The present disclosure relates to an information processing device, an information processing method, and an information processing system.
In recent years, with the emergence of moving image distribution services such as YouTube (registered trademark), moving image contents such as sports, games, and education are increasingly distributed with comments such as a commentary and explanation.
Patent Literature 1: JP 2018-187712 A
However, an editing work of adding a comment by speech, text, or the like to a moving image content is generally performed manually by a content creator, and the introduction cost for equipment necessary for editing and the work cost necessary for editing are high, and it is not a situation that anyone can easily create a moving image content with a comment.
Therefore, the present disclosure proposes an information processing device, an information processing method, and a program that enable more easy creation of a commented moving image content.
To solve the problems described above, an information processing device according to an embodiment of the present disclosure includes: an acquisition unit that acquires information regarding a relationship between a contributor of a content and a viewer of the content; and a comment generation unit that generates a comment to be uttered by a virtual commentator based on the information regarding the relationship.
Hereinafter, an embodiment of the present disclosure will be described in detail with reference to the drawings. Note that, in the following embodiment, the same parts are denoted by the same reference signs, and an overlapping description will be omitted.
In addition, the present disclosure will be described according to the following item order.
Hereinafter, an information processing device, an information processing method, and an information processing system according to an embodiment of the present disclosure will be described in detail with reference to the drawings. In the present embodiment, an optimal comment is automatically generated for a video of sports, games, or the like in accordance with “in what position and to whom to give a comment”. Representative examples of a service for adding a comment to a video include live broadcasting of a sports program broadcast on a television or the like, such as baseball broadcasting or soccer broadcasting. However, recently, with the emergence of a moving image distribution service using the Internet, such as YouTube (registered trademark), the application of the service is also expanding to a moving image content such as live game broadcasting or product introduction. As a result, while a conventional commentary by a television program is broadcasting for an unspecified large number of viewers, in the moving image distribution service using the Internet, a style of making a commentary in consideration of a specific viewer, such as contribution/distribution to a specific group such as friends or a family, or interactively responding to a chat from a viewer during live distribution, has been developed.
Here, the meaning of “in what position and to whom to give a comment” according to the present embodiment will be described with reference to
(1) of
In (1), the virtual commentator C1 represents the enthusiasm of the player U1 by saying, for example, “Let's give it a shot!”. In (2), the virtual commentator C1 says “Let's run together!” as a team member of the player U1. In (3), the virtual commentator C1 supports, as a viewer, “Cheer up! Let's turn the game around!”. In this way, it can be seen that, if the position of the virtual commentator C1 is different for one video scene, suitable comments are different.
Furthermore, regarding (3), in a case where a target to whom a comment is to be given is the player U1 in (3-1) and is a viewer (friend) A1 in (3-2), the name of the player U1 is called and the player U1 is encouraged in (3-1) like “Keep it up, George!”, but in (3-2), a conversation expressing an honest opinion may be made among friends like “Isn't George slow to make judgments?”.
As described above, unlike a broadcasting commentary of a television program, when generating a comment in consideration of a contributor and a viewer, it is necessary to generate an appropriate comment by clarifying the position of a virtual commentator and a target of the comment. Furthermore, the “position” has two meanings. The first is a position where a person stands, and the second is a role or a viewpoint taken by a certain person in a human relationship or a society. Therefore, in the present embodiment, a case where a comment is generated mainly according to the position having the second meaning is exemplified. However, the present disclosure is not limited thereto, and it is also possible to change a comment to be generated according to the position of a viewer or a virtual commentator in a 3D virtual space, the position having the first meaning.
Next, an example of a screen of an application that automatically adds a comment to a moving image content (hereinafter, also simply referred to as a video) will be described. In the description, an application with which a user plays a game while recording the game on a smartphone or a game console and then shares a commented playing video (moving image content) to a friend will be exemplified. The application may be an application executed by various information processing devices, such as an application of a smartphone, an application in a game console, or an application of a personal computer. Furthermore, the application that automatically adds a comment to a moving image content may be an application having only a comment addition function, or may be implemented as one function of a game application, a recording application, a moving image reproduction application, a moving image distribution application, or a social networking service (SNS) application capable of handling a moving image content.
Examples of a subject who adds a comment to a moving image content include a creator of the moving image content, a contributor of the moving image content, a distributor of the moving image content, and a viewer of the moving image content.
As illustrated in
On the other hand, as illustrated in
A configuration, in which the viewer (including the person himself/herself) U1 and A11 to A13 can additionally add a comment to the moving image content G1 to which a comment has been added by the creator of the moving image content, the contributor of the moving image content, or the distributor of the moving image content, based on the virtual commentator, the position, and the comment target selected by the viewer himself/herself at the time of viewing, may also be adopted.
First, a case where a comment is added at the time of contributing (uploading) the moving image content (see
As illustrated in
The user terminal 10 is, for example, an information processing device on a user U1 side, such as a smartphone, a personal computer, or a game console, and executes an application for generating (recording or the like) a moving image content, automatically adding a comment, or the like.
The user information retention unit 20 is, for example, a database managed on a provider side of the moving image distribution service, and retains information regarding the user U1, information regarding other users (including viewers) related to the user U1, and information regarding a relationship between the user U1 and other users (viewers and the like). In the description, these pieces of information are collectively referred to as user information.
Here, the information regarding the relationship between the user U1 and other users (viewers or the like) may include, for example, at least one of the degree of intimacy (such as a following relationship for an account) between the user U1 who is the contributor and other users (viewers or the like), a relationship between the user U1 and other users (viewers or the like) in the moving image content, and history information of other users (viewer or the like) for the moving image content contributed by the user U1 in the past. However, the present disclosure is not limited thereto, and various types of information regarding the relationship between the user U1 and other users (viewers or the like) may be included.
The event extraction unit 30 includes, for example, an image analysis unit 31 and a speech analysis unit 32, and extracts an event from the moving image content by analyzing an image and a speech in the moving image content generated by the user terminal 10. The extraction of the event will be described in detail later.
The comment generation unit 40 includes a position/target control unit 41, and generates a comment (text data) on the event extracted by the event extraction unit 30 based on the virtual commentator, the position, and the comment target selected by the user terminal 10. In other words, the comment generation unit 40 generates a comment to be uttered by the virtual commentator based on the information regarding the relationship between the user U1 and other users (viewers or the like). The generation of the comment will be described in detail later.
The utterance control unit 50 converts the comment (text data) generated by the comment generation unit 40 into sound data by using text-to-speech (TTS), for example.
For example, the avatar generation unit 60 generates an avatar of the selected virtual commentator based on the virtual commentator selected by the user terminal 10, and then, generates motion data for operating the avatar based on the position and the comment target of the selected virtual commentator, and the sound data generated by the utterance control unit 50.
The editing/rendering unit 70 renders the moving avatar generated by the avatar generation unit 60, and superimposes video data (hereinafter, also referred to as an avatar animation) generated by the rendering on video data of the moving image content. Furthermore, the editing/rendering unit 70 superimposes the sound data generated by the utterance control unit 50 on sound data of the moving image content. The editing/rendering unit 70 may digest the moving image content by editing the moving image content based on the event extracted by the event extraction unit 30, for example. The moving image content may be digested before or after the avatar animation and the sound data are superimposed.
The distribution unit 80 distributes the commented moving image content generated by the editing/rendering unit 70 to a terminal of the viewer via a predetermined network 90. The predetermined network may be various networks such as the Internet, a local area network (LAN), a wide area network (WAN), and a mobile communication system (including the 4th generation mobile communication system (4G)), the 4G-long term evolution (LTE), the 5th generation mobile communication system (5G), and the like).
First, an operation until the user U1 starts the application on the user terminal 10 and sets the virtual commentator will be described.
As illustrated in
When the user information is acquired, the user terminal 10 acquires a list (also referred to as a video list) of moving image contents such as a live-action video captured by the user U1 or a game playing video based on the acquired user information, creates the moving image content selection screen G10 illustrated in
In a case where the user U1 selects a moving image content to which a comment is to be added based on the moving image content selection screen G10 displayed on the user terminal 10 (step S103), the user terminal 10 acquires the genre (video genre) of the selected moving image content from meta information (hereinafter, referred to as video information) given to the selected moving image content. The video information may be, for example, tag information such as a title and a genre of the moving image content, a game name, or a dish name. Subsequently, the user terminal 10 acquires a genre ID of the selected moving image content by referring to the moving image content management table illustrated in
For example, in a case where the user U1 selects a moving image content of a soccer game, the genre ID is G03, and in a case where the user U1 selects a moving image content of cooking, the genre ID is G04. By referring to the character management table illustrated in
Here, the priority may be, for example, the order of the virtual commentator prioritized for each video genre. For example, in the example illustrated in
As the priority setting rule, for example, various methods such as a method of setting characters (for example, a character who likes sports) suitable for each video genre on a rule basis, a method of setting the priority of each character based on a preference history of the user, and the like may be used.
When the user U1 selects the character of the virtual commentator by using the commentator selection screen G20 (step S105), the user terminal 10 refers to the position management table illustrated in
When the user U1 selects the position of the virtual commentator by using the position selection screen G30 (step S107), the user terminal 10 refers to the comment target management table illustrated in
In this manner, the user terminal 10 sets the position of the virtual commentator and the comment target of the comment uttered by the virtual commentator, and notifies the event extraction unit 30 of the setting content. In steps S103, S105, and S107, a case where the user U1 selects each item has been exemplified, but the present disclosure is not limited thereto. For example, the user terminal 10 may be configured to automatically select each item based on the service definition, the user's preference, or the like.
As described above, when the setting of the virtual commentator is completed, next, the event extraction unit 30 executes an operation of extracting an event from the moving image content selected by the user U1.
As illustrated in
Next, the event extraction unit 30 extracts an event from the moving image content by inputting the moving image content to the acquired recognition model (step S112). More specifically, the event extraction unit 30 extracts an event by analyzing a feature of the video, movement of a person or a ball recognized by the recognition model, a feature point extracted from the image data such as a data indicator (score or the like) in the screen, a keyword recognized from the sound data, and the like. The event according to the present embodiment may be defined for each small genre such as “baseball” or “soccer” or for each specific game name, instead of a large genre such as “sports”. For example, in a case of a soccer game, an event ID=E001 is “goal” and an event ID=E002 is “shot”, that is, a representative technique, score, file, or the like may be defined as the event. In addition, the extracted event may include a parameter indicating whether the extracted event is the event of the own team or the event of the opponent team. Furthermore, information such as the name and uniform number of a player who has caused each event, a ball position, and the total point may be acquired as a part of the event.
When the event is extracted as described above, the event extraction unit 30 generates the event data together with a time code indicating a time at which the event occurs in the moving image content, and adds the generated event data to the event data list illustrated in
Then, the event extraction unit 30 repeatedly performs the event extraction operation as described above up to the end of the moving image content (NO in step S114).
As described above, when the event data list is created based on the moving image content, next, the comment generation unit 40 performs an operation of generating a comment (text data) for each event based on the virtual commentator selected by the user U1, the position and the comment target of the virtual commentator.
As illustrated in
Here, in a case where the events occur successively, it may be difficult to add a comment to all the events. In such a case, as in step S122, the comment generation unit 40 may perform filtering on the event data list. For example, the comment generation unit 40 may obtain a time interval between the events based on the time code of the event data list, and for an event that occurs within a predetermined time interval (for example, 15 seconds) (NO in step S122), the comment generation unit 40 returns to step S121 without generating a comment, and acquires an event of the next time code. At this time, priorities (for example, a goal is prioritized over a foul) may be assigned to the events, and events with low priorities may be preferentially excluded. However, the present disclosure is not limited thereto, and a configuration may be adopted in which events are distributed to a plurality of virtual commentators without filtering, and the respective virtual commentators utter comments for the distributed events.
Next, the comment generation unit 40 acquires a comment list corresponding to the position of the virtual commentator for each event by referring to the position comment list illustrated in
Furthermore, the comment generation unit 40 extracts n times (n is an integer of 1 or more) of comments used in the past for the same user U1 by referring to the used comment history for the comment list acquired in step S123 (step S124). The comment used for the same user U1 in the past may be any one or both of a comment used for a moving image content different from the moving image content currently being processed and a comment used for the same moving image content.
Next, the comment generation unit 40 selects one of comments (hereinafter, a comment corresponding to the position is referred to as a position comment) obtained by excluding the comments of n times acquired in step S124 from the comment list acquired in step S123 (step S125). In step S125, for example, the position comment may be randomly selected using a pseudo random number or the like, or the position comment may be selected according to the order of the comment list from which the past comments of n times are excluded. In this way, by controlling the same comment in such a way as not to overlap or be frequently used based on the history of the comments used in the past, it is possible to suppress the user U1 or the viewer from becoming bored with the comment of the virtual commentator. In step S124, each of the event and the comment may be vectorized and selected from the closest candidates.
Next, the comment generation unit 40 analyzes the position comment selected in step S125 by performing morphological analysis or the like (step S126). Subsequently, the comment generation unit 40 omits an event name included in the position comment (step S127). This is because, in a case where it is assumed that the viewer recognizes the occurrence of an event from the video, it is possible to make a comment with a better tempo by omitting the event name. However, in a case where the virtual commentator is a character such as an announcer that needs to speak correctly, omission of the event name (step S127) may be skipped.
Next, the comment generation unit 40 adds an exclamation to the position comment (step S128). For example, an exclamation expressing expectation such as “Here it comes!” is added to a shooting scene of the own team, and an exclamation indicating disappointment such as “Oh . . . ” is added to a goal scene of the opponent team, so that it is possible to encourage empathy of the comment target.
Next, the comment generation unit 40 acquires a proper noun such as a player name, a team name, or a nickname, a pronoun, or the like from the moving image content or the user information retention unit 20, and adds the acquired proper noun or pronoun to the position comment (step S129). This makes it possible to give a feeling of closeness to the comment target.
Next, the comment generation unit 40 converts an ending of the position comment into one suitable for a call to the target (step S130). For example, “although <event>, I will cheer for you” of the “position comment” is converted into “Let's all cheer for George!”. As a result, it is possible to make a more natural call to the comment target person. In a case of a language in which an expression of a relationship between the users (for example, the presence or absence of “please”) appears at a position other than the end of a sentence like English, the processing of step S130 may be executed as conversion of a portion other than the ending.
As described above, in steps S127 to S130, the comment generation unit 40 corrects the generated position comment based on the information regarding the relationship between the user U1 and other users (viewers or the like).
Next, the comment generation unit 40 adds the event ID, the parameter, and the time code to the position comment and registers the resulting position comment in the target comment list illustrated in
Then, the comment generation unit 40 repeatedly performs this operation (NO in step S132) until the generation of the comment for all the pieces of event data is completed (YES in step S132).
In this example, the position comment is generated for each event after the event data list is created based on the entire moving image content, but the present disclosure is not limited thereto, and for example, the position comment may be generated every time an event is extracted from the moving image content.
Here, a modified example of the ending conversion described in step S130 of
As illustrated in
In step S1302, the comment generation unit 40 acquires a dictionary corresponding to the attribute specified in step S1301. This is because, in a case where the selected position and comment target have the same attribute, it is desirable to use a dictionary corresponding to the attribute since it is desirable to use a popular word frequently used in the community, a technical term, or the like in the comment. The dictionary may be fad words, buzzwords, technical terms, or the like, or a corpus collected from a community social networking service (SNS) or the like. Furthermore, the dictionary may be retained in advance by the comment generation unit 40, or may be used as a dictionary by searching information on a network such as the Internet in real time.
Next, if there is a word corresponding to a community language, the comment generation unit 40 replaces a word in the position comment with the word corresponding to the community language (step S1303).
Next, the comment generation unit 40 checks the hierarchical relationship between the selected position and comment target by referring to the hierarchical relationship management table illustrated in
The comment generation unit 40 converts the ending of the position comment into an ending in an imperative tone (step S1305) in a case where the level of the position of the virtual commentator is higher than that of the comment target (“position>target” in step S1304), the comment generation unit 40 converts the ending of the position comment into an ending in a familiar tone in a case where the levels of the position and the comment target are the same as each other (“position=target” in step S1304), converts the ending of the position comment into the ending of the familiar word (step S1306), and the comment generation unit 40 converts the ending of the position comment into a polite word in a case where the level of the position is lower than that of the comment target (“position<target” in step S1304) (step S1307). Thereafter, the comment generation unit 40 returns to the operation illustrated in
Next, an example of an operation of generating a commented moving image content by using the position comment generated as described above and an operation of distributing the generated commented moving image content will be described.
As illustrated in
Furthermore, the avatar generation unit 60 executes avatar generation processing of generating the avatar animation of the virtual commentator (step S150).
Next, the editing/rendering unit 70 executes editing/rendering processing of generating the commented moving image content based on the moving image content, the comment voice, and the avatar animation (step S160).
The commented moving image content generated in this way is distributed from the distribution unit 80 to the user U1 and the viewer via the predetermined network 90 (step S171), and then this operation ends.
Here, the avatar generation processing described in step S150 of
As illustrated in
Then, the avatar generation unit 60 creates an animation of moving the avatar selected in step S1501 in accordance with an utterance section of the comment voice (step S1503), and returns to the operation illustrated in
Furthermore, the editing/rendering processing described in step S160 of
As illustrated in
Next, the editing/rendering unit 70 arranges the comment voice and the avatar animation on the moving image content in accordance with the time code (step S1602), and generates the commented moving image content by rendering the moving image content on which the comment voice and the avatar animation are arranged (step S1603). The text data of the position comment may be arranged as a subtitle in the moving image content.
The commented moving image content generated in this manner may be distributed to the user U1 in advance via the distribution unit 80 for confirmation by the user U1 (player) before being distributed to the viewer (step S164). Therefore, when the confirmation (distribution approval) by the user U1 is obtained via a communication unit (not illustrated) or the like (step S165), this operation returns to the operation illustrated in
The position selected in step S107 and the comment target selected in step S109 in
In
At time T1, the own team has scored and the ball possession rate of the team is also high, so that the comment target remains “friend”. Therefore, as illustrated in
At time T2, the ball possession rate is lower than a preset threshold (for example, 50%). The position/target control unit 41 of the comment generation unit 40 switches the comment target from “friend” to “team member” after giving a comment “Oh no, isn't it getting worse? I'll go give them some encouragement” to the friend at a timing (time T2) when the ball possession rate falls below the threshold. Therefore, the comment generation unit 40 operates to give a comment to the team member during a period in which the comment target is “team member”.
When losing a goal at time T3, the virtual commentator gives an encouraging comment to the team member as “A team (team name), don't worry! Your movement is not bad at all”.
Thereafter, when the ball possession rate exceeds the threshold at time T5, the position/target control unit 41 switches the comment target from “team member” to “friend”. Therefore, the comment generation unit 40 operates to generate a comment to be given to the friend after time T5.
As illustrated in
Next, in a case where the ball possession rate of the own team acquired in step S2202 falls below the threshold set in step S2201 (“lower” in step S2203), the position/target control unit 41 switches the comment target to “team member” and returns to the operation illustrated in
In
The comment generation unit 40 according to the present embodiment may change a comment to be generated based on an emotion value, for example, in addition to the event and the position.
As classification of emotions, there are various models such as Russell's model and Prussian model, but in the present embodiment, a simple positive/negative emotion is applied. In the present embodiment, the emotion value is mapped to a numerical value of 0 and 1, and 1 is defined as the most positive emotion and 0 is defined as the most negative emotion.
As illustrated in
The amount of change in emotion value illustrated in
As illustrated in
As illustrated in
Next, the comment generation unit 40 determines whether or not the emotion value a of the comment target is smaller than a threshold (for example, defined as 0.3) of a negative state (step S303), and in a case where the emotion value a is smaller than the threshold (YES in step S303), the comment generation unit 40 sets “encouragement” as a search tag for the position comment list illustrated in
Furthermore, in a case where the emotion value a of the comment target is equal to or larger than the threshold of the negative state (NO in step S303), the comment generation unit 40 determines whether or not an absolute value (|a-b|) of a difference between the emotion values of the comment target and the virtual commentator is larger than a threshold (for example, defined as 0.3) (step S304). In a case where the absolute value of the difference between the emotion values is larger than the threshold (YES in step S304), the comment generation unit 40 sets “empathy” and “sarcasm” as the search tags (step S306), and proceeds to step S122.
On the other hand, in a case where the absolute value of the difference between the emotion values is equal to or smaller than the threshold (NO in step S304), the comment generation unit 40 sets “empathy” as the search tag (step S307), and proceeds to step S122.
Next, after executing steps S122 to S124, the comment generation unit 40 selects one of the position comments to which the search tag set in step S305, S306, or S307 is attached among the position comments obtained by excluding the comments of n times acquired in step S124 from the comment list acquired in step S123 (step S325). In step S325, similarly to step S125, for example, the position comment may be randomly selected using a pseudo random number or the like, or the position comment may be selected according to the order of the comment list from which the past comments of n times are excluded.
Thereafter, the comment generation unit 40 creates the target comment list by performing the same operation as steps S126 to S132 illustrated in
In this way, by limiting a selection range of the position comment based on the emotion value, in other words, by narrowing down candidates of the position comment using the search tag set based on the emotion value, it is possible to generate a comment better reflecting the emotion of a player, a friend, or the like. For example, by setting “encouragement” as the search tag, it is possible to generate a comment encouraging a player who is disappointed due to the negative state like “Don't worry, it's just the beginning. You can make a comeback!”. In addition, by setting “empathy” as the search tag, it is possible to generate a comment empathizing with a disappointed player like “It's tough, I know how you feel!”. Furthermore, by setting both “empathy” and “sarcasm” as the search tags, it is also possible to generate a comment representing a complicated psychological state. For example, it is possible to express a sarcasm to a player with a continuous very positive goal like “Impressive. Is it because of the difference in PC specs, perhaps?” from the standpoint of a friend who is actually feeling boring because of monotonous play, although the comment is praising, the comment also means that it may be because of the difference in machine. In this way, it is possible to generate a more human comment by generating a comment based on the relationship between emotions.
The definition of the emotion value for the event illustrated in
Next, a case where a comment is added at the time of viewing (downloading) the moving image content (see
In a case of adding a comment at the time of viewing (downloading) the moving image content, it is possible to generate a comment better reflecting the taste and situation of the viewer than in a case of adding a comment at the time of contributing (uploading) the moving image content. The selection of the moving image content to be viewed by the viewer may be performed, for example, by the viewer starting the application on the terminal and using the moving image content selection screen G10 (see
As illustrated in
Similarly to the user terminal 10, the user terminal 110 is, for example, an information processing device on the viewer A10 side, such as a smartphone, a personal computer, or a game console, and an application for reproducing the moving image content, automatically adding a comment, or the like. Furthermore, the user terminal 110 includes a communication unit (not illustrated), and acquires the user information retained by the user information retention unit 20 and the moving image content to which a comment is to be added via the communication unit.
When receiving selection of the moving image content to which a comment is to be added sent from the user terminal 110, the setting unit 120 sets the position and the comment target of the virtual commentator based on the video information of the selected moving image content and the user information of the user U1 and the viewer A10 acquired from the user information retention unit 20. In other words, in this example, the setting unit 120 can function as the acquisition unit that acquires the information regarding a relationship between the user U1 who is the contributor of the moving image content and the viewers of the moving image content. Furthermore, the setting unit 120 sets the position of the virtual commentator and the comment target of the comment uttered by the virtual commentator, and notifies the event extraction unit 30 of the setting content. As described in
Other configurations may be similar to the system configuration example of the information processing system 1 illustrated in
An operation flow in a case of adding a comment at the time of viewing (downloading) may be basically similar to the operation flow in a case of adding a comment at the time of contributing (uploading) the moving image content described above with reference to
Furthermore, in the description, as described above, a case where the degree of intimacy is recognized from the information regarding the viewer A10 and the moving image contributor (the user U1 in the description), and the character, the position, and the comment target of the virtual commentator are automatically set is exemplified. Therefore, the operation illustrated in
As illustrated in
When the user information is acquired, the user terminal 110 acquires a list (video list) of moving image contents that can be viewed by the viewer A10 based on the acquired user information, creates the moving image content selection screen G10 illustrated in
When the viewer A10 selects the moving image content to which the comment is to be added based on the moving image content selection screen G10 displayed on the user terminal 110 (step S403), the user terminal 110 notifies the setting unit 120 of the selected moving image content. The setting unit 120 acquires the genre (video genre) of the selected moving image content from the meta information (video information) given to the moving image content sent from the user terminal 110, and acquires the genre ID of the selected moving image content by referring to the moving image content management table illustrated in
Next, the setting unit 120 acquires the user information of the user U1 who has uploaded the moving image content selected in step S403 to the cloud 100 (see
Next, the setting unit 120 acquires the degree of intimacy with the viewer A10 from the name of the user U1 included in the user information of the user U1, the service ID, and the like (step S406). For example, the degree of intimacy may be set based on information such as whether or not the user U1 has played an online game together, whether or not the user U1 is registered as a friend, whether or not the viewer A10 has viewed a moving image content contributed by the user U1 in the past, and whether or not the viewer A10 and the user U1 have had a conversation on an SNS or the like. For example, in a case where the user U1 is set as a friend who plays an online game together, or in a case where the user U1 and the viewer A10 have had a conversation within one month on the SNS, the degree of intimacy may be set to 3 which is the highest level, and in a case where there is no direct conversation but the viewer A10 has viewed moving image contents uploaded by the contributor (user U1) three or more times in the past, the degree of intimacy may be set to 2, which is the second highest level, and in other cases, the degree of intimacy may be set to 1, which is the lowest level.
The setting unit 120 sets the position and the comment target defined on a rule basis or the like based on the degree of intimacy acquired in step S406. Specifically, in a case where the degree of intimacy between the viewer and the contributor is level 3 (“3” in step S406), the setting unit 120 determines that the viewer and the contributor are familiar with each other, sets the position of the virtual commentator to a team member, and sets the comment target to the contributor (player=user U1) (step S407). Furthermore, in a case where the degree of intimacy between the viewer and the contributor is level 2 (“2” in step S406), the setting unit 120 determines that the viewer and the contributor are not as familiar as team members, sets the position of the virtual commentator to a friend, and sets the comment target to the contributor (player=user U1) (step S408). On the other hand, in a case where the degree of intimacy between the viewer and the contributor is level 1 (“1” in step S406), the setting unit 120 sets the position of the virtual commentator to a friend and sets the comment target to a friend (step S408), thereby selecting the style of observers.
The degree of recognition for the moving image content may be acquired from the viewing history of the viewer A10, and in a case where the moving image content is a moving image content that is viewed for the first time, an explanation for a beginner, a basic rule, or the like may be preferentially provided as comments, and a more advanced content may be included when the viewer becomes accustomed to the moving image content. Furthermore, the comment may be changed according to a sensed emotional arousal level of the viewer A10. For example, in a case where the viewer A10 is relaxed, a comment or a comment voice having a calm content and tone may be generated, and in a case where the viewer A10 is excited, a comment or a comment voice having a more intensified content and tone may be generated. In addition, the degree of comfort and the degree of discomfort of the sensed emotion of the viewer A10 may be learned, and the result may be used to generate a comment or a comment voice.
As described above, it is possible to generate a comment according to the situation for each viewing. For example, in the first viewing of a video of a game in which the own team sustains a crushing defeat, a comment “So frustrating, I will practice more!” of the virtual commentator whose position is “player” for the comment target “friend” is generated, but in the second viewing after half a year, the number of subsequent wins, the number of trophies acquired, and the like are acquired from activity information of the player and improvement is detected, and a comment including reviewal such as “I owe my current self to the intense practice that followed the frustration” can be generated.
Next, an example of generating conversation comments of two virtual commentators for real-time moving image distribution (hereinafter, also referred to as live distribution) will be described. In the description, it is assumed that the way of conversation between the two virtual commentators is, for example, that a commentator first comments on “what event has occurred” in response to the occurrence of the event, and then an expositor explains the event.
Next, an operation example in a case of generating conversation comments of two virtual commentators for real-time moving image distribution will be described.
As illustrated in
Meanwhile, a screen for selecting a video genre to be distributed live is displayed on the user terminal 10. When the user U1 selects the video genre based on this selection screen (step S502), a notification of the selected video genre is made from the user terminal 10 to the setting unit 120. The selection screen may have various modes such as an icon mode and a menu mode.
Next, the setting unit 120 automatically selects the characters and the positions of the two virtual commentators based on, for example, the video genre sent from the user terminal 10, the user information of the user U1 acquired from the user information retention unit 20, and the like (step S503). In this example, since it is assumed that “sports 1” in
Furthermore, the setting unit 120 automatically selects the comment target similarly based on the video genre, the user information, and the like (step S504). In this example in which “sports 1” is assumed as the video genre, for example, “viewer” is autonomously set as the comment target.
Next, as in step S111 of
When the preparation for the live distribution is completed in this way, next, the user U1 starts the live distribution of the moving image content by operating the user terminal 10 or an imaging device connected thereto (step S506). When the live distribution is started, the captured or taken-in video data (hereinafter, referred to as the moving image content) is sequentially output from the user terminal 10. At this time, the moving image content may be transmitted by a streaming method.
The moving image content transmitted from the user terminal 10 is input to the event extraction unit 30 directly or via the setting unit 120. The event extraction unit 30 extracts an event from the moving image content by inputting the moving image content to the recognition model acquired in step S505, similarly to step S112 in
Meanwhile, the comment generation unit 40 periodically confirms whether or not the event data has been input (step S508). In a case where no event data has been input (NO in step S508), this operation proceeds to step S540. On the other hand, in a case where the event data has been input (YES in step S508), the comment generation unit 40 determines whether or not a predetermined time (for example, 30 seconds) or more has elapsed from a time when the immediately previous event data has been input (step S509). When the predetermined time or more has elapsed (YES in step S509), the comment generation unit 40 proceeds to step S520. In step S122 and the like of
On the other hand, in a case where the next event data has been input before the predetermined time has elapsed (NO in step S509), the comment generation unit 40 determines whether or not the event data is event data of an event with a high priority (step S510), and in a case where the event data is event data of an event with a high priority (YES in step S510), in order to interrupt the utterance of the previous event, the comment generation unit 40 notifies the editing/rendering unit 70 of a request (utterance interruption/stop request) to interrupt or stop the utterance currently being executed or the utterance to be executed by one of the virtual commentators (step S511), and proceeds to step S520. In addition, in a case where the event data is not event data of an event having a high priority (NO in step S510), the comment generation unit 40 discards the input event data, and this operation proceeds to step S540. For example, in a case where the moving image content is a soccer game playing video, the priority of an event such as “passed the ball” is low, but the priority of an event such as “goal” is high. The priority of each event may be set in the parameter in the event management table illustrated in
In step S520, the comment generation unit 40 generates the position comment to be uttered by one of the virtual commentators based on the input event data.
In step S530, the utterance control unit 50 converts text data of the position comment generated by the comment generation unit 40 into the sound data.
In step S540, the avatar generation unit 60 generates the avatar animation in which the avatar of the virtual commentator moves according to the sound data generated by the utterance control unit 50. The operation example of the avatar generation processing may be similar to the operation example described above with reference to
In step S550, the editing/rendering unit 70 generates a commented moving image content from the selected moving image content, the sound data, and the avatar animation.
When the commented moving image content is generated in this manner, the generated commented moving image content is distributed live from the distribution unit 80 via the predetermined network 90 (step S512).
Thereafter, for example, a control unit (not illustrated) in the cloud 100 determines whether or not to end distribution (step S513), and in a case where the control unit determines not to end the distribution (NO in step S513), the operation returns to step S507. On the other hand, in a case where the control unit determines to end the distribution (YES in step S513), this operation ends.
1.10.1 Example of comment generation processing Here, the comment generation processing described in step S520 of
As illustrated in
Next, the comment generation unit 40 refers to the used comment history for the comment list acquired in step S5201, and selects one of the position comments obtained by excluding the past comments of n times from the comment list acquired in step S5201 (step S5202). In step S5202, similarly to step S125 of
Similarly, the comment generation unit 40 first acquires the comment list for the position of “commentator” with respect to the comment data sequentially input from the event extraction unit 30 (step S5203), and selects one of the position comments obtained by excluding the past comments of n times based on the comment history used for the acquired comment list (step S5204).
Next, as in steps S126 to S131 of
Furthermore, the utterance control processing described in step S540 of
As illustrated in
Next, the utterance control unit 50 converts each position comment into the sound data (comment voice) by performing voice synthesis processing using TTS on the text data of each position comment of “commentator” and “expositor” (step S5303).
Next, the utterance control unit 50 acquires an utterance time of each of the comment voice of “commentator” and the comment voice of “expositor” (step S5304).
Next, the utterance control unit 50 sets an utterance start time (time code) of the expositor to a time point after the lapse of the utterance time of the commentator from an utterance start time (time code) of the commentator (step S5305), and updates the target comment list at the updated utterance start time of the expositor (step S5306), so that the utterance of the virtual commentator of the expositor is started after the end of the utterance of the virtual commentator of the commentator, as illustrated in
Thereafter, the utterance control unit 50 stores voice files of the comment voice of “commentator” and the comment voice of “expositor” (step S5307), and returns to the operation illustrated in
Furthermore, the editing/rendering processing illustrated in step S550 of
As illustrated in
In step S5503, the editing/rendering unit 70 acquires the comment voice from the utterance control unit 50 and acquires the avatar animation from the avatar generation unit 60. Subsequently, similarly to steps S1602 and S1603 of
Next, an example of generating a comment according to a feedback from the viewer during real-time moving image distribution will be described.
As illustrated in
In step S601, the comment generation unit 40 determines whether or not a predetermined time (for example, 90 seconds) or more has elapsed from a time when the immediately previous event data has been input. In a case where the predetermined time or more has not elapsed (NO in step S601), the comment generation unit 40 proceeds directly to step S530. On the other hand, in a case where the predetermined time or more has elapsed (YES in step S601), the comment generation unit 40 performs viewer feedback (step S610), and then proceeds to step S530.
Here, in this example, a function in which each viewer (which may include a contributor (a player himself/herself)) can send a comment in a chat or the like of a distribution service as a feedback from a viewer is assumed.
As illustrated in
Next, the comment generation unit 40 determines whether or not there is a feedback target matching the target (a person himself/herself, a team member, a friend, or the like) set as the comment target (step S6104), and in a case where there is no matching target (NO in step S6104), the comment generation unit 40 returns to the operation illustrated in
Next, the comment generation unit 40 acquires a list of comments in which the position is “expositor” from the position comment list illustrated in
Next, the comment generation unit 40 generates the position comment to be actually uttered by the virtual commentator by performing the operation similar to steps S5204 to S5209 of
Next, in order to control the event interval in steps S510 and S601, the comment generation unit 40 counts the position comment registered in the target comment list in step S6114 as one of the events (step S6115), and returns to the operation illustrated in
As illustrated in
Meanwhile, a screen for selecting a video genre to be viewed live is displayed on the user terminal 110. When the viewer A10 selects the video genre based on this selection screen (step S702), a notification of the selected video genre is made from the user terminal 110 to the setting unit 120. The selection screen may have various modes such as an icon mode and a menu mode.
Next, the setting unit 120 automatically selects the character of the virtual commentator based on, for example, the video genre sent from the user terminal 110, the user information of the viewer A10 acquired from the user information retention unit 20, and the like, and sets “viewer” as the position (step S703).
Next, as in step S111 of
When the preparation for the live viewing is completed in this manner, next, the distribution unit 80 starts distributing a real-time video to the user terminal 110 (step S705). When the live viewing is started, the real-time video (moving image content) is sequentially output from the distribution unit 80 to the user terminal 110. At this time, the moving image content may be transmitted by a streaming method.
The number of viewers participating in the live distribution, the user information of the player and the viewer, and the like are managed by the setting unit 120, for example. The setting unit 120 manages the number of viewers who are currently viewing the live distribution, determines whether or not the number of viewers increases or decreases (step S706), and in a case where there is no increase or decrease in number of viewers (NO in step S706), the moving image proceeds to step S709.
In a case where there is an increase or decrease in number of viewers (YES in step S706), the setting unit 120 performs display adjustment for the virtual commentator (step S707). Specifically, in a case where a new viewer is added, the setting unit 120 adjusts the setting in such a way that the virtual commentator of the newly added viewer is additionally displayed in the real-time video (moving image content) distributed to the viewer A10. Furthermore, in a case where a viewer leaves the live distribution, the setting unit 120 adjusts the setting in such a way that the virtual commentator of the viewer who has left is not displayed in the real-time video for the viewer A10. At this time, a specific animation may be displayed, for example, the virtual commentator of the viewer who has left enters or exits from a door that appears on the screen. For example, a virtual commentator supporting an opponent team B appears in a state in which the viewer A10 is cheering for a team A, which leads to a spirited competition of cheering, thereby intensifying the battle. Furthermore, when a friend of the viewer A10 newly joins the live distribution, a virtual commentator of the friend appears, which can intensify the cheering or the battle.
Next, the setting unit 120 adjusts the comment target of the virtual commentator related to the viewer A10 based on the increase or decrease in number of virtual commentators (step S708). For example, in a case where the number of virtual commentators increases, the increased virtual commentators are added as the comment targets, and in a case where the number of virtual commentators decreases, the reduced virtual commentators are deleted from the comment targets. Then, the setting unit 120 sequentially inputs the adjusted virtual commentator and the comment target thereof to the event extraction unit 30 and the avatar generation unit 60.
Next, the comment generation unit 40 acquires the position comment generated for the virtual commentator of another viewer (hereinafter, referred to as a virtual commentator B for convenience of description) (step S709), and determines whether or not the target (listener) of the position comment is the virtual commentator of the viewer A10 (hereinafter, referred to as the virtual commentator A for convenience of description) (step S710).
In a case where the virtual commentator A is the target (YES in step S710), the comment generation unit 40 generates the position comment for the virtual commentator B (step S712) by setting the target (listener) of the comment to be uttered by the virtual commentator A to the virtual commentator B who has spoken to (step S711), and proceeds to step S715. On the other hand, in a case where the virtual commentator A is not the target (NO in step S710), the comment generation unit 40 sets the target of the comment to the viewer (step S713), generates the position comment for the viewer on an event basis (step S714), and proceeds to step S715.
As in steps S530 to S550 and S512 of
Thereafter, for example, a control unit (not illustrated) in the cloud 100 determines whether or not to end distribution (step S719), and in a case where the control unit determines not to end the distribution (NO in step S719), the operation returns to step S706. On the other hand, in a case where the control unit determines to end the distribution (YES in step S719), this operation ends.
In this example, for example, in a case where no event has occurred for a predetermined period (for example, 90 seconds), the virtual commentator A may actively speak to another commentator, for example, in a manner in which the virtual commentator A generates and inserts a comment for speaking to another virtual commentator B. Furthermore, regarding utterance timings of a plurality of virtual commentators, in the example illustrated in
Furthermore, the display position of the virtual commentator on the user terminal 10/110 may be adjusted according to the position. In other words, a virtual position may be set for the virtual commentator according to the position.
For example, when the moving image content is a two-dimensional image, the virtual position may be within a region of the image or may be outside the region. In addition, in a case of a three-dimensional video, the virtual position may be in a 3D space or in a region of a two-dimensional video superimposed on the three-dimensional video. Further, the virtual commentator does not have to be visually displayed. In this case, the viewer (including the user) may feel the presence of the virtual commentator by sound source localization or the like at the time of rendering the voice of the virtual commentator.
For example, in a case where the position of the virtual commentator is “friend watching together” or the like, it is assumed that the virtual position of the virtual commentator is next to the viewer. In this case, the virtual position can be set to either the right or left side of the viewer, a distance from the moving image content (specifically, the user terminal 10/110) is equal to that of the viewer, the orientation is basically toward the content, and the position and orientation toward the viewer can be set when speaking to the viewer.
Furthermore, in a case where the position of the virtual commentator is the person himself/herself, the virtual position to be positioned next to the viewer in order to give a feeling of experiencing together with the viewer, the virtual position facing the viewer from the side of the content as a presenter, or the like can be considered.
In a case where such a virtual position is determined, it is possible to make the viewer feel the virtual position including the position and orientation, a time-series change thereof, and the like by 2D/3D drawing of the virtual commentator or localizing a sound source of the virtual commentator in a three-dimensional space. For example, it is possible to express a sound difference between speaking in a direction toward the viewer and speaking in a direction 90 degrees different from the direction toward the viewer, a sound difference between speaking from a distance and speaking near the ear, a state in which the virtual commentator approaches/moves away, and the like.
Such adjustment of the display position (virtual position) according to the position of the virtual commentator can be implemented, for example, by the avatar generation unit 60 adjusting the display position of the generated avatar animation in the moving image content according to the position of the virtual commentator. Furthermore, the adjustment of the orientation of the virtual commentator according to the position of the virtual commentator can be implemented, for example, by the avatar generation unit 60 adjusting the orientation of the virtual commentator when generating the avatar animation according to the position of the virtual commentator.
As illustrated in
Furthermore, as illustrated in
As illustrated in
As described above, by controlling the display position and orientation of the virtual commentator according to the position of the virtual commentator and the comment target, it is possible to improve the realistic feeling of the viewer in the content experience.
Furthermore, it is assumed that the comment content of the virtual commentator indicates a specific point in the moving image frame, such as “the movement of this player is great”. On the other hand, since a region that can be gazed by a human is limited as compared respect to a moving image frame, there is a possibility that the ease of understanding for the viewer is reduced when the line-of-sight of the viewer largely moves. For such a problem, for example, it is conceivable to set a rule that, in a case where there are a plurality of virtual commentators, if a certain virtual commentator makes a comment on a specific point in a moving image frame, no comment is made on a point that does not fall within the central viewing angle by another virtual commentator within a certain time.
Furthermore, by detecting a gaze point on the screen from line-of-sight information of the contributor and causing the virtual commentator to comment on the portion, attention of the viewer may be guided in a direction desired by the contributor, or conversely, by detecting a gaze point on the screen from line-of-sight information of the viewer and causing the virtual commentator to make a comment on the portion, or the like.
Furthermore, a caption of a voice comment uttered by the virtual commentator may be superimposed and displayed on the moving image content. At this time, for example, by adjusting the display position of the caption based on the line-of-sight information of the contributor or the viewer, it is possible to reduce the line-of-sight movement of the viewer and improve the ease of understanding. Specifically, by displaying the caption in the vicinity of a region in the moving image content gazed by the viewer or the viewer, the comment and the region targeted by the comment can be visually linked. Therefore, the light-of-sight movement of the viewer can be reduced, and the ease of understanding can be improved.
As illustrated in
On the other hand, as illustrated in
Furthermore, as illustrated in
In order to acquire the line-of-sight information of the contributor or the viewer, for example, a line-of-sight detection sensor or a camera provided in the user terminal 10/110 may be used. That is, the user terminal 10/110 can also function as the acquisition unit that acquires the line-of-sight information of the contributor or the viewer.
Next, an example of a case where machine learning is applied to generate the comment for each position of the virtual commentator will be described. In recent years, moving image distribution has become common, and the number of distributions has rapidly increased particularly for game videos. In main distribution platforms, a comment of a game player, a comment of a moving image viewer, a comment of a commentator/expositor at an e-SPORTS (registered trademark) match or the like are exchanged in real time during moving image distribution, and it is expected that the main distribution platforms will be further developed in the future including categories other than games as means for interactively enjoying moving images.
Such diversification of comment senders corresponds to the position of the virtual commentator. For example, in a case of a game commentary, the comment of the position “the person himself/herself” in
That is, the number of distributed moving images to which comments from the positions of various virtual commentators are added is rapidly increasing, and as a result, a large amount of comments from various positions are independently acquired for each of the positions. As a result, it has become possible to create a large-scale comment data set (for example, the position comment list in
The large-scale comment data set is inferior in the comment amount to a general-purpose language model based on another ultra-large-scale data set, but it is possible to construct a language model for each position using both the general-purpose language model and the large-scale position data set by performing optimization such as fine tuning and N-Shot learning on the general-purpose language model trained using the data set for each position.
General-purpose language models 231 to 236 prepared for the players 1 to 3, the chat 204 by the viewer, the commentator, and the expositor are trained using, as training data, comment data sets registered in the position comment groups 221 to 226 of the respective positions. As a result, language models (position comment lists) 241 to 246 of the respective positions are created.
The comment generation unit 40 in the information processing system 1 according to the present embodiment can generate appropriate position comments according to various positions of the virtual commentator by using the language models (position comment lists) 241 to 246 of the respective positions.
Although an example of arranging the comment voice and the avatar animation in the moving image content has been mainly described so far, the processing described based on each flowchart may be executed for generation and addition of a text (caption) comment, or may be executed for both the voice comment and the text comment.
Furthermore, only the text comment and/or the voice comment may be added without adding the avatar animation to the moving image content.
At least some of the setting unit 120, the event extraction unit 30, the comment generation unit 40, the utterance control unit 50, the avatar generation unit 60, the editing/rendering unit 70, and the distribution unit 80 according to the above-described embodiments may be implemented in the user terminal 10/110, and the rest may be implemented in one or more information processing devices such as a cloud server on a network, or all may be implemented in a cloud server on a network. For example, the setting unit 120 may be implemented by the user terminal 10/110, and the event extraction unit 30, the comment generation unit 40, the utterance control unit 50, the avatar generation unit 60, the editing/rendering unit 70, and the distribution unit 80 may be implemented in a cloud server or the like on a network.
One or more information processing devices that execute at least one of the user terminal 10/110, the setting unit 120, the event extraction unit 30, the comment generation unit 40, the utterance control unit 50, the avatar generation unit 60, the editing/rendering unit 70, or the distribution unit 80 according to the above-described embodiment can be implemented by a computer 1000 having a configuration as illustrated in
As illustrated in
The CPU 1001 operates based on a program stored in the ROM 1002 or the storage unit 1105, and controls each unit. For example, the CPU 1001 loads a program stored in the ROM 1002 or the storage unit 1105 into the RAM 1003, and executes processing corresponding to various programs.
The ROM 1002 stores a booting program such as a basic input output system (BIOS) executed by the CPU 1001 when the computer 1000 is started, a program depending on hardware of the computer 1000, and the like.
The storage unit 1105 is a computer-readable recording medium that non-transitorily records a program executed by the CPU 1001, data used by the program, and the like. Specifically, the storage unit 1105 is a recording medium that records a program for executing each operation according to the present disclosure.
The communication unit 1106 is an interface for the computer 1000 to be connected to an external network (for example, the Internet). For example, the CPU 1001 receives data from another device or transmits data generated by the CPU 1001 to another device via the communication unit 1106.
The sensor input unit 1101 includes, for example, a line-of-sight detection sensor or a camera that detects a line of sight of a contributor, a viewer, or the like, and generates line-of-sight information of the contributor, the viewer, or the like based on the acquired sensor information. Furthermore, for example, in a case where the user terminal 10/110 is a game console or the like, an inertial measurement unit (IMU), a microphone, a camera, or the like included in the game console or a controller thereof may be included.
The operation unit 1102 is an input device such as a keyboard, a mouse, a touch pad, a touch panel, or a controller for a contributor or a viewer to input operation information.
The display unit 1103 is a display that displays a game screen or a moving image content. For example, various selection screens as illustrated in
The sound output unit 1104 includes, for example, a speaker or the like, and outputs sound of a game or a moving image content, a voice comment uttered by the virtual commentator in the moving image content, or the like.
For example, in a case where the computer 1000 functions as any one or more of the user terminal 10/110, the setting unit 120, the event extraction unit 30, the comment generation unit 40, the utterance control unit 50, the avatar generation unit 60, the editing/rendering unit 70, and the distribution unit 80 according to the above-described embodiment, the CPU 1001 of the computer 1000 implements the function of each corresponding unit by executing a program loaded into the RAM 1003. Furthermore, the storage unit 1105 stores programs and the like according to the present disclosure. The CPU 1001 reads the programs from the storage unit 1105 and executes the programs, but as another example, these programs may be acquired from another device on the network via the communication unit 1106.
Although the embodiments of the present disclosure have been described above, the technical scope of the present disclosure is not limited to the above-described embodiments as it is, and various modifications can be made without departing from the gist of the present disclosure. In addition, components of different embodiments and modified examples may be appropriately combined.
Furthermore, the effects of each embodiment described in the present specification are merely examples and are not limitative, and other effects may be provided.
Furthermore, each of the above-described embodiments may be used alone, or may be used in combination with another embodiment.
Note that the present technology can also have the following configurations.
(1)
An information processing device including:
an acquisition unit that acquires information regarding a relationship between a contributor of a content and a viewer of the content; and
a comment generation unit that generates a comment to be uttered by a virtual commentator based on the information regarding the relationship.
(2)
The information processing device according to (1), wherein the information regarding the relationship includes at least one of a degree of intimacy between the contributor and the viewer, a relationship between the contributor and the viewer in the content, or history information of the viewer for the content contributed by the contributor in past.
(3)
The information processing device according to (1) or (2), wherein the information regarding the relationship includes at least one of the degree of intimacy between the contributor and the viewer, a relationship between the contributor and the viewer in the content, or history information of the viewer for the content contributed by the contributor in the past.
(4)
The information processing device according to any one of (1) to (3), wherein the acquisition unit sets a position of the virtual commentator, and
the comment generation unit generates the comment according to the position.
(5)
The information processing device according to (4), wherein the comment generation unit generates the comment based on a position comment list in which candidates of the comment to be uttered by the virtual commentator for each position are listed.
(6)
The information processing device according to (5), wherein the comment generation unit generates the comment based on a comment list obtained by excluding comments of a predetermined number of times from comments previously uttered by the virtual commentator from the position comment list.
(7)
The information processing device according to any one of (4) to (6), wherein the acquisition unit causes the contributor or the viewer to select a target of the position of the virtual commentator.
(8)
The information processing device according to any one of (4) to (6), wherein the acquisition unit automatically sets the position of the virtual commentator based on the information regarding the relationship.
(9)
The information processing device according to any one of (1) to (8), wherein the acquisition unit sets a target of the comment to be uttered by the virtual commentator, and
the comment generation unit generates the comment according to the target of the comment.
(10)
The information processing device according to (9), wherein the acquisition unit causes the contributor or the viewer to select a target of the comment.
(11)
The information processing device according to (9), wherein the acquisition unit automatically sets a target of the comment based on the information regarding the relationship.
(12)
The information processing device according to any one of (1) to (11), wherein the comment generation unit generates the comment according to a genre to which the content belongs.
(13)
The information processing device according to any one of (1) to (12), wherein the comment generation unit corrects the generated comment based on the information regarding the relationship.
(14)
The information processing device according to (13), wherein the comment generation unit corrects an ending of the generated comment based on a hierarchical relationship between the contributor and the viewer.
(15)
The information processing device according to any one of (1) to (14), further including an extraction unit that extracts an event of the content,
wherein the comment generation unit generates the comment for the event.
(16)
The information processing device according to (15), wherein in a case where a time difference from a time at which a preceding event occurs to a time at which a next event occurs in the content is less than a predetermined time, the comment generation unit skips generation of a comment for the next event.
(17)
The information processing device according to (15), wherein in a case where a time difference from a time at which a preceding event occurs to a time at which a next event occurs in the content is less than a predetermined time, and a priority of the next event is higher than a priority of the preceding event, the comment generation unit generates a comment for the next event and requests to stop utterance of the comment generated for the preceding event.
(18)
The information processing device according to any one of (1) to (17), wherein the comment generation unit generates a comment to be uttered by each of two or more virtual commentators.
(19)
The information processing device according to (18), wherein the comment generation unit generates the comment in such a way that a second virtual commentator of the two or more virtual commentators makes utterance after utterance of a first virtual commentator of the two or more virtual commentators is completed.
(20)
The information processing device according to (18) or (19), wherein the comment generation unit generates the comment to be uttered by one of the two or more virtual commentators to the other one of the two or more virtual commentators.
(21)
The information processing device according to any one of (1) to (20), wherein the acquisition unit acquires the number of viewers who are currently viewing the content, and the comment generation unit generates the comment to be uttered by each of the virtual commentators whose number corresponds to the number of viewers, and increases or decreases the number of virtual commentators according to an increase or decrease in number of viewers.
(22)
The information processing device according to any one of (1) to (21), wherein the comment generation unit acquires a feedback from the viewer and generates the comment according to the feedback.
(23)
The information processing device according to any one of (1) to (22), further including:
an editing/rendering unit that incorporates at least one of text data or sound data corresponding to the comment into the content and superimposes the animation of the virtual commentator on the content.
(24)
The information processing device according to (23), further including an animation generation unit that generates an animation of the virtual commentator,
wherein the editing/rendering unit superimposes the animation of the virtual commentator on the content.
(25)
The information processing device according to (24), wherein the editing/rendering unit adjusts a position of the animation in the content according to a position of the virtual commentator.
(26)
The information processing device according to any one of (23) to (25), further including an utterance control unit that converts the comment into the sound data.
(27)
The information processing device according to any one of (9) to (11), further including:
an utterance control unit that converts the comment into sound data;
an editing/rendering unit that incorporates the sound data into the content; and
an animation generation unit that generates an animation of the virtual commentator,
wherein the acquisition unit sets a target of the comment to be uttered by the virtual commentator, and
the animation generation unit generates the animation in which an orientation of the virtual commentator is adjusted according to the target of the comment, and
the editing/rendering unit superimposes the animation of the virtual commentator on the content.
(28)
The information processing device according to any one of (1) to (27), wherein the acquisition unit acquires line-of-sight information of the contributor or the viewer, and the comment generation unit generates the comment based on the light-of-sight information.
(29)
The information processing device according to (28), wherein the comment generation unit adjusts a display position of the comment in the content based on the line-of-sight information.
(30)
An information processing method including:
acquiring information regarding a relationship between a contributor of a content and a viewer of the content; and
generating a comment to be uttered by a virtual commentator based on the information regarding the relationship.
(31)
An information processing system in which a first user terminal, an information processing device, and a second user terminal are connected via a predetermined network,
the information processing device including:
an acquisition unit that acquires information regarding a relationship between a contributor who contributes a content from the first user terminal to the information processing device and a viewer who views the content via the second user terminal; and
a comment generation unit that generates a comment to be uttered by a virtual commentator based on the information regarding the relationship.
Number | Date | Country | Kind |
---|---|---|---|
2021-086935 | May 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/048460 | 12/27/2021 | WO |