The present application claims priority to European Patent Application No. 23386122.8, filed Nov. 27, 2023, the contents of which is incorporated by reference herein in its entirety for all purposes.
This disclosure relates to a data processing apparatus and method.
The “background” description provided is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in the background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Video games are sometimes subjected to undesirable “lag”, which is a time delay between a user providing an input (e.g. on a video game controller) and that input being executed in the video game. The lag may be due to network latency (e.g. in an online multiplayer game) or due to latency between a video game controller and a video games console.
Lag can negatively affect a user's performance in the game through no fault of the user. For example, in a video game requiring quick reaction times (such as a first-person shooter, FPS, game), lag may cause a user's instructed actions to be delayed and cause negative consequences for the user (e.g. a successful attack by an enemy player) which would not have occurred if there had been no (or less) lag. Alleviating the consequences of lag is therefore highly desirable.
One way in which the consequences of lag may be alleviated is by predicting, based on a current situation in the video game (e.g. the player's environment and the number and position of enemy players), the next action of the player. If the prediction is correct, the result is as if no (or less) lag occurred. Thus, for example, if the player is predicted to perform an action causing a character controlled by the player to run for cover, this action is started before an input from the player has been received. Once the input from the player has been received and the input corresponds to the predicted action, the effect is that the player experiences their character carrying out the action concerned with no (or negligible) perceivable lag, even if, in reality, there was a lag between the player providing the input and the input being received. This reduces the negative effects of lag and improves the player's gaming experience.
A problem, however, is if the predicted action does not actually correspond with the action instructed via the user input. For instance, if the predicted action is for the player's character to run for cover but, in fact, the player instructs the character to attack, then there is a discrepancy between the predicted action and instructed action which will not become apparent until the player's input instructing the action has been received. If there is a significant lag between the user providing the input and the input being received (e.g. due to network and/or controller communication latency), this will result in the player's character beginning to perform an action not instructed by the user. The predicted action thus needs to be corrected to correspond with the instructed action. There is a desire to achieve this correction while maintaining a natural and engaging gameplay experience for the user.
The present disclosure is defined by the claims.
Non-limiting embodiments and advantages of the present disclosure are explained with reference to the following detailed description taken in conjunction with the accompanying drawings, wherein:
Like reference numerals designate identical or corresponding parts throughout the drawings.
A display device 100 (e.g. a television or monitor), associated with a games console 110, is used to display content to one or more users. A user is someone who interacts with the displayed content, such as a player of a game, or, at least, someone who views the displayed content. A user who views the displayed content without interacting with it may be referred to as a viewer. This content may be a video game, for example, or any other content such as a movie or any other video content. The games console 110 is an example of a content providing device or entertainment device; alternative, or additional, devices may include computers, mobile phones, set-top boxes, and physical media playback devices, for example. In some embodiments the content may be obtained by the display device itself—for instance, via a network connection or a local hard drive.
One or more video and/or audio capture devices (such as the integrated camera and microphone 120) may be provided to capture images and/or audio in the environment of the display device. While shown as a separate unit in
In some implementations, an additional or alternative display device such as a head-mountable display (HMD) 130 may be provided. Such a display can be worn on the head of a user and is operable to provide augmented reality or virtual reality content to a user via a near-eye display screen. A user may be further provided with a video game controller 140 which enables the user to interact with the games console 110. This may be through the provision of buttons, motion sensors, cameras, microphones, and/or any other suitable method of detecting an input from or action by a user.
The games console 110 comprises a central processing unit or CPU 20. This may be a single or multi core processor, for example comprising eight cores. The games console also comprises a graphical processing unit or GPU 30. The GPU can be physically separate to the CPU or integrated with the CPU as a system on a chip (SoC).
The games console also comprises random access memory, RAM 40, and may either have separate RAM for each of the CPU and GPU, or shared RAM. The or each RAM can be physically separate or integrated as part of an SoC. Further storage is provided by a disk 50, either as an external or internal hard drive, or as an external solid-state drive (SSD), or an internal SSD.
The games console may transmit or receive data via one or more data ports 60, such as a universal serial bus (USB) port, Ethernet® port, WiFi® port, Bluetooth® port or similar, as appropriate. It may also optionally receive data via an optical drive 70.
Interaction with the games console is typically provided using one or more instances of the controller 140. In an example, communication between each controller 140 and the games console 110 occurs via the data port(s) 60.
Audio/visual (A/V) outputs from the games console are typically provided through one or more A/V ports 90, or through one or more of the wired or wireless data ports 60. The A/V port(s) 90 may also receive audio/visual signals output by the integrated camera and microphone 120, for example. The microphone is optional and/or may be separate to the camera. Thus, the integrated camera and microphone 120 may instead be a camera only. The camera may capture still and/or video images.
Where components are not integrated, they may be connected as appropriate either by a dedicated data link or via a bus 200.
As explained, examples of a device for displaying images output by the game console 110 are the display device 100 and the HMD 130. The HMD is worn by a user 201. In an example, communication between the display device 100 and the games console 110 occurs via the A/V port(s) 90 and communication between the HMD 130 and the games console 110 occurs via the data port(s) 60.
The controller 140 is an example of a peripheral device for allowing the games console 110 to receive input from and/or provide output to the user. Examples of other peripheral devices include wearable devices (such as smartwatches, fitness trackers and the like), microphones (for receiving speech input from the user) and headphones (for outputting audible sounds to the user).
In an example, if the peripheral device 205 is a controller (like controller 140), the input interface 203 comprises buttons, joysticks and/or triggers or the like operable by the user. In another example, if the peripheral device 205 is a microphone, the input interface 203 comprises a transducer for detecting speech uttered by a user as an input. In another example, if the peripheral device 205 is a fitness tracker, the input interface 203 comprises a photoplethysmogram (PPG) sensor for detecting a heart rate of the user as an input. The input interface 203 may take any other suitable form depending on the type of input the peripheral device is configured to detect.
The controller (typically in a central portion of the controller) may also comprise one or more system buttons 136, which typically cause interaction with an operating system of the entertainment device rather than with a game or other application currently running on it. Such buttons may summon a system menu or allow for recording or sharing of displayed content, for example. Furthermore, the controller may comprise one or more other elements such as a touchpad 138, a light for optical tracking (not shown), a screen (not shown), haptic feedback elements (not shown), and the like.
There is therefore a discrepancy between the predicted action of
It is noted that in multiplayer online game, each player is typically located in a different geographical location with their own games console 110. The games consoles 110 are connected to each other over a network. A signal indicating an input provided by one user to control a character in the game must thus be transmitted from that user's games console to each of the other games consoles in order for that every player to see that character perform the same action. Depending on network conditions, the geographical distances between players, etc. the lag experienced for different players may therefore be different. The games console 110 of each player may therefore be configured to predict the next action of a given user and, once the signal indicating the actual instructed action is received, determine whether or not there is a discrepancy. Each individual games console may then correct the discrepancy in the way described below.
As will be explained, this allows the discrepancy to be corrected in a smooth manner for each games console, even if a different amount of lag is experienced for different games consoles. The example(s) below may be carried out by each games console which is able to make a prediction of a particular action in the game and determine whether or not that prediction was correct once a signal indicating the action which was actually instructed is received (for example, directly from the controller 140 or over a network from another games console of another user).
In particular,
On the other hand,
As discussed, however, a problem occurs if the lag between a user inputting a command (e.g. via controller 140) and a signal indicative of that command being received by the games console 110 is sufficiently large that the signal is only received after the games console 100 has already started to output one or more frames showing the incorrect predicted action.
This is illustrated in
If the action instructed via the input command 600 is the same as the predicted action, then there is no problem. In fact, because output of the successive frames representing the instructed action has already begun, the effect is to alleviate the perception of any lag between the user providing the input command and it being received by the games console.
On the other hand, if the action instructed via the input command 600 is different to the predicted action (e.g. because, rather than the instructed action being for the character 400 to take cover behind the object 402 as predicted, the instructed action is for the character 400 to attack enemy players 403A to 403C, as shown in
As shown in
The present technique addresses this problem as exemplified in
The present technique thus enables a user to benefit from the perception of reduced lag when actions are predicted correctly. At the same time, if an action is predicted incorrectly, then the predicted action is corrected in a smooth and natural manner, thereby maintaining the user's immersion in and enjoyment of the game. User experience is therefore improved.
The interpolated frames 601A and 601B are generated, for example, by interpolating between the current frame 501B (first source frame) of the set of frames showing the incorrect action and the later frame 502E (second source frame) of the set of frames showing the correct action. Any suitable known interpolation technique, such as any suitable known motion interpolation technique, may be used.
In one example, a pose representation of the character 400 is generated for each of the first and second source frames. A pose representation represents the character's body, including arms, legs, torso, neck and head, by lines and joints (connecting the lines) defined in 3D space. In each of the interpolated frames, each joint is positioned between its position in the first source frame and its position in the second source frame.
The position of each joint in each interpolated frame depends on the number of interpolated frames.
For example, if there is to be only one interpolated frame between the first and second source frames, then each joint in the interpolated frame may be positioned halfway between its position in the first source frame and its position in the second source frame.
On the other hand, if there is to be two interpolated frames (as exemplified in
This may be generalized such that, for a given joint and a given number N of interpolated frames, the position of that joint in the nth frame is a fraction
For the avoidance of doubt, the first source frame is the frame of the set of frames showing the incorrect predicted action displayed when the input command is received by the games console 110 (e.g. frame 501B displayed at frame display time t2 in
It will be appreciated that
It is also noted that, to make the movement of the character 400 between frames visually clearer, the frames shown in
An action (such as the predicted action shown in
The input data (during training and/or inference) may comprise various types of data, such as numerical values, images, video, text, or audio. Raw input data may be pre-processed to obtain an appropriate feature vector used as input to the model. For example, features of an image or audio input may be extracted to obtain a corresponding feature vector. It will be appreciated that the type of input data and techniques for pre-processing the data (if required) may be selected based on the specific task the supervised learning model is used for.
Once prepared, the labelled training data set is used to train the supervised learning model. During training, the model adjusts its internal parameters (e.g. weights) so as to optimize (e.g. minimize) an error function, aiming to minimize the discrepancy between the model's predicted outputs and the labels provided as part of the training data. In some cases, the error function may include a regularization penalty to reduce overfitting of the model to the training data set.
The supervised learning model may use one or more machine learning algorithms in order to learn the mapping between its inputs and outputs. Example suitable learning algorithms include linear regression, logistic regression, artificial neural networks, decision trees, support vector machines (SVM), random forests, and the K-nearest neighbor algorithm.
Once trained, the supervised learning model may be used for inference—i.e. for predicting outputs for previously unseen input data. The supervised learning model may perform classification and/or regression tasks. In a classification task, the supervised learning model predicts discrete class labels for input data, and/or assigns the input data into predetermined categories. In a regression task, the supervised learning model predicts labels that are continuous values.
In some cases, limited amounts of labelled data may be available for training of the model (e.g. because labelling of the data is expensive or impractical). In such cases, the supervised learning model may be extended to further use unlabeled data and/or to generate labelled data.
Considering using unlabeled data, the training data may comprise both labelled and unlabeled training data, and semi-supervised learning may be used to learn a mapping between the model's inputs and outputs. For example, a graph-based method such as Laplacian regularization may be used to extend a SVM algorithm to Laplacian SVM in order to perform semi-supervised learning on the partially labelled training data.
Considering generating labelled data, an active learning model may be used in which the model actively queries an information source (such as a user or operator) to label data points with the desired outputs. Labels are typically requested for only a subset of the training data set, thus reducing the amount of labelling required as compared to fully supervised learning. The model may choose the examples for which labels are requested. For example, the model may request labels for data points that would most change the current model or that would most reduce the model's generalization error. Semi-supervised learning algorithms may then be used to train the model based on the partially labelled data set.
Here, the inputs (independent variables) are vectors associated with each of the character 400, enemy characters 403A to 403C and the object 402 the character 400 may use for cover.
Each vector associated with the character 400 includes a first element indicating an amount of health of the character 400 (ranging from full health at 10.0 to death at 0.0) and a second element indicating one of a plurality of predetermined weapons (e.g. “handgun”, “machine gun”, “baseball bat” or “machete”) held by the character 400.
Each vector associated with each of the enemy characters 403A to 403C (denoted “Enemy 1”, “Enemy 2” and “Enemy 3”, respectively in
Each vector associated with the object 402 includes a first element indicating a distance of the object 402 from the character 400 in the x direction and a second element indicating a distance of the object 402 from the character 400 in the y direction.
In this example, the x direction corresponds to the direction in which the character 400 moves from side-to-side and the y direction corresponds to the direction in which the character 400 moves backwards and forwards. The distances in this example are given in virtual meters, for example, which may be determine based on the scale of the video game so that an object in the video game has a size in virtual meters corresponding to the size in meters such an object would have in the real world. However, any other suitable unit (e.g. a normalized unit extending between −1 and 1 in each of the x and y dimensions) could be used. The positions of an object (including both character and non-character objects) are, for example, the center of mass positions of that object.
The labelled outputs (dependent variables) are the action that was instructed for the character 400 by the user at the point at which the values of the independent variables were recorded. The action is one of a plurality of predetermined actions. They include “Run to cover position” (corresponding to the character 400 moving to a position where they are protected from enemy fire, e.g. behind the object 400 as exemplified in
In an example, the training data is collected from real game data collected from players of the particular video game concerned. For example, users may give permission for their gameplay data to be used in training the model. Each time that user takes a particular action, the data of the independent variable vectors is recorded and associated with the label (e.g. “Run to cover position”, “Attack (shoot)” or “Attack (melee)”) indicating the action. The training data set is thus gradually built up.
Once a sufficient amount of training data has been obtained (e.g. a predetermined number such as 10000, 25000, 50000 or 100000 of sets of input and output data, each set corresponding to one row of a table like that shown in
Once trained, the model is able to use new input data (that is, previously unseen independent variable vectors) to predict the action a user will take for that new input data. Specifically, based on the new input data, the model will output a classification indicating one of the plurality of predetermined actions (e.g. “Run to cover position”, “Attack (shoot)” or “Attack (melee)”).
As previously described, frames showing the predicted action are then output by the games console 110 before an input command instructing an action is received from the user. If the predicted action matches the instructed action, this has the effect of alleviating any lag perceived by the user. On the other hand, if the perceived action does not match the instructed action, the predicted action is corrected using interpolation so the frames output by the games console 110 smoothly transition from those of the predicted action to those of the instructed action.
It will be appreciated the input and output data of
It will also be appreciated that the input data will likely vary depending on the genre of game being played. For example, in a soccer game (rather than a first-person shooter), the position of the soccer ball in the game will be part of the input data (but the weapons held by each player and position of objects such as object 402 which players may take cover behind will not be part of the input data). The present technique is therefore not limited regarding exactly what input data is used, since this depends on the video game being played.
In an example, the machine learning model for predicting the intended action of the user is executed by an external server 800 which connects to a plurality of games consoles 110A to 110C over a network 806 (e.g. the internet). This forms a system 1000 exemplified in
This allows the machine learning model to be trained based on the gameplay of multiple real players (with player permission). Data for training the machine learning model can therefore be obtained quickly and easily as users play the video game concerned. Furthermore, once trained, input data can be provided to the machine learning model from each games console 110A to 110C to predict the next action of a given character (e.g. character 400). In response, the machine learning model then provides data indicative of the predicted action to each games console. Each games console then outputs frames according to the predicted action until a signal indicating the instructed action is received (at which point, the output of frames is either continued if the predicted action matches the instructed action or the output of frames is corrected in the way previously described). The present technique thus uses readily available gameplay data to train the machine learning model and make the trained machine learning model available to all users.
When the machine learning model is executed by the separate server 800 rather than on each of the games consoles 110A to 110C (meaning input data must be provided from each games console over the network 806 and data indicating the predicted action must be received over the network), the network architecture may be designed such that this is still associated with less lag overall than each games console waiting for a signal indicating the input command from the relevant user (e.g. the user controlling character 400 in an online multiplayer game—in this case, each of the other users may be controlling a respective one of the enemy characters 403A to 403C, for example).
For example, if all input commands are routed between the games consoles 110A to 110C over the network 806 via the server 800 (or via a separate router (not shown) co-located with the server 800) then, for a games console (e.g. games console 110A) located closer to the server 800 than another games console (e.g. games console 110B), the closer games console 110A may receive the data indicating the predicted action from the server 800 before it receives the input command indicating the actual instructed action from the further games console 110B. The games console 110A is thus able to begin output of frames according to the predicted action before the input command from the games console 110B is received.
In an example, each games console may transmit input data to the server 300 periodically (e.g. every 1, 2, 5 or 10 frames). If the resulting signal indicating the predicted action is received before the relevant input command, the output of frames showing the predicted action is carried out (and continued or corrected depending on the instructed action indicated by the input command when the input command is received). On the other hand, if the relevant input command is received before the predicted action, then the output of frames showing the action instructed by the input command is carried out and the signal indicating the predicted action (once it is subsequently received) is discarded. Action prediction is therefore used only when it provides reduced lag, thereby allowing lag to be alleviated for each games console in the most appropriate way.
The server 800 of
There may be multiple servers 800 distributed around the world, each executing the trained machine learning model. Each games console 110A to 110C then obtains predicted actions from the nearest server 800. This helps further reduce the time required for receiving data indicating the predicted action.
In an example, game developers may access the machine learning model via an application programing interface (API) made available by the server 800. This may allow both training of the machine learning model (e.g. by providing input gameplay data and corresponding actions as labelled output data) and use of the trained machine learning model through specific API calls executed by the video game application concerned on each of the games' consoles 110A to 110C. This reduces the need for individual game developers to have specialist expertise in the machine learning model concerned and also reduces the processing that needs to be carried out on each individual games console (thereby alleviating any reduction in games console performance).
In an example, whenever a prediction action is required, the game application makes an API call to the server 800 indicating the current input gaming data to be used for generating the prediction (e.g. input data vectors in the format exemplified in
It will be appreciated that the present technique may be applied to any game (e.g. virtual soccer games, racing games, combat games, first person shooters, etc.) for which, based on a learned relationship between past input data and instructed actions, the actions of a user can be predicted. The present technique is not limited to actions or commands input by the user using a controller 104. For example, it may be applied to gesture inputs and/or vocal inputs or the like.
The method starts at step 901.
At step 902, video game data (e.g. data representing input variables of the machine learning model, as exemplified by the vectors of
At step 903, data indicating a predicted in-game action of a user is obtained in response to the in-game situation. The predicted in-game action (e.g. the label “Run to cover position”, “Attack (shoot)” or “Attack (melee)”) is output by the machine learning model based on the video game data, for example.
At step 904, output of a first sequence of video frames depicting the predicted in-game action (e.g. the sequence of video frames 501A to 501E) is begun.
At step 905, data indicating an instructed in-game action of the user is received (e.g. via a signal indicating the input command from the user, which is transmitted either directly from the controller 140 or transmitted between games consoles over a network).
At step 906, it is determined whether the predicted in-game action is the same as the instructed in-game action.
If the predicted in-game action is the same as the instructed in-game action, the method proceeds to step 907, at which output of the first sequence of video frames is continued. The method then ends at step 911.
On the other hand, if the predicted in-game action is not the same as the instructed in-game action, the method proceeds to step 908.
At step 908, output of the first sequence of video frames is stopped (so, for example, the currently displayed frame of the first sequence, such as frame 501B in
At step 909, one or more interpolated video frames (e.g. interpolated frames 601A and 601B in
At step 910, output of a second sequence of video frames depicting the instructed in-game action is begun.
The first output video frame of the second sequence of video frames is the video frame output at two or more frame display times after a frame display time of the final output video frame of the first sequence of video frames. Furthermore, the one or more interpolated video frames are output at respective frame display times between the frame display time of the final output video frame of the first sequence of video frames and the frame display time of the first output video frame of the second sequence of video frames. Thus, for example, as exemplified in
The method ends at step 911.
Example(s) of the present technique are defined by the following numbered clauses:
Numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that, within the scope of the claims, the disclosure may be practiced otherwise than as specifically described herein.
In so far as embodiments of the disclosure have been described as being implemented, at least in part, by one or more software-controlled information processing apparatuses, it will be appreciated that a machine-readable medium (in particular, a non-transitory machine-readable medium) carrying such software, such as an optical disk, a magnetic disk, semiconductor memory or the like, is also considered to represent an embodiment of the present disclosure. In particular, the present disclosure should be understood to include a non-transitory storage medium comprising code components which cause a computer to perform any of the disclosed method(s).
It will be appreciated that the above description for clarity has described embodiments with reference to different functional units, circuitry and/or processors. However, it will be apparent that any suitable distribution of functionality between different functional units, circuitry and/or processors may be used without detracting from the embodiments.
Described embodiments may be implemented in any suitable form including hardware, software, firmware or any combination of these. Described embodiments may optionally be implemented at least partly as computer software running on one or more computer processors (e.g. data processors and/or digital signal processors). The elements and components of any embodiment may be physically, functionally and logically implemented in any suitable way. Indeed, the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the disclosed embodiments may be implemented in a single unit or may be physically and functionally distributed between different units, circuitry and/or processors.
Although the present disclosure has been described in connection with some embodiments, it is not intended to be limited to these embodiments. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in any manner suitable to implement the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
23386122.8 | Nov 2023 | EP | regional |