The present invention relates, in general terms, to a system for sensor-based training intervention, and the method implemented on such a system. In particular, embodiments of the present invention relates to sensor-based training intervention for developing particular social behaviours.
Autism Spectrum Disorder (ASD) is a pervasive neuropsychiatric disorder and the top cause of disease burden in children aged 14 and below, in Singapore and worldwide. This lifelong disorder, marked by deficits in social communication, interaction, and imagination, has an average prevalence of 1%. Children with ASD also present with severe functioning problems in day-to-day activities and are at an increased risk of developing depression, conduct disorders, and anxiety disorders.
There is no known cure nor generally approved medication for ASD. Current treatments involve primarily behavioural interventions with limited efficacy as they involve considerable effort and expense for the child and family. Evidence also suggests that early intervention may lead to better outcomes. However, many ASD children are diagnosed late.
Given these limitations, there is a need to explore alternative and novel approaches for early diagnosis and intervention which can lead to improvement in functioning even if a cure is not available.
It would be desirable to overcome or [alleviate/ameliorate] at least one of the above-described problems, or at least to provide a useful alternative.
Disclosed herein is a system for sensor-based training intervention including:
The visuospatial attention indicator is an indication or determination of the level of concentration of the subject on a point or points of interest.
The system may be employed to train social behaviour of the subject, a further comprise a display, wherein, in advance of steps (a) and (b), the display displays a social cue to the subject, and wherein step (c)ii. comprises measuring a visuospatial attention indicator associated with the social cue. Step (c)i. may comprise modelling a joint state space relating to the social cue.
The social behaviour may comprise interacting with the gaze of another person (the third party), and the display may then display the third party to the subject, and the social cue comprises one or both eyes of the third party. The eye or eyes of the third party may have a focus, and step (c)ii. may then comprise measuring a visuospatial attention indicator with reference to the focus. The one or more processors may be configured configured, at step (c)ii., to determine whether the subject is focussing on the focus of the third party. Determining whether the subject is focussing on the focus of the third party may comprise removing the social cue, wherein the one or more processors are configured to measure the visuospatial attention indicator based on whether the combined data infers recollection of the subject of the focus.
The social behaviour may comprise facial recollection, the display displaying a target face and, separately, a plurality of other faces, at least one said other face being the target face, and wherein the one or more processors are configured to measure the visuospatial attention indicator by determining if the subject focuses on the target face in the plurality of other faces.
The social behaviour may instead comprise facial expression recognition, the display displaying a scenario and the social cue, the social cue comprising a plurality of faces, each face of the plurality of faces expressing a response to the scenario, and wherein the one or more processors are configured to measure the visuospatial attention indicator by determining if the subject focuses on the face, of the plurality of faces, for which the response matches the scenario.
The system may be configured to be used repetitively, each subsequent repetition comprising displaying a more difficult or easier social cue depending on the visuospatial attention indicator of a previous repetition.
Also disclosed herein is a method for sensor-based training intervention, comprising:
The method may be employed to train social behaviour of the subject, and further comprise displaying, in advance of steps (a) and (b), a social cue to the subject, wherein step (d) comprises measuring a visuospatial attention indicator associated with the social cue.
Step (c) may comprise modelling a joint state space relating to the social cue. The social behaviour may comprise interacting with the gaze of another person (the third party), and displaying the social cue may comprise displaying the third party to the subject, and the social cue comprises one or both eyes of the third party. The eye or eyes of the third party may have a focus, and the visuospatial attention indicator may be measured with reference to the focus. Step (d) may comprise determining whether the subject is focussing on the focus of the third party. Determining whether the subject is focussing on the focus of the third party may comprise removing the social cue, and measuring the visuospatial attention indicator based on whether the combined data infers recollection of the subject of the focus.
The social behaviour may instead comprise facial recollection, wherein displaying the social cue comprises displaying a target face and, separately, a plurality of other faces, at least one said other face being the target face, and measuring the visuospatial attention indicator by determining if the subject focuses on the target face in the plurality of other faces.
The social behaviour may instead comprise facial expression recognition, and the social cue is displayed in relation to a scenario, the social cue comprising a plurality of faces, each face of the plurality of faces expressing a response to the scenario, and measuring the visuospatial attention indicator comprises determining if the subject focuses on the face, of the plurality of faces, for which the response matches the scenario.
Steps (a) to (d) may be performed repetitively, wherein displaying the social cue for a repetition comprises displaying a more difficult or easier social cue depending on the visuospatial attention indicator of a previous repetition.
Advantageously, embodiment of the present invention combines eye tracker data and EEG data and models a combined feature space or state space from which it can be determined whether there is a point of interest on which the subject is focusing and/or whether the subject is focusing on a specific point of interest. From that determination, it can be inferred whether the subject understands a particular social cue with which they are presented.
More broadly, therefore, embodiments of the present invention enable the system or method to determine whether the subject is focusing on a point of interest that indicates they understand a social cue with which they are presented.
Embodiments of the present invention will now be described, by way of non-limiting example, with reference to the drawings in which:
The systems and methods disclosed herein may measure the interactive focus of a subject (may also be referred to as a patient) on points of interest. Some prior art endeavours to achieve this by measuring electroencephalogram (EEG) signals and inferring concentration from those signals. However, the signals fail to take into account what the subject is looking at. For example, while concentrating, the gaze of a subject may move across multiple points of interest. Therefore, the subject will be concentrating but that concentration is placed on a thought rather than anything necessarily in the visual field of the subject. In other cases, the prior art focuses on tracking the eye movement of the subject and inferring, when the gaze lingers on a particular point, that the subject is focused on that particular point. However, the subject may not be concentrating at all.
In contrast, systems and methods disclosed herein seek to identify a joint feature space or joint state space of EEG and eye-tracker signals from which to infer a level of focus on points of interest. Thus, systems and methods disclosed herein may determine interactive focus on points within the subject's field of view.
Such a method 100 is broadly defined in
The method 100 may be employed, for example, on a computer system 200 as shown in
As shown, the computer system 200 includes the following components in electronic communication via a bus 212:
Although the components depicted in
The main subsystems the operation of which is described herein in detail are the EEG sensors 202, the eye trackers 204, the one or more processors (i.e. N processing components) 216 and display 208. The sensors 202 and 204 measure a subject response to social cues or stimulate presented on display 208. The one or more processors 216 then interpret the data from the sensors 202 and 204 to measure a visuospatial attention indicator from which the correctness or otherwise of the subject response to the social cues can be inferred or determined. The display 208 may be realized by any of a variety of displays (e.g., CRT, LCD, HDMI, micro-projector and OLED displays).
In general, the non-volatile data storage 210 (also referred to as non-volatile memory) functions to store (e.g., persistently store) data and executable code, such as the instructions necessary for the computer system 200 to perform the method 100. The executable code in this instance thus comprises instructions enabling the system 200 to perform the methods disclosed herein, such as that described with reference to
In some embodiments for example, the non-volatile memory 210 includes bootloader code, modem software, operating system code, file system code, and code to facilitate the implementation components, well known to those of ordinary skill in the art that, for simplicity, are not depicted nor described.
In many implementations, the non-volatile memory 210 is realized by flash memory (e.g., NAND or ONENAND memory), but it is certainly contemplated that other memory types may be utilized as well. Although it may be possible to execute the code from the non-volatile memory 210, the executable code in the non-volatile memory 210 is typically loaded into RAM 214 and executed by one or more of the N processing components 216.
The N processing components 216 in connection with RAM 214 generally operate to execute the instructions stored in non-volatile memory 210. As one of ordinarily skill in the art will appreciate, the N processing components 216 may include a video processor, modem processor, DSP, graphics processing unit, and other processing components. The N processing components 216 may form a central processing unit (CPU), which executes operations in series. In some embodiments, it may be desirable to use a graphics processing unit (GPU) to increase the speed of analysis and thereby enable, for example, the real-time assessment of visuospatial attention—e.g. during performance of a task. Whereas a CPU would need to perform the actions using serial processing, a GPU can provide multiple processing threads to perform processes in parallel.
The transceiver component 218 includes N transceiver chains, which may be used for communicating with external devices via wireless networks, microphones, servers, memory devices and others. Each of the N transceiver chains may represent a transceiver associated with a particular communication scheme. For example, each transceiver may correspond to protocols that are specific to local area networks, cellular networks (e.g., a CDMA network, a GPRS network, a UMTS networks), and other types of communication networks. In some embodiments, one or both of sensors 202 and 204 may be remote, rather than form components of the system as shown with reference to
Reference numeral 224 indicates that the computer system 200 may include physical buttons, as well as virtual buttons such as those that would be displayed on display 208. Moreover, the computer system 200 may communicate with other computer systems or data sources over network 226.
It should be recognized that
To provide versatility, it may be desirable to implement the method 100 in the form of an app, or use an app to interface with a server on which the method 100 is executed. These functions and any other desired functions may be achieved using apps 222, which can be installed on a mobile device.
The system 200 may be more realistically presented in the network or system 300 shown in
The system 300 can be employed to train social behaviour of the subject. To that end, in advance of performing steps 102 and 104, the display 304 will display a social cue to the subject 308. The workstation 310 then measures the visuospatial attention indicator associated with the social cue. This can involve modelling a joint state space relating to the social cue. This ensures the workstation 310 identifies features, or common features, in the EEG data and eye tracker data that are important for the particular social cue in question. In some cases, the joint state space may be a similar or the same state space for all social behaviour training programs.
The method 100, when performed in a system such as computer 200 or 300, produces a computerised system that tracks and visuospatial attention to train visual, memory and social skills of, for example, autistic children. The method and system enable customised social skills training to be delivered through software intended to target particular deficiencies in visual and social functions. These deficiencies include deficiencies in the ability of the subject to identify facial expressions and emotions, maintain eye-contact and interact with other people, and perform facial recognition.
As discussed below, the steps of the method may be performed repetitively, by displaying social cues at a repetition, and displaying a more difficult or easier social cue at any particular repetition depending on the visuospatial attention indicator of a previous repetition. As such, progressive training programs can be implemented. These training programs can range from guided (more easy) to unguided (more difficult) scenarios. The training programs may also arrange from abstract to more realistic scenarios where the user is first exposed to cartoons followed by realistic faces or human faces.
With further reference to system overview
In general, customised training program will comprise a series of exercises (repetitions of the steps of method 100) integrated with the physiological measurements—i.e. data as measured by sensors 302 and 306. These exercises include maintaining eye contact, recognising facial expressions or emotions, and training the focusing of attention. The eye tracker 302 allows an accurate mapping of the subject's eyes onto the targets or points interest—in some embodiments a point of interest may be a target face, being a face the subject is being trained to recognise. Relatedly, the EEG headband 306 provides an objective measurement of the subject's attention level while looking at these targets.
To improve engagement over delivery of standard starting material, the present system 200, 300 may gamify the delivery of method 100. The general gameplay mechanism is summarised in
Depending on the subject's progress, the subject will be presented with a target objective such as a virtual avatar's face on which to focus. At the start of each trial (repetition), the software embodying method 100 will display via display 208, 304 the target objective to the subject and continuously monitor the subject's visuospatial attention from the subject's eye gaze and brain computer interface (BCI) score—the BCI score will hereinafter be interchangeably referred to as the visuospatial attention indicator.
The general gameplay mechanism 400 involves the selection of appropriate objectives—e.g. social behavioural training objectives or programs, whether abstract or real—402. This may be done in an automated way, such as during a program in which a subject is tested on all available social behavioural programs in sequence, or in a manual way. After selection of appropriate objectives, the trial commences—404. Software implementing the method 100 shows the target objective to the subject—406. In some embodiments, the subject will be made aware of the nature of the exercise they are about to undertake—e.g. their ability to track the gaze of an avatar displayed to them—and in other embodiments they will be unaware of the exercise so that the system 200, 300 can determine whether the subject's response is a natural or learned one. During display of the target objective, the hybrid BCI (i.e. sensors 202, 204, 302, 306) monitors eye gaze positions of the subject and EEG signals of the subject—408. The processors 316 then measure the subject's visuospatial attention or indicator as computed from the combined EEG and eye tracker data—410.
Based on the visuospatial attention indicator, the system 200, 300 may determine whether the visuospatial attention of the subject was sustained on the target objective, for example a target face or object—412. It visuospatial attention was appropriately sustained, the trial ends—414. At this point, the difficulty of the objective may be increased, for example made more abstract, or the objective (the social behaviour being trained) may be changed. If visuospatial attention was not maintained or was below a desired threshold, guidance may be provided to the subject to help them focus—416. In some embodiments, the gamification mechanism may revert back to step 402 and select an easier objective for the subject to attempt, and perform a new trial.
An example of feedback delivered during performance of gameplay mechanism 400 is shown in
As mentioned above, one social behaviour that can be trained is the interaction between the subject and the gaze of another person (a third party or, presently, a virtual avatar). The social cue in this instance will be one or both eyes of the virtual avatar, which may include the direction the eyes are looking. The system 200, 300 may therefore provide a training scenario involving training the subject to focus on and/or follow the gaze of other people.
There are various mechanisms envisaged for achieving this. In one embodiment, the program trains the subject to process and follow eye gazes of another person, or other people. The program comprises one or more trials 600, each being divided into three parts as illustrated in
In the first part 602, the target objective is to identify the eyes of the virtual avatar 604. The subject is required to focus on the eyes. Presently, the eyes are looking in a particular direction as shown. The system 200, 300 may, using eye trackers, confirm whether or not the subject is focusing on a location of the display corresponding to the eyes of the virtual avatar 604.
In the second part 606, various objects are introduced into the field of view of the display—607. The direction of the eyes of the virtual avatar 604 remain unchanged. However, the eyes of the virtual avatar 604 now have a focus, namely the ice cream cone. The objective is to have the subject focus on the object at which the eyes of the virtual avatar 604 are looking. The system 200, 300 may therefore measure a visuospatial attention indicator with reference to the focus. For example, the system 200, 300 may determine whether the subject is looking at the object at which the virtual avatar is looking—i.e. whether the subject is focusing on the same thing as the virtual avatar.
In the third part 608, the virtual avatar is removed. The objective is to have the subject identify the same object, presently the ice cream cone, from a variety of objects displayed to the subject. Therefore, determining whether the subject is focusing on the same focus as that which the virtual avatar focussed on in 606, may involve removing the social cue (i.e. the avatar or their directional gaze) and determining whether the visuospatial attention indicator infers recollection of the subject (i.e. the ice cream cone) of the virtual avatar is focus.
The third part 608 may involve shuffling the objects, or introducing or replacing some objects with new objects. This increases the difficulty of recollection of the object on which the virtual avatar was focusing.
A flow chart 700 for illustrating the process of
Another social behaviour that may be sought to be trained is facial recollection. Processing of faces is an important element for social interaction. However, individuals with Autism Spectrum Disorder often show a general face discrimination deficit. Accordingly, the system 200, 300 may include training programs (e.g. stored in memory 210) to train the subject to focus their attention and discriminate different faces as shown on the display—e.g. in a bubble launching game such as that shown in
In this scenario, the subject focuses on one of a plurality of other, candidate, faces at the top of the display 800 that best matches a target face 802 at the bottom of the display 800. Presently, the best match is candidate face 804. Therefore, the system 200, 300 measures the visuospatial attention indicator by determining of the subject focuses on the target face in the plurality of other faces.
The scenario displayed in
Another training program for training facial recognition is shown in
Another social behaviour that may be sought to be trained is facial expression recognition. To train facial expression recognition, the display can be used to display a scenario and a social cue. For example, the scenario may be a picture or series of pictures or text designed to elicit an emotional response from a person. The social cue in this instance may be a plurality of faces each of which expresses a different response to the scenario, such as a different emotional expression—e.g. laughter, happiness, sadness or shock. The visuospatial attention indicator can then be measured by determining if the subject focuses on the face, of the plurality of faces, for which the response matches the scenario. An instance of such training as reflected in
As mentioned above, the method 100, and consequently the system 200, 300 implementing that method, may provide progressive levels of difficulty to challenge and engage the subject based on their performance. To challenge the subject across multiple sessions during which they engage with the system 200, 300, the training exercises or trials employ a progressive level advancement structure. For example, when training the ability of the subject to follow the gaze of a virtual avatar, the number of objects in the visual field of the display may increase in more advanced levels, when the subject as shown aptitude in the social behaviour by correctly answering the earlier levels. In the example shown in
The method 100 may also progress from guided training to non-guided training. In earlier, easier levels, the exercises are designed to guide the subject step-by-step. An example of the progressive nature of the method 100 is shown in
In more guided examples, only the eyes are shown and the other parts of the face are masked out to reduce the amount of distracting information presented to the subject. In the non-guided examples, the full face is shown and no further clues are given on where to focus. This is shown by the arrow 1300 indicating progressively increased difficulty as progressively more distracting information is introduced to the subject.
Progressively increased difficulty can also apply when transitioning from recognising a gaze of a set of cartoon eyes when compared with recognising the gaze of virtual human eyes. In earlier levels, abstract cartoon characters are used to attract the subject's attention. In later levels, pictures of real humans are used. The objective is to guide the subject towards familiarising themselves with real people in real life situations, and the manner in which people behave or facial information should be interpreted in real life situations. Increased difficulty in moving from more abstract representations to more real-life representations is indicated by arrow 1302.
If the BCI score is high during gameplay, indicating a high degree of error, and the player continues to make mistakes in their selection of the appropriate object, facial expression or face match, the method 100 may guide the player towards a correct selection. As a consequence, the level of difficulty the creases with increased guidance. This can be done in several ways. Examples are shown in
To combine the EEG data and eye tracker data, sequential Bayesian inference is proposed.
To perform sequential Bayesian infusion of EEG and eye tracker measurements, the following state space model for the visual spatial attention process is considered:
s
t
=As
t−1
+n
s (1)
where st is a vector describing the visual attention state at time t, and st−1 is the state at time t−1. The particular state vector presently proposed is described by Equation 2. A is the state transfer matrix that describes a linear transformation model of the visual attention state from time t−1 to t. ns is the stationary state process noise associated with the linear transformation model. In particular, the following visuospatial attention state vector is used:
s=[x,v,β] (2)
where x=[x, y, z] is the Cartesian coordinate triplet that defines the gaze position, v=[vx, vy, vz] is the linear speed of the gaze motion in the Cartesian space, and β is the score of attention.
This state space model is thus parametrised by A and nx only. The two parameters can be customised (e.g. through machine learning) to fit into different visuospatial attention processes. Described below is a basic example of the model parameters that fits visual attention processes as Gaussian processes, in which attention score β follows a random walk process and the motion of the gaze point following a smooth trajectory. Specifically, Matrix A takes the following form:
where Δt is simply the time interval between t and (t−1).
Now the state vector St is recursively estimated from a sequence of measurement data points in EEG and eye-tracker. We take a linear generative model that associates the state vector with the measurement vector ut:
u
t
=Bs
t
+n
u (4)
Here, ut contains measured variables from an EEG and/or eye-tracker, B is the mapping matrix that describes the transfer from state-space to measurement-space, and nu is another stationary state process noise for the measurement noise. The measurement vector used for present purposes comprises two parts:
u=[{circumflex over (x)},w] (5)
The first part, {circumflex over (x)}, is the measured location of the gaze by the eye-tracker or sensor, and the second part, w, is the EEG representation vector of attentional state. Generally, w can be a combination of both temporal and spectral EEG features. Any validated EEG features can be used to represent the brain signals for attention. Presently, the feature extraction algorithm used in the presently described attention detection and training system generates the measurements or features.
A Guassian random variable model is then used for the noise component that is characterised by:
where T is the transpose operator, μ the mean value, and Σ the covariance matrix.
The sequential Bayesian fusion algorithm is implemented in a series of steps. Firstly, the algorithm is initialised. That involves setting initial values for the state vector x0 at time step t=0. Thereafter the algorithm moves to the next time point which, without loss of generality, is reflected by t+1->t. Subsequently, the measurement vector is computed from EEG and eye tracker data: ut. The initial prediction is then computed for the state vector using the state transition model under:
ŝ
t
−
=Aŝ
t−1; (7)
The a posteriori estimate of the error covariance is then made according to:
P
t
−
=AP
t−1
A
T+Σs; (8)
The innovation matrix K is subsequently updated, where:
K
t
=P
t
−
B
T(BPt−BT+Σu)−1 (9)
The state vector estimate is then updated, where:
s
t
=s
t
−
+K
t(uk−Bst−) (11)
And the error covariance estimate is updated, where:
P
t=(I−KtB)Pt− (12)
The algorithm then repeats by incrementing the time step. Using this algorithm, the EEG data and eye tracker data may be fused (i.e. combined) to yield visuospatial attention representation of that data, they can be used to measure visuospatial attention indicator for the subject at the time the data was measured.
It will be appreciated that many further modifications and permutations of various aspects of the described embodiments are possible. Accordingly, the described aspects are intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2020/050565 | 10/7/2020 | WO |