The present application is a U.S. National Phase of International Patent Application Serial No. PCT/RU2020/000478 entitled “SYSTEM AND METHOD FOR DETERMINING COGNITIVE DEMAND,” filed on Sep. 11, 2020. The entire contents of each of the above-referenced applications are hereby incorporated by reference for all purposes.
The present disclosure relates to systems and methods for determining cognitive demand, and more particularly, to determining the cognitive demand of a driver of a vehicle.
Vehicle drivers may be distracted from driving by simultaneously executing one or more additional tasks, or by experiencing stress induced by the traffic situation. This increases the cognitive demand of the driver, which may adversely affect the driver's performance, depending on whether he is distracted or not. Advanced driver assistance systems to support the driver are typically configured for a constant and low cognitive demand. These systems therefore benefit from detecting the cognitive demand of a driver. Accordingly, a need exists for detecting the cognitive demand of a person, in particular of a driver of a car.
The topic of cognitive demand is discussed in:
The validity of driving simulators is disclosed in:
The following documents relate to configuring vehicle-mounted electronics according to the state of mind of the driver:
The following disclosures relate to analysis of the state of mind:
Disclosed and claimed herein are methods and systems for determining cognitive demand.
The present disclosure relates to a computer-implemented method for determining a cognitive demand of a user. The method comprises
The disclosure is based on the principle that cognitive demand of a user can be determined using externally observable biosignals. Biosignals are recorded, and a change in biosignals is detected, i. e. a deviation from a baseline. An artificial neural network is trained to determine how the cognitive demand biases biosignals. The artificial neural network may then obtain information on the cognitive demand level, even in a noisy environment such as a vehicle. The term “cognitive demand” is understood to encompass both cognitive load and stress. Cognitive load is related to whether the user is only occupied with the first task, e. g. driving a vehicle, or flying a plane, or whether the user is simultaneously occupied with at least one task in addition to the first task. Stress may be induced by the first task itself. For example, a driver of a vehicle may be stressed by a dangerous traffic situation, or by any other stress-inducing event.
The method comprises a training phase and an inference phase. In the training phase, biosignals are recorded from a subject in a training setting similar to the setting in which the method is to be used. This may be, e. g., a laboratory setup such as a driving simulator or a more realistic situation such as driving a vehicle. Information on the cognitive demand the subject experiencing, in particular if the cognitive demand is increased with respect to the typical cognitive demand of driving, is supplied along with the biosignals as a training dataset to the artificial neural network. The artificial neural network is configured to receive one or more biosignals, such a heart interbeat interval (IBI), or eye gaze, as an input, and to produce an output signal indicative of the cognitive demand level. By training, weights in the artificial neural network are determined that relate the biosignals to the output cognitive demand level signal. In the inference phase, the artificial neural network yields an output signal indicating a cognitive demand level. When the signal exceeds a predetermined threshold, the cognitive demand level is considered high. In response to that determination an output signal is generated, and other connected devices may be configured differently. For example, the sensitivity of a steering wheel or a braking pedal may be modified. Determining the biosignals may include recording data by a sensor and converting the data into a biosignal, by one or more pre-analysis steps as described below.
In an embodiment, the method further comprises converting the one ore more biosignals, after recording the biosignals, into a continuous format indicative of a frequency and/or duration of events per time by applying a time window to the recorded biosignals, the time window in particular being a sliding time window. This preprocessing step allows specifying the biosignal in a useful format, e. g., the number of heart beats by time. By averaging the biosignals over a time window, quick changes are smoothed out. Thereby, if a measurement fails for a short period in time due to noise, a valid signal is continuously available.
In an embodiment, the at least one first task comprises driving a vehicle. The method is particularly useful for determining a driver's cognitive demand level because the driver's biosignal can be obtained with a variety of sensors comprised in the vehicle. Furthermore, determining a driver's cognitive demand level is particularly important to improve road safety.
In an embodiment, the one or more biosignals comprise a heart interbeat interval, IBI. The heart IBI is a particularly useful biosignal as it is related to both cognitive load and stress, and because it can be detected with a plurality of methods including contactless methods.
In an embodiment, recording a biosignal comprises determining the heart interbeat intervals by means of a contact sensor attached to the user. Alternatively, recording the biosignal comprises determining the heart interbeat intervals by means of an RGB camera facing the user. This allows detecting a heart interbeat interval without the need of a sensor that is in direct contact with the user. Thereby, the user can use the system without the need to be connected to a sensor.
In an embodiment, the biosignal comprises eye metrics, in particular one or more of eye gaze, eye openness, and eye movement data. Eye metrics are highly related to cognitive load.
In an embodiment, recording the biosignal comprises: capturing, by a camera, images of the user, and analyzing the images to determine the eye metrics, in particular one or more of eye gaze, eye openness, and eye movement data. This method is contactless and requires only the use of a camera facing the user. This improves the ease of use, in particular in a vehicle.
In an embodiment, the method further comprises analyzing, by a pre-processor, the eye metrics to determine occurrences of one or more of fixation, saccade, and eye blinking, in particular blink duration, eyelid close speed, eyelid open speed. Thereby, entropy in the data is reduced, and the biosignals comprise information particularly adapted for training a neural network.
In an embodiment, the cognitive demand comprises cognitive load related to the user being occupied with at least a second task other than the first task in addition to the first task. This is one of the major applications of the present disclosure. A user, e. g. a driver of a vehicle or a pilot of an aircraft, may be distracted by a plurality of other activities, for example the use of electronic devices, such as a telephone, or by other persons in the vehicle. If the user is occupied by a second task, this induces an increased cognitive load. The cognitive load can then be detected by a biosignal, in particular eye movement characteristics, which are particularly affected by cognitive load. However, other biosignals may as well be used, for example the heart beat interval.
In an embodiment, the method further comprises:
Thereby, the output of the artificial neural network is transformed into a binary output signal that represents information whether the user is experiencing high cognitive demand or not. In particular, “high cognitive demand” may be defined as a Boolean variable indicative of whether the cognitive demand significantly exceeds the cognitive demand typical for a driving situation. This may be the case either due to the driver being distracted by a second task, or due to stress experienced by the driver. The binary output signal may then be used to control devices and systems comprised in, attached to, or connected to the vehicle.
In an embodiment, the cognitive demand comprises stress. The stress can be sensed by biosignals. For example, a heart interbeat interval is strongly related to stress and therefore well suited for stress detection.
In an embodiment, the method further comprises:
Thereby, a binary signal is generated that does not only depend on the current cognitive demand level as determined on-line, but allows also to reacting to a short-time peak in cognitive demand in a predetermined time period in the past. If, for example, the user was stressed due to a stressful traffic situation, there is an increased probability that further stress is imminent, and psychological effects of stress may remain longer than physical effects. Therefore, there is an advantage if the stress level indicated at the output is maintained high for a certain previously determined duration.
In an embodiment, the method further comprises undertaking, based on the determination that the user is experiencing high cognitive demand, one or more of the following actions:
These actions have the effect of increasing the traffic safety by reacting to the increased cognitive demand of the driver. Any of these actions may be executed if high cognitive demand is determined in any way as described in the present disclosure. In particular, different actions may be executed depending on whether the cognitive demand is related to cognitive load, i. e. the driver executing a second task, or to a stress level.
In a second aspect of the disclosure, a system for determining a cognitive demand level of a user is provided. The system comprises a first computing device, a first sensor, a second computing device, and a second sensor. The system is configured to execute the steps described above. All properties of the computer-implemented method of the present disclosure are also valid for the system.
The features, objects, and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference numerals refer to similar elements.
The data from the cameras and/or other sensors are processed by a computing device 106. The components 108-118 of the computing device 106 may be implemented in hardware or in software. First, metrics are generated based on the camera images by a metrics generator, which may be implemented as a heart inter-beat interval extractor 108, or an eye gaze extractor 110, to convert the camera image into a biosignal, as discussed with reference to step 206 in
The process begins, 202, when operation of the system, in particular operation of a vehicle to which the system is attached, is initiated. Images of a user's face are captured with a camera, 204, at a frame rate of 30 fps-60 fps, for example. The following steps are executed for each image, but may comprise re-using intermediate results from previous images. Therefore, the biosignals may be generated at a rate equal to the frame rate of the cameras.
Metrics are generated, 206, based on the camera images. These metrics comprise, in particular, a heart interbeat interval (IBI), and/or eye gaze metrics. To determine a heart IBI, the head of the user is identified on the image and a region of interest is identified that moves with the head if the user turns the head. This generates a signal indicative of the pixel values in the region of interest. After removal of perturbing effects, periodic variations indicative of the heart IBI are found, and the heart IBI is determined based on these periodic variations. This particular process is, however, only an exemplary embodiment of a process for sensing a heart IBI. Alternatively, the heart IBI may be determined by a contact sensor, based on electrocardiography, for example.
In the same step, eye movement characteristics are inferred from the images of the infrared camera 104. On the image, a position of the pupil of a user's eye is determined in relation to the user's head. Thereby, a continuous sequence of coordinates is determined that describes the direction of a vector pointing outwards from the center of the pupil, i. e. normal to the surface of the eye. The coordinates can be either Euler angles or be coordinates of a vector in three-dimensional space. In order to determine by how much the eyes are open, the contours of the upper and lower eyelids are determined, and a maximum distance between them is determined. This maximum distance is an indicator of the openness of the eye and can be normalized, i. e. expressed as a percentage of the state of the eyes being fully open. Alternatively, it can be expressed as a distance in pixels or in millimeters or milliradians using an appropriate calibration. This particular method is, however, only an exemplary embodiment of a sensor for eye gaze and openness information. Furthermore, the present disclosure is not limited to detecting data related to eyes and heart IBI. Other biosignals may be measured in other exemplary embodiments.
Optionally, second order metrics may be derived, 208. This term relates to behavioral patterns with a finite temporal duration, which have a physiological meaning. For example, periods when the eyes are closed (i. e. the openness is below a threshold) may be referred to as blinks. During this time, it is impossible to determine the eye gaze. In contrast, when the eyes are open, the eye gaze can be determined. Fixations are time intervals of slow eye movement, defined as periods during which the eye gaze does not move more than by a two degree angle during a predetermined duration. In contrast, the rest of the time, i. e. when the eyes are moving comparably quickly, is considered a sequence of saccades (periods of fast eye movement). Thereby, any moment in time can be classified unambiguously as belonging to a saccade, a fixation, or a blink. Based on this classification, length of each timeframe, average gaze movement speed, blink duration, eyelid closing speed, eyelid opening speed, and other second order metrics may be determined.
The metrics are further converted into continuous metrics, 210. A heart IBI may be normalized and averaged within a time window. Likewise, blink frequency, eye openness, and eye movement characteristics may be averaged over the duration of a sliding window. In particular, the sliding windows may be of different lengths and shifted with respect to each other. If, for example, high cognitive load leads to a modification in eye movement after a short time and to a different heart IBI after a longer time, the two sliding time windows for these to biosignals that correspond to each other may not be of equal length, and may be non-overlapping.
Furthermore, the discrete sequence of saccades, fixations and blinks may also be transformed into continuous representation. For example, in a predetermined time window of 30 seconds, the user may blink six times, hold three fixations with a total time of 10 seconds and hold four fixations with overall time of 18 seconds. The corresponding continuous metrics comprise then: a number of blinks equal to 6, a number of fixations equal to 3, an overall fixation time equal to 10 seconds, and an average fixation time equal to 3.3 seconds. The window may be sliding so that the next evaluation of continuous metrics may be done within the window starting 1 time unit later, having significant overlap with the previous window. Thereby, the metrics as output signals vary continuously with time, and do not typically jump from one value to an entirely different value from one instant in time to the next. Thereby, the effect of noise and spurious effects on the output signals is reduced. These metrics are then used to determine a cognitive demand level as described with reference to
The process begins, 302, when the operation of the system is initiated. Metrics are determined, 304, according to the process described in
Controlling or modifying settings may comprise, for example, transmitting a warning to one or more vehicles in proximity of the vehicle to inform them about a stressed or distracted driver. Audio, lighting, navigation, or air conditioning systems in the vehicle may be set to assist the driver focus on traffic. For example, music on the radio may be set to a lower volume. Further, a driver assistance system may be configured differently, e. g. to the effect that the brakes react faster to any action on the braking pedal, or that a lane keeping system is activated. A speed limiter may be activated to prevent the vehicle from exceeding a predetermined maximum speed. Also, an automatic emergency stop maneuver may be executed in case of extreme cognitive demand.
The artificial neural network may be trained by letting a large number of subjects, e. g. some hundred subjects, drive in a car simulator, in a controlled environment. Thereby, the training conditions are more flexible, controllable, and safe. Known driving simulators are sufficiently similar to real-world driving to allow training an artificial neural network.
The biosignals are recorded and processed, 404, in the same way as during execution of the method for inference of the artificial neural network, as described with reference to
In an alternative embodiment, the controlled variable is a stress-inducing situation. In order to train the artificial neural network, a potentially dangerous situation occurs and the occurrence is indicated as part of the second training data subset. For example, a simulator may simulate an ordinary traffic situation for ten minutes, and then a series of dangerous situations that also lasts ten minutes. The artificial neural network can be trained using these datasets. This procedure may be improved by measuring the user's stress level using a variety of other methods, such as electrocardiography, or electroencephalography, at the same time, thereby obtaining precise information on the stress level. Additionally or alternatively, the stress level may be characterized by determining correlated values, such as the reaction time of the user, the stability of holding a lane, or time allocation patterns (i. e. how much time the user pays attention to the road, the dashboard, or mirrors, for example). Alternatively or additionally, the expected or realized stress level of the situation may be determined manually after assessment of both situation and recorded data, in order to adjust the second training dataset before initiating training. The second training dataset thus comprises labels given to a period in time upon recording the first training dataset. The labels indicate a stress level according to the assessment based on suitable knowledge from neurophysiological research. This allows training the artificial neural network to detect the stress actually experienced by the user, so that different stress reactions among the training subjects do not induce uncertainty into the training phase. Thus, by reducing the number of false positives and false negatives, the accuracy of the training is improved, i. e. the F1 score is increased. Thereby, a smaller number of training runs is necessary.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/RU2020/000478 | 9/11/2020 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2022/055383 | 3/17/2022 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6090051 | Marshall | Jul 2000 | A |
6102870 | Edwards | Aug 2000 | A |
7344251 | Marshall | Mar 2008 | B2 |
7435227 | Farbos | Oct 2008 | B2 |
7438418 | Marshall | Oct 2008 | B2 |
7938785 | Aguilar et al. | May 2011 | B2 |
9132839 | Tan | Sep 2015 | B1 |
9248819 | Tan | Feb 2016 | B1 |
9646046 | Sadowsky et al. | May 2017 | B2 |
9723992 | Senechal et al. | Aug 2017 | B2 |
9763573 | Distasi et al. | Sep 2017 | B2 |
9934425 | el Kaliouby et al. | Apr 2018 | B2 |
10111611 | el Kaliouby et al. | Oct 2018 | B2 |
10368741 | Courtemanche et al. | Aug 2019 | B2 |
10399575 | Spasojevic et al. | Sep 2019 | B2 |
20050128092 | Bukman | Jun 2005 | A1 |
20070066916 | Lemos | Mar 2007 | A1 |
20080150734 | Johns | Jun 2008 | A1 |
20100117814 | Lermer | May 2010 | A1 |
20180125356 | Yamada | May 2018 | A1 |
20180125405 | Yamada | May 2018 | A1 |
20180125406 | Yamada | May 2018 | A1 |
20190038204 | Beck et al. | Feb 2019 | A1 |
20190101985 | Sajda et al. | Apr 2019 | A1 |
20200003570 | Marti et al. | Jan 2020 | A1 |
20200156654 | Boss et al. | May 2020 | A1 |
20210245766 | Sato | Aug 2021 | A1 |
Number | Date | Country |
---|---|---|
109572550 | Apr 2019 | CN |
102015200775 | Jul 2016 | DE |
2006024129 | Mar 2006 | WO |
2008107832 | Sep 2008 | WO |
2015116832 | Aug 2015 | WO |
Entry |
---|
Machine Translation of German Publication No. DE 102015200775 A1 of Decke et al., Jul. 21, 2016, Translated on Sep. 30, 2024. |
Hone, K. et al., “Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI),” Natural Language Engineering, vol. 6, No. 3-4, Sep. 1, 2000, 17 pages. |
“Driver Workload Metrics Project Task 2 Final Report,” NHTSA Website, Available Online at https://www.nhtsa.gov/sites/nhtsa.gov/files/documents/driver_workload_metrics_final_report.pdf, Nov. 2006, 460 pages. |
Niculescu, A. et al., “Stress and Cognitive Load in Multimodal Conversational Interactions,” ResearchGate Website, Available Online at 228782716_Stress_and_Cognitive_Load_in_Multimodal_Conversational_Interactions, May 2010, 6 pages. |
Ekanayake, H. et al., “Comparing Expert Driving Behavior in Real World and Simulator Contexts,” International Journal of Computer Games Technology, vol. 2013, No. 891431, Aug. 18, 2013, 15 pages. |
Barua, S. et al., “Supervised Machine Learning Algorithms to Diagnose Stress for Vehicle Drivers Based on Physiological Sensor Signals,” Studies in Health Technology and Infomatics, vol. 211, Jun. 2015, 8 pages. |
Deniaud, C. et al., “The concept of “presence” as a measure of ecological validity in driving simulators,” Journal of Interaction Science, vol. 3, No. 1, Jul. 13, 2015, 13 pages. |
“Energizing comfort control: Wellness while driving,” Wayback Machine Internet Archive Website, Daimler Media, Available Online at https://web.archive.org/web/20171031020835/https://media.daimler.com/marsmediasite/en/instance/ko/energizing-comfort-control-wellness-while-driving.xhtml?oid=22934464, Available as Early as Oct. 31, 2017, 4 pages. |
McWilliams, T. et al., “Assessing Driving Simulator Validity: A Comparison of Multi-Modal Smartphone Interactions across Simulated and Field Environments,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2672, No. 37, Dec. 2018, 8 pages. |
“European New Car Assessment Programme (Euro NCAP),” Euro NCAP Website, Available Online at https://cdn.euroncap.com/media/43373/euro-ncap-assessment-protocol-sa-v901.pdf, Feb. 2019, 35 pages. |
“Energizing comfort control: Your personal coach is always on board,” Mercedes-Benz Group Media Website, Available Online at https://group-media.mercedes-benz.com/marsMediaSite/en/instance/ko/ENERGIZING-comfort-control-Your-personal-coach-is-always-on-board.xhtml?oid=43504472, Jun. 8, 2019, 4 pages. |
Chua, S. et al., “Virtual Reality for Screening of Cognitive Function in Older Persons: Comparative Study,” Journal of Medical Internet Research, vol. 21, No. 8, Aug. 1, 2019, 10 pages. |
Musabini, A. et al., “Heatmap-Based Method for Estimating Drivers' Cognitive Distraction,” ArXiv Cornell University Website, Available Online at https://arxiv.org/abs/2005.14136, Available as Early as May 28, 2020, Revised Oct. 31, 2020, 8 pages. |
ISA European Patent Office, International Search Report and Written Opinion Issued in Application No. PCT/RU2020/000478, May 11, 2021, WIPO, 30 pages. |
State Intellectual Property Office of the People's Republic of China, Office Action and Search Report Issued in Application No. 202080103866.9, Mar. 27, 2025, 25 pages. (Submitted with Partial Translation). |
Number | Date | Country | |
---|---|---|---|
20230339479 A1 | Oct 2023 | US |