This application claims priority to Japanese Patent Application No. 2021-75345 filed on Apr. 27, 2021, the entire disclosure of which is incorporated herein by reference.
The present invention relates to an electronic device, a method for controlling an electronic device, and a program.
Safe driving of a movable body requires driver's attention. Studies have been conducted which involve observing the level of driver's attention and providing the diver with a warning or driving assistance if the driver's attention drops. In a proposed technique for observing the level of attention, a cumulative visibility is calculated and compared with a reference value. The cumulative visibility is a cumulative value of the degrees to which the line of sight overlaps a nearby object, such as an oncoming vehicle (see Patent Literature 1).
Also, in recent years, studies have been conducted which attempt to estimate an internal state, such as a concentration level or emotion, of a subject. For example, an approach to estimating the mental states of learners has been reported. In this approach, teacher's utterances, learner's biological information, and a learner's video image are recorded during a lecture. Then after the lecture, the learners report introspective observation of own emotions in each scene (see Non Patent Literature 1). Also, for example, an approach has been reported in which diagnostic data and eye-gaze data of readers who read X-ray pictures are collected to interpret chest X-ray pictures using deep learning (see Non Patent Literature 2).
Patent Literature 1: International Publication No. 2008/029802
Non Patent Literature 1: Tatsunori Matsui, Tatsuro Uno, Yoshimasa Tawatsuji, “Study on Estimation of Learner's Mental States from Physiological Indexes Considering Time Dilation and Persistent Model of Mental States”, The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018, The Japanese Society for Artificial Intelligence
Non Patent Literature 2: Taiki Inoue, Nisei Kimura, Kotaro Nakayama, Kenya Sakka, Rahman Abdul, Ai Nakajima, Patrick Radkohl, Satoshi Iwai, Yoshimasa Kawazoe, Kazuhiko Ohe, “Diagnostic Classification of Chest X-Rays Pictures with Deep Learning Using Eye Gaze Data”, The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019, The Japanese Society for Artificial Intelligence
In an embodiment, an electronic device includes an encoder and a decoder. The encoder is configured to estimate an unknown value on the basis of first biological information including a line of sight of a subject extracted from an image of the subject, subject's environmental information representing an environment of the subject, and subject's internal state information representing an internal state of the subject. The decoder is configured to estimate second biological information including the line of sight of the subject on the basis of the unknown value, the subject's environmental information, and the subject's internal state information. The electronic device adjusts parameters of the encoder and the decoder on the basis of reproducibility of the first biological information from the second biological information.
In another embodiment, an electronic device includes an encoder, a decoder, and an estimator. The encoder is configured to estimate an unknown value on the basis of first biological information including a line of sight of a subject extracted from an image of the subject, subject's environmental information representing an environment of the subject, and a value assumed to be subject's internal state information representing an internal state of the subject. The decoder is configured to estimate second biological information including the line of sight of the subject on the basis of the unknown value, the subject's environmental information, and the value assumed to be the subject's internal state information. The estimator is configured to assume a plurality of values to be the subject's internal state information, and estimate a value of the plurality of values, corresponding to the highest reproducibility of the first biological information from the second biological information, to be the subject's internal state information.
In another embodiment, a method for controlling an electronic device includes estimating an unknown value, estimating second biological information, and adjusting. The estimating an unknown value estimates an unknown value on the basis of first biological information including a line of sight of a subject extracted from an image of the subject, subject's environmental information representing an environment of the subject, and subject's internal state information representing an internal state of the subject. The estimating second biological information estimates second biological information including the line of sight of the subject on the basis of the unknown value, the subject's environmental information, and the subject's internal state information. The adjusting adjusts parameters in the estimating an unknown value and the estimating second biological information on the basis of reproducibility of the first biological information from the second biological information.
In another embodiment, a method for controlling an electronic device includes estimating an unknown value, estimating second biological information, and estimating a value. The estimating an unknown value estimates an unknown value on the basis of first biological information including a line of sight of a subject extracted from an image of the subject, subject's environmental information representing an environment of the subject, and a value assumed to be subject's internal state information representing an internal state of the subject. The estimating second biological information estimates second biological information including the line of sight of the subject on the basis of the unknown value, the subject's environmental information, and the value assumed to be the subject's internal state information. The estimating a value assumes a plurality of values to be the subject's internal state information, and estimates a value of the plurality of values, corresponding to the highest reproducibility of the first biological information from the second biological information, to be the subject's internal state information.
In another embodiment, a program causes an electronic device to execute estimating an unknown value, estimating second biological information, and adjusting. The estimating an unknown value estimates an unknown value on the basis of first biological information including a line of sight of a subject extracted from an image of the subject, subject's environmental information representing an environment of the subject, and subject's internal state information representing an internal state of the subject. The estimating second biological information estimates second biological information including the line of sight of the subject on the basis of the unknown value, the subject's environmental information, and the subject's internal state information. The adjusting adjusts parameters in the estimating an unknown value and the estimating second biological information on the basis of reproducibility of the first biological information from the second biological information.
In another embodiment, a program causes an electronic device to execute estimating an unknown value, estimating second biological information, and estimating a value. The estimating an unknown value estimates an unknown value on the basis of first biological information including a line of sight of a subject extracted from an image of the subject, subject's environmental information representing an environment of the subject, and a value assumed to be subject's internal state information representing an internal state of the subject. The estimating second biological information estimates second biological information including the line of sight of the subject on the basis of the unknown value, the subject's environmental information, and the value assumed to be the subject's internal state information. The estimating a value assumes a plurality of values to be the subject's internal state information, and estimates a value of the plurality of values, corresponding to the highest reproducibility of the first biological information from the second biological information, to be the subject's internal state information.
In the technique disclosed in Patent Literature 1, hourly visibility is calculated using a table to determine cumulative visibility. However, an appropriate table varies depending on the driving situation in actual environment. It has been difficult to accurately observe the level of driver's attention in various driving situations. In the technique disclosed in Non Patent Literature 1, it may be difficult to reasonably model a causal relation between the biological information and the internal state (e.g., emotion) of the subject using a simple discriminative model. That is, in a reasonable flow of information processing, a mental state, such as emotion, causes a biological response. In learning a simple discriminative model, however, a mental state is inferred from biological information. The model structure thus differs from the truth, and model learning is unlikely to be carried out very well. Also, there are occasions where the behavior of the model that estimates the internal state of the subject on the basis of biological information of the subject, is to be explained to the user. From this perspective, it is desirable to further verify the reasonableness of the causal relation in the model that estimates the internal state of the subject on the basis of biological information of the subject. In the technique disclosed in Non Patent Literature 2, as in Non Patent Literature 1, it may be difficult to reasonably model a causal relation between biological information (e.g. eye-gaze data) of the subject and the internal state (e.g., determination of disease) of the subject using a simple discriminative model. Also, in the technique disclosed in Non Patent Literature 2, it is desirable to further verify the reasonableness of the causal relation in the model that estimates the internal state of the subject on the basis of biological information of the subject. As described above, to accurately estimate an internal state, such as a concentration level or emotion, of the subject from biological information of the subject, it is desirable to reasonably model a causal relation in data generation.
The present disclosure provides an electronic device, a method for controlling an electronic device, and a program that reasonably estimate an internal state, such as a concentration level, of a subject on the basis of a data generation process. An embodiment can provide an electronic device, a method for controlling an electronic device, and a program that reasonably estimate an internal state, such as a concentration level, of a subject.
Embodiments of the electronic device to which the present disclosure is applied will now be described with reference to the drawings. The following description may also serve as a description of the method for controlling an electronic device and the program to which the present disclosure is applied.
In the present disclosure, “electronic device” may be a device driven by electricity. In an embodiment, an electronic device estimates an internal state, such as a concentration level, of a subject. “Subject” may be a person (typically a human being) whose internal state is estimated by the electronic device according to the embodiment. In the present disclosure, “user” may be a person (typically a human being) who uses the electronic device according to the embodiment. The “user” and “subject” may be either the same person or different persons. The “user” and “subject” may be either human beings or animals other than human beings.
In the embodiment according to the present disclosure, the electronic device is installed, for example, in a movable body. Movable bodies may include, for example, vehicles, ships, and aircrafts. Vehicles may include, for example, automobiles, industrial vehicles, railroad vehicles, living vehicles, and fixed-wing aircrafts traveling on runways. Automobiles may include, for example, passenger cars, trucks, buses, motorcycles, and trolley buses. Industrial vehicles may include, for example, industrial vehicles for agriculture or construction. Industrial vehicles may include, for example, forklifts and golf carts. Industrial vehicles for agriculture may include, for example, tractors, cultivators, transplanters, binders, combines, and lawnmowers. Industrial vehicles for construction may include, for example, bulldozers, scrapers, excavators, crane trucks, dump trucks, and road rollers. Vehicles may include human-powered vehicles. The classification of vehicles is not limited to the examples described above. For example, automobiles may include industrial vehicles that can travel on the road. The same vehicle may be included in multiple categories. Vessels may include, for example, personal watercrafts (PWCs), boats, and tankers. Aircrafts may include, for example, fixed-wing aircrafts and rotorcrafts. In the present disclosure, “user” and “subject” may be persons who drive a movable body, such as a vehicle, or may be passengers who are non-drivers travelling in a movable body, such as a vehicle.
In an embodiment, an electronic device 1 may be any of various types of devices. In the embodiment, for example, the electronic device may be any device, such as a dedicated terminal, a general-purpose smartphone, a tablet, a phablet, a notebook-size personal computer (notebook PC), a computer, or a server. In the embodiment, for example, the electronic device may have the function of communicating with other electronic devices, such as mobile phones or smartphones. The “other electronic devices”, described above, may be electronic devices, such as mobile phones or smartphones, or may be any devices, such as base stations, servers, dedicated terminals, or computers. In the present disclosure, “other electronic devices” may also be devices or apparatuses driven by electricity. In the embodiment, the electronic device may communicate with other electronic devices via wired and/or wireless communication.
In the embodiment, the electronic device 1 is described as being installed in a movable body, such as a passenger car. In the embodiment, the electronic device 1 can estimate, in this case, a predetermined internal state (e.g., predetermined mental state) of a person (driver or non-driver) travelling in the movable body, such as a passenger car. The following describes an example in which, in the embodiment, the electronic device 1 estimates the concentration level of the driver during driving, as the internal state of the driver who drives the movable body, such as a passenger car. In this case, in the embodiment, the electronic device 1 can estimate the concentration level of the driver during driving on the basis of, for example, an image of the driver and a scene image captured during driving.
As illustrated in
The controller 10 controls and/or manages an overall operation of the electronic device 1, as well as the operation of each functional unit of the electronic device 1. To provide control and processing capabilities for performing various functions, the controller 10 may include at least one processor, such as a central processing unit (CPU) or a digital signal processor (DSP). The controller 10 may be implemented by a single processor or by some processors. The controller 10 may be implemented by discrete processors. The processor may be implemented as a single integrated circuit. The integrated circuit is also referred to as an IC. The processor may be implemented as a plurality of integrated circuits and discrete circuits connected to be capable of communicating with each other. The processor may be implemented on the basis of various other known techniques.
The controller 10 may include at least one processor and memory. The processor may include a general-purpose processor configured to read a specific program and execute a specific function, and a dedicated processor dedicated to specific processing. The dedicated processor may include an application specific integrated circuit (ASIC). The processor may include a programmable logic device (PLD). The PLD may include a field-programmable gate array (FPGA). The controller 10 may be either a system-on-a-chip (SoC) or a system in a package (SiP) where one or more processors work together. The controller 10 controls the operation of each component of the electronic device 1.
The controller 10 may include, for example, at least one of software or hardware resource. In the electronic device 1 according to the embodiment, the controller 10 may be constituted by concrete means in which software and hardware resources work in coordination. At least one selected from the group consisting of the extractor 12, the estimator 14, and the determiner 16 included in the controller 10 may include at least one of software or hardware resource. In the electronic device 1 according to the embodiment, at least one selected from the group consisting of the extractor 12, the estimator 14, and the determiner 16 may be constituted by concrete means in which software and hardware resources work in coordination.
The extractor 12 extracts a line of sight of a subject (subject's line of sight) from an image of the subject (subject's image) captured by the first imager 21. The estimator 14 estimates an internal state of the subject (subject's internal state), such as a concentration level of the subject (subject's concentration level). The determiner 16 determines whether the subject's internal state estimated by the estimator 14 satisfies a predetermined condition. If the subject's internal state satisfies the predetermined condition (e.g., if the subject's concentration level drops to a value equal to or less than a predetermined value), the determiner 16 outputs a predetermined alarm signal to the informing unit 40. In the present disclosure, line-of-sight data extracted as data of the subject's line of sight may be treated as coordinate values (x, y) of a point of gaze. In the present disclosure, the line-of-sight data is not limited to coordinates of a point of gaze of the subject. For example, a pupil diameter and/or eye rotation information may be used as a line-of-sight feature.
The operation of the controller 10, and the operation of the extractor 12, the estimator 14, and the determiner 16 included in the controller 10 will be described further later below.
The first imager 21 may include an image sensor, such as a digital camera, that electronically captures an image. The first imager 21 may include an imaging element, such as a charge coupled device image sensor (CCD) or a complementary metal oxide semiconductor (CMOS) sensor, that performs photoelectric conversion. For example, the first imager 21 may feed a signal based on the captured image to the controller 10. For this, as illustrated in
The first imager 21 captures a subject's image. Hereinafter, a driver who drives a movable body, such as a passenger car, will be described as an example of the subject. That is, in the embodiment, the first imager 21 captures an image of a driver driving a movable body, such as a passenger car. In the embodiment, for example, the first imager 21 may capture a still image of the subject every predetermined period (e.g., 30 frames per second). In the embodiment, for example, the first imager 21 may capture a sequence of moving images of the subject. An imager 20 may capture a subject's image in various data forms, such as RGB data and/or infrared data.
To capture an image of the driver, the first imager 21 may be installed toward the driver, in the forward part of the interior of the movable body, such as a passenger car. The subject's image captured by the first imager 21 is fed to the controller 10. As described below, in the controller 10, the extractor 12 extracts biological information including the subject's line of sight from the subject's image. For this, the first imager 21 may be installed in an area suitable for capturing an image including an eyeball region of the driver. In the following description, information fed to the neural network may be defined as line-of-sight information, as it is biological information obtained after processing an image.
The first imager 21 may include a line-of-sight detector, such as an eye tracker. For example, the eye tracker may be installed in the movable body in such a way that the line of sight of the subject sitting in the driver's seat of the movable body can be detected. In this case, for example, the eye tracker may be either a contact eye tracker or a non-contact eye tracker. The eye tracker may be of any type that can detect the subject's line of sight with respect to the scene.
Like the first imager 21, the second imager 22 may include an image sensor, such as a digital camera, that electronically captures an image. That is, the second imager 22 may include an imaging element, such as a CCD or a CMOS sensor, that performs photoelectric conversion. For example, the second imager 22 may feed a signal based on the captured image to the controller 10. For this, as illustrated in
The second imager 22 primarily captures a scene image in front of the subject. More specifically, the second imager 22 may capture an image including a direction in which the subject's line of sight is directed. Hereinafter, a driver driving a movable body, such as a passenger car, will be described as an example of the subject. That is, in the embodiment, the second imager 22 captures an image of a scene toward which the line of sight of the driver driving a movable body, such as a passenger car, is directed. Typically, a driver of a movable body directs the line of sight in the direction of travel of the movable body. Therefore, the second imager 22 may primarily capture a scene image in front of the subject. Depending on the situation, the driver of the movable body may direct the line of sight to the right or left of the direction of travel of the movable body. In this case, the second imager 22 may capture, for example, a scene image to the right or left of the subject. In the embodiment, for example, the second imager 22 may capture a still image of the subject every predetermined period (e.g., 30 frames per second). In the embodiment, for example, the second imager 22 may capture a sequence of moving images of the scene.
To capture an image of the scene in front of the driver, the second imager 22 may be installed toward the front of the movable body, such as a passenger car, in the forward part of the interior of the movable body. The subject's image captured by the second imager 22 is fed to the controller 10. As described below, in the controller 10, the image captured by the second imager 22 is associated with the position toward which the line of sight of the subject whose image is captured by the first imager 21 is directed. Therefore, the first imager 21 may be installed in an area suitable for capturing an image including a direction in which the driver's line of sight is directed.
The storage 30 may serve as a memory that stores various types of information. The storage 30 may store, for example, a program executed by the controller 10 and a result of processing performed by the controller 10. The storage 30 may serve as a working memory for the controller 10. Thus, as illustrated in
The storage 30 may store, for example, machine learning data. Here, the machine learning data may be data generated by machine learning. The machine learning data may include parameters generated by machine learning. Machine learning may be based on artificial intelligence (AI) technology which enables specific tasks to be executed through training. More specifically, machine learning may be a technology that enables an information processing device, such as a computer, to learn a large amount of data and automatically construct algorithms or models for performing tasks, such as classification and/or prediction and/or data generation. In the present specification, AI may include machine learning. In the present specification, machine learning may include supervised learning which involves learning features or rules of input data on the basis of correct data. Machine learning may include unsupervised learning which involves learning features or rules of input data in the absence of correct data. Machine learning may include reinforcement learning which involves learning features or rules of input data through the process of being rewarded or punished. In the present specification, machine learning may be a combination of any of supervised learning, unsupervised learning, and reinforcement learning.
In the present embodiment, the concept of machine learning data may include an algorithm that outputs a result of predetermined inference (estimation) using an algorithm learned from input data. Other appropriate examples of the algorithm that can be used in the present embodiment include linear regression that predicts a relation between a dependent variable and an independent variable, a neural network (NN) obtained by mathematically modeling a human cerebral neuron, a least-squares method that squares errors for calculation, decision tree that uses a tree structure for problem solving, and regularization that modifies date in a predetermined method. The present embodiment may use a deep neural network which is a type of neural network. A deep neural network is a type of neural network, and a neural network with deep layers is referred to as a deep neural network. A machine learning algorithm using a deep neural network is referred to as deep learning. Deep learning is frequently used as an algorithm that constitutes AI.
In the embodiment, information stored in the storage 30 may be, for example, information stored in advance before shipment from the factory, or may be information appropriately acquired by the controller 10. In the embodiment, the storage 30 may store information received from a communication unit (communication interface) connected to the controller 10 or the electronic device 1. In this case, for example, the communication unit may communicate with an external electronic device or a base station via at least one of wireless or wired communication to receive various types of information. In the embodiment, the storage 30 may store information received by an input unit (input interface) connected to the controller 10 or the electronic device 1. In this case, the user of the electronic device 1 or other persons may enter various types of information by operating the input unit.
The informing unit 40 may output a predetermined alarm for alerting the user of the electronic device 1, on the basis of a predetermined signal (e.g., alarm signal) output from the controller 10. For this, as illustrated in
In the embodiment, for example, if the concentration level (internal state) of the subject is estimated to drop to a value equal to or less than a predetermined threshold, the informing unit 40 may output an alarm indicating that the concentration level of the subject has dropped. In the embodiment, for example, if the concentration level of the driver is estimated to drop to a value equal to or less than a predetermined threshold, the informing unit 40 configured to output visual information may inform the driver and/or other users, through the use of light emission or predetermined display, that the concentration level of the driver has dropped. In the embodiment, for example, if the concentration level of the driver is estimated to drop to a value equal to or less than a predetermined threshold, the informing unit 40 configured to output audio information may inform the driver and/or other users, through the use of a predetermined sound or voice, that the concentration level of the driver has dropped. In the embodiment, for example, if the concentration level of the driver is estimated to drop to a value equal to or less than a predetermined threshold, the informing unit 40 configured to output tactile information may inform the driver and/or other users, through the use of a predetermined vibration, that the concentration level of the driver has dropped. The driver and/or other users can thus notice that the concentration level of the driver has dropped.
The following gives a description of how the electronic device 1 estimates internal information of the subject in the embodiment.
In the embodiment, the electronic device 1 performs, by using an autoencoder, machine learning based on a driver's image captured during driving, and estimates a driver's internal state, such as a concentration level. The autoencoder is an architecture of a neural network. The autoencoder is a neural network including an encoder (which may hereinafter be associated with symbol ENN) and a decoder (which may hereinafter be associated with symbol DNN). In the electronic device 1 according to the embodiment, the controller 10 may include capabilities of the autoencoder. That is, in the embodiment, the controller 10 of the electronic device 1 includes capabilities of the encoder ENN and the decoder DNN.
To estimate a subject's internal state in the embodiment, the electronic device 1 assumes a generation process, as illustrated in
In machine learning in the electronic device 1 according to the embodiment, as illustrated in
When the unknown value Z is inferred as described above, the decoder DNN of the neural network illustrated in
As illustrated in
In the electronic device 1 according to the embodiment, the internal state information Y and the unknown value Z may be estimated by receiving only the first biological information X and the environmental information S. Here, the first biological information X may be information including the subject's line of sight extracted from the subject's image captured by the first imager 21. The environmental information S may include information of the scene image captured by the second imager 22.
As illustrated in
In the electronic device 1 according to the embodiment, the internal state information Y representing the internal state during observation of the first biological information X may be received to reconstruct the subject's second biological information X′. In the embodiment, for various cases of the internal state, such as a concentration level, the subject's second biological information X′ may be reconstructed by using the internal state information Y representing the internal state during observation of the information (second biological information X′) including the subject's line of sight. For example, in the embodiment, a state where the subject is fully concentrating only on driving the movable body may be artificially created. In this case, in the embodiment, the autoencoder of the electronic device 1 may reconstruct the corresponding information (second biological information X′) including the subject's line of sight, from the information (first biological information X) including the subject's line of sight observed in the created state and the internal state information Y representing the internal state during the observation. Also, for example, a state where the subject is not fully concentrating on driving the movable body may be artificially created, and the information (second biological information X′) including the subject's line of sight corresponding to the internal state information Y representing the internal state in the created state may be reconstructed by the autoencoder of the electronic device 1 according to the embodiment. Here, the state where the subject is not fully concentrating on driving the movable body may be a state where, for example, the driver is performing a predetermined mental calculation while driving the movable body. Then, in accordance with the level of difficulty of the predetermined mental calculation (e.g., relatively simple mental calculation or relatively complex mental calculation), the level of the state where the subject is not fully concentrating on driving the movable body may be adjusted stepwise. For example, if the driver is performing a very simple mental calculation while driving the movable body, the subject may be regarded as being relatively concentrated while not fully concentrating on driving the movable body. If the driver is performing a very complex mental calculation while driving the movable body, the subject may be regarded as being relatively distracted from driving the movable body. In the present disclosure, the concentration level Y corresponding to the line of sight X in the learning phase may be a known value. Therefore, a plurality of Ys does not need to be assumed in the learning phase. For example, in the present disclosure, the concentration level corresponding to the observed line of sight may be defined depending on the mental calculation task.
As described above, in the electronic device 1 according to the embodiment, the information (second biological information X′) including the subject's line of sight may be reconstructed by using the corresponding internal state information Y. For example, the internal state information Y may be zero (Y=0) in a concentrated state and one (Y=1) in a distracted state. Then, the parameters of the encoder ENN and the decoder DNN may be adjusted to increase the degree to which the original information (first biological information X) including the subject's line of sight is reproduced by the information (second biological information X′) including the subject's line of sight and reconstructed on the basis of the corresponding internal state information Y. In the embodiment, the electronic device 1 may adjust the parameters of the encoder ENN and the decoder DNN on the basis of reproducibility of the first biological information X from the second biological information X′.
At the start of operation of the learning phase illustrated in
Upon starting the operation illustrated in
After the subject's image is acquired in step S11, the extractor 12 of the controller 10 extracts a subject's line of sight from the subject's image (step S12). Any technique, such as image recognition, may be adopted to extract the subject's line of sight from the subject's image in step S12. For example, the first imager 21 may include a line-of-sight detector, such as an eye tracker, in place of the function of the extractor 12. In the embodiment, the controller 10 of the electronic device 1 thus acquires the first biological information X including the subject's line of sight extracted from the subject's image in step S12.
After the subject's line of sight is extracted in step S12, the controller 10 acquires predetermined environmental information of the subject (step S13). In step S13, as the predetermined environmental information of the subject, for example, the controller 10 may acquire, from the second imager 22, a scene image captured by the second imager 22. In step S13, when, for example, the scene image captured by the second imager 22 is stored in the storage 30, the controller 10 may acquire the scene image from the storage 30. In the embodiment, the controller 10 of the electronic device 1 thus acquires the environmental information S representing an environment of the subject (subject's environmental information) in step S13.
After the subject's attribute information is acquired in step S13, the estimator 14 of the controller 10 estimates an unknown value (step S14). In step S14, the estimator 14 may estimate the unknown value Z using the encoder ENN of the autoencoder, on the basis of the first biological information X including the subject's line of sight, the subject's environmental information S, and the subject's internal state information Y (see
After the unknown value is estimated in step S14, the estimator 14 of the controller 10 estimates the second biological information including the subject's line of sight (step S15). In step S14, the estimator 14 may estimate, using the decoder DNN of the autoencoder, the second biological information X′ including the subject's line of sight, on the basis of the subject's internal state information Y, the unknown value Z, and the subject's environmental information S (see
After the second biological information X′ is estimated in step S15, the controller 10 adjusts parameters of the encoder ENN and the decoder DNN (step S16). In step S16, the controller 10 may adjust the parameters of the encoder ENN and the decoder DNN on the basis of the degree to which the first biological information X including the subject's line of sight is reproduced from the second biological information X′ including the subject's line of sight. Also, as described above, the controller 10 may adjust the parameters of the encoder ENN and the decoder DNN on the basis of a loss function including not only this degree of reproduction, but also the degree of distribution deviation representing the degree to which the probability distribution followed by the unknown value Z inferred by the encoder ENN deviates from a predetermined probability distribution. In the embodiment, the electronic device 1 can perform learning in accordance with the operation in the learning phase described above.
As described above, in the electronic device 1 according to the embodiment, the encoder ENN of the controller 10 estimates the unknown value Z on the basis of the first biological information X including the subject's line of sight extracted from the subject's image, the subject's environmental information S, and the subject's internal state information Y. Also, in the electronic device 1 according to the embodiment, the decoder DNN of the controller 10 estimates the second biological information X′ including the subject's line of sight, on the basis of the unknown value Z, the subject's environmental information S, and the subject's internal state information Y. Then, in the embodiment, the electronic device 1 adjusts the parameters of the encoder ENN and the decoder DNN on the basis of reproducibility of the first biological information X from the second biological information X′.
In the embodiment, the subject's internal state information Y may include information representing the subject's concentration level. In the embodiment, the subject's internal state information Y may include information representing the subject's concentration level, particularly during driving of the vehicle.
In the embodiment, the subject's environmental information S may include information of a scene image in front of the subject. Also, in the embodiment, the subject's environmental information S may include information of a captured image including a direction in which the subject's line of sight is directed.
In step S12, described above, the extractor 12 of the controller 10 extracts the subject's line of sight from the subject's image. In step S12, however, the extractor 12 of the controller 10 may extract, from the subject's image, coordinates representing the point to which the subject's line of sight is directed. In this case, in step S15, the estimator 14 of the controller 10 may estimate, as the second biological information including the subject's line of sight, the coordinates representing the point to which the subject's line of sight is directed. Thus, the point to which the subject's line of sight is directed, included in the first biological information X and the second biological information X′, can be easily associated with the position in the subject's environmental information S (scene image) acquired in step S13. Thus, in the electronic device 1 according to the embodiment, at least one of the first biological information X or the second biological information X′ may include coordinates of the subject's line of sight.
In the electronic device 1 according to the embodiment, the controller 10 can infer the unknown value Z, which is a latent variable, using the encoder ENN of the autoencoder. Also, in the electronic device 1 according to the embodiment, the controller 10 can estimate, using the decoder DNN of the autoencoder, the second biological information X′ as a reconstruction of the first biological information X, on the basis of the unknown value Z. As described above, in the embodiment, the electronic device 1 can adjust the parameters of the encoder ENN and the decoder DNN on the basis of reproducibility of the first biological information X from the second biological information X′.
In the electronic device 1 according to the embodiment, the controller 10 may calculate a difference, such as a mean square error or an absolute difference, between the first biological information X and the second biological information X′. The controller 10 may output the second biological information X′ as a probability distribution to calculate a probability or a logarithm of probability of the first biological information X in the probability distribution. In the embodiment, the controller 10 may define a prior probability distribution for the unknown value Z. In this case, the controller 10 may calculate the prior probability of the estimated unknown value Z, and use the calculated prior probability together with the probability of the first biological information X. That is, the controller 10 may adjust the parameters of the encoder ENN and the decoder DNN on the basis of the degree to which the unknown value Z diverges from a predetermined probability distribution, such as a normal distribution. Also, in the embodiment, the controller 10 may output, from the encoder of the autoencoder, the unknown value Z as an approximate posterior probability distribution. In this case, the degree to which the unknown value Z diverges from the predetermined probability distribution may be an indicator of divergence between the prior distribution and the posterior distribution of the unknown value Z. For example, the Kullback-Leibler divergence may be used as a divergence indicator. In this case, the controller 10 may sample a plurality of unknown values Z to determine a plurality of pieces of second biological information X′. Thus, in the embodiment, the electronic device 1 may adjust the parameters of the encoder ENN and the decoder DNN on the basis of the degree to which the unknown value Z diverges from a predetermined probability distribution.
In the embodiment, by performing a learning phase, the electronic device 1 can acquire parameters suitable for estimating the subject's internal state. Hereinafter, a phase for estimating the subject's internal state may be simply referred to as “estimation phase”.
At the start of operation of the estimation phase illustrated in
Upon starting the operation illustrated in
After the subject's image is acquired in step S21, the extractor 12 of the controller 10 extracts a subject's line of sight from the subject's image (step S22). The operation of step S22 may be performed in a manner same as, or similar to, the operation of step S12 illustrated in
After the subject's line of sight is extracted in step S22, the estimator 14 of the controller 10 estimates the subject's internal state information Y (step S23). The subject's internal state information Y estimated in step S23 may be, for example, information representing a subject's concentration level. In the embodiment, the subject's internal state information Y may include information representing a subject's concentration level, particularly during driving of a vehicle (movable body), such as a passenger car.
In step S23, in the embodiment, the electronic device 1 may estimate the subject's internal state information Y in the following manner. That is, in the embodiment, the controller 10 of the electronic device 1 assumes, for example, that the internal state information Y in a concentrated state is 0 and that the internal state information Y in a distracted state is 1. The controller 10 thus assumes a plurality of pieces of internal state information Y. Also, in the embodiment, the controller 10 may assume a plurality of pieces of internal state information Y to be values ranging from 0 to 1.
Then, for each of the plurality of pieces of internal state information Y assumed as described above, the controller 10 examines the degree to which the reconstructed information (second biological information X′) including the subject's line of sight reproduces the original information (first biological information X) including the subject's line of sight. The estimator 14 then estimates the internal state information Y corresponding to the highest degree to which the reconstructed information (second biological information X′) including the subject's line of sight reproduces the original information (first biological information X) including the subject's line of sight (or the highest reproducibility of the first biological information X from the second biological information X′), to be the current internal state (concentration level) of the subject. For example, if the reproducibility is highest when the subject's internal state information Y is 0, the estimator 14 may estimate that the subject is in a concentrated state. On the other hand, for example, if the reproducibility is highest when the subject's internal state information Y is 1, the estimator 14 may estimate that the subject is in a distracted state. Also, for example, if the reproducibility is highest when the subject's internal state information Y takes a value ranging from 0 to 1, the estimator 14 may estimate that the subject is at a concentration level corresponding to this value. In the embodiment, the controller 10 may define a prior probability distribution for the unknown value Z. In this case, the controller 10 may calculate the prior probability and/or logarithmic prior probability of the estimated unknown value Z and use the calculated prior probability and/or logarithmic prior probability together with the reproducibility described above. Also, in the embodiment, the controller 10 may output, from the encoder of the autoencoder, the unknown value Z as an approximate posterior probability distribution. In this case, the controller 10 may sample a plurality of unknown values Z to determine a plurality of pieces of second biological information X′. In this case, the controller 10 may calculate and use the approximate posterior probability and/or approximate logarithmic posterior probability of the estimated unknown value Z. The controller 10 may perform estimation on the basis of a probability or logarithmic probability representing a likelihood that the unknown value Z estimated by the encoder ENN is generated from a predetermined probability distribution. The subject's line-of-sight image may include at least one of a set of coordinates of the subject's line of sight, or a line-of-sight feature, such as a pupil diameter or eye rotation information.
After the subject's internal state information Y is estimated in step S23, the determiner 16 determines whether the estimated concentration level is equal to or less than a predetermined threshold (step S24). Before the operation of step S24, the predetermined threshold may be set as a criterion for determining whether to output an alarm related to the subject's concentration level. The predetermined threshold may be stored, for example, in the storage 30. In step S24, the determiner 16 may determine whether the estimated concentration level satisfies a predetermined condition, such as whether the estimated concentration level is equal to or less than the predetermined threshold.
If the concentration level is equal to or less than the predetermined threshold (or the concentration level has dropped) in step S24, the determiner 16 may output a predetermined alarm from the informing unit 40 (step S25) and end the operation illustrated in
As described above, in the electronic device 1 according to the embodiment, the encoder ENN of the controller 10 estimates the unknown value Z on the basis of the first biological information X including the subject's line of sight extracted from the subject's image, the subject's environmental information S, and a value assumed to be the subject's internal state information Y. Also, in the electronic device 1 according to the embodiment, the decoder DNN of the controller 10 estimates the second biological information X′ including the subject's line of sight, on the basis of the unknown value Z, the subject's environmental information S, and a value assumed to be the subject's internal state information Y. Then, in the embodiment, the electronic device 1 assumes a plurality of values to be the subject's internal state information Y, and estimates one of the plurality of values, corresponding to the highest reproducibility of the first biological information X from the second biological information X′, to be the subject's internal state information Y. The electronic device 1 may estimate the subject's internal state by using a degree of distribution deviation representing the degree to which the probability distribution followed by the unknown value Z estimated by the encoder ENN deviates from a predetermined probability distribution. The predetermined probability distribution may be a normal distribution. The degree of field deviation may be represented by the Kullback-Leibler divergence.
In the embodiment, the electronic device 1 may output a predetermined alarm if one of the plurality of values assumed to be the subject's internal state information Y, corresponding to the highest reproducibility of the first biological information X from the second biological information X′, satisfies a predetermined condition.
In the learning phase and/or the estimation phase described above, various types of information may be acquired and estimated on the basis of time series information acquired in a predetermined period. That is, in the electronic device 1 according to the embodiment, the encoder ENN and the decoder DNN of the autoencoder may process, for example, the subject's environmental information S as time series information acquired in a predetermined period. Also, in the electronic device 1 according to the embodiment, the encoder ENN and the decoder DNN of the autoencoder may process, for example, the first biological information X and/or the second biological information X′ as time series information acquired in a predetermined period.
As described above, in the electronic device 1 according to the embodiment, at least one selected from the group consisting of the subject's environmental information S, the first biological information X, and the second biological information X′ may be time series information acquired in a predetermined period. In the electronic device 1 according to the embodiment, the encoder ENN and the decoder DNN of the autoencoder process time series information. This can be expected to improve accuracy in estimating the subject's internal state information Y.
As described above, in the embodiment, the electronic device 1 can estimate the subject's internal state on the basis of the model where the subject's internal state serves as a causal factor that generates the biological information including the subject's line of sight. Therefore, in the embodiment, the electronic device 1 can reasonably estimate the internal state, such as a concentration level, of the subject, from a natural causal relation based on a data generation process. Also, in the embodiment, the electronic device 1 can output a predetermined alarm if, for example, the subject's concentration level drops while the subject is driving the movable body. Thus, in the embodiment, for example, the electronic device 1 can improve safety of the subject who is driving the movable body. In the embodiment, the internal state, such as a concentration level, of the subject can be reasonably estimated on the basis of the data generation process.
The line of sight and/or attention behavior of human generally tend to be influenced by an environment, such as a surrounding scene. To estimate the subject's internal state, therefore, a highly accurate result cannot be obtained without appropriately considering the subject's environment, such as that described above. Also, when the subject's internal state is estimated, a model on which the result of estimation is based is preferably explained objectively to the user.
For example, to estimate the internal state, such as a concentration level, of the subject from a captured image of the subject, learning may be performed in a process opposite a causal relation between the internal state and the biological information. That is, as in known machine learning, learning may be performed to estimate the internal state from biological response data, such as the subject's line of sight. In this case, however, such a model structure with an opposite causal relation makes the internal data structure of the model a black box. As a result, since a causal factor cannot be identified, a wrong structure may be learned. Additionally, since the causal relation is a black box, it is difficult to objectively explain the model of the causal relation to the user.
In the electronic device 1 according to the embodiment, an algorithm that estimates a subject's internal state is based on a generative model different from a typical recognition model or regression model. A generative model in the electronic device 1 learns, from data, a process in which a subject's internal state and a subject's environment (e.g., surrounding scene) serve as causal factors that generate a subject's line-of-sight image. Therefore, in the embodiment, the electronic device 1 can be expected to improve estimation accuracy by taking the subject's environment into consideration. Also, with the electronic device 1 according to the embodiment, a mechanism based on the data generation process can be objectively explained to the user.
Another embodiment will now be described.
As illustrated in
The image processor 18 may extract more abstract information from a scene image captured by the second imager 22. For example, the image processor 18 may extract more abstract information predicting a subject's line of sight, on the basis of a scene image captured by the second imager 22. The image processor 18 may feed the information predicting a subject's line of sight to the estimator 14, as a line-of-sight prediction map. The image processor 18 may predict a subject's line of sight in an image of a scene that can be viewed by the subject. In the embodiment, the image processor 18 may estimate a map (line-of-sight prediction map) that predicts the subject's line of sight from an image (e.g., surrounding image) including a scene ahead of the subject's line of sight. Any existing technique may be adopted to generate the line-of-sight prediction map on the basis of an image of a scene that can be viewed by the subject. The line-of-sight prediction map to be used may be a prediction map (group) for each concentration level of the subject and/or for each factor that lowers the concentration level.
The image processor 18 may generate a semantic segmentation image on the basis of a scene image captured by the second imager 22, and output the generated semantic segmentation image. For example, in application to driving training performed in simulated environment, a simulator may directly output a semantic segmentation image without involving the image processor 18.
In the electronic device 2 according to the embodiment illustrated in
Thus, in the electronic device 2 according to the embodiment, the subject's environmental information S may include information extracted by the image processor 18 from an image captured by the second imager 22. Also, in the electronic device 2 according to the embodiment, the subject's environmental information S may include information that predicts a subject's line of sight in an image captured by the second imager 22. In the embodiment, the electronic device 2 uses the subject's environmental information S appropriately extracted by image processing. This can be expected to improve accuracy in estimating the subject's internal state information Y.
Still another embodiment will now be described.
As illustrated in
When the biomarker acquiring unit 50 has the function of acquiring a subject's pupil radius, the biomarker acquiring unit 50 may include, for example, an imaging device capable of capturing an image of a subject's pupil. In this case, for example, the first imager 21 configured to capture an image including a subject's line of sight may also serve as the biomarker acquiring unit 50. The biomarker acquiring unit 50 may be any component capable of measuring or estimating the subject's pupil size. When the biomarker acquiring unit 50 has the function of acquiring the amount of subject's sweat, the biomarker acquiring unit 50 may include a device, such as a skin conductance unit, attached to the subject's skin. The biomarker acquiring unit 50 may be any component capable of measuring or estimating the amount of subject's sweat. The subject's biomarker information acquired by the acquiring unit 50 may be fed to, for example, the estimator 14 of the controller 10.
In the electronic device 3 illustrated in
The electronic device 3 illustrated in
Thus, in the electronic device according to the embodiment, the subject's first biological information and/or second biological information may include information representing the subject's pupil radius. In this case, information about brightness in the environment may be included as the environmental information S. Also, in the electronic device according to the embodiment, the subject's first biological information and/or second biological information may include information representing the amount of subject's sweat. In this case, the temperature and/or humidity in the environment may be included as the environmental information. In the embodiment, the electronic device 3 can be expected to improve accuracy in estimating the subject's internal state information Y by using, as the environmental information, information other than the internal state that influences biological information of the subject. That is, the pupil and the amount of sweat serve as biological information. The pupil relates to brightness, and the amount of sweat relates to temperature and humidity. Therefore, by considering the brightness, temperature, and humidity as environmental information, the electronic device can accurately estimate the internal state in the embodiment. This relation may be the same as the relation between the line of sight and the scene in front of the subject, described above.
Any person skilled in the art can make variations of, and alterations to, the present disclosure. Accordingly, it is to be noted that such variations and alterations are within the scope of the present disclosure. For example, functional units, means, and steps in each embodiment may be added to other embodiments without logical inconsistency, or may be replaced with functional units, means, and steps in other embodiments. In each embodiment, a plurality of functional units, means, and steps may be combined into one, or may each be divided. Each embodiment of the present disclosure does not necessarily need to be implemented exactly as described above. The described features may be combined or partially omitted as appropriate.
For example, in the embodiments described above, the second imager 22 is described as a component separate from the first imager 21. However, from an image captured by one imager, such as a 360° dashboard camera, the first imager 21 and the second imager 22 may individually extract image data to be used.
Number | Date | Country | Kind |
---|---|---|---|
2021-075345 | Apr 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/017280 | 4/7/2022 | WO |