The present disclosure relates to the field of vehicle technologies, and more particularly, to a method for recognizing driver's line-of-sight, an apparatus for recognizing driver's line-of-sight, a computer-readable storage medium, and a vehicle.
A direction of driver's line-of-sight is self-evident for safety of a vehicle during driving. Every year, there are countless car accidents caused by distraction and fatigue during driving, leaving behind painful lessons. Accurately and real-time estimating the direction of driver's line-of-sight during driving to assist in safe driving is of utmost importance.
In the related art, a line-of-sight estimation algorithm only learns human line-of-sight based on human eyes as an input to a network, but ignores an influence of a head pose, leading to inaccurate line-of-sight estimation, and thus affecting driving safety and resulting in poor user experience.
A method for recognizing driver's line-of-sight is provided according to embodiments of the present disclosure. The method includes: obtaining image data of a target object; obtaining a face image and a head image of the target object by processing the image data; obtaining an eye image of the target object based on the face image and obtaining a head pose of the target object based on the head image; obtaining a relative position of a pupil to a sclera by processing the eye image; and determining a direction of line-of-sight of the target object based on the face image, the head pose, and the relative position of the pupil to the sclera.
A computer-readable storage medium is provided according to embodiments of the present disclosure. The computer-readable storage medium has a driver's line-of-sight recognition program stored thereon. The driver's line-of-sight recognition program causes, when being executed by a processor, the processor to: obtain image data of a target object; obtain a face image and a head image of the target object by processing the image data; obtain an eye image of the target object based on the face image and obtaining a head pose of the target object based on the head image; obtain a relative position of a pupil to a sclera by processing the eye image; and determine a direction of line-of-sight of the target object based on the face image, the head pose, and the relative position of the pupil to the sclera.
A vehicle is also provided according to embodiments of the present disclosure. The vehicle includes a memory, a processor, and a driver's line-of-sight recognition program stored on the memory and executable by the processor. The processor is configured to cause, when executing the driver's line-of-sight recognition program, the vehicle to: obtain image data of a target object; obtain a face image and a head image of the target object by processing the image data; obtain an eye image of the target object based on the face image and obtaining a head pose of the target object based on the head image; obtain a relative position of a pupil to a sclera by processing the eye image; and determine a direction of line-of-sight of the target object based on the face image, the head pose, and the relative position of the pupil to the sclera.
Additional aspects and advantages of the present disclosure will be provided at least in part in the following description, will become apparent at least in part from the following description, or can be learned from practicing of the present disclosure.
Embodiments of the present disclosure will be described in detail below with reference to examples thereof as illustrated in the accompanying drawings, throughout which same or similar elements, or elements having same or similar functions, are denoted by same or similar reference numerals. The embodiments described below with reference to the drawings are illustrative, and are intended to explain, rather than limiting, embodiments of the present disclosure.
Embodiments of the present disclosure provide a method for recognizing driver's line-of-sight, which combines a face image, a head pose, and a relative position of a pupil to a sclera as inputs to a network to recognize a direction of driver's line-of-sight, and can improve accuracy of recognizing the line-of-sight direction. In this way, the driver's line-of-sight can be accurately monitored in real time, improving driving safety and user experience.
Embodiments of the present disclosure also provide an apparatus for recognizing driver's line-of-sight.
Embodiments of the present disclosure also provide a computer-readable storage medium.
Embodiments of the present disclosure also provide a vehicle.
A method for recognizing driver's line-of-sight, an apparatus for recognizing driver's line-of-sight, a computer-readable storage medium, and a vehicle are described below with reference to the accompanying drawings.
As illustrated in
At block S1, image data of a target object is obtained.
In an exemplary embodiment of the present disclosure, an image of a target object (i.e., a driver) may be captured by an in-vehicle camera. The in-vehicle camera may be placed on an A-pillar, a rear-view mirror, and a steering wheel in the vehicle, and the camera may be an IR camera or an RGB camera. The in-vehicle camera captures an image of the target object in the vehicle in real time to obtain the image data of the target object.
At block S2, a face image and a head image of the target object are obtained by processing the image data.
In an exemplary embodiment of the present disclosure, the image data is used for face detection using a face detection algorithm to obtain a position of a region where a face of the target object is located, to obtain the face image of the target object. The image data is also used for head detection using an object detection algorithm to obtain a position of a region where a head of the target object is located, to obtain the head image of the target object.
At block S3, an eye image of the target object is obtained based on the face image, and a head pose of the target object is obtained based on the head image.
At block S4, a relative position of a pupil to a sclera is obtained by processing the eye image.
At block S5, a direction of line-of-sight of the target object is determined based on the face image, the head pose, and the relative position of the pupil to the sclera.
In an exemplary embodiment of the present disclosure, after capturing the image of the target object using the in-vehicle camera, face detection is performed on the image data of the target object to obtain a position coordinate of a face region in the image. The face image of the target object is obtained by extracting the face region. The face detection on the image data of the target object may be performed using algorithms such as MTCNN, RetinaFace, or YOLO series. In another exemplary embodiment of the present disclosure, publicly available face detectors in Face Detector of OpenCV or face detectors in Dlib may be used for face detection. An object detector is used to detect the head in the image data, and thus the position coordinate of the region where the head of the target object is located can be obtained. The head image of the target object can be obtained by extracting the head region. Based on the obtained face image of the target object, eye region detection is performed on the face image to determine a position of an eye in the face image. The eye image can be obtained by extracting the eye region. The eye region detection on the face image may be performed using keypoint-based detection methods, such as PFLD. In another embodiment, open-source libraries (such as Dlib) or OpenCV may be used for face keypoint detection. In addition, the eye region may be treated as a target, and object detection algorithms, such as RCNN series, YOLO series, SSD series, or CenterNet, may be used for the eye region detection. Further, the eye image is processed for segmentation to obtain the relative position of the pupil to the sclera. On the basis of obtaining the head image of the target object, the head pose of the target object is estimated based on the head image to obtain the head pose of the target object. The direction of line-of-sight of the target object is estimated based on the face image, the head pose, and the relative position of the pupil to the sclera to determine the direction of line-of-sight of the target object. Therefore, the above line-of-sight recognition method can monitor a direction of line-of-sight of a driver in real time, and achieve whether the driver's attention is on a road surface. When the driver's attention is off the road for a predetermined period, real-time voice reminders are issued to redirect the driver's attention back to the driving process, which helps ensure safe driving and significantly reduces a probability of accidents.
It should be understood that a direction of a person's line-of-sight is closely related to a position of an eyeball and a head pose of the person. When the head pose of the person is fixed, the position of the eyeball determines the direction of the person's line-of-sight. When the position of the eyeball remains unchanged and the head rotates, the direction of the person's line-of-sight also changes accordingly. With the method for recognizing driver's line-of-sight according to the embodiment of the present disclosure, the direction of line-of-sight of the target object is estimated based on the face image, the head pose, and the relative position of the pupil to the sclera to determine the direction of line-of-sight of the target object. In this way, the accuracy of recognizing the direction of line-of-sight can be improved, and thus the driver's line-of-sight can be accurately monitored in real time, improving the driving safety and the user experience.
According to an embodiment of the present disclosure, the obtaining the relative position of the pupil to the sclera by processing the eye image includes: segmenting a pupil region and a sclera region in the eye image using an image segmentation network; obtaining pupil region position information and sclera region position information; and determining the relative position of the pupil to the sclera based on the pupil region position information and the sclera region position information.
In an exemplary embodiment of the present disclosure, after obtaining the eye image, image segmentation is performed on the eye image of the target object by the image segmentation network, and the pupil region and the sclera region in the eye image are segmented to obtain the relative position of the pupil to the sclera. The image segmentation network may use one of DDRNet, Deeplab series, PSPNet, UNet series, and a Transformer-based image segmentation algorithm.
According to an embodiment of the present disclosure, the obtaining the face image of the target object includes processing the image by using a face detection model. A backbone network structure of the face detection model is a lightweight network structure, and uses a BiFPN structure based on a dilated convolutional neural network to fuse features of different layers.
In an exemplary embodiment of the present disclosure,
According to an embodiment of the present disclosure, the obtaining the head image of the target object includes expanding, based on the face image of the target object, the face image in an upward direction, a downward direction, a leftward direction, and a rightward direction on the image data by a predefined scaling factor to obtain the head image. The predefined scaling factor may be set as desired, for example, 20%. As a result, a computing power can be saved.
According to another embodiment of the present disclosure, the obtaining the head image of the target object includes processing the image data based on one of an RCNN series deep learning algorithm, a Yolo series deep learning algorithm, an SSD series deep learning algorithm, and an anchor-free target detection algorithm to obtain the head image. That is, the head image of the target object can be obtained by performing target detection on the image data of the target object with one of the above algorithms.
According to an embodiment of the present disclosure, the obtaining the eye image of the target object based on the face image includes: obtaining, based on a face keypoint detection algorithm, an eye keypoint in a region where an eye of the target object is located; obtaining a coordinate of the eye keypoint; obtaining a bounding rectangle of the eye keypoint for a left eye and a bounding rectangle of the eye keypoint for a right eye; and expanding the bounding rectangle of the eye keypoint for the left eye and the bounding rectangle of the eye keypoint for the right eye in an upward direction, a downward direction, a leftward direction, and a rightward direction by a predefined scaling factor to determine the eye image of the target object.
According to an embodiment of the present disclosure, the obtaining the bounding rectangle of the eye keypoint for the left eye and the bounding rectangle of the eye keypoint for the right eye includes: obtaining a maximum coordinate and a minimum coordinate of the eye keypoint for the left eye along an x-axis and a y-axis or a maximum coordinate and a minimum coordinate of the eye keypoint for the right eye along the x-axis and the y-axis, respectively; and determining a difference between the maximum coordinate and the minimum coordinate in the x-axis direction as a length of the bounding rectangle, and determining a difference between the maximum coordinate and the minimum coordinate in the y-axis direction as a width of the bounding rectangle.
In an exemplary embodiment of the present disclosure, the face keypoint detection algorithm is based on a 5-keypoint model, a 68-keypoint model, or a 98-keypoint model. With a higher number of keypoints, precision of the final obtained eye region becomes higher. The 68-keypoint model is taken as examples below for illustration.
37, y37), (
38, y38), (
39, y39), (
40, y40), (
41, y41), and (
42, y42). For the right eye, keypoint coordinates are (
43, y43), (
44, y44), (
45, y45), (
46, y46), (
47, y47), and (
48, y48). Based on the obtained coordinates of the eye keypoints for the left eye and the obtained coordinates of the eye keypoints for the right eye, the bounding rectangle of the eye keypoints for the left eye and the bounding rectangle of the eye keypoints for the right eye can be obtained. Taking the left eye as an example, a formula for obtaining the bounding rectangle of the eye keypoints for the left eye is as follows:
where Xmin denotes the minimum coordinate of the eye keypoint for the left eye along the x-axis, Xmax denotes the maximum coordinate of the eye keypoint for the left eye along the x-axis, Ymin denotes the minimum coordinate of the eye keypoint for the left eye along the y-axis, Ymax denotes the maximum coordinate of the eye keypoint for the left eye along the y-axis, W denotes the length of the bounding rectangle, and H denotes the width of the bounding rectangle. Based on the coordinates of the eye keypoints for the left eye, the bounding rectangle of the eye keypoints for the left eye can be obtained through the above formula, and the bounding rectangle of the eye keypoints for the right eye can be derived through a similar method.
Since the eye keypoints closely follow the contours of the eyes, to ensure that the bounding rectangle includes an entire eye region, after obtaining the bounding rectangle of the eye keypoints for the left eye and the bounding rectangle of the eye keypoints for the right eye, the bounding rectangle of the eye keypoints for the left eye and the bounding rectangle of the eye keypoints for the right eye are expanded in the upward direction, the downward direction, the leftward direction, and the rightward direction by the predefined scaling factor. For example, the bounding rectangle of the eye keypoints for the left eye and the bounding rectangle of the eye keypoints for the right eye are expanded by 0.75*W both in the leftward direction and in the rightward direction and are expanded proportionally both in the upward direction and in the downward direction. Therefore, a complete eye image of the target object can be obtained.
According to an embodiment of the present disclosure, the obtaining the head pose of the target object based on the head image includes: obtaining an Euler angle of the target object based on a deep learning network; and determining the head pose when a difference between the Euler angle and a predetermined angle is smaller than a predetermined threshold.
In an exemplary embodiment of the present disclosure, after the head image is detected, a head pose of a to-be-detected object is learned from the head image through the deep learning network. Angles of the to-be-detected object in three directions are obtained, i.e., a Pitch angle of a head, a Yaw angle of the head, and a Roll angle of the head, that is, an Euler angle. A current pose of the head is presented through the Euler angle. A difference between the Euler angle and a preset angle of a predetermined pose is made, and the smaller the difference is, the higher the accuracy of the head image is. When the difference is smaller than the predetermined threshold, the head pose of the target object can be determined.
According to an embodiment of the present disclosure, the determining the direction of the line-of-sight of the target object based on the face image, the head pose, and the relative position of the pupil to the sclera includes: performing feature extraction on each of the face image, the head pose, and the relative position of the pupil to the sclera based on a multi-branch deep learning network; and performing concatenation on a face feature of the face image, a head feature of the head pose, and the relative position of the pupil to the sclera to obtain the direction of the line-of-sight of the target object.
In an exemplary embodiment of the present disclosure, the direction of the line-of-sight is not only related to the head pose but also closely related to a position of the pupil. When the head pose is stored and fixed, the position of the pupil changes and the direction of the line-of-sight also changes. Therefore, it is required to combine the face image, the head pose, and the relative position of the pupil to the sclera as inputs to the multi-branch deep learning network to determine the direction of the line-of-sight of the target object. A structure of the multi-branch deep learning network is illustrated in
In an exemplary embodiment of the present disclosure, as illustrated in
In summary, with the method for recognizing driver's line-of-sight according to the embodiments of the present disclosure, the face image and the head image of the target object are obtained by processing the image data, and the eye image of the target object is then obtained based on the face image. After that, the head pose of the target object is obtained based on the head image, then the relative position of the pupil to the sclera is obtained by processing the eye image, and finally the direction of line-of-sight of the target object is determined based on the face image, the head pose, and the relative position of the pupil to the sclera. Therefore, the method can combine the face image, the head pose, and the relative position of the pupil to the sclera as inputs to the network to recognize the direction of driver's line-of-sight, which can improve accuracy of recognizing the direction of line-of-sight. In this way, the driver's line-of-sight can be accurately monitored in real time, improving the driving safety and the user experience.
Corresponding to the above embodiments, an apparatus for recognizing driver's line-of-sight is further provided according to the present disclosure.
As illustrated in
The image obtaining module 110 is configured to obtain image data of a target object. The face detection module 120 is configured to obtain a face image of the target object by processing the image data. The head detection module 130 is configured to obtain a head image of the target object by processing the image data. The eye detection module 140 is configured to obtain an eye image of the target object based on the face image. The head pose estimation module 150 is configured to obtain a head pose of the target object based on the head image. The image processing module 160 is configured to obtain a relative position of a pupil to a sclera by processing the eye image. The line-of-sight estimation module 170 is configured to determine a direction of line-of-sight of the target object based on the face image, the head pose, and the relative position of the pupil to the sclera.
According to an embodiment of the present disclosure, the image processing module 160 is configured to obtain the relative position of the pupil to the sclera by processing the eye image. In particular, the image processing module 160 is configured to: segment a pupil region and a sclera region in the eye image using an image segmentation network; obtain pupil region position information and sclera region position information; and determine the relative position of the pupil to the sclera based on the pupil region position information and the sclera region position information.
According to an embodiment of the present disclosure, the face detection module 120 is configured to obtain the face image of the target object. In particular, the face detection module 120 is configured to process the image using a face detection model. A backbone network structure of the face detection model is a lightweight network structure, and uses a BiFPN structure based on a dilated convolutional neural network to fuse features of different layers.
According to an embodiment of the present disclosure, the head detection module 130 is configured to obtain the head image of the target object. In particular, the head detection module 130 is configured to expand the face image in an upward direction, a downward direction, a leftward direction, and a rightward direction on the image data by a predefined scaling factor based on the face image of the target object, to obtain the head image.
According to another embodiment of the present disclosure, the head detection module 130 is configured to obtain the head image of the target object. In particular, the head detection module 130 is configured to process the image data based on one of the RCNN series deep learning algorithm, the Yolo series deep learning algorithm, the SSD series deep learning algorithm, and the anchor-free target detection algorithm to obtain the head image.
According to an embodiment of the present disclosure, the eye detection module 140 is configured to obtain the eye image of the target object based on the face image. In particular, the eye detection module 140 is configured to: obtain, based on a face keypoint detection algorithm, an eye keypoint in a region where an eye of the target object is located; obtain coordinates of the eye keypoint; obtain a bounding rectangle of the eye keypoint for a left eye and a bounding rectangle of the eye keypoint for a right eye; and expand the bounding rectangle of the eye keypoint for the left eye and the bounding rectangle of the eye keypoint for the right eye in the upward direction, the downward direction, the leftward direction, and the rightward direction by the predefined scaling factor to determine the eye image of the target object.
According to an embodiment of the present disclosure, the eye detection module 140 is configured to obtain the bounding rectangle of the eye keypoint for the left eye and the bounding rectangle of the eye keypoint for the right eye. In particular, the eye detection module 140 is configured to: obtain the maximum coordinate and the minimum coordinate of the eye keypoint for the left eye along the x-axis and the y-axis and the maximum coordinate and the minimum coordinate of the eye keypoint for the right eye along the x-axis and the y-axis, respectively; and determine a difference between the maximum coordinate and the minimum coordinate in the x-axis direction as a length of the bounding rectangle, and determining a difference between the maximum coordinate and the minimum coordinate in the y-axis direction as a width of the bounding rectangle.
According to an embodiment of the present disclosure, the head pose estimation module 150 is configured to obtain the head pose of the target object based on the head image. In particular, the head pose estimation module 150 is configured to: obtain an Euler angle of the target object based on a deep learning network; and determine the head pose when a difference between the Euler angle and a predetermined angle is smaller than the predetermined threshold.
According to an embodiment of the present disclosure, the line-of-sight estimation module 170 is configured to determine the direction of the line-of-sight of the target object based on the face image, the head pose, and the relative position of the pupil to the sclera. In particular, the line-of-sight estimation module 170 is configured to: perform feature extraction on each of the face image, the head pose, and the relative position of the pupil to the sclera based on a multi-branch deep learning network; and perform concatenation on a face feature of the face image, a head feature of the head pose, and the relative position of the pupil to the sclera to obtain the direction of the line-of-sight of the target object.
It should be noted that for details not disclosed in the apparatus for recognizing driver's line-of-sight according to the embodiments of the present disclosure, reference can be made to the details disclosed in the method for recognizing driver's line-of-sight according to the above embodiments of the present disclosure, and thus details thereof will be omitted herein.
With the apparatus for recognizing driver's line-of-sight according to the embodiments of the present disclosure, the image data of the target object is obtained by the image obtaining module, the image data is processed by the face detection module to obtain the face image of the target object, the image data is processed by the head detection module to obtain the head image of the target object, the eye image of the target object is obtained based on the face image by the eye detection module, the head pose of the target object is obtained based on the head image by the head pose estimation module, the eye image is processed by the image processing module to obtain the relative position of the pupil to the sclera, and the direction of line-of-sight of the target object is determined based on the face image, the head pose, and the position of the pupil relative to the sclera by the line-of-sight estimation module. Therefore, the apparatus for recognizing driver's line-of-sight can combine the face image, the head pose, and the relative position of the pupil to the sclera as inputs to the network to recognize the direction of driver's line-of-sight, which can improve the accuracy of recognizing the direction of line-of-sight. In this way, the driver's line-of-sight can be accurately monitored in real time, improving the driving safety and the user experience.
Corresponding to the above embodiments, a computer-readable storage medium is further provided according to the present disclosure.
The computer-readable storage medium according to the embodiments of the present disclosure has a driver's line-of-sight recognition program stored thereon. The driver's line-of-sight recognition program implements, when being executed by a processor, the above method for recognizing driver's line-of-sight.
With the computer-readable storage medium according to the embodiments of the present disclosure, the accuracy of recognizing the direction of the line-of-sight can be improved by executing the above method for recognizing driver's line-of-sight. Therefore, the real-time and accurate monitoring of the driver's line-of-sight can be achieved, improving the driving safety and the user experience.
Corresponding to the above embodiments, a vehicle is further provided according to the present disclosure.
As illustrated in
With the vehicle according to the embodiments of the present disclosure, the accuracy of recognizing the direction of the line-of-sight can be improved by executing the above method for recognizing driver's line-of-sight. Therefore, the real-time and accurate monitoring of the driver's line-of-sight can be achieved, improving the driving safety and the user experience.
It should be noted that the logical and/or steps described in the flowchart or otherwise depicted herein, for example, may be considered as a fixed sequence list of executable instructions for implementing logical functions, and may be specifically implemented in any computer-readable medium for use for, or in conjunction with, an instruction execution system, apparatus, or device, such as a computer-based system, a system including a processor, or other systems that may fetch and execute instructions from the instruction execution system, apparatus, or device. For purposes of this disclosure, the “computer-readable medium” may be an apparatus that may include, store, communicate, propagate, or transport the program for use for or in conjunction with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium include an electrical connection portion (an electronic device) having at least one wire, a portable computer disk cartridge (a magnetic device), a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (an EPROM or Flash memory), a fiber optic device, and a portable compact disc read-only memory (CD-ROM). In addition, the computer-readable mediums may even be paper or other suitable mediums on which the program may be printed. Since the program may be obtained electronically by, for example, optical scanning of the paper or other mediums followed by editing, interpretation, or other suitable processing if necessary, and then stored in a computer memory.
It should be understood that each part of the present disclosure can be implemented in hardware, software, firmware or any combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in a memory and executed by a suitable instruction execution system. For example, when implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following technologies known in the art: a discrete logic circuit having logic gate circuits for implementing logic functions on data signals, an application-specific integrated circuit with suitable combined logic gates, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), and the like.
In the present disclosure, the description with reference to the terms “an embodiment,” “some embodiments,” “an example,” “a specific example,” “some examples,” and the like means that specific features, structures, materials, or characteristics described in conjunction with the embodiment(s) or example(s) are included in at least one embodiment or example of the present disclosure. The appearances of the above phrases in various places throughout this specification are not necessarily referring to the same embodiment or example. In addition, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one of at least one embodiment or example.
In addition, the terms such as “first” and “second” are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features associated with “first” and “second” may explicitly or implicitly include at least one of the features. In the description of the present disclosure, “plurality” means at least two, such as two and three, unless otherwise specifically defined.
In the present disclosure, unless otherwise clearly specified and limited, terms, such as “mounting,” “connect,” “connect to,” and “fixed to,”, should be understood in a broad sense. For example, it may be a fixed connection, a detachable connection, or an integral connection; mechanical connection or electrical connection; direct connection or indirect connection through an intermediate; or internal communication of two components or the interaction relationship between two components, unless otherwise clearly limited. For those skilled in the art, specific meanings of the above-mentioned terms in the present disclosure can be understood according to specific circumstances.
Although the embodiments of the present disclosure have been shown and described above, it should be understood that the above-mentioned embodiments are exemplary and should not be construed as limiting the present disclosure. Those skilled in the art can make changes, modifications, substitutions, and alternations to the above-mentioned embodiments within the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202211089211.8 | Sep 2022 | CN | national |
This application is a continuation of International Application No. PCT/CN2023/107264, filed on Jul. 13, 2023, which claims priority to Chinese patent application No. 202211089211.8, titled “METHOD AND APPARATUS FOR RECOGNIZING DRIVER'S LINE-OF-SIGHT, VEHICLE, AND STORAGE MEDIUM” and filed on Sep. 7, 2022, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2023/107264 | Jul 2023 | WO |
Child | 19066199 | US |