The present disclosure relates to an estimation device and an estimation method for estimating a skeleton location of a vehicle-occupant (e.g. driver) in an interior of a vehicle, and it also relates to a storage medium for storing an estimation program.
In recent years, information-providing techniques useful for vehicle-occupants in mobile devices (e.g. in an interior of a vehicle including such as a car) have been developed. According to these techniques, the state of the vehicle-occupant (action or gesture) in a mobile device is sensed, and the vehicle-occupant is provided with useful information based on the result of sensing. Some of these techniques are disclosed in Unexamined Japanese Patent Publication No. 2014-221636, and No. 2014-179097.
A technique for sensing a state of the vehicle-occupant is actualized in, for instance, an estimation device that estimates the skeleton location of the specific part of the vehicle-occupant based on an image supplied from an in-vehicle camera disposed in the vehicle interior. The skeleton location can be estimated with the aid of an estimating model (algorithm) formed through a machine learning. The estimating model formed through a deep learning, in particular, is suited for this application because of its high estimation accuracy about the skeleton location. The deep learning refers to a type of the machine learning using a neural network.
The present disclosure provides an estimation device and estimation method that improve a sensing accuracy of a state of the vehicle-occupant, and a storage medium that stores an estimation program.
The estimation device of the present disclosure includes a storage section, an estimator, a likelihood calculator, and an output section. The storage section stores a model formed through a machine learning. The estimator estimates a skeleton location of a specific part of a vehicle-occupant in a vehicle interior from image data, in which the equipment in the interior is shot, with the aid of the model stored in the storage section, and this estimator also estimates a positional relation between the equipment and the specific part. The likelihood calculator calculates the likelihood of skeleton location information, which indicates the skeleton location, based on the estimated positional relation. The output section outputs the skeleton location information.
According to the estimation method of the present disclosure, image data in which the equipment in a vehicle interior is shot is obtained first. Consequently, a skeleton location of a specific part of a vehicle-occupant in the vehicle interior and a positional relation between the equipment and the specific part from the obtained image data are estimated with the aid of the model stored in the storage section;. Further, a likelihood of skeleton location information indicating the skeleton location is calculated based on the estimated positional relation, then the skeleton location information is output.
A non-transitory storage medium of the present disclosure stores an estimation program to be executed by a computer of the estimation device. This estimation program includes the following processes:
1. making the computer obtain image data in which the equipment in the vehicle interior is shot;
2. making the computer estimate the skeleton location of a specific part of a vehicle-occupant in the vehicle interior, and the positional relation between the equipment and the specific part from the obtained image data with the aid of the model stored in the storage section;
3. making the computer calculate a likelihood of skeleton location information indicating the skeleton location based on the estimated positional relation; and
4. making the computer output the skeleton location information.
The present disclosure allows improving the accuracy of sensing the state of the vehicle-occupant.
Prior to the description of the embodiment of the present disclosure, the origin of the present disclosure is explained.
Some pieces of the equipment disposed in the interior of vehicle have shapes similar to specific parts of the vehicle-occupant. For instance, an outer edge of seat and an unevenness of the door are similar to an arm and a hand of the vehicle-occupant. They are thus difficult to distinguish from each other in the image. In this case, it is afraid that an estimation result obtained with the aid of the estimating model might show a wrong result, viz. the result indicates a wrong skeleton location. As a result, a state of the vehicle-occupant is sensed based on the skeleton location erroneously estimated, so that a correct sensing result cannot be obtained.
In the case of estimating a state of the vehicle-occupant with the aid of the estimating model formed through the machine learning and based on the skeleton location of the specific part of the vehicle-occupant, it is preferable that an estimation result (skeleton location information) of a weak likelihood be excluded and an estimation result of a strong likelihood be only used for sensing the state of the vehicle-occupant. Nevertheless, in the case of estimating the skeleton location with the aid of the estimating model, the most likelihood value for an image of one frame is output as an estimation result. In other words, a conventional estimation device outputs always a 100% likelihood of an estimation result (skeleton location information). When a state of the vehicle-occupant is sensed, it is thus difficult, based on the likelihood resulting from the estimation, to determine whether or not the estimation result is usable.
On the other hand, the likelihood of an estimation result of an object frame to be estimated can be calculated based on estimation results of images of multiple frames. For instance, as
Nevertheless, as shown in
The exemplary embodiment of the present disclosure is demonstrated hereinafter with reference to the accompanying drawings.
In-vehicle camera 20 is, for example, an infrared camera disposed in the interior of the vehicle. In-vehicle camera 20 shoots a seated vehicle-occupant and a region in which the equipment around the vehicle-occupant is present. Estimation device 1 estimates a positional relation between the specific part of the vehicle-occupant and pieces of equipment among the equipment around the vehicle-occupant. Each of the pieces of equipment has a shape similar to the specific part of the vehicle-occupant. In other words, estimation device 1 estimates the positional relation between the specific part of the vehicle-occupant and each of the pieces of equipment that is difficult to distinguish from the specific part in the image. For instance, in the case of the specific part being a hand of the vehicle-occupant, the positional relation between the hand and the equipment such as a door, steering wheel, or seatbelt is estimated.
In the case of estimating the skeleton location of the right hand of the vehicle-occupant with estimation device 1, how to determine a likelihood of the estimated skeleton location is demonstrated hereinafter. This determination uses estimation results of a positional relation between the right hand and the door, a positional relation between the right hand and the steering wheel, and a positional relation between the right hand and the seatbelt.
As
Processor 11 includes CPU (central processing unit) 111 working as computation/control device, ROM (read only memory) 112 working as a main storage device, and RAM (random access memory) 113. ROM 112 stores a basic program called BIOS (basic input output system) and basic setting data. CPU 111 reads a program from ROM 112 or storage section 12 in response to a processing content, and then develops the program in RAM 113 for executing the developed program, thereby executing a given process.
Processor 11, for instance, executes an estimation program, thereby working as image receiver 11A, estimator 11B, likelihood calculator 11C, and estimation result output section 11D. To be more specific, processor 11 estimates a skeleton location of the vehicle-occupant (herein, skeleton location of the right hand) from the image data containing an image of the equipment of the vehicle with the aid of estimating model M. The equipment of the vehicle includes such as a door, steering wheel, seatbelt, rear-view mirror, sunshade, center-panel, car navigation system, air-conditioner, shift lever, center-box, dashboard, arm-rest, and seat. The image data containing the image of the equipment of the vehicle is supplied from in-vehicle camera 20 to processor 11, which then estimates the positional relation between the equipment and the specific part of the vehicle-occupant before outputting the estimation result. The functions of image receiver 11A, estimator 11B, likelihood calculator 11C, and estimation result output section 11D will be described following the flowchart shown in
Storage section 12 is an auxiliary storage device such as HDD (hard disk drive) and SSD (solid state drive). Storage section 12 can be a disc drive that drives an optical disc such as a CD (compact disc), DVD (digital versatile disc) and an MO (magneto-optical disc) to read/write information. Storage section 12 can be also a USB memory or a memory card such as an SD card.
Storage section 12, for instance, stores an operating system (OS), an estimation program, and estimating model M. The estimation program can be stored in ROM 112. The estimation program is provided via a portable and computer readable storage medium (e.g. optical disc, magneto-optical disc, and memory card) that has stored the program. The estimation program can be also supplied by downloading the program from a server device via a network. Estimating model M can be stored in ROM 112, and can be supplied through the portable storage medium or a network as well. The portable storage medium is a non-transitory computer readable storage medium.
Estimating model M is an algorithm formed through machine learning, and outputs skeleton location information that indicates the skeleton location of the specific part of the vehicle-occupant, and existence information that indicates a positional relation between the equipment and the specific part, upon receiving the image containing the image of the equipment. Estimating model M is preferably formed through deep learning that uses a neural network. Estimating model M thus formed has the higher performance of image recognition, and thus can estimate the positional relation between the equipment and the specific part of the vehicle-occupant with high accurate. Estimating model M is formed, for instance, by learning device 2 shown in
Processor 21, for instance, executes a learning program thereby functioning as training data receiver 21A and learning section 21B. To be more specific, processor 21 carries out ‘a learning with teacher’ with the aid of training data T, thereby forming estimating model M.
Training data T includes image T1, skeleton location information T2, and existence information T3. Image T1 contains images of the equipment (door, steering wheel, and seatbelt) of the vehicle and the specific part of the vehicle-occupant. Information T2 indicates the skeleton location of the specific part of the vehicle-occupant shot in image T1. Information T3 indicates the positional relation between the equipment and the specific part. Image T1 is associated with information T2 and T3, and this unit (i.e. T1, T2, and T3) as one set forms training data T. Image T1 is an input to estimating model M, and information T2 and T3 are output from estimating model M. Image T1 can contain only the image of the equipment (not containing the specific part of the vehicle-occupant).
Skeleton location information T2 is given as coordinates (x, y) indicating the skeleton location of the specific part in image T1.
Existence information T3 is given as ‘True/False’. To be more specific, when existence information T3 is given as ‘True’, information T3 indicates that the hand is overlaid upon the equipment (the hand touches the equipment). On the other hand, when existence information T3 is given as ‘False’, information T3 indicates that the hand is off the equipment. In this context, existence information T3 includes the first individual-equipment existence information indicating the positional relation between the right hand and the door, the second individual-equipment existence information indicating the positional relation between the right hand and the seat, and the third individual-equipment existence information indicating the positional relation between the right hand and the seatbelt.
The specific part of the vehicle-occupant will not touch two different equipment simultaneously. To be more specific, the right hand cannot touch a door and a steering wheel simultaneously, because the door is apart from the steering wheel by a greater distance than a size of one hand. Accordingly, when one piece of the three individual-equipment existence information of existence information T3 is set to ‘True’, the other two are set to ‘False’.
Image T1 of training data T can be an entire image corresponding to the complete image shot by in-vehicle camera 20, or it can be a partial image corresponding to an image cut out from the entire image. In the case of using the image, shot by in-vehicle camera 20, as it is as an input of estimating model M used in estimation device 1, the entire image is prepared as image T1 of training data T, and skeleton location information T2 is given as the coordinates on the entire image. When estimation device 1 uses the image cut out from the image shot by in-vehicle camera 20 as an input to estimating model M, the partial image is prepared as image T1 of training data T, and skeleton location information T2 is given as the coordinates on the partial image. In other words, image T1 of training data T during the learning preferably has the same object range to be processed (image size and location) as the object range of the image to be used as the input to estimating model M during the estimation.
Image T1 of training data T contains images of various patterns supposed to be shot by in-vehicle camera 20. To be more specific, a large amount of images showing the vehicle-occupant in different states, viz. specific parts in different locations, are prepared as image T1 of training data T. Then skeleton location information T2 and existence information T3 are associated with each of a large amount of the images. Preparation of patterns as many as possible as image T1 will increase an accuracy of the estimation done by estimating model M.
In step S101, processor 21 obtains one set of training data T. Processor 21 executes the process as training data receiver 21A. As discussed previously, training data T contains image T1, skeleton location information T2, and existence information T3.
In step S102, processor 21 optimizes estimating model M based on obtained training data T. Processor 21 executes the process as learning section 21B. To be more specific, processor 21 reads the present estimating model M from storage section 22. Processor 21 then modifies or reforms estimating model M such that an output, produced when image T1 is input to estimating model M, becomes equal to the values of skeleton location information T2 and existence information T3 both associated with image T1. For instance, during a deep learning with the aid of a neural network, a binding strength (parameter) between nodes that form the neural network is modified.
In step S103, processor 21 determines whether or not training data T not yet learned is present. In the case where training data T not yet learned is found (branch YES of step S103), the process moves to step S101, so that the learning of estimating model M is repeated, and the accuracy of estimating model M can be increased, viz. estimating accuracies of the skeleton location of the vehicle-occupant and the positional relation between the skeleton location of the specific part and the equipment are increased. On the other hand, in the case where training data T not yet learned is not found (branch NO of step S103), the process moves to step S104.
In step S104, processor 21 determines whether or not the learning is fully done. For instance, processor 21 uses an average value of square-error as a loss function, and when this value is equal to or less than a predetermined threshold, processor 21 determines that the learning has been fully done. To be more specific, processor 21 calculates the average values of respective square-errors between the output values used in step S102 and produced when image T1 is input into estimating model M, and the values of skeleton location information T2 and existence information T3 both associated with image T1, then processor 21 determines whether or not each of those average values is equal to or less than the respective predetermined threshold.
When processor 21 determines that the learning has been fully done (branch YES in step S104), the process moves to step S105. When processor 21 determines that the learning is not fully done yet (branch NO in step S104), processor 21 repeats the processes from step S101 and onward.
In step S105, processor 21 updates estimating model M stored in storage section 22 based on the result of learning.
As discussed above, learning device 2 forms estimating model M to be used for estimating the skeleton location of the vehicle-occupant in the interior of the vehicle. Learning device 2 includes training data receiver 21A (a receiver) and learning section 21B. Training data receiver 21A obtains training data T, in which image T1 containing an image of at least one piece of the equipment in the interior is associated with skeleton location information T2 (first information) indicating the skeleton location of the specific part of the vehicle-occupant and existence information T3 (second information) indicating the positional relation between the equipment and the specific part. The at least one piece of the equipment in the interior refers to, for instance, a door, a steering wheel, or a seatbelt. The specific part of the vehicle-occupant refers to, for instance, a right hand. Learning section 21B forms estimating model M such that an input of image T1 to estimating model M allows outputting skeleton location information T2 and existence information T3, both associated with image T1, from estimating model M.
Use of estimating model M formed by learning device 2 allows estimation device 1 to estimate the skeleton location of the specific part (e.g. right hand) of the vehicle-occupant based on the image supplied from in-vehicle camera 20, as well as the positional relation between the equipment and the specific part.
In step S201, processor 11 obtains image DI from in-vehicle camera 20. Processor 11 executes the process as image receiver 11A.
In step S202, processor 11 carries out an estimation of the skeleton location of the specific part of the vehicle-occupant and an estimation of the positional relation between the equipment and the specific part, based on image DI with the aid of estimating model M. Processor 11 executes the process as estimator 11B. As the estimation result obtained by estimator 11B, the skeleton location information indicating the skeleton location of the specific part, and the existence information indicating the positional relation between the specific part and the equipment are obtained. The existence information in this context contains the first individual-equipment information indicating the positional relation between the right hand and the door, the second individual-equipment information indicating the positional relation between the right hand and the seat, and the third individual-equipment information indicating the positional relation between the right hand and the seatbelt.
In step S203, processor 11 calculates a likelihood of the estimated skeleton location with the aid of the existence information. Processor 11 executes the process as likelihood calculator 11C.
For instance, processor 11 compares multiple estimation results (three pieces of information are used in this embodiment) of the individual-equipment existence information with each other, thereby calculating the likelihood of the skeleton location information. In the case of no contradiction are found among the multiple estimation results of individual-equipment existence information, the likelihood of the estimated skeleton location information is strong (e.g. likelihood is rated 1). In the case where any contradiction is found among them, the likelihood is weak (e.g. the likelihood is rated 0).
As
As discussed above, the comparisons of the estimation results of the multiple pieces of individual-equipment existence information with each other allow readily determining the likelihood of the estimated skeleton location information.
Furthermore, when the estimation results of individual-equipment existence information have contradictions to each other (estimation results 3 and 4 in
The equipment information has been established in advance and stored in ROM 112. This information is given as a region occupied by individual equipment (e.g. door, steering wheel, seatbelt) on the image. In this context, the region is given as four points in coordinates. As
As
Each of
For instance, assume that estimation result 3 shown in
When the estimation result of the individual-equipment existence information has a contradiction (estimation results 3, 4 shown in
In step S204 shown in
The state sensing device carries out an appropriate process in response to the skeleton location of the specific part of the vehicle-occupant. For instance, when the estimation result indicates a determination that the right hand does not hold the steering wheel, the state sensing device issues a warning to hold the steering wheel. At this time, the state sensing device selects the skeleton location information having a likelihood stronger than a given value before using the information, thereby increasing a sensing accuracy, so that a proper process can be expected.
As discussed above, in step S204, processor 11 outputs skeleton location information DO1 as the estimation results for indicating the skeleton location of the specific part of the vehicle-occupant as well as likelihood information DO2 that indicates the calculated likelihood. Instead of this process, processor 11 can output only the skeleton location information having a likelihood stronger than a given value. In such a case, the state sensing device can carry out a process appropriate to the skeleton location information output from processor 11, and does not need to select the skeleton location information having a stronger likelihood.
As discussed above, estimation device 1 estimates the skeleton location of the vehicle-occupant in the interior of the vehicle, and includes storage section 12, estimator 11B, likelihood calculator 11C, and estimation result output section 11D (an output section). Storage section 12 stores estimating model M formed through machine learning. Estimator 11B obtains image DI containing an image of at least one piece of the equipment (e.g. door, steering wheel, seatbelt) in the interior, and estimates a skeleton location of a specific part of a vehicle-occupant (e.g. right hand) with the aid of estimating model M. Estimator 11B also estimates the positional relation between the equipment and the specific part with the aid of estimating model M Likelihood calculator 11C calculates a likelihood of skeleton location information DO1, which indicates the skeleton location, based on the estimated positional relation. Estimation result output section 11D outputs at least skeleton location information DO1.
The estimation method carried out in estimation device 1 estimates the skeleton location of the vehicle-occupant in the vehicle interior. According to the method, image DI containing an image of at least one piece of the equipment (e.g. door, steering wheel, seatbelt) is obtained (refer to step S201 in
The estimation program to be executed by a computer of estimation device 1 includes the first-fourth processes below:
Estimation device 1 thus allows outputting the skeleton location information of the specific part of the vehicle-occupant as well as the information about a likelihood useful for sensing the state of the vehicle-occupant. These functions of estimation device 1 achieve an improvement in accuracy of sensing the state of the vehicle-occupant. The likelihood calculation can be carried out for an image of each frame in order to increase a recognition accuracy.
As discussed previously, the present disclosure is demonstrated specifically based on the exemplary embodiment, nevertheless the present disclosure is not limited to the embodiment and can be modified within the scope not deviating from the gist of the disclosure.
For instance, estimation device 1 can output the estimated existence information as it is as the information about the likelihood. In this case, the state sensing device disposed in a later stage of estimation device 1 determines the likelihood of the estimated skeleton-location information.
As
The specific part, of which skeleton location is estimated by estimation device 1, is not limited to ‘right hand’ demonstrated in the embodiment, but the specific part can be another part. The object equipment, of which positional relation with the specific part is to be estimated, can be one piece or two pieces of the equipment, and it can be more than three pieces of the equipment.
Estimating model M can be formed through other type of machine learning (e.g. random forest) other than the deep learning.
In this embodiment, an example of the method for calculating the likelihood in the case where a contradiction is found in the estimation results of individual-equipment existence information (e.g. a contradiction is found in estimation results 3 and 4 shown in
Here is another method for calculating the likelihood: An estimation result of one piece of the individual-equipment information is compared with a positional relation determined based on both of the equipment information indicating the position of the same equipment and the skeleton location information. In other words, in the case where the use of estimating model M allows estimating at least one piece of the individual-equipment existence information, the likelihood of the skeleton location information can be calculated.
Alternatively, image T1 and skeleton location information T2 are prepared as training data T to be used for the learning done in learning device 2, and existence information T3 can be produced by processor 21 of learning device 2 based on the skeleton location information and the equipment information.
In the previous description, a program is installed in a general purpose computer, thereby allowing the computer to function as processors 11 and 21; nevertheless, individual parts of processors 11 and 21 can be formed of dedicated circuits, or only portions of the individual parts can be formed of dedicated circuits and the remaining portions can be formed by installing a program into the general purpose computer.
The embodiment demonstrated hereinbefore shall be construed that every description is exemplified, and not limited to something The scope of the present disclosure is defined not in the descriptions hereinbefore but in the claims described hereinafter, and can be changed within a scope not deviating from the gist of the claims.
The present disclosure is useful for an estimation device, estimation method, and estimation program that estimate not only a skeleton location of a vehicle-occupant in a vehicle interior, but also a skeleton location of a person in a specific space.
Number | Date | Country | Kind |
---|---|---|---|
2017-027230 | Feb 2017 | JP | national |