The present invention relates to a technique for estimating a spin state of an object such as a flying ball.
The technique for estimating the spin state of the object such as the flying ball is disclosed in NPL 1. In this technique, the spin period T is obtained by detecting the time t+T at which the appearance of the ball in a certain frame t appears again from the input video, the number of spins of the ball is obtained from the spin period T and the spin shaft of the ball which fits most between adjacent frames is obtained. However, the technique of NPL 1 cannot estimate the spin state of the object when no input video for one period is obtained.
On the other hand, there is a technique described in PTL 1 which can estimate the spin state of the object even when an input video for one period is not obtained. In this technique, the object image at a time point t and the object image at a time point t+tc obtained from an input video are used where tc is an integer of 1 or more, to estimate the spin state of the object, by selecting a hypothesis of spin state in which likelihood of the image of the object obtained by spinning the object in an object image at a time point t by tc unit time on the basis of the hypothesis of spin state is high from among the plurality of hypotheses of spin states.
However, according to the technique of PTL 1, tc is a fixed value, and depending on a hypothesis of spin state, an image of an object obtained by spinning an object in an object image at time point t by tc unit time may not be generated. That is, in order to generate the image of the object spun by tc unit time, the component of the image obtained by spinning the object by tc unit time must be included in the original object image at the time point t. However, when tc is the fixed value, the condition may not be satisfied depending on the hypothesis of spin state. For example, in a case when the unit time is a frame interval, tc=1 is established, and the frame rate of the input video is low (for example, in the case of about 120 fps), when the spin amount per unit time represented by the hypothesis of spin state is large, the component of the image of the object obtained by spinning the object by tc unit time is not included, and the image of the object obtained by sinning the object by tc unit time may not be generated. In this way, when it is impossible to generate the image of the object obtained by spinning the object in the object image at the time point t by tc unit time, the method of PTL 1 cannot properly estimate the spin state of the object. On the other hand, although such a problem can be solved by increasing the frame rate of the input video, when the frame rate of the input video is high (for example, about 480 fps), the recordable time is shortened.
The present invention was made in view of the above circumstances, and an object of the present invention is to provide a technique capable of estimating a spin state of an object regardless of a frame rate of an input video.
In order to solve the above problem, a target estimation image which is an image of an object at a time point t+w·u obtained by spinning the object in an object image which is the image of the object at a certain time point t obtained from an input video in time-series by w unit time on the basis of a hypothesis of spin state and an object image at a time point t+w·u obtained from the input video are used wherein an absolute value of w is an integer of 1 or more and u is a unit time, to estimate the spin state of object by selecting a hypothesis of spin state and w in which likelihood of the target estimation image becomes high from among a plurality of hypotheses of spin states and a plurality of w.
Since the spin state of the object is estimated by selecting not only the hypothesis of spin state but also w, the spin state of the object can be estimated regardless of the frame rate of the input video.
An embodiment of the present invention will be described below with reference to the drawings. Note that constituent units having the same function are denoted by the same number, and redundant description is omitted.
As exemplified in
<Object Image Generation Unit 11 (Step S11)>
The object image generation unit 11 receives a video of an object (referred to an “input video” below). The object is an object which is a target of estimation of a spin state. An example of the object is a ball. Hereinafter, the case where the object is a ball of baseball will be described as an example. Of course, the object is not limited to a baseball ball, but may be a ball of a soft ball, a ball of a boring, a soccer ball or the like. The spin state of the object is information corresponding to at least one of a spin shaft and a spin amount of the object. The information corresponding to the spin shaft of the object is, for example, information representing the spin shaft of the object (the spin shaft of a rotation of the object), and one example of the information is a coordinate or an angle representing the spin shaft. More preferably, the information corresponding to the spin shaft of the object is information representing the spin shaft and the spin direction of the object. An example of such information is a two-dimensional coordinate (x, y), when a spin shaft is parallel to a straight-line L passing through the two-dimensional coordinate (x, y) and an origin (0, 0) and the two-dimensional coordinate (x, y) is viewed from the origin (0, 0), a predetermined spin direction R (a right spin direction or a left spin direction) around the straight-line L is the spin direction of the object. The information corresponding to the spin amount of the object is, for example, an angle and the number of spins representing the spin amount, the number of spins (for example, rpm: revolutions per minute, rps: revolutions per second, revolutions per frame) per predetermined time (for example, one minute, one second, a frame interval), and the like. The input video is a video in time-series and has images of a plurality of frames. For example, the input video is a moving image obtained by photographing the state of a thrown ball. The input video may be photographed in advance or may be photographed in real time.
The object image generation unit 11 generates an object image which is an image of an object from the input video. The object image is, for example, a partial region in one frame image in the input video, which is segmented so that the entire object is included with the center of the object as the center of the image. The object image generation unit 11 segments a partial region from one frame image in the input video so as to form a rectangle of a size including the whole video of the object and including a margin of a known size in the periphery, and sets the partial region as the object image. As an example of the margin of the known size, the margin can be set to be 0.5 times the radius of the object. That is, it is conceivable that the object image is a square constituted of a side having a length three times the radius of the total object in a left margin of the object (0.5 times the radius of the object), the object (diameter 2 times the radius) and a right margin of the object and a side having a length three times the radius of the total object in an upper margin, the object and a lower margin of the object.
The object image generation unit 11 may generate an object image in which the feature of the object is extracted (the feature of the object is emphasized). For example, the object image generation unit 11 may obtain an image obtained by performing edge extraction on a partial region segmented from the input video as described above as an object image. Thus, the feature of the object can be extracted, and there is a merit that accuracy of following processing of the spin state estimation unit 13 is enhanced.
As described above, the input video is a time-series video, and the object image is also a time-series image. For example, when an object image is generated for each frame image of the input video, the object image also corresponds to each frame. The object image at the time point t is expressed as Ot. The time point t may be any time-series information corresponding to the time point, for example, real time or frame number. The generated object image is output to the spin state estimation unit 13.
The object image generated by the object image generation unit 11 is input to the spin state estimation unit 13. The spin state estimation unit 13 uses a target estimation image Et+w·u which an image of an object at a time point t+w·u obtained by spinning the object in an object image Ot which is the image of the object at a time point t obtained from the input video in time-series as described above by w unit time on the basis of a hypothesis of spin state and an object image Ot+w·u at the time point t+w·u obtained from the input video, to estimate the spin state of the object by selecting a hypothesis of spin state and w in which likelihood of the target estimation image Et+w·u becomes high from among a plurality of hypotheses of spin states and a plurality of w.
In other words, the spin state of the object is estimated by selecting the hypothesis of spin state and w among the plurality of hypotheses of spin states and the plurality of w so that the target estimation image Et+w·u which is the image of the object at the time point t+w·u obtained by spinning the object in an object image Ot which is the image of the object at a certain time point t obtained from the input video in time-series by w unit time on the basis of the hypothesis of spin state and the object image Ot+w·u at the time point t+w·u obtained from the input video are close together.
Here, the unit time u is a predetermined time interval. The unit time u may be a frame interval (that is, a time section between frames adjacent to each other), or may be a time section between frames separated by two or more, or may be another predetermined time section. In the following, an example in which the frame interval is the unit time u is explained as an example. In addition, w is an integer having an absolute value of 1 or more. That is, w is an integer satisfying w≤−1 or w≥1. When w is negative, spinning the object by w unit time on the basis of the hypothesis of spin state means that the object is spun by |w| unit times in the reverse spin direction of the spin direction indicated by the hypothesis of spin state (the object is brought into a state of going back by w unit time in the past). W may be limited to an integer of 1 or more, or may be limited to an integer of −1 or less. The upper limit of the absolute value of w is not limited, but the absolute value of w may be limited to a value equal to or less than the assumed spin period of the object. The hypothesis of spin state represents, for example, information r corresponding to the spin shaft of the object and information θ corresponding to the spin amount of the object.
Specific example of a step S13 will be described with reference to
The spin state estimation unit 13 uses the target estimation image Et+w·u and the object image Ot+w·u for each w belonging to a search range a≤w≤b of w, selects the hypothesis of spin state (rw, θw) in which the likelihood of the target estimation image Et+w·u becomes high from among the plurality of hypotheses of spin states (r, θ), and obtains each matching score s w (step S131). However, a<b is satisfied, a and b may be predetermined, set based on an input value, or automatically set based on other processing. Note that even when 0 is included in the search range a≤w≤b, the processing of the step S131 at w=0 is not required, but the processing of the step S131 may be performed for w=0. In addition, in order to select the hypothesis of spin state (rw, θw) in which the likelihood of the target estimation image Et+w·u becomes high for each w, for example, the method described in PTL 1 may be used. An overview of this method will be shown below.
«Example of Method for Selecting Hypothesis of Spin State (rw, θw) of Each w»
When the method described in PTL 1 is used for selecting the hypothesis of spin state (rw, θw) for each w, the spin state estimation unit 13 performs the following processing for each w belonging to the search range a≤w≤b.
First, the spin state estimation unit 13 generates a plurality (a plurality of types) of hypotheses of spin states (r, θ). The generated plurality of hypotheses is expressed by (r, θ)=(r (1), θ (1)), . . . , (r (J), θ (J)). Here, J is an integer of 2 or more. For example, the spin state estimation unit 13 generates the plurality of hypotheses (r (1), θ(1)), . . . , (r (J), θ (J)) on the basis of a probability distribution given in advance. Note that, in an initial state, since no prior information is generally present, the spin state estimation unit 13 generates the plurality of hypotheses (r (1), θ (1)), . . . , (r (J), θ (J)) on the basis of a probability distribution which is uniformly distributed (step S1311).
The spin state estimation unit 13 generates the target estimation image Et+w·u which is the image of the object at the time point t+w·u obtained by spinning the object in the object image Ot by w unit time on the basis of the hypothesis of spin state (r (j), θ (j)) (j=1, . . . , J). That is, the spin state estimation unit 13 generates the target estimation image Et+w·u corresponding to the hypothesis of each spin state (r (j), θ (j)) for each w belonging to the search range a≤w≤b.
The spin state estimation unit 13 determines whether the likelihoods of the calculated hypotheses (r (1), θ (1)), . . . , (r (J), θ (J)) satisfies a predetermined convergence condition. An example of the predetermined convergence condition is whether or not the magnitude of the difference between the maximum value of the likelihood of the hypothesis calculated last time and the maximum value of the likelihood of the hypothesis calculated this time is equal to or less than a predetermined threshold value. When the likelihood of the calculated hypothesis does not satisfy the predetermined convergence condition, the processing returns to the step S1311. In this case, in the step S1311, the spin state estimation unit 13 newly generates a plurality of hypotheses (r (1), θ (1)), . . . , (r (J), θ (J)) by random sampling based on a probability distribution of hypotheses determined by the likelihood calculated in the step S1312. On the other hand, when the likelihood of the calculated hypothesis satisfies the predetermined convergence condition, the spin state estimation unit 13 selects the hypothesis (r, θ)=(rw, θw) in which the likelihood becomes high from current-calculated hypotheses (r (1), θ (1)), . . . , (r (J), θ (J)). For example, the spin state estimation unit 13 may select the hypothesis (rw, θw) corresponding to the maximum value of the likelihood of the current-calculated hypotheses, the hypothesis (rw, θw) in which the likelihood is equal to or more than the threshold value or exceeds the threshold value, or the hypothesis (rw, θw) in which the likelihood is equal to or more than a reference order in descending order of the likelihood (step S1313) (description of «Example of Method for Selecting Hypothesis of Spin State (rw, θw) of Each w» is completed).
When selecting the hypothesis of spin state (rw and θw) of each w, the spin state estimation unit 13 further obtains a matching score sw between the target estimation image Et+w·u corresponding to the selected hypothesis (rw, θw) and the object image Ot+w·u of each w. The matching score sw is an index representing similarity between the target estimation image Et+w·u and the object image Ot+w·u. For example, the likelihood of the hypothesis (rw, θw) obtained in the step S1312, that is, the similarity between the target estimation image Et+w·u corresponding to the hypothesis (rw, θw) and the object image Ot+w·u may be used as the matching score s w as it is, a function value of the similarity may be used as the matching score s w, or the matching score s w is newly calculated from the target estimation image Et+w·u corresponding to the hypothesis (rw, θw) and the object image Ot+w·u. By performing the above-described processing for each w belonging to the search range a≤w≤b, the following list is obtained.
The spin state estimation unit 13 selects a specific w on the basis of the matching score sa, . . . , sb obtained as described above (step S132). That is, the spin state estimation unit 13 selects the specific w corresponding to a large matching score. For example, the spin state estimation unit 13 may select w corresponding to the maximum matching score among the matching scores sa, . . . , sb, may select w corresponding to the matching score which is equal to more than the threshold value or exceeds the threshold value among the matching scores sa, . . . , Sb, or may select w corresponding to the matching score which is equal to or more than the reference order in the descending order among the matching scores sa, . . . , sb.
The spin state estimation unit 13 estimates the spin state of the object from the hypothesis (rw, θw) corresponding to the selected specific w, and outputs the estimation result (step S133). That is, the spin state estimation unit 13 estimates information corresponding to at least one of the spin shaft and the spin amount of the object from the hypothesis (rw, θw), and outputs the estimation result. For example, the spin state estimation unit 13 obtains information corresponding to at least one of the spin shaft and the spin amount per unit time of the object as the spin state of the object on the basis of information rw corresponding to the spin shaft of the object, information θw corresponding to the spin amount of the object, and w represented by the hypothesis of the selected spin state (rw, θw).
The same image as the object image Ot+w·u corresponding to the selected hypothesis (rw, θw) appears every spin period of the object. Therefore, only from the selected hypothesis (rw, θw), it is hard to completely specify the object image Ot+w·u is an image when the object represented in the object image Ot is spun to what extent. In addition, the same image as the object image Ot+w·u appears even when the object spins in any direction around a certain spin shaft. Therefore, only from the selected hypothesis (rw, θw), it is hard to completely specify the object image Ot+w·u is an image when the object represented in the object image Ot is spun in any direction.
That is, the spin state which can be estimated in the step S133 from only the hypothesis of spin state (rw, θw) corresponding to the specific w selected in the step S132 is as follows.
(1) The object is spun around the spin shaft corresponding to r w.
(2) The object spins by only Θw+2 nπ or spins by only −Θw+2 nπ during the w unit time. However, Θw represents the spin amount corresponding to θw, and Θw=θw is satisfied when θw represents the spin amount itself.
It can be estimated that the spin amount per unit time (the spin amount per frame) is (Θw+2 nπ)/w or (−Θw+2 nπ)/w. In addition, the number of spins per minute, which is an example of information corresponding to the spin amount per unit time, is represented by {(Θw+2 nπ)/w}*fr*60/2 π [rpm] or {(−Θw+2 nπ)/w}*fr*60/2 π [rpm]. Here, fr [fps] represents the frame rate of the input video. For example, fr=120 or 480 are present.
Further, the spin state estimation unit 13 may estimate the spin state of the object by using other auxiliary information in addition to the hypothesis of spin state (rw and θw) corresponding to the selected specific w. For example, when the range of the number of spins of the object and the kind of pitch which may be thrown are known in advance, this information may be used as the auxiliary information. Further, a sensor such as a Doppler sensor may be used to detect the kind of pitch, position, ball speed, and the like, and the detection result may be used as the auxiliary information.
Further, the kind of pitch may be extracted from the position change of the object in the input video and used as the auxiliary information.
Further, depending on the relationship between the frame rate of the input video and the spin amount of the object per unit time, there is a case where information corresponding to the spin shaft of the object cannot be obtained at all. For example, as exemplified in
In the present embodiment, the spin state estimation unit 13 uses the target estimation image Et+w·u which is the image of the object at the time point t+w·u when the object in the object image Ot which is the image of the object at the certain time point t obtained from the input video in time-series is spun by w unit time on the basis of the hypothesis of spin state and the object image Ot+w·u at the time point t+w·u obtained from the input video, to estimate the spin state of the object by selecting the hypothesis of spin state and w in which the likelihood of the target estimation image Et+w·u becomes high among the plurality of hypotheses of spin states and the plurality of w. Thus, even when the target estimation image Et+w·u in which the object in the object image Ot is spun by w unit time on the basis of the hypothesis of spin state for any of w (for example, w=1 is satisfied) cannot be generated, the target estimation image Et+w·u can be generated for the other w, and the spin state of the object can be estimated by selecting the optimum w. As a result, in the present embodiment, the spin state of the object can be estimated regardless of the frame rate of the input video.
Note that, in
As a general tendency, as the absolute value of w increases, it is expected that the estimation accuracy of the spin state is improved. A detailed description will be given below. An error (noise) is defined as e when the spin state of the object is estimated by using the target estimation image Et+w·u and the object image Ot+w·u and selecting the hypothesis of spin state in which the likelihood of the target estimation image Et+w·u becomes high. Here, it is assumed that the magnitude of e does not depend so much on the magnitude of the absolute value of w. For example, it is assumed that e is not so changed as a whole in both the case of w=1 and the case of w=12. Therefore, it is assumed that the error per estimation of the spin state is e regardless of the value of w. Here, assuming that the true spin amount of the object per unit time u (one frame interval) is θ, the number of spins per minute [rpm] of the object estimated as w=1 and w=12 are as follows.
When it is estimated as w=1
(θ+e)*fr*60/2 π=θ*fr*60/2π+e*fr*60/2 π
When it is estimated as w=12:
{(θ*12+e)/12}*fr*60/2 π=θ*fr*60/2 π+(e/12)*fr*60/2 π
As described above, it is understood that the error per unit time ((e/12)*fr*60/2 π) in the case of w=12 is decreased to 1/12 with respect to the error per unit time (e*fr*60/2 π) in the case of w=1. In general, it is expected that the error per unit time in the case of w=c (where, c satisfies a≤c≤b and is an integer, a<b and c≠0 are established) becomes about 1/|c| of the error per unit time in the case of w=1. Therefore, as the absolute value of w becomes larger, the estimation accuracy of the spin state is expected to be improved.
For this reason, the search range a≤w≤b may be limited so that the absolute value of w becomes large. For example, the absolute value of w may be limited to 2 or more. That is, the absolute value of w is an integer of 2 or more and u is the unit time, the spin state estimation unit 13 may use the target estimation image which is the image of the object at the time point t+w·u obtained by spinning the object in the object image which is the image of the object at the time point t obtained from the input video of the plurality of frames in time-series by w unit time on the basis of the hypothesis of spin state and the object image at the time point t+w·u obtained from the input video, to estimate the spin state of the object by selecting the hypothesis of spin state and w in which the likelihood of the target estimation image becomes high from among the plurality of hypotheses of spin state and a plurality of w whose absolute values are two or more. Thus, it is possible to improve the accuracy of estimating the spin state as compared with the case where w=1 may be selected as in the first embodiment.
As exemplified in
The spin state estimation processing is the same as that of the first embodiment or the modification example 1 of the first embodiment. The information corresponding to the provisional spin amount is, for example, an angle and the number of spins representing the spin amount of the object, the number of spins (for example, per one minute, per one second, per one frame) per a predetermined time (for example, one minute, one second, a frame interval), and the like. The prior processing (step S130) is exemplified below.
The search range a≤w≤b of w based on the information corresponding to the provisional spin amount may be any one as long as the search range a≤w≤b is based on the information corresponding to the provisional spin amount. For example, the number of frames required for one spin of the object obtained from the information corresponding to the provisional spin amount is defined as Wtmp, and the range including Wtmp may be defined as the search range a≤w≤b. For example, the spin state estimation unit 13 may define the spin angle of the object in one frame interval from the information corresponding to the provisional spin amount as θr [radian], obtain Wtmp=2 π/θr and determine the search range a≤w≤b as any of the following. Here, N is a positive integer, d1, d2 are positive real numbers satisfying d1<d2 (for example, d1=0.7, and d2=1.3), and V is an integer of 2 or more.
The technique of the first embodiment and its modification examples 1 and 2 uses the target estimation image Et+w·u at the time point t+w·u obtained by spinning the object in the object image Ot at the time point t by w unit time on the basis of the hypothesis of spin state and the object image Ot+w·u at the time point t+w·u, to estimate the spin state of the object by selecting the hypothesis of spin state and w in which the likelihood of the target estimation image Et+w·u becomes high. That is, the spin state of the object is estimated on the basis of two images of the object image Ot at a certain time t and the object image Ot+w·u at the time t+w·u. On the other hand, as this processing, the spin state of the object is estimated based of 2K images of the object images Ot1, Ot2, . . . , OtK at a plurality of time points t1, t2, . . . , tK and the object images Ot1+w·u, Ot2+w·u, . . . , OtK+w·u at a plurality of time points t1+w·u, t2+w·u, . . . , tK+w·u. Here, K is an integer of 2 or more. For example, t1≠t2 . . . ≠tK is established and tk+1=tk+u is established. Note that, although “t α” of the subscript with the lower part (where, α =1, . . . , K) should be originally described as “tα”, due to the restriction of the description, it is described as “t α”.
That is, the spin state estimation unit 13 may use the target estimation images Et1+w·u, Et2+w·u, . . . , EtK+w·u of the images of the object at the time points t1+w·u, t2+w·u, . . . , tK+w·u obtained by spinning the object in the object images Ot1, Ot2, . . . , OtK at the time points t1, t2, . . . , tK obtained from the input video by w unit time on the basis of the hypothesis of spin state of the object and the object images Ot1+w·u, Ot2+w·u, . . . , OtK+w·u at the time points t1+w·u, t2+w·u, . . . , tK+w·u obtained from the input video wherein K is an integer of 2 or more, to estimate the spin state of the object by respectively selecting the hypothesis of spin state and w in which the likelihood of the target estimation images Et1+w·u, Et2+w·u, . . . , EtK+w·u become high among the plurality of hypotheses of spin state and the plurality of w.
In other words, the spin state estimation unit 13 may use the target estimation images Et1+w·u, Et2+w·u, . . . , EtK+w·u and the object images Ot1+w·u, Ot2+w·u, . . . , OtK+w·u, to estimate the spin state of the object by selecting the hypothesis of spin state and w in which the target estimation images Et1+w·u, Et2+w·u, . . . , EtK+w·u and the object images Ot1+w·u, Ot2+w·u, . . . , OtK+w·u are close to each other among the plurality of hypotheses of spin states and the plurality of w.
In this way, by estimating the spin state of the object on the basis of 2K images of the object images Ot1, Ot2, . . . , OtK and the object image Ot1+w·u, Ot2+w·u, . . . , OtK+w·u, the influence of variation in features of the object appearing in the image is reduced and estimation accuracy can be improved compared with estimation of the spin state of the object on the basis of the two images.
As exemplified in
<Object Image Generation Unit 11 (Step S11)>
The processing of step S11 by the object image generation unit 11 is the same as that of the first embodiment. However, the object image generated in the step S11 is output to the spin state estimation unit 23.
<Spin State Estimation Unit 23 (Step S23)>
The object image generated by the object image generation unit 11 is input to the spin state estimation unit 23. The spin state estimation unit 23, as mentioned above, uses the target estimation images Et1+w·u, Et2+w·u, . . . , EtK+w·u which are the images of the object at the time points t1+w·u, t2+w·u, . . . , tK+w·u obtained by spinning the object in the object images Ot1, Ot2, . . . , OtK at the time points t1, t2, . . . , tK obtained from the input video and the object images Ot1+w·u, Ot2+w·u, . . . , OtK+w·u at the time points t1+w·u, t2+w·u, . . . , tK+w·u obtained from the input video on the basis of the hypothesis of spin state, to estimate the spin state of the object by respectively selecting the hypothesis of spin state and w in which the likelihood of the target estimation images Et1+w·u, Et2+w·u, . . . , EtK+w·u becomes high among the plurality of hypotheses of spin states and the plurality of w, and outputs the estimation result.
Also in the present embodiment, the spin state of the object can be estimated regardless of the frame rate of the input video, as in the first embodiment. Furthermore, in the present embodiment, the spin state of the object is estimated on the basis of 2K images of the object images Ot1, Ot2, . . . , OtK and the object images Ot1+w·u, Ot2+w·u, . . . , OtK+w·u, so that estimation accuracy can be improved more than estimation of spin state of the object on the basis of the two images.
As in the modification example 1 of the first embodiment, the search range a≤w≤b may be limited in the second embodiment so that the absolute value of w becomes large. For example, the absolute value of w may be limited to 2 or more.
Similarly to the modification example 2 of the first embodiment, in the second embodiment, the spin state estimation unit 23 estimates information corresponding to the provisional spin amount of the object in the pre-processing (step S130), and in the following spin state estimation processing (steps S231, S132 and S133), the spin state of the object may be estimated by selecting the hypothesis of spin state and w in which the likelihood of the target estimation image becomes high from among the plurality of hypotheses of spin states and the search range a≤w≤b of w based on information corresponding to the provisional spin amount, respectively.
However, in such a case, when processing for extracting features of the object such as edge extraction is performed, features of a boundary part between the part directly irradiated with light and the part becoming a shadow are extracted, and the spin state of the object may not be correctly estimated. For example, when edge extraction of the object image Ot exemplified in
In this regard, although a method of removing a shadow from the object image by a known image processing technique can be considered, it is difficult to appropriately determine whether or not the shadow is present in an object image obtained based on the input video photographed in various environments different in the position of the sun, weather, and the like, and there are many cases where the shadow cannot be sufficiently removed. Also, a method of not using pixels in a predetermined fixed region of the object image for estimating the spin state may be considered. However, the positions of the shadows and boundary parts are different depending on the environment such as the position of the sun, and an appropriate fixed region cannot be set in an object image obtained based on the input video photographed in various environments.
Therefore, in the present embodiment, the spin state estimation device estimates the spin state of the object by using an object image obtained by excluding at least a part of a region common to a plurality of frames from an image corresponding to the object obtained from the input video of the plurality of frames in time-series. The image corresponding to the object may be an image of the object or an image obtained by extracting features of the object. An example of an image obtained by extracting features of an object is an image obtained by performing edge extraction of the object. An example of an image obtained by performing edge extraction of the object is an image in which a pixel value of an edge region of the object is pix 1 (for example, a pixel value representing black), and a pixel value other than the edge region is pix 2 (for example, a pixel value representing white). Preferably, the spin state of the object is estimated by using the object image obtained by excluding at least a part of the region common to the plurality of frames from the image obtained by extracting the features of the object from the input video. Although the positions of the shadows and boundary parts are different depending on the photographing environment as described above, the positions of the shadows and boundary parts of the images of the objects obtained from the same input video are hardly changed as exemplified in
As exemplified in
<Object Image Generation Unit 11 (Step S11)>
The processing of step S11 by the object image generation unit 11 is the same as that of the first embodiment. However, the object image generated in the step S11 is output to the shadow region exclusion unit 32.
<Shadow Region Exclusion unit 32 (step S32)>
The object image output from the object image generation unit 11 is input to the shadow region exclusion unit 32. As described above, the object image may be an image segmented from the input video or an image obtained by extracting features of the object. The shadow region exclusion unit 32 obtains and outputs an object image obtained by excluding at least a part of a region common to a plurality of frames from the input object image (the image corresponding to the object obtained from the input video of the plurality of frames in time-series).
As exemplified in
Next, the shadow region exclusion unit 32 generates a mask m for excluding information on a region including at least a part of the region common to the extracted plurality of frames (referred to “a removal region” below) (step S322). The shadow region exclusion unit 32, for example, generates an image in which the pixel value of the removal region is pix 3 (for example, pix 3=0) and pixel value of a region other than the removal region is pix 4 (for example, pix 4=1) as the mask m. Alternatively, the pix 3 and pix 4 may be set so that the change of the pixel value corresponding to the change of the coordinates becomes continuous (smooth) in the vicinity of the boundary between the removal region and the region other than the removal region. The mask m corresponding to the object image Ot described in
Next, the shadow region exclusion unit 32 applies the mask m obtained in the step S322 to the object image input in the step S321, and obtains and outputs an object image obtained by excluding the removal region from the input object image (step S323). For example, the shadow region exclusion unit 32 obtains and outputs an object image in which a value obtained by multiplying a pixel value of each coordinate (x, y) of the mask m by a pixel value of each coordinate (x, y) of the object image input in the step S321 (for example, the image obtained by extracting features of the object) as the pixel value of each coordinate (x, y).
An object image obtained by excluding the exclusion region generated by the shadow region exclusion unit 32 is output to the spin state estimation unit 13 (or the spin state estimation unit 23).
<Spin State Estimation Unit 13 or 23 (Step S13 or S23)>
This processing is the same as that of the first embodiment, the second embodiment, or the modification example thereof, except that the object image output from the shadow region exclusion unit 32 is used.
<Features of Present Embodiment>
Also in this embodiment, the spin state of the object can be estimated regardless of the frame rate of the input video, as in the first embodiment. Further, in the present embodiment, a spin state of the object is estimated by using an object image obtained by excluding at least a part of the region common to the plurality of frames from the image corresponding to the object obtained from the input video of the plurality of frames in time-series. Therefore, the influence of the shadow appearing on the object and the boundary region can be reduced, and the estimation accuracy of the spin state of the object can be improved.
Note that, even when no shadow appears on the object, the accuracy of estimating the spin state of the object is hardly reduced.
Also, even if the object spins, the image of the shaft center part of the spin shaft of the object does not largely change. For this reason, the shaft center part may be included in the removal region. However, even in such a case, since the region of the shaft center part is small, the estimation accuracy of the spin state of the object is hardly reduced.
[Hardware Configuration]
The spin state estimation devices 1, 2, and 3 according to each embodiment are devices configured by executing a predetermined program by a general-purpose or dedicated computer including a processor (a hardware processor) such as a CPU (central processing unit), memory such as a RAM (random-access memory), a ROM (read-only memory), and the like. The computer may include one processor and memory, or may include a plurality of processors and memories. This program may be installed in the computer or may be recorded in the ROM or the like in advance. In addition, a part or all of the processing unit may be constituted by using an electronic circuit which realizes a processing function independently, instead of an electronic circuit (circuitry) which realizes a functional configuration by reading a program like a CPU. Further, an electronic circuit constituting one device may include a plurality of CPUs.
The above mentioned program can be recorded on a computer-readable recording medium. An example of computer-readable recording medium is a non-transitory recording medium. Examples of such recording media are a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory, and the like.
The distribution of the program is performed by, for example, selling, transferring, lending, or the like a portable recording medium such as a DVD or CD-ROM recording the program. Further, the program may be stored in the storage device of the server computer and transferred from the server computer to another computer via the network to distribute the program. As described above, the computer executing such a program, for example, temporally stores the program stored in the portable recording medium or the program transferred from the server computer in the its own storage device. When executing the processing, the computer reads the program stored in its own storage device, and executes the processing in accordance with the read program. As another execution form of the program, the computer may directly read the program from the portable recording medium and execute processing in accordance with the program, each time the program is transferred from the server computer to the computer, processing in accordance with the received program may be executed sequentially. In addition, by a so-called ASP (Application Service Provider) type service which does not transfer the program from the server computer to the computer and realizes a processing function only by the execution instruction and the result acquisition, the above-mentioned processing may be executed. It is assumed that the program in this embodiment includes information that is used for processing by a computer and is equivalent to the program (e.g., data that is not a direct command to a computer but has the nature of defining the processing of a computer).
Although the device is configured by executing a predetermined program on a computer in each embodiment, at least a part of these processing contents may be realized using hardware.
The present invention is not limited to the above-described embodiment. For example, the various processing described above may not only be executed in chronological order in accordance with the description, but may also be executed in parallel or individually according to the processing capacity of the device that executes the processing or as required. In addition, it goes without saying that changes can be made as appropriate without departing from the spirit of the present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/003024 | 1/28/2021 | WO |