This specification relates to object detection in perception systems.
Object detection systems are designed in machine structures for autonomous vehicles and autonomous security systems. For example, autonomous automotive vehicles include control systems responsive to object detection with a perception system for identifying and localizing encountered objects, and a maneuvering system that utilizes the information from the perception system to enable the vehicle to drive safely. Autonomous aircraft vehicles include control systems responsive to vision-based object classification for above-wing and below-wing autonomy. Autonomous airport security systems include control systems responsive to image-based perception systems for autonomous security.
Many object detection systems are based on Deep Neural Network (DNN) architectures that can exhibit errors due to network bias or small perturbations in the perceived data received from a sensor that scans for objects in a target environment. A significant portion of DNN errors result from the misclassification of detected objects. These classification errors can lead to false detections, which introduces uncertainty and errors such as in safe maneuvering of an autonomous vehicle.
Like reference symbols in the various figures indicate like elements.
Sensor 102 provides perception data 128 that captures images 130 of a detected object 132, such as detected pedestrian object 1321 or detected cyclist object 1322, in an environment surrounding or in proximate vicinity to the vehicle 100 during a sequence of time intervals t1 to tF, where the subscript F represents the total number of frames that include detected object 132 in captured images 130. For example, sensor 102 such as a camera sensor may generate a video signal for providing perception data 128 having a sequence of frames representing captured images 130 of detected pedestrian object 1321 or detected cyclist object 1322 during the sequence of time intervals t1 to tF. The sequence of frames from captured images 130 are associated with a current frame ftc and prior frames within the sequence of time intervals t1 to tF.
Sensor 102 may utilize other sensor modalities such as lasers, sonar, radar, and light detection and ranging (LiDAR) sensors that scan and record data from objects surrounding autonomous vehicle 100 to provide perception data 128. In one embodiment, a measurement for the sequence of frames representing captured images 130 may be a predetermined time interval between frames such as every millisecond, every second, or may be a number of frames in a predetermined time interval such as 10 frames per second.
Autonomous vehicle control system 104 may include a memory 106 and an autonomous vehicle controller 108. In one embodiment, memory 106 may be integrated in autonomous vehicle controller 108. Memory 106 may include a reference object class 110 that represents an object class associated with detected object 132, such as detected pedestrian object 1321 or detected cyclist object 1322. Reference object class 100 has associated reference parameters which include reference component-descriptors 112, a reference observation time constraint 114, and a reference temporal similarity measure boundary 116. Reference object class 110 with its associated reference parameters may be determined from neural network or machine learning model training such as the training illustrated in
Autonomous vehicle controller 108 may include an object detector 120, a component-based similarity measure generator 122, an object classification verifier 124, and an autonomous decision-making system 126.
Object detector 120 is responsive to images 130 for identifying an object localization 136 of detected object 132, such as detected pedestrian object 1321 or detected cyclist object 1322, and generating an object classification 138 associated with object localization 136 at each frame in the sequence of time intervals of t1 to tF. For example, object detector 120 generates object localization 136 with an associated object classification 138 for each detected object 132, such as detected pedestrian object 1321 or detected cyclist object 1322, that is identified and localized in a captured image of images 130 during each time frame in the sequence of time intervals of t1 to tF. Object localization 130 may define a bounding box centered on detected pedestrian object 1321, and another bounding box centered on detected cyclist object 1322.
Component-based similarity measure generator 122 may be configured to generate a sequence of similarity measures 144 associated with the sequence of time intervals of t1 to tF. The component-based similarity measure generator 122 may be responsive to object classification 138 and object localization 136 at each frame in the sequence of time intervals t1 to tF for (i) generating object component-descriptors 140 and (ii) comparing object component-descriptors 140 with reference component-descriptors 112 to generate each similarity measure in the sequence of similarity measures 144. In one embodiment, component-based similarity measure generator 122 associates object classification 138 with reference object class 110 to determine component-descriptors 140 from object localization 136, and to generate a similarity measure (in the sequence of similarity measures 144) which compares object component-descriptors 140 with reference component-descriptors 112 at each frame in the sequence of time intervals t1 to tF. For example, each similarity measure in the sequence of similarity measures 144 may represent a difference or distance measure between object component-descriptors 140 and reference component-descriptors 112. Component-based similarity measure generator 122 may include a buffer for storing the sequence of similarity measures 144. Alternatively, object classification verifier 124 may include a buffer for storing sequence of similarity measures 144.
Object classification verifier 124 compares the sequence of similarity measures 144 generated within reference observation time constraint 114 to reference temporal similarity measure boundary 116 for generating object classification verification data 160 associated with object classification 138. Object classification verifier 124 may be configured to be responsive to object classification 138 for selecting reference observation time constraint 114 and reference temporal similarity measure boundary 116 associated with reference object class 110 having an object class that is the same an object class associated with object classification 138. Object classification verification data 160 may represent validity or error measure associated with object classification 138 from object detector 120.
Autonomous decision-making system 126, according to one embodiment, may be responsive to object classification verification data 160 for generating a decision-making command 162. Speed and control system 103 is responsive to decision-making command 162 to autonomously maneuver autonomous vehicle 100. For example, decision-making command 162 may include steering and speed controls for safe maneuvering in response to object classification verification data 160.
Reference object class 110m refers to the mth reference object class 110m in the set of reference object classes 1101 to 110M, for m=1 to M where M is the total number of reference object classes. The mth set of reference component-descriptors 112m.1 to 112m.Nm include Nm reference component-descriptors, where the number of Nm reference component-descriptors may depend on characteristics of an object class such as a pedestrian or cyclist associated with reference object class 110m.
In one embodiment, reference object class 110m may be selected in response to object classification 138 generated by object detector 120 of
Reference component-descriptors 112m.1 to 112m.Nm may include (i) a histogram of Nm reference component-descriptors 112m.1 to 112m.Nm that correspond to Nm reference component cluster centroids in a reference embedding space associated with reference object class 110m and (ii) a reference embedding space mapping protocol 112embedding_protocol_m associated with generating the histogram of Nm reference component-descriptors 112m.1 to 112m.Nm. Also, reference embedding space mapping protocol 112embedding_protocol_m may be used for generating a set of object component-descriptors from the object localization 136 associated with the object classification 138 in
Each reference embedding space mapping protocol 112embedding_protocol_m may include, according to one embodiment, neural network architecture design hyperparameters and associated weights that are determined during training of reference component descriptors 112m.1 to 112m.Nm for an object class associated with reference object class 110m. For example, the neural network architecture design hyperparameters and associated weights may be used to configure neural network architecture in component-based similarity measure generator 122 of
Reference observation time constraint 114m(tstart_m, tend_m) includes an observation start time tstart_m and an observation end time tend_m which may define the current frame and prior frames within the sequence of time intervals t1 to tF for an observation of sequence of similarity measures 144 associated with object localization 136 and object classification 138 in
Reference temporal similarity measure boundary 116m represents performance characteristics from reference similarity measure sequences within Q m frames during reference observation time constraint 114m (tstart_m, tend_m) associated with reference object class 110m. For example, the reference similarity measure sequences are learned during a training process such as illustrated in
The similarity measure may be an earth mover distance (EMD) measure. For example, an EMD measure represents an amount of work needed to transform one distribution into another distribution when measuring distance in an embedded space between components that belong to the same object type. In alternate embodiments, other distribution-based distances (such as Wasserstein distance, or any other similarity measures such as L1 norm and L2 norm distances) may also be used where the distance between components that belong to the same object type should be a low distance to provide a measure of uncertainty or certainty when verifying object classification from an object detector.
Referring to the embodiment of
Classification confidence value 302 represents a confidence measure of objection classification 138 associated with object localization 136. Classification confidence threshold 302 may be set at a threshold value for determining whether to provide object classification 138 with its associated object localization 136 to component-based similarity measure generator 122, and to provide object classification 138 to object classification verifier 124. That is, when classification confidence value 320 is below the threshold value of classification confidence threshold 304, object classification 138 with its associated object localization 136 is not provided to component-based similarity measure generator 122. Conversely, when classification confidence value 302 is at or above the threshold value of classification confidence threshold 304, object classification 138 with its associated object localization 136 is provided to component-based similarity measure generator 122 and object classification 138 is provided to object classification verifier 124.
For example, object detector 120 may be configured with classification confidence threshold 304 at 50%, and receives perception data 128(t) having a captured image of detected pedestrian object 1321 and detected cyclist object 1322 at a current frame ftc in the sequence of time intervals of t1 to tF:
A. Detected Pedestrian Object 1321. For detected Pedestrian Object 1321, object detector 120 identifies and localizes detected pedestrian object 1321 in object localization 1361(tc) with (i) an associated object classification 1381(tc) representing a PEDESTRIAN label having 85% confidence value at classification confidence value 302, which satisfies the 50% threshold value at classification confidence threshold 302 and (ii) another object classification 1381(tc) representing a DOG label having 15% confidence value at classification confidence value 302, which does not satisfy the 50% threshold at classification confidence threshold 302. Accordingly, object classification 1381(tc) having the PEDESTRIAN label together with its associated object localization 1361(tc) that identified and localized detected pedestrian object 1321 are provided to component-based similarity measure generator 122 of
B. Detected Cyclist Object 1322. For detected cyclist object 1322, object detector 120 identifies and localizes detected pedestrian object 1322 in object localization 1362(tc) with (i) an associated object classification 1382(tc) representing a CYCLIST label having 45% confidence value at classification confidence value 302, which does not satisfy the 50% threshold value at classification confidence threshold 302 and (ii) another object classification 1382(tc) representing a CAR label having 55% confidence value at classification confidence value 302, which does satisfy the 50% threshold at classification confidence threshold 302. Accordingly, object classification 1382(tc) having the CAR label together with its associated object localization 1362(tc) that identified and localized detected cyclist object 1322 are provided to component-based similarity measure generator 122 of
Object-component descriptor detector 140 is responsive to object classification 1381(tc) for selecting a reference embedding space mapping protocol 112embedding_protocol_1, shown as Yembedding_protocol_1, from memory 106 of
Reference embedding space mapping protocol 112embedding_protocol_1 is associated with the histogram of N reference component-descriptors 1121.1 to 1121.N
Component-descriptor comparator 142 generates a similarity measure 1441(tc), also shown as similarity measure SM1(tc) from a cumulative difference function between object component-descriptors X1.1(tc) to X1.N
In one embodiment, component-based similarity measure generator 122 may include similarity measures 144 with a sequence of similarity measures 1441(t) from component-descriptor comparator 142 which are provided to object classification verifier 124 of
Object classification verifier 124 may include similarity measure comparator 148 having a buffer 1501 and a temporal similarity measure verifier 152. Similarity measure comparator 148 receives (i) object classification 1381(tc) having the PEDESTRIAN label associated with detected pedestrian object 1321(tc) at a current frame ftc, and (ii) a sequence of similarity measures 1441(t) associated with a sequence of object classifications 1381(t), where the time t includes the current frame current frame ftc and prior frames in the sequence of time intervals of t1 to tF.
Reference observation time constraint 1141(tstart_1, tend_1) includes Q1 frames within observation start time tstart1 to observation end time tend1. The Q1 frames define the current frame ftc and the prior frames within the sequence of time intervals t1 to tF for the sequence of similarity measures 1441(t). Reference temporal similarity measure boundary 1161 is illustrated as temporal similarity measure boundary SMboundary_1. The sequence of similarity measures 1441(t)=SM1(t) includes similarity measure SM1(tc) at the current frame ftc and similarity measures SM1(tc−1), SM1(tc−2), SM1(tc−(Q−1)), . . . SM1(t1) at the prior frames within the sequence of time intervals t1 to tF.
Temporal similarity measure verifier 152 compares the sequence of similarity measures SM1(t) generated within reference observation time constraint 1141(tstart_1, tend_1) to a reference temporal similarity measure boundary SMboundary_1 for generating object classification verification data 1601(tc) associated with object classification 1381(tc) at the current frame ftc.
Object classification verification data 1601(tc) may represent a validation measurement for object classification 1381(tc) at the current frame ftc from a combined similarity measure and probabilistic signal temporal logic (PSTL) constraint. The combined similarity measure and PSTL constraint is based on (i) the sequence of similarity measures SM1(t) during the current frame ftc and the prior frames within reference observation time constraint 1141(tstart_1, tend_1); and (ii) the reference temporal similarity boundary SMboundary_1.
In one embodiment, the combined similarity measure SM1(t) and probabilistic signal temporal logic (PSTL) is generated as follows:
Object classification verification data 1601(tc) represents a validation measurement that verifies the current object classification 1381(tc) having a pedestrian label identified from the object localization 1361(tc) associated with the detected pedestrian object 1321. This verification reflects that sequence of similarity measures 1441(t)=SM1(t) during the observation time constraint 1141 (tstart_1, tend_1) are within the reference temporal similarity measures boundary 1161. For example, the sequence of similarity measures 1441(t)=SM1(t) during the observation time constraint 1141 (tstart_1, tend_1) is modeled by various instances of a reference similarity measure sequence within temporal similarity measure boundary 1161.
In one embodiment, reference object class 1101 has an associated reference similarity measure threshold 1171=SMTH_1 stored in memory 106 of FIG. 2A. Similarity measure comparator 148 may include a static similarity threshold verifier 154 and a verification data selector 156. Reference similarity measure threshold 1171=SMTH_1 is selected in response to object classification object classification 1381(tc) at the current frame. Static similarity threshold verifier 154 compares similarity measure SM1(tc) to reference similarity measure threshold 1171=SMTH_1 for generating object classification verification data 1601(tc) associated with object classification 1381(tc). Verification data selector 156 selects the output from static similarity threshold verifier 146 to utilize reference similarity measure threshold 1171=SMTH_1 until the sequence of similarity measures SM1(t) are at least equal to the number of Q1 frames within the reference observation time constraint 1141(tstart1, tend1). Verification data selector 156 selects the output from temporal similarity measure verifier 152 when the sequence of similarity measures SM1(t) are at least equal to the number of Q1 frames in the reference observation time constraint 1141 (tstart1, tend1).
Object-component descriptor detector 140 is responsive to object classification 1382(tc) for selecting a reference embedding space mapping protocol 112embedding_protocol_2, shown as Yembedding_protocol_2, from memory 106 of
Reference embedding space mapping protocol 112embedding_protocol_2 is associated with the histogram of N reference component-descriptors 1122.1 to 1122.N
Component-descriptor comparator 142 generates a similarity measure 1442(tc), also shown as similarity measure SM2(tc) from a cumulative difference function between object component-descriptors X2.1(tc) to X2.N
In one embodiment, component-based similarity measure generator 122 may include similarity measures 144 with a sequence of similarity measures 1442(t) from component-descriptor comparator 142 which are provided to object classification verifier 124 of
Object classification verifier 124 may include similarity measure comparator 148 having a buffer 1502 and a temporal similarity measure verifier 152. Similarity measure comparator 148 receives (i) object classification 1382(tc) having the CAR label associated with detected pedestrian object 1322(tc) at a current frame ftc, and (ii) a sequence of similarity measures 1442(t) associated with a sequence of object classifications 1382(t), where the time t includes the current frame current frame ftc and prior frames in the sequence of time intervals of t2 to tF.
Reference observation time constraint 1142(tstart_2, tend_2) includes Q2 frames within observation start time tstart2 to observation end time tend2. The Q2 frames define the current frame ftc and the prior frames within the sequence of time intervals t2 to tF for the sequence of similarity measures 1442(t). Reference temporal similarity measure boundary 1162 is illustrated as temporal similarity measure boundary SMboundary_2. The sequence of similarity measures 1442(t)=SM2t) includes similarity measure SM2(tc) at the current frame ftc and similarity measures SM2(tc−1), SM1(tc−2), SM2(tc-(Q−1)), . . . SM2(t1) at the prior frames within the sequence of time intervals t2 to tF.
Temporal similarity measure verifier 152 compares the sequence of similarity measures SM2(t) generated within reference observation time constraint 1142(tstart_2, tend_2) to a reference temporal similarity measure boundary SMboundary_2 for generating object classification verification data 1602(tc) associated with object classification 1382(tc) at the current frame ftc.
Object classification verification data 1602(tc) may represent a validation measurement for object classification 1382(tc) at the current frame ftc from a combined similarity measure and probabilistic signal temporal logic (PSTL) constraint. The combined similarity measure and PSTL constraint is based on (i) the sequence of similarity measures SM2(t) during the current frame ftc and the prior frames within reference observation time constraint 1142(tstart_2, tend_2); and (ii) the reference temporal similarity boundary SMboundary_2.
In one embodiment, the combined similarity measure SM2(t) and probabilistic signal temporal logic (PSTL) is generated as follows:
Object classification verification data 1602(tc) represents a validation measurement that does not verify the current object classification 1382(tc) having a CAR label identified from the object localization 1362(tc) associated with the detected pedestrian object 1322. This non-verification reflects that sequence of similarity measures 1442(t)=SM2(t) during the observation time constraint 1122(tstart_2, tend_2) are not within the reference temporal similarity measures boundary 1162. For example, the sequence of similarity measures 1442(t)=SM2(t) during the observation time constraint 1142 (tstart_2, tend_2) do not satisfy model characteristics of any instance of a reference similarity measure sequence within temporal similarity measure boundary 1162.
In another embodiment, reference object class 1102 has an associated reference similarity measure threshold 1172=SMTH_2 stored in memory 106 of
Step 408 generates a sequence of similarity measures associated with the sequence of frames. The step of generating the sequence of similarity measures includes being responsive to the object classification and the object localization at each frame for (i) generating object component-descriptors and (ii) comparing the object component-descriptors with the reference component-descriptors to generate each similarity measure in the sequence of similarity measures. Step 410 compares the sequence of similarity measures generated within the reference observation time constraint to the reference temporal similarity measure boundary for generating object classification verification data associated with the object classification. Step 412 generates a decision-making command in response to the object classification verification data. Step 414 controls the speed and control system in response to the decision-making command for autonomously maneuvering the autonomous vehicle.
Step 508 generates a sequence of similarity measures associated with the sequence of frames. The step of generating the sequence of similarity measures is responsive to the object classification and the object localization at each frame for (i) generating object component-descriptors and (ii) comparing the object component-descriptors with the reference component-descriptors to generate each similarity measure in the sequence of similarity measures. Step 510 compares the sequence of similarity measures generated within the reference observation time constraint to the reference temporal similarity measure boundary for generating object classification verification data associated with the object classification. Step 512 generates a decision-making command in response to the object classification verification data. Step 514 controls the autonomous system in response to the decision-making command.
The perception system may be embedded in an autonomous vehicle that includes a (i) sensor and (ii) a speed and steering control system, and the step of controlling the perception system includes controlling the speed and control system in response to the decision-making command for autonomously maneuvering the autonomous vehicle. Alternatively, the perception system may be embedded in an autonomous aviation security system that includes a surveillance system, and the step of controlling the perception system includes controlling the surveillance system in response to the decision-making command for autonomously controlling the aviation security system.
Referring to
The reference component-descriptors may include (i) a histogram of reference component-descriptors that represent reference component cluster centroids in an embedded space associated with the reference object class and (ii) an embedding space mapping protocol associated with generating the histogram of reference component-descriptors; and the object component-descriptors may include a histogram of object component-descriptors that represent object component locations in the embedded space associated with the reference object class. In one embodiment, the histogram of reference component-descriptors may include a histogram of N reference component-descriptors Y1 through YN that represent N reference component cluster centroids in the embedded space associated with the reference object class; the histogram of object component-descriptors may include a histogram of N object component-descriptors X1 through XN that represent N object component locations in the embedded space associated with the reference object class; and the embedding space mapping protocol generates the histogram of N object component-descriptors X1 through XN from the object localization at each frame.
The memory may include a set of reference object classes, each reference object class in the set of reference object classes having an associated set of reference component-descriptors, a reference observation time constraint, and a reference temporal similarity measure boundary. A reference object class is selected from the set of reference object classes in response to an object classification label associated with the generated object classification.
The method steps may include generating a classification confidence value associated with the object classification; and generating the sequence of similarity measures when the classification confidence value satisfies a classification confidence threshold. The reference embedding space mapping protocol may be selected in response to the object classification to generate the histogram of object component-descriptors from the object localization at each frame.
The similarity measure may be generated as a cumulative difference between the object component-descriptors and the reference component-descriptors. The similarity measure may be generated as a cumulative difference function between the object component-descriptors X1 through XN and the reference component-descriptors Y1 through YN, and the cumulative difference function is:
In one embodiment, the sequence of frames has a current frame and prior frames and the reference observation time constraint an observation start time tstart and an observation end time tend. The object classification verification data represents a validation measurement for the object classification at the current frame, the validation measurement is a comparison of (i) the sequence of similarity measures associated with the detected object at the current frame and the prior frames within the observation start time tstart and the observation end time tend and (ii) the reference temporal similarity measure boundary associated with the reference object class. The validation measurement for the object classification is a verified classification when the sequence of similarity measures associated with the detected object at the current frame and the prior frames during the reference observation time constraint is within the reference temporal similarity measure boundary associated with the reference object class.
For example, the validation measurement for the object classification at the current frame is determined from combined similarity measures and probabilistic signal temporal logic constraints based on (i) the sequence of similarity measures during the current frame and the prior frames within reference observation time constraint; and (ii) the reference temporal similarity boundary associated with the reference object class. The validation measurement represents a verified classification at the current frame when the sequence of similarity measures associated with the detected object during the reference observation time constraint is within the reference temporal similarity measure boundary. The validation measurement represents a misclassification at the current frame when the sequence of similarity measures associated with the detected object during the reference observation time constraint are not within the reference temporal similarity measure boundary. The combined similarity measures with probabilistic signal temporal logic constraints may be generated as follows:
In one embodiment, method 600 illustrates an example of developing the set of reference object classes 1101 to 110M for post-processing an object detector output in autonomous systems such autonomous surface vehicles, autonomous aerial vehicles, and aviation security systems. Each reference object class 110m in the set of reference object classes 1101 to 110M may have trained parameters that include (i) a set of reference component-descriptors 112m.1 to 112m.Nm, (ii) a reference observation time constraint 114m (tstart_m, tend_m), and (iii) reference temporal similarity measure boundary 116m associated with reference object class 110m. For example, the set of reference object classes 1101 to 110M together with their respective trained parameters may be stored in memory 106 of
Method 600 includes step 602 that provides a benchmark dataset having objects associated with reference object classes 1101 to 110M, step 604 that generates reference component-descriptors 112m.1 to 112m.Nm, step 606 that determines a similarity measure threshold 117m=SMTH_m, and step 608 that validates a reference observation time constraint 114m (tstart_m, tend_m) for a reference similarity measure boundary 116m:
I. Generating Reference Component-Descriptors. Step 604 performs the following steps to generate reference component-descriptors 112m.1 to 112m.Nm: step 604-1 selects a first set of images having an object associated with a selected reference object class 110m which represents an object class such as a pedestrian or cyclist object class from a benchmark dataset; step 604-2 maps component patches from detected objects in the first set of images to an embedding space with a selected number of N clusters having maximum inter-cluster variations and discriminative visual appearance; and step 604-3 generates a set of reference component-descriptors 112m.1 to 112m.Nm, in response to the N clusters. The selected reference object class 110m refers to the mth reference object class 110m in the set of reference object classes 1101 to 110M, for m=1 to M. The mth set of reference component-descriptors 112m.1 to 112m.Nm include Nm reference component-descriptors, where the number of Nm reference component-descriptors may depend on characteristics of an object class such as a pedestrian or cyclist associated with reference object class 110m.
II. Determine Similarity Measure Threshold. Step 606 performs the following steps to determine a similarity measure threshold 117m=SMTH_m: Step 606-1 selects a second set of images having an object associated with the selected reference object class 110m from the benchmark dataset. Step 606-2 maps component patches from detected objects to an embedding space defining a set of training component-descriptors TDm.1 to TDm.Nm for each detected object. Step 606-3 generates similarity measures between each set of training component-descriptors TDm.1 to TDm.Nm and the reference set of component-descriptors 112m.1 to 112m.Nm. Step 606-4 determines a reference similarity measure threshold 117m=SMTH_m for identifying similarity measures associated with expected true positive measures and similarity measures associated with expected false positive measures.
III. Validate Observation Time Constraint and Similarity Measure Boundary. Step 608 includes the following steps to determine validation accuracy for reference observation time constraint 114m (tstart_m, tend_m): Step 608-1 selects a set of video sequences with images having an object associated with the selected reference object class 110m from the benchmark dataset. Step 608-2 selects a reference observation time constraint 114m (tstart_m, tend_m). Step 608-3 maps component patches from detected objects to an embedding space that defines a set of validation component-descriptors VDm.1 to VDm.Nm for each detected object during the reference observation time constraint 114m (tstart_m, tend_m). Step 608-4 determines reference temporal similarity measure boundary 116m defining sequences of generated similarity measures representing expected true positive detections and expected false positive detections during the reference observation time constraint 114m (tstart_m, tend_m).
Step 608-5 compares the sequences of generated similarity measures to ground truth data for the selected reference object class 110m to determine validation accuracy for the reference observation time constraint 114m (tstart_m, tend_m). If the validation accuracy of reference observation time constraint 114m (tstart_m, tend_m) does not satisfy a validating accuracy threshold target, then perform step 608-6 to adjust reference observation time constraint 114m (tstart_m, tend_m) in step 608-2 and repeat steps 608-3 to 608-5. If the validation accuracy for reference observation time constraint 114m (tstart_m, tend_m) satisfies the validating accuracy threshold target, then perform step 608-7 to store the training parameters developed in steps 604, 606, and 608, in memory such as memory 106 of
Step 604-1 selects a first set of images having an object associated with the selected reference object class 110m from a benchmark dataset.
For each image in the set of first images, step 604-2 maps component patches from detected objects to an embedding space having N clusters by performing the following steps: step 630 detects the object having the selected reference object class 110m; step 634 extracts patches from the detected object; and step 640 maps extracted patches to an embedding space 642 with a clustering criterion and a selected number of Nm clusters that maximizes inter-cluster variations and discriminative visual appearance of reference component descriptor clusters in embedded space 642, the cluster criterion is defined by an embedding space protocol 112embedding_protocol_m associated with reference component descriptor clusters 644.
Step 604-3 generates a set of reference component-descriptors 112m.1 to 112m.Nm=Ym.1 to Ym.Nm in response to the N clusters having component cluster centroids for detected objects having the selected reference object class 110m. An example of embedded clustering to create a histogram of component attributes is disclosed in commonly assigned issued U.S. Pat. No. 11,023,798 entitled Machine-Vision Method to Classify Input Data Based on Object Components, issued on Jun. 1, 2021, which is hereby incorporated by reference in its entirety.
For each image in the second set of images, steps 606-2 and 606-3 compiles a distribution of similarity measures generated between training component descriptors and the reference component descriptors by performing the following steps: step 650 detects the object having the selected reference object class 110m; step 652 extracts patches from the detected object; step 654 utilizes the embedding space mapping protocol 112embedding_protocol_m to determine a set of training component descriptors TDm.1 to TDm.Nm (also shown as Xm.1 to Xm.Nm) from the extracted patches; step 656 generates a similarity measure between the training component descriptors TDm.1 to TDm.Nm=Xm.1 to Xm.Nm and the reference component descriptors 112m.1 to 112m.Nm=Ym.1 to Ym.Nm, and step 658 compiles a distribution of similarity measures representing a likelihood of possible similarity measures for the detected object.
Step 606-4 selects the reference similarity measure threshold 117m=SMTH for defining the similarity measures associated with expected true positive detections and the similarity measures associated with expected false positive detections.
For each video sequence during the reference observation timing constraint 114m (tstart_m, tend_m), step 608-3 compiles a distribution of similarity measures generated between validation component-descriptors and the reference component-descriptors by performing the following steps: step 680 detects the object having the selected reference object class 110m; step 682 extracts patches from the detected object; step 684 utilizes the embedding mapping space protocol 112embedding_protocol_m to determine a set of validation component descriptors VDm.1 to VDm.Nm=Xm.1 to Xm.Nm from the extracted patches, and step 686 generates a similarity measure between the validation component descriptors VDm.1 to VDm.Nm=Xm.1 to Xm.Nm and the reference component descriptors 112m.1 to 112m.Nm=Ym.1 to Ym.Nm; and step 688 compiles a distribution of similarity measures representing a likelihood of possible similarity measures for each detected object during the reference observation timing constraint 114m (tstart_m, tend_m).
Step 608-4 determines temporal similarity measure boundary 116m based on performance characteristics from similarity measures sequences within the time constraint 114m (tstartm, tendm) for the selected reference object class 110m. For example, by using probabilistic signal temporal logic (PSTL) formulation, the efficacy of temporal similarity measures may be examined and captured for predicting correct object classification.
Step 608-5 compares the sequences of generated similarity measures for the observation zm of the detected object to ground truth data for the selected reference object class 110m to determine validation accuracy of the reference observation timing constraint 114m (tstart_m, tend_m). If the validation accuracy for reference observation time constraint 114m (tstart_m, tend_m) satisfies the validating accuracy threshold target, then perform step 608-7 to store the training parameters developed in steps 604, 606, and 608, in memory such as memory 106 of
The reference similarity measure sequences represent expected true positive detections and expected false positive detections within boundary performance characteristics for correct object classification associated with the selected reference object class 110m.
The similarity measure may be an earth mover distance (EMD) measure. For example, an EMD measure represents an amount of work needed to transform one distribution into another distribution when measuring distance in an embedding space between components that belong to the same object type. In alternate embodiments, other distribution-based distances (such as Wasserstein distance, or any other similarity measures such as L1 norm and L2 norm distances) may also be used where the distance between components that belong to the same object type should be a low distance to provide a measure of uncertainty or certainty when verifying object classification from an object detector.
The set of reference object classes 1101 to 110M together with their respective trained parameters may be used in methods and systems for post-processing an object detector output to verify object classifications that reflect true detections or identify object misclassification errors that reflect false detections. For example, the reference component-based descriptors may be (i) formulated such that each object class is encoded into specific components and (ii) converted into probabilistic signal temporal logic for object detection verification. Perception error evaluation and detection using axioms generated with the probabilistic signal temporal logic may be defined from similarity measures such as earth mover distances. The probabilistic signal temporal logic may be used to learn a discriminative pattern in the histogram of false positives vs. true positives for each object class. Probabilistic signal temporal logic may provide axioms, each of which may be constructed with a single or multiple probes having corresponding statistical analyses. The axioms may provide object misclassification error information with an uncertainty measure through perception error evaluation associated with an object detection or recognition, and may be used to weight uncertainties for decision-making commands in autonomous systems.
In one embodiment, the perception error evaluation may be used to generate object classification verification data that reflects a confidence level for object detection. Also, the detected objects may be verified by their components or parts such that if the components or parts which constitute the object exist, then the likelihood of false-positive detection decreases. Accordingly, the context of detected objects with verified object parts may be used to provide an accurate, robust, and verifiable decision-making process such as safe maneuvering with steering and speed control. For example, an autonomous vehicle may respond to a verified pedestrian object detection to actuate a deceleration maneuver and slow down the vehicle. The object component-descriptors of the detected pedestrian object are generated to identify body parts such as hands, head, legs, to validate whether the detection is correct or is an error, and to generate a confidence measure based on recognized object parts in the pedestrian detection. This confidence measure may be used to steer and accelerate safely with higher confidence.
The verification of an object classification associated with detected object may eliminate false positive detection from being sent to the decision-making control command in an autonomous system. Also, the object classification verification data including error detection results may be used to weight object detections with uncertainty information for perception-based decision makings such as steering and speed control. For example, if a pedestrian object is detected and verified with a high degree of confidence then a self-driving car would react accordingly to control the steering and speed control command such as by slowing down to a stop.
Perception system 700 may further include a conventional tracking performance module 710 and a conventional action recognition module 716. Tracking performance module 710 and recognition module 716 are associated with higher order temporal logic 722. Higher order temporal logic 722 may include temporal logic 706, a temporal logic 712, and a temporal logic 718. Temporal logic 706 may be connected to object detection module 704, tracking performance module 710, and action recognition module 716 for generating verified corrected object detection 708. Temporal logic 712 is connected to object detection module 704 and tracking performance module 710 for generating a verified and corrected tracking 714. Temporal logic 718 is connect to object detection module 704 and action recognition module 716 for generating a verified and corrected action recognition 720. For example, suitable temporal logic is illustrated in commonly assigned and co-pending U.S. patent application Ser. No. 17/030,354 entitled System and Method of Perception Error Evaluation and Correction by Solving Optimization Problems Under the Probabilistic Signal Temporal Logic Based Constraints, filed on Sep. 23, 2020 which is hereby incorporated by reference in its entirety.
Verified and corrected object detection 708, verified and corrected tracking 714, and verified and corrected action recognition 720 are each provided to autonomous decision-making system 126.
Computer system 800 may include an address/data bus 802 that is configured to communicate information. Additionally, one or more data processing units, such as a processor 804 (or processors), are coupled with address/data bus 802. Processor 804 is configured to process information and instructions. Processor 804 may be a microprocessor. Alternatively, processor 804 may be a parallel processor, application-specific integrated circuit (ASIC), programmable logic array (PLA), complex programmable logic device (CPLD), or a field programmable gate array (FPGA).
Computer system 800 may be configured to utilize one or more data storage units such as a volatile memory unit 806 (e.g., random access memory (“RAM”) and static RAM, dynamic RAM, etc.) coupled with address/data bus 802. Volatile memory unit 806 may be configured to store information and instructions for processor 804. Also, computer system 800 may include a non-volatile memory unit 808 (e.g., read-only memory (“ROM”), programmable ROM (“PROM”), erasable programmable ROM (“EPROM”), electrically erasable programmable ROM “EEPROM”), flash memory, etc.) coupled with address/data bus 802. Non-volatile memory unit 808 may be configured to store static information and instructions for processor 804. Alternatively, computer system 100 may execute instructions retrieved from an online data storage unit such as in “Cloud” computing.
Computer system 800 may include one or more interfaces are configured to enable computer system 800 to interface with other electronic devices and computer systems. The communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc.) communication technology.
Computer system 800 may include an input device 812 coupled with address/data bus 802. Input device 812 may be configured to communicate information and command selections to processor 804. Input device 812 may be an alphanumeric input device, such as a keyboard, that may include alphanumeric and/or function keys. Computer system 800 may include a cursor control device 814 coupled with address/data bus 802, wherein cursor control device 814 is configured to communicate user input information and/or command selections to processor 804. Cursor control device 814 may be implemented using a device such as a mouse, a track-ball, a track-pad, an optical tracking device, or a touch screen. Cursor control device 814 may be directed and/or activated via input from input device 812, such as in response to the use of special keys and key sequence commands associated with input device 812. Alternatively, cursor control device 814 may be configured to be directed or guided by voice commands.
Computer system 800 further may include one or more optional computer usable data storage devices, such as a storage device 816, coupled with the address/data bus 802. Storage device 816 is configured to store information and/or computer executable instructions. Storage device 816 may be a storage device such as a semiconductor storage device, magnetic storage device, or optical storage device. A display device 818 may be coupled with address/data bus 802. Display device 818 may be configured to display video and/or graphics. Display device 818 may include a cathode ray tube (“CRT”), liquid crystal display (“LCD”), field emission display (“FED”), plasma display, or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
The processes and steps for the example embodiments in
A number of example embodiments have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the devices and methods described herein.
The present application claims priority to U.S. Provisional Application No. 63/220,965 entitled VERIFYING OBJECT CLASSIFICATION USING COMPONENT-BASED DESCRIPTORS AND TEMPORAL SIMILARITY MEASURES, filed on Jul. 12, 2021, the entirety of which is hereby incorporated by reference, and U.S. Provisional Application No. 63/224,216 entitled VERIFYING OBJECT CLASSIFICATION USING COMPONENT-BASED DESCRIPTORS AND TEMPORAL SIMILARITY MEASURES, filed on Jul. 21, 2021, the entirety of which is hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/073573 | 7/8/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63220965 | Jul 2021 | US | |
63224216 | Jul 2021 | US |