This application claims the benefit of priority to Korean Patent Application No. 10-2024-0001033, filed in the Korean Intellectual Property Office on Jan. 3, 2024, the entire contents of which are incorporated herein by reference.
The present disclosure relates to an object detection apparatus for detecting a three-dimensional (3D) object using a light detection and ranging (LiDAR) point cloud and a data augmentation method thereof.
A 3D object detection device may simultaneously estimate bounding box information (e.g., a position (x, y, z), a size (width, length, height), and a heading angle) and a class of an object from a LiDAR point cloud by means of a deep learning-based detection network. The 3D object detection device may train the detection network to improve the performance of the detection network. A labeled bounding box and class ground-truth (GT) may be used to train the detection network. Furthermore, a data augmentation technology may be applied to improve the performance of training the detection network.
The data augmentation technology is a technique for applying various manipulation techniques to original data to increase the amount of data. Data augmentation may be further categorized as global augmentation and object augmentation. The global-augmentation technology may allow data to be processed for each object and/or each class by using an augmentation technique of rotating, reversing, or zooming in and/or out on a scene. The object-augmentation technology may allow data to be processed for each object or each class.
In some implementations, object-augmentation technology may cause probabilistic random augmentation for all objects without considering characteristics of objects. In other words, the object-augmentation technology in some implementations may randomly and probabilistically select which object to perform augmentation on and how much intensity to apply. Thus, data augmentation may be accompanied by a side effect in which data quality is not improved and/or deteriorates. For example, if a high degree of occlusion occurs for many areas of an object due to an obstacle or a preceding vehicle, data quality may deteriorate compared to the original after the original data of the object is augmented.
The present disclosure has been made to solve the above-mentioned problems occurring in some implementations while advantages achieved by those implementations are maintained intact.
An aspect of the present disclosure provides an object detection apparatus for augmenting data with regard to a characteristic of a single object and a data augmentation method thereof.
Another aspect of the present disclosure provides an object detection apparatus for adaptively adjusting an object area and application intensity based on a characteristic of a single object and performing efficient data augmentation to suit an object and dataset situation and a data augmentation method thereof.
The technical problems to be solved by the present disclosure are not limited to the aforementioned problems, and any other technical problems not mentioned herein will be clearly understood from the following description by those skilled in the art to which the present disclosure pertains.
According to one or more example embodiments of the present disclosure, an object detection apparatus may include: a processor configured to: obtain point cloud data including an object; and perform data augmentation of the point cloud data by applying an augmentation technique. The augmentation technique may be determined based on a characteristic of the object. The processor may be further configured to control a vehicle based on the augmented point cloud data.
The processor may be configured to perform the data augmentation by: segmenting the object into a plurality of partitions; determining a density of each of the plurality of segmented partitions; and determining, based on the determined density of each of the plurality of segmented partitions, a valid partition.
The processor may be configured to segment the object by: determining a quantity of partitions and a partition segmentation scheme based on a type of the object and the characteristic of the object.
The processor may be configured to determine the density by: determining the density using a quantity of a plurality of points of the object and a quantity of points which belong to each partition of the plurality of segmented partitions.
The processor may be configured to determine the valid partition by: comparing the determined density of a selected partition, of the plurality of segmented partitions, with a predetermined threshold; and determining, based on the determined density of the selected partition being greater than the predetermined threshold, the selected partition as the valid partition.
The processor may be configured to perform the data augmentation by: determining augmentation application intensity based on at least one of: a distance of the object, or a degree of occlusion of the object.
The processor may be configured to determine the augmentation application intensity by: determining a smallest distance of the object among distances between: an origin with respect to a vehicle coordinate system, and points in the object.
The augmentation technique may include at least one of a dropout technique, a sparse technique, or a noise technique.
The processor may be configured to perform the data augmentation by: determining a final augmentation technique to be applied to the data augmentation based on a selection probability for each predetermined augmentation technique.
The processor may be further configured to: train an object detection model using the augmented point cloud data.
According to one or more example embodiments of the present disclosure, a method may include: obtaining, by one or more processors, point cloud data including an object; and performing data augmentation of the point cloud data by applying an augmentation technique. The augmentation technique may be determined based on a characteristic of the object. The method may further include controlling a vehicle based on the augmented point cloud data.
Performing the data augmentation may include: segmenting the object into a plurality of partitions; determining a density of each of the plurality of segmented partitions; and determining, based on the determined density of each of the plurality of segmented partitions, a valid partition.
Segmenting the object may include: determining a quantity of partitions and a partition segmentation scheme based on a type of the object and the characteristic of the object.
Determining the density may include: determining the density using a quantity of a plurality of points of the object and a quantity of points which belong to each partition of the plurality of segmented partitions.
Determining the valid partition may include: comparing the determined density of a selected partition, of the plurality of segmented partitions, with a predetermined threshold; and determining, based on the determined density of the selected partition being greater than the predetermined threshold, the selected partition as the valid partition.
Performing the data augmentation may include: determining augmentation application intensity based on at least one of: a distance of the object, or a degree of occlusion of the object.
Determining the augmentation application intensity may include: determining a smallest distance of the object among distances between: an origin with respect to a vehicle coordinate system, and points in the object.
The augmentation technique may include at least one of a dropout technique, a sparse technique, or a noise technique.
Performing the data augmentation may include: determining a final augmentation technique to be applied to the data augmentation based on a selection probability for each predetermined augmentation technique.
The method may further include: training an object detection model using the augmented point cloud data.
The above and other objects, features and advantages of the present disclosure will be more apparent from the following detailed description taken in conjunction with the accompanying drawings:
Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the exemplary drawings. In adding the reference numerals to the components of each drawing, it should be noted that the identical component is designated by the identical numerals even when they are displayed on other drawings. In addition, a detailed description of well-known features or functions will be ruled out in order not to unnecessarily obscure the gist of the present disclosure.
In describing components of exemplary embodiments of the present disclosure, the terms first, second, A, B, (a), (b), and the like may be used herein. These terms are only used to distinguish one component from another component, but do not limit the corresponding components irrespective of the order or priority of the corresponding components. Furthermore, unless otherwise defined, all terms including technical and scientific terms used herein have the same meaning as being generally understood by those skilled in the art to which the present disclosure pertains. Such terms as those defined in a generally used dictionary are to be interpreted as having meanings equal to the contextual meanings in the relevant field of art, and are not to be interpreted as having ideal or excessively formal meanings unless clearly defined as having such in the present application.
A single-object-based attribute-aware data augmentation technology for efficiently augmenting data using a characteristic of a single object upon data augmentation for improving object detection training performance is proposed in the specification. Data augmentation may refer to application of one or more data manipulation techniques on an original data set (e.g., original data of a point cloud), such as dropout, swap, mix, sparse, and noise, to obtain a new data set having a different quantity of points (e.g., increased points or decreased points).
An object detection apparatus 100 may be mounted on a vehicle to detect 3D information (or 3D object information) about an object (e.g., a pedestrian, a vehicle, a bus, a truck, and/or the like) around the vehicle. Referring to
The sensors 110 may obtain object information. The object information may be point cloud scene data (or point cloud data) having a labeled bounding box (or object) or the like. The sensors 110 may obtain a point cloud using a light detection and ranging (LiDAR) sensor or the like. The LiDAR sensor may radiate a laser pulse and may measure an arrival time of the laser pulse reflected from an object, thus calculating space position coordinates (or point information) at a reflection point.
The memory 120 may store a dataset which is training data. The training data may include ground truth (GT) data which is experimental information about an object. The memory 120 may store an object detection model (or an object detection network) which is a training target. The memory 120 may store a training model to which a deep learning network is applied. Furthermore, the memory 120 may store a data augmentation algorithm (or a data augmentation model).
The memory 120 may store the information (e.g., the point cloud data) obtained by the sensors 110. Furthermore, the memory 120 may store preset information and input data and/or output data of the processor 130.
The memory 120 may be a non-transitory storage medium which stores instructions executed by the processor 130. It is shown that the memory 120 is located outside the processor 130, but not limited thereto. The memory 120 may be located inside the processor 130. The memory 120 may include at least one of storage media such as a flash memory, a hard disk, a solid state disk (SSD), universal flash storage (UFS), a random access memory (RAM), a static RAM (SRAM), a read only memory (ROM), a programmable ROM (PROM), an electrically erasable and programmable ROM (EEPROM), or an erasable and programmable ROM (EPROM).
The processor 130 may include at least one of processing devices such as an application specific integrated circuit (ASIC), a digital signal processor (DSP), programmable logic devices (PLD), a field programmable gate array (FPGA), a central processing unit (CPU), a microcontroller, or a microprocessor.
The processor 130 may perform data augmentation depending on the data augmentation algorithm. The processor 130 may receive point cloud data as an input. The processor 130 may apply an augmentation technique with regard to a characteristic of a single object to perform the data augmentation. The augmentation technique may be a technique performed based on the single object. The processor 130 may output point cloud data augmented with regard to the characteristic of the single object. The processor 130 may train the object detection model by using the augmented data as training data. The augmented point cloud data may be used to control a vehicle (e.g., detect an object, steer or adjust speed to avoid the object, etc.).
Hereinafter, a description will be given of a process of performing single-object-based attribute-aware data augmentation with regard to the characteristic of the single object.
The processor 130 may receive point cloud data. The processor 130 may receive the point cloud data from the sensors 110. The processor 130 may access the point cloud data from the memory 120. The point cloud data may include at least one labeled object.
The processor 130 may perform partition segmentation for each object included in the point cloud data. The processor 130 may segment a bounding box (i.e., a labeled 3D box) of the object into a plurality of partitions, such as m partitions. The number (e.g., quantity) of partitions and the partition segmentation scheme may be determined based on a type of the object and/or a characteristic of the object.
The processor 130 may calculate density (or point density) for each partition of the object. The processor 130 may calculate a density dpi of an i-th partition using Equation 1 below.
Herein, npi denotes the number of points which belong to the i-th partition and ntot denotes the number of all the points of the object. The number of all the points of the object may be represented as Equation 2 below.
Herein, m denotes the total number (e.g., quantity) of partitions.
The processor 130 may select a valid partition based on the calculated density for each partition of the object. The processor 130 may compare the density dpi of the i-th partition with a predetermined threshold dthr. If it is determined that the density dpi of the i-th partition is less than or equal to the predetermined threshold dthr, the processor 130 may exclude (e.g., except) the i-th partition from an augmentation application target. The processor 130 may determine that a partition with low density does not have sufficient point information to represent a characteristic of the object and may exclude the partition with the low density from the augmentation application target. The processor 130 may define a valid partition set P which remains after completing the task of excluding one or more partitions whose the density is less than or equal to the threshold, as Equation 3 below.
In other words, if it is determined that the density dpi of the i-th partition is greater than the predetermined threshold dthr, the processor 130 may select the i-th partition as the augmentation application target, that is, the valid partition.
The processor 130 may determine augmentation application intensity x based on a distance of the object. The processor 130 may determine a value, having the shortest distance from the origin with respect to a vehicle coordinate system among points in the object, as a distance r of the object. The distance r of the object may be represented as Equation 4 below.
Herein, Q denotes a set of the points in the object, which may be defined as Q={xj,yj,zj}j=1n
Because there is a lack of feature information of an object far from the vehicle as there is a low number of points in original data of the object, deterioration in data quality may occur upon augmentation application at high intensity. The augmentation application intensity o may have a range from minimum 0 to maximum 1 and may be applied to be adaptable for each distance section. The augmentation application intensity according to the distance section may be predetermined by a designer or may be set as a relational expression in the form of a function.
Furthermore, the processor 130 may adjust the determined augmentation application intensity based on an occlusion degree of the object. If the occlusion degree of the object is high, because a point is lost as a specific area of the object is occluded, deterioration in data quality occurs upon augmentation application at high intensity. Any task may be performed at the augmentation application intensity x depending on the occlusion degree using an occlusion attribute which is present in the dataset, and augmentation may be applied to be adaptable. The designer may set the augmentation application intensity according to the occlusion degree of the object as the relational expression in the form of the function. Furthermore, the designer may segment the occlusion degree of the object into a plurality of sections and may set the augmentation application intensity to apply predetermined specific calculation for each section.
The processor 130 may adjust the augmentation application intensity according to the distance of the object based on the occlusion degree of the object. The processor 130 may adjust the determined augmentation application intensity a with regard to the occlusion degree OA of the object based on the distance of the object like Equation 5 below.
The processor 130 may apply a probability-based augmentation technique based on the selected valid partition and the adjusted augmentation application intensity. The probability-based augmentation technique may augment data based on the single object, which may include techniques such as a dropout technique, a sparse technique, and a noise technique. The dropout technique may reflect augmentation application intensity to eliminate a point of a current partition. The sparse technique may reflect augmentation application intensity to make a point density of a current partition sparse to be low. The noise technique may reflect augmentation application intensity to add any noise point to a current partition.
The processor 130 may finally select an augmentation technique to be applied for each partition which belongs to the valid partition set P based on a selection probability for each predetermined augmentation technique. The selection of the augmentation technique through the selection probability may follow a well-known usual method. For example, if selection probabilities of the dropout technique, the sparse technique, and the noise technique are preset to 0.1, 0.15, and 0.2, respectively, the processor 130 may apply the dropout technique to any one partition which belongs to the valid partition set P at the probability of 10%, may apply the sparse technique to the any one partition at the probability of 15%, and may apply the noise technique to the any one partition at the probability of 20%. At this time, several techniques may not be applied to one partition in duplicate. If there is a previously applied technique, a corresponding partition may be skipped. Augmentation application intensity when applying the augmentation technique may be obtained by obtaining previously adjusted augmentation application intensity.
Referring to
In S200, the processor 130 may perform data augmentation with regard to a characteristic of a single object. The processor 130 may perform data augmentation for each object included in the point cloud data.
In S300, the processor 130 may train an object detection network using the augmented data. In other words, the processor 130 may train the object detection network by using the augmented data as training data. When completing the training of pieces of all point cloud data of a dataset, the processor 130 may end the training.
The above-mentioned embodiment is applied in only the training stage and is not applied in an inference stage on an application of the trained network model. Thus, the above-mentioned training method may improve the performance of detection accuracy without affecting a performance time upon actual inference.
In S210, a processor 130 may select a valid partition based on a partition density of an object. The processor 130 may segment a bounding box of the object into a plurality of partitions. The processor 130 may calculate density for each segmented partition and may select a valid partition based on the calculated density.
In S220, the processor 130 may determine augmentation application intensity based on object attribute. The processor 130 may adjust augmentation application intensity to be applied to the valid partition, based on the object attribute.
In S230, the processor 130 may apply a probability-based augmentation technique to perform the data augmentation. The processor 130 may apply the probability-based augmentation technique based on the valid partition selected in S210 and the augmentation application intensity determined in S220. The probability-based augmentation technique may augment data based on a single object. Techniques, such as a dropout technique, a sparse technique, and a noise technique, may be included in the probability-based augmentation technique. The dropout technique may reflect augmentation application intensity to eliminate a point of a current partition. The sparse technique may reflect augmentation application intensity to make a point density of a current partition sparse to be low. The noise technique may reflect augmentation application intensity to add any noise point to a current partition.
The processor 130 may finally select an augmentation technique to be applied for each partition which belongs to a valid partition set P based on a selection probability for each predetermined augmentation technique. The selection of the augmentation technique through the selection probability may follow a well-known usual method. For example, if selection probabilities of the dropout technique, the sparse technique, and the noise technique are preset to 0.1, 0.15, and 0.2, respectively, the processor 130 may apply the dropout technique to any one partition which belongs to the valid partition set P at the probability of 108, may apply the sparse technique to the any one partition at the probability of 15%, and may apply the noise technique to the any one partition at the probability of 20%. At this time, several techniques may not be applied to one partition in duplicate. If there is a previously applied technique, a corresponding partition may be skipped. Augmentation application intensity when applying the augmentation technique may be obtained by reflecting previously adjusted augmentation application intensity.
In S211, a processor 130 may segment an object into a plurality of partitions. The processor 130 may segment a bounding box (i.e., a labeled 3D box) of the object into m partitions. For example, as shown in
In S212, the processor 130 may calculate density for each partition of the object. The processor 130 may calculate a density dpi of an i-th partition using Equation 1 above.
In S213, the processor 130 may select a valid partition based on the calculated density for each partition of the object. The processor 130 may determine the density dpi of the i-th partition is less than or equal to a predetermined threshold dthr. If it is determined that the density dpi of the i-th partition is less than or equal to the predetermined threshold dthr, the processor 130 may except the i-th partition from an augmentation application target. The processor 130 may determine that a partition with low density does not have sufficient point information to represent a characteristic of the object and may except the partition with the low density from the augmentation application target. The processor 130 may define a valid partition set P including a partition which remains after completing the task of excepting the partition, the density of which is less than or equal to the threshold.
In S221, a processor 130 may determine augmentation application intensity a based on a distance of an object. The processor 130 may determine a value, having the shortest distance from the origin with respect to a vehicle coordinate system among points in the object, as a distance r of the object. Because there is a lack of feature information of an object far from a vehicle as there is a low number of feature points in original data of the object, deterioration in data quality may occur upon augmentation application at high intensity. The augmentation application intensity x may have a range from minimum 0 to maximum 1 and may be applied to be adaptable for each distance section. The augmentation application intensity according to the distance section may be predetermined by a designer or may be set as a relational expression in the form of a function.
In S222, the processor 130 may adjust the determined augmentation application intensity x based on an occlusion degree of the object. The processor 130 may adjust the determined augmentation application intensity o based on the distance of the object with regard to an occlusion degree OA of the object. If the occlusion degree of the object is high, because a point is lost as a specific area of the object is occluded, deterioration in data quality occurs upon augmentation application at high intensity. Any task may be performed at the augmentation application intensity x depending on the occlusion degree using an occlusion attribute which is present in a dataset, and augmentation may be applied to be adaptable. The designer may set the augmentation application intensity according to the occlusion degree of the object as the relational expression in the form of the function. Furthermore, the designer may segment the occlusion degree of the object into a plurality of sections and may set the augmentation application intensity to apply predetermined specific calculation for each section.
Referring to
Referring to
Referring to
Furthermore, referring to Table 1 in which the result of being evaluated using an evaluation method used in the field of object detection, mean average precision (mAP), is disclosed, the improvement width of training performance (i.e., the performance of object detection in an object detection network) is small and performance actually deteriorates, if applying an existing object augmentation technology to a baseline to train the object detection network, compared to if training the object detection network using the baseline.
It may be identified that the improvement width of training performance increases and performance degraded if applying the existing object augmentation technology to train the object detection network is improved, if applying an augmentation technology considering an object characteristic according to embodiments disclosed in the specification to the baseline to train the object detection network, compared to if training the object detection network using the baseline.
The processor 1100 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 1300 and/or the storage 1600. The memory 1300 and the storage 1600 may include various types of volatile or non-volatile storage media. For example, the memory 1300 may include a read only memory (ROM) 1310 and a random access memory (RAM) 1320.
Accordingly, the operations of the method or algorithm described in connection with the embodiments disclosed in the specification may be directly implemented with a hardware module, a software module, or a combination of the hardware module and the software module, which is executed by the processor 1100. The software module may reside on a storage medium (that is, the memory 1300 and/or the storage 1600) such as a RAM, a flash memory, a ROM, an EPROM, an EEPROM, a register, a hard disc, a removable disk, and a CD-ROM. The exemplary storage medium may be coupled to the processor 1100. The processor 1100 may read out information from the storage medium and may write information in the storage medium. Alternatively, the storage medium may be integrated with the processor 1100. The processor 110 and the storage medium may reside in an application specific integrated circuit (ASIC). The ASIC may reside within a user terminal. In another case, the processor 1100 and the storage medium may reside in the user terminal as separate components.
Embodiments of the present disclosure may improve training performance through efficient data augmentation even for a dataset with high difficulty.
Hereinabove, although the present disclosure has been described with reference to exemplary embodiments and the accompanying drawings, the present disclosure is not limited thereto, but may be variously modified and altered by those skilled in the art to which the present disclosure pertains without departing from the spirit and scope of the present disclosure claimed following claims. Therefore, embodiments of the present disclosure are not intended to limit the technical spirit of the present disclosure, but provided only for the illustrative purpose. The scope of the present disclosure should be construed on the basis of the accompanying claims, and all the technical ideas within the scope equivalent to the claims should be included in the scope of the present disclosure.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10-2024-0001033 | Jan 2024 | KR | national |