This application is based on and claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2022-0111491, filed on Sep. 2, 2022, No. 10-2022-0135629, filed on Oct. 20, 2022, No. 10-2022-0164963, filed on Nov. 30, 2022 and No. 10-2022-0164964, filed on Nov. 30, 2022, in the Korean Intellectual Property Office, the disclosure of which is incorporated by reference herein in its entirety.
The present disclosure relates to a method and device for annotating a detected object.
Artificial intelligence (AI) refers to a technology for artificially implementing human learning and reasoning abilities by using computer programs. In relation to AI, machine learning refers to learning using a model including a plurality of parameters, to optimize the parameters based on given data.
For AI training, it is necessary to design training data. Designing training data requires data processing, among such processes, annotation refers to generating a bounding box for an object included in image data and inputting necessary information.
Provided are methods and devices for annotating a detected object. Technical objectives of the present disclosure are not limited to the foregoing, and other unmentioned objects or advantages of the present disclosure would be understood from the following description and be more clearly understood from the embodiments of the present disclosure. In addition, it would be appreciated that the objectives and advantages of the present disclosure can be implemented by means provided in the claims and a combination thereof.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments of the disclosure.
According to an aspect of an embodiment, a method of annotating a detected object includes classifying a class of the object, generating a bounding box including a first quadrangle and a second quadrangle both sharing one side, for the object based on the class of the object, and generating an annotation on the object in units of the bounding box.
The method may further include determining attributes of the object, and the generating of the annotation may include generating an annotation regarding the class of the object and the attributes of the object, in units of the bounding box.
In the method, the class of the object may be any one of ‘car’, ‘van’, ‘truck’, ‘two-wheeled vehicle’, ‘pedestrian’, ‘emergency vehicle’, or ‘etc.’.
In the method, in a case where the object is a car designed to transport goods and is loaded with another vehicle, a class of the other vehicle may not be classified.
In the method, the generating of the bounding box may include generating the bounding box by applying different criteria depending on the classified class of the object.
In the method, the first quadrangle may be a rectangle, and the second quadrangle may be a trapezoid.
In the method, the first quadrangle may correspond to a front or a rear of the object, and the second quadrangle may correspond to a left side or a right side of the object.
In the method, in a case where an upper surface of the object is exposed, the first quadrangle or the second quadrangle may include the upper surface of the object.
In the method, in a case where the object is a wheeled vehicle, the second quadrangle may include a line segment connecting wheel-ground points to each other.
In the method, in a case where the class of the object is ‘two-wheeled vehicle’ or ‘pedestrian’, a width of the first quadrangle may be generated to be equal to a width of a shoulder of a person included in the object.
The method may further include, in a case where the detected object is a vehicle and a proportion of the object visible in image data is less than a threshold proportion of a total size of the object, determining not to generate the bounding box.
In the method, the attributes of the object may include visibility, a movement state, a lane position, a major state, a size and a subclass of the object.
The method may further include controlling an ego vehicle based on the generated annotation.
According to an aspect of another embodiment, a device for annotating a detected object includes a memory storing at least one program, and a processor configured to execute the at least one program to classify a class of the object, generate a bounding box including a first quadrangle and a second quadrangle both sharing one side, for the object based on the class of the object, and generate an annotation on the object in units of the bounding box.
According to an aspect of another embodiment, a method of generating an outline of an object includes recognizing at least one object included in a first image obtained while driving, generating a first outline for each recognized object, extracting the recognized object from the first image based on the generated first outline, obtaining first coordinate values for the second outline of the extracted object as a result of inputting the extracted object into a first learning model, and merging the object to which the second outline is applied, with the first image based on the obtained first coordinate values.
The method may further include calculating second coordinate values for a position of the extracted object in the first image, and calculating third coordinate values, which are values for coordinates of the second outline, based on the second coordinate values, and the merging with the first image may include merging the object to which the second outline is applied, with the first image further considering the obtained third coordinate values.
In the method, the first coordinate values may include 7 coordinate values.
In the method, the first coordinate values may include 8 coordinate values.
In the method, the first outline may have a polygonal shape.
In the method, the first outline may have a rectangular shape.
In the method, the first outline may be generated as a result of inputting the first image into a second learning model.
In the method, the second outline may have a shape of a cuboid.
In the method, the second outline may have a quadrangular shape.
In the method, the second outline may have a shape including two polygons.
In the method, any one of the two polygons may represent the front or rear of the object, and the other may represent a side of the object.
In the method, the second outline may have a shape in which two polygons are combined based on at least one common side.
In the method, the second outline may have a shape including a rectangle and a trapezoid.
In the method, a standard of the first outline may include a standard of the second outline.
According to an aspect of another embodiment, a device for generating an outline of an object includes a memory storing at least one program, and a processor configured to execute the at least one program to recognize at least one object included in a first image obtained while driving, generate a first outline for each recognized object, extract the recognized object from the first image based on the generated first outline, obtain first coordinate values for the second outline of the extracted object as a result of inputting the extracted object into a first learning model, and merge the object to which the second outline is applied, with the first image based on the obtained first coordinate values.
According to an aspect of another embodiment, a method of obtaining a cuboid includes recognizing at least one object included in a first image obtained while driving and generating a first outline for each recognized object, extracting first coordinate values constituting the generated first outline from the first image, and obtaining coordinates of a cuboid of the recognized object based on the recognized object and the extracted first coordinate values.
In the method, the first outline may be a bounding box for the recognized object.
In the method, the cuboid may be a sign that visually indicates information about the height, width, and depth of the recognized object.
In the method, the cuboid may include side points indicating a side of the recognized object.
In the method, the side points may include two points vertically arranged.
In the method, the cuboid may include a rear sign indicating the rear of the recognized object.
In the method, the cuboid may include an exposure sign indicating the degree of exposure of the recognized object.
In the method, the cuboid may include at least one of side points indicating a side of the recognized object, a rear sign indicating the rear of the recognized object, and an exposure sign indicating the degree of exposure of the recognized object.
In the method, the obtained cuboid may include a rear sign indicating the rear of the recognized object, and when the direction of the recognized object is a front-only direction, the rear sign may be a line that divides the exact center of the recognized object and extends in the vertical direction.
In the method, the length of the rear sign may equal to the height of the first outline.
In the method, the obtained cuboid may include a rear sign indicating the rear of the recognized object, and when the direction of the recognized object is the rear-only direction, the rear sign may coincide with the first outline.
In the method, the obtained cuboid may include side points indicating a side of the recognized object and a rear sign indicating the rear of the recognized object, and when the direction of the recognized object is a direction in which only the front and a side of the recognized object are visible, the length of the line connecting the side points is equal to the height of the first outline, and the rear sign may be limited to a line that shares at least part of the left or right side of the first outline.
In the method, the obtained cuboid may include side points indicating a side of the recognized object and a rear sign indicating the rear of the recognized object, and when the direction of the recognized object is a direction in which only a side and the rear of the recognized object are visible, the length of the line connecting the side points may be limited to the length of part of the left of right side of the first outline, and the rear sign may be expressed as a quadrangle that shares the right side of the first outline.
In the method, the obtained cuboid may include side points indicating a side of the recognized object and a rear sign indicating the rear of the recognized object, and when the direction of the recognized object is a direction in which only the left or right side of the recognized object is visible, the length of the line connecting the side points is equal to the height of the first outline, and the rear sign may be limited to a line that shares the left or right side of the first outline.
According to an aspect of another embodiment, a device for obtaining a cuboid includes a memory storing at least one program, and a processor configured to execute the at least one program to recognize at least one object included in a first image obtained while driving, generate a first outline for each recognized object, extract first coordinate values constituting the generated first outline from the first image, and obtain coordinates of a cuboid of the recognized object based on the recognized object and the extracted first coordinate values.
According to an aspect of another embodiment, a method of correcting a visual sign includes inputting an image obtained while driving into a learning model and generating a visual sign for the position and direction of an object detected from the image, performing first correction to move the position of the generated visual sign to the inside of an outline of the object, and performing second correction to change at least one of the position and shape of the visual sign moved to the inside of the outline, based on the characteristics of the visual sign.
In the method, the outline may be a bounding box for the recognized object.
In the method, the visual sign may be a cuboid that visually indicates information about the height, width, and depth of the recognized object.
In the method, the visual sign may include side points indicating a side of the recognized object.
In the method, the visual sign may include a rear sign indicating the rear of the recognized object.
In the method, the visual sign may include an exposure sign indicating the degree of exposure of the recognized object.
In the method, the performing of the first correction may include moving the position of the generated visual sign to the inside of the outline of the object and identifying the direction of the recognized object, and the performing of the second correction may include performing the second correction further considering the identified direction of the object.
In the method, the performing of the first correction may include identifying the direction of the recognized object and then moving the position of the generated visual sign to the inside of the outline of the object, and the performing of the second correction may include performing the second correction further considering the identified direction of the object.
In the method, the performing of the first correction may include moving a visual sign located outside the outline among the generated visual sign to the inside of the outline.
In the method, the performing of the first correction may include moving a visual sign located outside the outline among the generated visual sign to a boundary of the outline.
In the method, the performing of the first correction may include moving a visual sign located outside the outline to the closest point among a plurality of points constituting the outline.
In the method, the performing of the second correction may include, when the visual sign is a rear sign, changing the position and shape of the visual sign to a box corresponding to the area of the rear of the recognized object.
In the above method, the performing of the second correction may include, when the visual sign is a rear sign, changing the shape of the visual sign to a line corresponding to the height of the rear of the recognized object.
According to an aspect of another embodiment, a device for correcting a visual sign includes a memory storing at least one program, and a processor configured to execute the at least one program to inputting an image obtained while driving into a learning model and generating a visual sign for the position and direction of an object detected from the image, performing first correction to move the position of the generated visual sign to the inside of an outline of the object, and performing second correction to change at least one of the position and shape of the visual sign moved to the inside of the outline, based on the characteristics of the visual sign.
According to an aspect of another embodiment, provided is a computer-readable recording medium having recorded thereon a program for executing at least one of the methods described above.
The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the present embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein. Accordingly, the embodiments are merely described below, by referring to the figures, to explain aspects. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list.
Advantages and features of the present disclosure and a method for achieving them will be apparent with reference to embodiments of the present disclosure described below together with the attached drawings. The present disclosure may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein, and all changes, equivalents, and substitutes that do not depart from the spirit and technical scope of the present disclosure are encompassed in the present disclosure. These embodiments are provided such that the present disclosure will be thorough and complete, and will fully convey the concept of the present disclosure to those of skill in the art. In describing the present disclosure, detailed explanations of the related art are omitted when it is deemed that they may unnecessarily obscure the gist of the present disclosure.
The terms used in the present application are merely used to describe example embodiments, and are not intended to limit the present disclosure. Singular forms are intended to include plural forms as well, unless the context clearly indicates otherwise. As used herein, terms such as “comprises,” “includes,” or “has” specify the presence of stated features, numbers, stages, operations, components, parts, or a combination thereof, but do not preclude the presence or addition of one or more other features, numbers, stages, operations, components, parts, or a combination thereof.
Some embodiments of the present disclosure may be represented by functional block components and various processing operations. Some or all of the functional blocks may be implemented by any number of hardware and/or software elements that perform particular functions. For example, the functional blocks of the present disclosure may be embodied by at least one microprocessor or by circuit components for a certain function. In addition, for example, the functional blocks of the present disclosure may be implemented by using various programming or scripting languages. The functional blocks may be implemented by using various algorithms executable by one or more processors. Furthermore, the present disclosure may employ known technologies for electronic settings, signal processing, and/or data processing. Terms such as “mechanism”, “element”, “unit”, or “component” are used in a broad sense and are not limited to mechanical or physical components.
In addition, connection lines or connection members between components illustrated in the drawings are merely exemplary of functional connections and/or physical or circuit connections. Various alternative or additional functional connections, physical connections, or circuit connections between components may be present in a practical device.
Hereinafter, the term ‘vehicle’ may refer to all types of transportation instruments with engines that are used to move passengers or goods, such as cars, buses, motorcycles, kick scooters, or trucks.
Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.
Referring to
At least one of the sensors configured to collect the situational information around the autonomous vehicle may have a certain field of view (FoV) as illustrated in
The autonomous driving device may control the movement of the autonomous vehicle 10 by processing information collected by the sensors of the autonomous vehicle 10 in real time, while storing, in a memory device, at least part of the information collected by the sensors.
Referring to
Data collected by the sensors 42 to 45 may be delivered to the processor 46. The processor 46 may store, in the memory system 47, the data collected by the sensors 42 to 45, and control the body control module 48 based on the data collected by the sensors 42 to 45 to determine the movement of the vehicle. The memory system 47 may include two or more memory devices and a system controller configured to control the memory devices. Each of the memory devices may be provided as a single semiconductor chip.
In addition to the system controller of the memory system 47, each of the memory devices included in the memory system 47 may include a memory controller, which may include an artificial intelligence (AI) computation circuit such as a neural network. The memory controller may generate computational data by applying certain weights to data received from the sensors 42 to 45 or the processor 46, and store the computational data in a memory chip.
In the image data 50 according to the embodiment illustrated in
On the other hand, the distance to the preceding vehicle 52 and a movement of the traveling vehicle 53 to change lanes or the like may be significantly important factors in terms of safe driving of the autonomous vehicle. Accordingly, data regarding a region including the preceding vehicle 52 and the traveling vehicle 53 in the image data 50 may have a relatively high importance in terms of the driving of the autonomous vehicle.
A memory device of the autonomous driving device may apply different weights to different regions of the image data 50 received from a sensor, and then store the image data 50. For example, a high weight may be applied to the data regarding the region including the preceding vehicle 52 and the traveling vehicle 53, and a low weight may be applied to the data regarding the region including the front area 51 of the autonomous vehicle and the background 54.
Hereinafter, operations according to various embodiments may be understood as being performed by the autonomous driving device or the processor included in the autonomous driving device.
A device for annotating a detected object according to various embodiments of the present disclosure may be substantially the same as the autonomous driving device, may be included in the autonomous driving device, or may be a component implemented as part of a function performed by the autonomous driving device.
The device for annotating a detected object of the present disclosure may classify the class of a detected object. The device for annotating a detected object of the present disclosure may generate a bounding box for the object based on the classified class of the object. The device for annotating a detected object of the present disclosure may generate an annotation for the object in units of generated bounding boxes. Hereinafter, a method, performed by the device for annotating a detected object, of annotating a detected object will be described in detail.
The device for annotating a detected object of the present disclosure may classify the class of a detected object.
In an embodiment, in order to classify the class of the object, the device for annotating a detected object may receive an input of data. For example, the input data may be image data collected by a camera, but is not limited thereto and may include various types of data such as images, text, or voice.
In an embodiment, the device for annotating a detected object may include a classifier, and the classifier may calculate a probability that the input data is classified as a particular class. Through the probability calculation of the classifier, the device for annotating a detected object may classify the class of an object included in the input data. In addition, the classifier may be trained based on training data, and may be trained such that an error between a result output from the classifier and ground truth.
In the present disclosure, the device for annotating a detected object may classify a detected object as any one of a plurality of classes.
In an embodiment, the plurality of classes may be described with reference to Table 1 below.
Among the plurality of classes in Table 1, ‘Passenger car’ may correspond to vehicles designed to transport a few passengers. For example, sedans, sport utility vehicles (SUVs), taxis, and the like may be classified as passenger vehicles.
Among the plurality of classes in Table 1, ‘Van’ may correspond to vehicles designed to transport a number of passengers. For example, 16-seater cars, buses, and the like may be classified as vans.
Among the plurality of classes in Table 1, ‘Truck’ may correspond to vehicles designed to transport goods. In a case where a vehicle designed to transport goods is loaded with other vehicles including heavy equipment vehicles, the vehicle designed to transport goods and the other vehicles may be regarded as one vehicle and classified as a truck. That is, the device for annotating a detected object does not classify the classes of the other vehicles loaded on the vehicle designed to transport goods.
Among the plurality of classes in Table 1, ‘Two-wheeled vehicle’ may correspond to vehicles that run on two wheels and passengers thereof. For example, motorcycles, scooters, mopeds, bicycles, segways, and the like may be classified as two-wheeled vehicles. Strollers and hand trucks may also be classified as two-wheeled vehicles.
Among the plurality of classes in Table 1, ‘Pedestrian’ may correspond to people traveling on foot.
Among the plurality of classes in Table 1, ‘Emergency vehicle’ refers to vehicles designed for special purposes, and may correspond to vehicles used in emergency situations. For example, vehicles equipped with sirens, such as police cars, fire trucks, or tow trucks, may be classified as emergency vehicles. For example, in a case where a vehicle that may be classified as another class, such as an unmarked police car, is equipped with a siren, the vehicle may be classified as an emergency vehicle only when the siren is operating.
Among the plurality of classes in Table 1, ‘etc.’ may correspond to vehicles that are not classified as the above classes. For example, three-wheel vehicles, forklifts, excavators, rickshaws, snowplows, bulldozers, road rollers, and the like may be classified as ‘etc.’.
However, Table 1 above is provided as an example, and any suitable classification method may be applied.
In the present disclosure, the device for annotating a detected object may apply different annotation methods depending on the class of the object.
The device for annotating a detected object of the present disclosure may generate a bounding box for a detected object based on the classified class of the object.
In an embodiment, the bounding box may include two quadrangles. In detail, the device for annotating a detected object may generate a bounding box including a first quadrangle and a second quadrangle, which share one side, for the detected object based on the class of the object.
A bounding box 400 according to an embodiment of the present disclosure may include a first quadrangle 410 and a second quadrangle 420. By using the bounding box 400 of the form according to the present embodiment, the front, rear, and sides of an object indicated by two-dimensional image data may be specified separately from each other.
In an embodiment, the first quadrangle 410 may have a rectangular shape. That is, the first quadrangle 410 may include four sides that are perpendicular to adjacent line segments. In an embodiment, two sides of the first quadrangle 410 may be horizontal to the ground surface, and the other two sides of the first quadrangle 410 may be perpendicular to the ground surface.
In an embodiment, the second quadrangle 420 may have a trapezoidal shape. That is, two of the four sides of the second quadrangle 420 may be parallel to each other.
In an embodiment, the first quadrangle 410 and the second quadrangle 420 may share one side. In an embodiment, the side shared by the first quadrangle 410 and the second quadrangle 420 may be perpendicular to the ground surface.
In an embodiment, the first quadrangle 410 may correspond to the front or rear of a vehicle. In an embodiment, the second quadrangle 420 may correspond to the right or left side of the vehicle. Meanwhile, in a case where the top of the vehicle is exposed, the first quadrangle 410 or the second quadrangle 420 may be generated to include the top of the vehicle.
In the present disclosure, the bounding box including the first quadrangle 410 and the second quadrangle 420 may be appropriately generated according to various situations.
In an embodiment, the first quadrangle 410 may be generated to exclude side mirrors of the vehicle.
In an embodiment, in a case where the object is a wheeled vehicle, the second quadrangle 420 may be generated to include a line segment connecting wheel-ground points to each other.
In an embodiment, the first quadrangle 410 may be generated to be wider than the front or rear of the vehicle, i.e., to include portions of the sides of the vehicle, but not beyond the wheel-ground points.
In an embodiment, when a door of the vehicle is open, the bounding box 400 may be generated not to include the door.
In an embodiment, in a case where the size or length of equipment on the vehicle changes, for example, the vehicle is a ladder truck or a crane, or in a case where part of the vehicle moves separately, the bounding box 400 may be generated to include only part corresponding to the main body of the vehicle.
In an embodiment, when the vehicle tows another object with wheels, such as a cargo trailer, the bounding box 400 may be generated for each of the vehicle and the towed object. For example, when a tow vehicle tows another vehicle, the tow vehicle may be classified as the emergency vehicle class, the towed vehicle may be classified as the passenger car class, and the bounding box 400 may be generated for each of the tow vehicle and the other vehicle.
In
In the present disclosure, the device for annotating a detected object may generate a bounding box by applying different criteria depending on the classified class of an object.
In an embodiment, in a case where the class of the object is ‘Truck’, ‘Two-wheeled vehicle’, or ‘etc.’, the height of the first quadrangle (i.e., the length of the side perpendicular to the ground surface) and the width of the second quadrangle (i.e., the length of the side horizontal to the ground surface) may be generated to include a cargo (e.g., luggage or freight) loaded on the object. In the present embodiment, the width of the first quadrangle (i.e., the length of the side horizontal to the ground surface) may be generated regardless of the loaded cargo.
Referring to
In an embodiment, in a case where the class of the object is ‘Two-wheeled vehicle’, the width of the first quadrangle may be generated to correspond to a particular dimension of the body of a person included in the object. This is because, in a case where the class of the object is ‘Two-wheeled vehicle’, it is difficult to specify the width of the front or rear, or the width of the front or rear is significantly narrow. For example, the particular dimension of the body of the person may be the width of the shoulder of the person. In detail, in a case where the class of the object is ‘Two-wheeled vehicle’, the width of the first quadrangle may be equal to the width of the shoulder of the person, and the height of the first quadrangle may be equal to the height of the object including the vehicle and the person on the vehicle. Here, the second quadrangle may be generated on the right or left of the first quadrangle to include a side of the object.
Referring to
In an embodiment, in a case where the class of the object is ‘Pedestrian’, the width of the first quadrangle may be generated to correspond to a particular dimension of the body of a person included in the object. This is because, in a case where the class of the object is ‘Pedestrian’, it is difficult to specify the width of the front or rear, or the width of the front or rear is significantly narrow, as in the case of ‘Two-wheeled vehicle’. For example, the particular dimension of the body of the person may be the width of the shoulder of the person. In detail, in a case where the class of the object is ‘Pedestrian’, the width of the first quadrangle may be equal to the width of the shoulder of the person, and the height of the first quadrangle may be equal to the height of the person.
Meanwhile, in an embodiment, in a case where the class of the object is ‘Two-wheeled vehicle’ but there is no person on the object, the width of the first quadrangle 410 may be generated based on a stem or a fork.
In an embodiment, in a case where the class of the object is ‘Pedestrian’, a walking stick, a broom, a handbag, an umbrella, or the like carried by the person may not be included in the bounding box.
In the present disclosure, in a case where only part of an object is included in the image data, the device for annotating a detected object may determine whether to generate a bounding box, according to a condition.
In an embodiment, whether to generate a bounding box may be determined based on the proportion of the object visible in the image data. For example, in a case where the proportion of the vehicle visible in the image data is less than a threshold proportion (e.g., 30%) of the total size of the object, the device for annotating a detected object may determine not to generate a bounding box.
In an embodiment, whether to generate a bounding box may be determined based on the proportion of part of the object visible in the image data. For example, in a case where the proportion of a wheel of the vehicle visible in the image data is less than 50% of the size of the wheel of the vehicle, the device for annotating a detected object may determine not to generate a bounding box.
In an embodiment, whether to generate a bounding box may be determined based on the area occupied by a particular object in the image data. For example, in a case where an object classified as ‘Van’, ‘Truck’, or ‘etc.’ occupies more than 150 px in the image data, the device for annotating a detected object may determine to generate a bounding box.
In the present disclosure, by determining whether to generate a bounding box, applying a method of generating a bounding box, and the like differently according to various conditions, annotation for a detected object and safe and effective driving control of an ego vehicle may be achieved based on the annotation.
The device for annotating a detected object of the present disclosure may generate an annotation on an object in units of bounding boxes.
In an embodiment, the device for annotating a detected object may determine an attribute of an object in units of bounding boxes. In the present disclosure, an attribute of an object may refer to the nature of the object in its relationship with an ego vehicle, which is independent of the class of the object.
In an embodiment, attributes of an object that may be determined may be described with reference to Table 2 below.
In an embodiment, attributes of an object may include visibility of the object. In the present disclosure, the visibility may refer to an attribute related to the proportion of part of the object visible in the image data with respect to the entire object.
The visibility may be determined according to the proportion of part of the object visible in the image data. A criterion according to any suitable manner may be applied to determination of the visibility. For example, when the proportion of the object visible is at least 0% but less than 20%, the visibility may be determined as ‘Level 1’, when the proportion of the object visible is at least 20% but less than 40%, the visibility may be determined as ‘Level 2’, when the proportion of the object visible is at least 40% but less than 60%, the visibility may be determined as ‘Level 3’, when the proportion of the object visible is at least 60% but less than 80%, the visibility may be determined as ‘Level 4’, when the proportion of the object visible is at least 80% but less than 100%, the visibility may be determined as ‘Level 5’, and when the entire object is visible, the visibility may be determined as ‘Level 6’.
In an embodiment, attributes of an object may include a movement state of the object. In the present disclosure, the movement state may refer to an attribute related to a movement of the object.
The movement state may be determined according to whether the object is moving. A criterion according to any suitable manner may be applied to determination of the movement state. For example, when the object is moving, the movement state may be determined as ‘Moving’, when the object is stationary but movable, the movement state may be determined as ‘Stopped’, and when the object is stationary and is not intended to move, the movement state may be determined as ‘Parked’. In an embodiment, when it is detected that an object includes a person, determination of the movement state of the object may be limited to the states other than ‘Parked’.
Depending on the movement state of the object, the ego vehicle may vary route settings or driving control. For example, when the movement state of an object in front is ‘Stopped’, the ego vehicle may be controlled to wait for a certain period of time, but when the movement state of the object in front is ‘Parked’, the ego vehicle may be controlled to move to avoid the object in front.
In an embodiment, attributes of an object may include a lane position of the object. In the present disclosure, the lane position may refer to information or an attribute related to a positional relationship with the ego vehicle.
In an embodiment, the lane position may be determined based on the relative position with the ego vehicle in units of lanes. For example, when the object is in the same lane as the ego vehicle, the lane position of the object may be ‘0’. For example, when the object is in any one of lanes on the right of the front of the ego vehicle, the lane position of the object may be a positive integer that is directly proportional to the distance between the lane of the ego vehicle and the lane of the object. For example, when the object is in the first lane on the right of the front of the ego vehicle (i.e., the closest lane among the lanes on the right), the lane position of the object may be ‘+1’, when the object is in the second lane on the right, the lane position of the object may be ‘+2’, and when the object is in the third lane on the right, the lane position of the object may be ‘+3’. On the contrary, when the object is in the first lane on the left of the front of the ego vehicle (i.e., the closest lane among the lanes on the left), the lane position of the object may be ‘−1’, when the object is in the second lane on the left, the lane position of the object may be ‘−2’, and when the object is in the third lane on the left, the lane position of the object may be ‘−3’.
In an embodiment, attributes of an object may include a major state of the object. In the present disclosure, the major state may refer to an attribute regarding whether the object affects driving of the ego vehicle.
For example, when the object is directly in front of the ego vehicle and may directly affect the driving of the ego vehicle, the major state may be ‘Caution’. For example, when the object does not affect the driving of the ego vehicle at all, the major state may be ‘No attention required’. For example, when there is a possibility that the object will affect the driving of the ego vehicle due to, for example, a sudden lane change, the major state may be ‘Attention required’. For example, when the ego vehicle is in a lane adjacent to a sidewalk, the major state of a pedestrian on the sidewalk may be ‘Attention required’.
In addition, median barriers, tubular markers, and the like may be considered in determination of the major state of an object. For example, even when the object is a vehicle driving next to the ego vehicle in an adjacent lane, the major state of the object may be ‘No attention required’.
By determining the major state of an object, it is possible to differentiate the degrees of dependence of driving control on one or more detected objects. For example, an object whose major state is ‘No attention required’ may have no effect on driving control. For example, an object whose major state is ‘Attention required’ may affect driving control, and the ego vehicle may be controlled to respond sensitively to driving of the object (e.g., the speed or a lane change).
In an embodiment, attributes of an object may include the size of the object.
For example, the size of an object may be determined based on its relative size to the ego vehicle. For example, the size of an object may be determined based on size classification according to predetermined vehicle types. For example, the size of an object may be any one of ‘Small’, ‘Medium’, or ‘Large’.
In an embodiment, attributes of an object may include a subclass of the object. In the present disclosure, a subclass of an object is a secondary class included in the class of the object, and may refer to an attribute for subdividing and distinguishing a particular class.
For example, in a case where the class of an object is ‘Two-wheeled vehicle’, subclasses of the object may include Rear_car, Electric_cart, Hand_truck, Three_wheels, and the like. For example, in a case where the class of an object is ‘Pedestrian’, subclasses of the object may include ‘Standing’, ‘Carrying a load’, ‘Accompanying an animal’, ‘Holding an umbrella’, and the like.
In the present disclosure, the device for annotating a detected object may generate an annotation by storing the class and attributes of an object in units of bounding boxes.
In the present disclosure, as attributes of an object are determined, a degree of risk or the like of the object based on a relationship with the ego vehicle may be evaluated. In addition, Table 2 above is provided as an example, and any appropriate attributes of an object may be assigned.
An object annotated by the device for annotating a detected object of the present disclosure may be used as a basis for controlling driving of an ego vehicle, and may also be used for training of an autonomous driving device.
Operations illustrated in
In operation 610, the device for annotating a detected object may classify the class of an object.
In an embodiment, the class of the object may be any one of ‘Passenger car’, ‘Van’, ‘Truck’, ‘Two-wheeled vehicle’, ‘Pedestrian’, ‘Emergency vehicle’, or ‘etc.’.
In an embodiment, in a case where the object is a vehicle designed to transport goods and is loaded with another vehicle, the device for annotating a detected object may not classify the class of the loaded vehicle.
In operation 620, the device for annotating a detected object may generate a bounding box including a first quadrangle and a second quadrangle, which share one side, for the object based on the class of the object.
In an embodiment, operation 620 may be performed by applying different criteria depending on the classified class of the object.
In an embodiment, the first quadrangle may be a rectangle.
In an embodiment, the first quadrangle may correspond to the front or rear of the object.
In an embodiment, the second quadrangle may be a trapezoid.
In an embodiment, the second quadrangle may correspond to the left or right side of the object.
In an embodiment, when the top of the object is exposed, the first quadrangle or the second quadrangle may include the top of the object.
In an embodiment, in a case where the object is a wheeled vehicle, the second quadrangle may include a line segment connecting wheel-ground points to each other.
In an embodiment, in a case where the class of the object is ‘Two-wheeled vehicle’ or ‘Pedestrian’, the width of the first quadrangle may be generated to be equal to the width of the shoulder of a person included in the object.
In operation 630, the device for annotating a detected object may generate an annotation on the object in units of bounding boxes.
In an embodiment, the device for annotating a detected object may determine attributes of the object, and operation 630 may include generating an annotation on the class of the object and the attributes of the object in units of bounding boxes.
In an embodiment, the attributes of the object may include visibility, a movement state, a lane position, a major state, a size, and a subclass of the object.
In an embodiment, in a case where the detected object is a vehicle and the proportion of the object visible in the image data is less than a threshold proportion of the size of the entire object, the device for annotating a detected object may determine not to generate a bounding box.
In an embodiment, the device for annotating a detected object may control the ego vehicle based on the generated annotation.
Referring to
The communication unit 710 may include one or more components for performing wired/wireless communication with an external server or an external device. For example, the communication unit 710 may include at least one of a short-range communication unit (not shown), a mobile communication unit (not shown), and a broadcast receiver (not shown).
The DB 730 is hardware for storing various pieces of data processed by the device 700 for annotating a detected object, and may store a program for the processor 720 to perform processing and control. The DB 730 may store payment information, user information, and the like.
The DB 730 may include random-access memory (RAM) such as dynamic RAM (DRAM) or static RAM (SRAM), read-only memory (ROM), electrically erasable programmable ROM (EEPROM), a compact disc-ROM (CD-ROM), a Blu-ray or other optical disk storage, a hard disk drive (HDD), a solid-state drive (SSD), or flash memory.
The processor 720 controls the overall operation of the device 700 for annotating a detected object. For example, the processor 720 may execute programs stored in the DB 730 to control the overall operation of an input unit (not shown), a display (not shown), the communication unit 710, the DB 730, and the like. The processor 720 may execute programs stored in the DB 730 to control the operation of the device 700 for annotating a detected object.
The processor 720 may control at least some of the operations of the device 700 for annotating a detected object described above with reference to
The processor 720 may be implemented by using at least one of application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, and other electrical units for performing functions.
In an embodiment, the device 700 for annotating a detected object may be a mobile electronic device. For example, the device 700 for annotating a detected object may be implemented as a smart phone, a tablet personal computer (PC), a PC, a smart television (TV), a personal digital assistant (PDA), a laptop computer, a media player, a navigation system, a camera-equipped device, and other mobile electronic devices. In addition, the device 700 for annotating a detected object may be implemented as a wearable device having a communication function and a data processing function, such as a watch, glasses, a hair band, a ring, or the like.
In another embodiment, the device 700 for annotating a detected object may be an electronic device embedded in a vehicle. For example, the device 700 for annotating a detected object may be an electronic device that is manufactured and then inserted into a vehicle through tuning.
In another embodiment, the device 700 for annotating a detected object may be a server located outside a vehicle. The server may be implemented as a computer device or a plurality of computer devices that provide a command, code, a file, content, a service, and the like by performing communication through a network. The server may receive data necessary for determining a movement path of a vehicle from devices mounted on the vehicle, and determine the movement path of the vehicle based on the received data.
In another embodiment, a process performed by the device 700 for annotating a detected object may be performed by at least some of a mobile electronic device, an electronic device embedded in a vehicle, and a server located outside a vehicle.
The camera may be mounted on the vehicle to photograph the outside of the vehicle. The camera may photograph front, side, and rear areas around the vehicle. An object outline generating device according to the present disclosure may obtain a plurality of images captured by the camera. The plurality of images captured by the camera may include a plurality of objects.
Information about an object includes object type information and object attribute information. Here, the object type information is index information indicating the type of object, and includes a group indicating a supercategory, and a class indicating a subcategory. In addition, the object attribute information indicates attribute information about the current state of an object, and includes movement information, rotation information, traffic information, color information, visibility information, and the like.
In an embodiment, groups and classes included in object type information may be as shown in Table 3 below, but are not limited thereto.
The movement information represents a movement of an object, and may be defined as ‘Stopped’, ‘Parked’, ‘Moving’, or the like. Object attribute information of a vehicle may be determined as ‘Stopped’, ‘Parked’, or ‘Moving’, object attribute information of a pedestrian may be determined as ‘Moving’, ‘Stopped’, or ‘Unknown’, and object attribute information of an immovable object, such as a traffic light, may be determined as ‘Stopped’, which is a default.
Rotation information represents the rotation of an object, and may be defined as ‘Forward’, ‘Backward’, ‘Horizontal’, ‘Vertical’, ‘Lateral’, or the like. Object attribute information of a vehicle may be determined as ‘Front’, ‘Rear’, or ‘Side’, and object attribute information of a horizontal or vertical traffic light may be determined as ‘Horizontal’ or ‘Vertical’.
Traffic information is traffic-related information of an object, and may be defined as ‘Instruction’, ‘Caution’, ‘Regulation’, ‘Auxiliary sign’, or the like of a traffic sign. Color information is information about the color of an object, and may represent the color of an object, a traffic light, or a traffic sign.
Referring to
Using all images to determine which object is the same in the images causes significant increases in the amount of data transmission and the amount of computation. Accordingly, it is difficult to perform processing through edge computing on an apparatus mounted on a vehicle, and it is also difficult to perform real-time analysis.
Referring to
The object outline generating device according to the present disclosure may obtain a plurality of frames by dividing a video obtained from a camera into frames. The plurality of frames may include a previous frame 910 and a current frame 920. The object outline generating device may primarily analyze a frame to recognize an object included in the frame, and secondarily generate a bounding box corresponding to the recognized object, as described above with reference to
The object outline generating device may recognize a first pedestrian object 911 in the previous frame 910.
In an embodiment, the object outline generating device may divide a frame into grids having the same size, predict the number of bounding boxes designated in a predefined shape around the center of each grid, and calculate a confidence of the object based on a result of the predicting. The object outline generating device may determine whether an object is included in the frame or only a background is included in the frame, select a position having a high object confidence, and determine an object category, thereby recognizing the object. However, the method of recognizing an object in the present disclosure is not limited thereto.
The object outline generating device may obtain first position information of the first pedestrian object 911 recognized in the previous frame 910. As described above with reference to
In addition, the object outline generating device may obtain second position information of a second pedestrian object 921 recognized in the current frame 920.
The object outline generating device may calculate a similarity between the first position information of the first pedestrian object 911 recognized in the previous frame 910, and the second position information of the second pedestrian object 921 recognized in the current frame 920.
Referring to
However, the method of determining identity between objects is not limited to the above method.
Here, the learning model may be a deep learning-based model, but is not limited thereto. The learning model is a model trained to receive an image as input data, recognize objects based on particular points of the objects included in the image, and generate outlines respectively surrounding the entire objects, and as a target frame of an image captured by the autonomous vehicle is input as test data for the learning model that has been completely trained, outlines may be generated on the objects included in the target frame.
For example, an outline generated around an object may be a polygon. As another example, an outline generated around an object may have a rectangular shape as illustrated in
Information for generating an outline may be expressed as coordinates or coordinate values. A rectangular outline as illustrated in
According to an embodiment, an outline of an object recognized in a target frame may be determined by a user input. In detail, in a case where an image captured by an autonomous vehicle includes a target frame in which an object is recognized, the object outline generating device may selectively extract only the target frame and output it to a user terminal used by a user, and when the user recognizes the object and inputs an input for generating an outline, the outline may be confirmed based on the input of the user.
Hereinafter, the term ‘first outline’ is considered to refer to an outline generated around an object to extract the object immediately after the object is recognized in a target frame, as illustrated in
Based on the first outline 1010 of the motorcycle and the first outline 1030 of the passenger car of
An extracted first object 1110 and an extracted second object 1130 have a lower resolution than the total resolution of the target frame, and by recognizing characteristics of an object through an automated process, dummy data that reduces accuracy may be minimized. The extracted first object 1110 and the extracted second object 1130 may be input to a learning model configured to output second outlines of a first object and a second object as result data. The term ‘second outline’ is a different concept from the term ‘first outline’ described above, and will be described below.
In the process of extracting the motorcycle and the passenger car from the target frame, the object outline generating device may identify and store the size (e.g., the width and height), absolute position information, and relative position information in the target frame, regarding the motorcycle and the passenger car, based on the coordinate values of the first outlines of the motorcycle and passenger car. Examples of utilization of coordinate values of first outlines, the size of an object, the position information of the object, and the relative position information in a target frame will be described below.
In more detail,
The object outline generating device may input the extracted first object 1110 and the extracted second object 1130 into a learning model, and control the learning model to generate second outlines for the motorcycle and the passenger car as illustrated in
Here, the learning model is a different model from the above-described learning model configured to generate a first outline, and for example, may be a deep learning model configured to receive an image of an extracted object as an input, and output a second outline for the object as output data. According to an embodiment, the learning model configured to generate a second outline may be a model using a machine learning technique other than deep learning.
Here, the term ‘second outline’ refers to a type of metadata that compressively expresses three-dimensional information of an object, considering that the object expressed in two dimensions is actually expressed as three-dimensional information. The autonomous vehicle may perform calculation of the distance to and perspective of an object that is gradually approaching or moving away from the autonomous vehicle, by recognizing the object surrounded by a second outline. In the present disclosure, a second outline may function as reference data for an autonomous vehicle to perform calculation of the distances to and perspectives of various objects as described above. The second outline also surrounds the object like the first outline, but includes more coordinates (or coordinate values) due to the characteristic of compressively expressing three-dimensional information.
Hereinafter, various embodiments of coordinate characteristics of the second outline and morphological characteristics of the second outline will be described.
For example, coordinate values of the second outline may include 7 coordinate values. In
As another example, the second outline may include 8 coordinate values. Although not illustrated in
The shape of the second outline may be a shape including two polygons. In particular, in the present embodiment, any one of the two polygons may represent the front or rear of the object, and the other polygon may represent a side of the object.
Table 4 shows an example of interpretations of object movements in a case where the shape of a second outline is a combination of two polygons in the horizontal direction based on one common side. The learning model has been trained based training data to receive an image of an object surrounded by a first outline as an input, and indicate the front and rear of the object as rectangles, and the left and right sides of the object as trapezoids whose parallel sides have different lengths.
Referring to
As an embodiment of the present disclosure, it has already been described above that the shape of the second outline may be a cuboid, and in this case, a total of 8 coordinate values are required.
Although not listed in Table 4, in a case where only the front or rear of the object is included in an image captured by a camera mounted on the autonomous vehicle, or only a side of the object is included and both the front and the rear of the object is not included in the image, the second outline may have a quadrangular shape. That is, a second outline of an object whose only front or rear is included in the image may have a rectangular shape, and a second outline of an object whose only side is included but both front and rear are not included in the image may have the shape of a simple quadrangle that does not include any right angle.
In addition, the second outline may be a shape of a combination of two polygons based on at least one common side, which has been schematically described with reference to the second outline 1210 of the motorcycle and the second outline 1230 of the passenger car of
The specifications of the second outline may be included in the specifications of the first outline. In detail, because the learning model is trained to generate a second outline based on an image whose resolution is limited to a first outline, the specifications of the second outline may be automatically adjusted to be less than or equal to the specifications of the first outline.
As illustrated in
In the present disclosure, coordinate values for generating a second outline of an extracted object may be referred to as simply first coordinate values, coordinate values of the center of the object in an original image to refer to the extracted object may be referred to as simply second coordinate values, and coordinate values for generating a second outline of the object in the original image may be referred to as simply third coordinate values. After the first coordinate values for forming the second outline are determined, the object outline generating device may identify the position of the object in the original image through the second coordinate values and perform processing such that a second outline is generated around the second coordinate values in the original image, thereby merging the object to which the second outline is applied, with the original image.
According to the present disclosure, accurate annotation may be performed on each object through a process of primarily clearly setting a first outline for the object, then extracting the object to perform training to generate a second outline individually, and finally merging the second outline. That is, compared to a related-art method of three-dimensionally annotating an entire target frame at once, according to the present disclosure, the accuracy of annotation may be significantly increased.
In addition,
The method of
First, the object outline generating device may recognize an object in a first image and perform processing through a learning model such that a first outline is generated for each object (S1410). The first outline may have a polygonal shape.
Next, the object outline generating device may extract the object from the first image based on the first outline (S1430). In more detail, the extracted object may be an image limited to the size area of the first outline, with the first outline as a boundary line.
The object outline generating device may obtain first coordinate values of a second outline for the object (S1450). According to an embodiment, the first coordinate values may include 7 coordinate values or 8 coordinate values.
The object outline generating device may merge the object to which the second outline is applied, with the first image based on the first coordinate values (S1470). As an alternative embodiment of operation S1470, the object outline generating device may calculate second coordinate values for a position of the extracted object in the original image (or the target frame) and calculate third coordinate values, which are values for coordinates of the second outline, based on the second coordinate values, and may implement the effect of merging the object to which the second outline is applied, with the original image considering both the first coordinate values and the third coordinate values.
Referring to
The communication unit 1510 may include one or more components for performing wired/wireless communication with an external server or an external device. For example, the communication unit 1510 may include at least one of a short-range communication unit (not shown), a mobile communication unit (not shown), and a broadcast receiver (not shown).
The DB 1530 is hardware for storing various pieces of data processed by the object outline generating device 1500, and may store a program for the processor 1520 to perform processing and control.
The DB 1530 may include RAM such as DRAM or SRAM, ROM, EEPROM, a CD-ROM, a Blu-ray or other optical disk storage, an HDD, an SSD, or flash memory.
The processor 1520 controls the overall operation of the object outline generating device 1500. For example, the processor 1520 may execute programs stored in the DB 1530 to control the overall operation of an input unit (not shown), a display (not shown), the communication unit 1510, the DB 1530, and the like. The processor 1520 may execute programs stored in the DB 1530 to control the operation of the object outline generating device 1500.
The processor 1520 may control at least some of the operations of the object outline generating device 1500 described above with reference to
For example, as described above with reference to
The processor 1520 may be implemented by using at least one of ASICs, DSPs, DSPDs, PLDs, FPGAs, controllers, microcontrollers, microprocessors, and other electrical units for performing functions.
Hereinafter, the method of obtaining a cuboid of an object in an image according to the present disclosure will be referred to as simply a cuboid obtaining method, and a device for obtaining a cuboid of an object in an image according to the present disclosure will be referred to as simply a cuboid obtaining device.
First,
An object recognized in the image of
In general, when an image is input into the learning model, cuboid coordinates of objects included in the image may be obtained as illustrated in
Meanwhile, according to the cuboid obtaining device according to the present disclosure, a cuboid as illustrated in
Hereinafter, the definitions of side points, exposure sign, and rear sign included in a cuboid in the present disclosure will be described in detail.
Side points may represent a boundary between the front and a side of an object recognized in an image. Here, the term ‘side’ refers to a side of an object that may be identified through an image, and thus may refer to any one of the left side or the right side of the object. Side points include upper and lower points and a connection line connecting the two points, and the upper and lower points have the same x-coordinate and different y-coordinates. For example, the coordinates of the two points of the side points may be (x1, y1) and (x1, y2), respectively. The connection line may be determined immediately after the two points forming the side points are determined. Because the side points represent the boundary between the front and the side of the object, the length of the connection line of the side points may represent the height of the front of the object.
An exposure sign (lateral sign) may indicate the degree of exposure of an object recognized in an image, and may be expressed as a single large dot in the image, as illustrated in
A rear sign is an indicator of the position of the rear of an object in an image. According to an embodiment, the rear sign may be in the form of a line or a box. Changes in the shape of the rear sign will be described below with reference to
Referring to
First, the image data 1710 is raw data of an image obtained by a camera installed in the autonomous vehicle while the vehicle is driving, and objects (e.g., people, two-wheeled vehicles, four-wheeled vehicles, other moving objects, etc.) included in the image may be recognized by the model training unit 1740. In the present disclosure, the method of recognizing an object in an image is not limited to a particular method, and thus, in addition to currently known methods, object recognition methods that will soon be developed may be used.
First outline data 1721 refers to information about an outline generated for each object recognized as an object in the image. In detail, information about the outline 1605A in
Cuboid data 1722 refers to information about a cuboid obtained for each object recognized as an object in the image. For example, when a first object and a second object are included in the image, the cuboid data 1722 may include coordinates for a cuboid of the first object and coordinates for a cuboid of the second object. Here, the cuboids may each include side points, a rear sign, and an exposure sign, which are described above, and are obtained without considering the first outline data 1721, and thus, some cuboids are inaccurate.
In the present disclosure, when preprocessed image data is input into the learning model, objects are recognized in the image data, a certain number (e.g., 13) of values for each object are extracted as predicted values, then the first outline data 1721 is calculated by using some of the predicted values, the cuboid data 1722 is calculated in parallel by using the remaining predicted values, and the cuboid data 1722 is corrected to be dependent on the first outline data 1721. That is, the first outline data 1721 may function as a kind of boundary for converting the cuboid data 1722 into final cuboid coordinates.
When the first outline is a bounding box, the minimum number of pairs of coordinates for forming the first outline may be four. For example, the minimum coordinates of the first outline may be (x1, y1), (x1, y5), (x4, y1), and (x4, y5). As described above, when the coordinates of the first outline are determined, coordinate values for forming the first outline may be x1, y1, x5, and y5, and all coordinates that coincide with at least one of x-axis or y-axis coordinates of the coordinates of the first outline may be regarded as coordinates of the first outline. For example, (x3, y1) or (x1, y3) may be regarded as coordinates of the first outline. When the number of objects recognized in the image is n, the number of first outlines may also be n.
The preprocessor 1730 may preprocess the image data 1710. For example, the preprocessor 1730 may perform an embedding function such that the image data 1710 may be input into and learned by the learning model of the model training unit 1740 without an error, and may also perform a function of checking the integrity of the image data 1710. According to an embodiment, the preprocessor 1730 may be implemented to be integrated into the model training unit 1740.
The model training unit 1740 may receive the preprocessed image data 1710 from the preprocessor 1730 to perform training to obtain a cuboid of an object included in an image.
As illustrated in
In particular, in response to obtaining coordinates of side points, an exposure sign, and a rear sign for the object after obtaining the coordinates of the first outline of the object, the learning model of the cuboid obtaining device 1700 may obtain cuboid coordinates of the object by correcting the coordinates of the obtained side points, exposure sign, and rear sign considering the coordinate values of the first outline.
For example, when coordinates of two points in the vertical direction forming side points are obtained, the cuboid obtaining device 1700 may correct the y-coordinate values of the coordinates of the two points in the vertical direction forming the side points, considering the coordinate values of the first outline. In more detail, the cuboid obtaining device 1700 may determine the direction of the recognized object, then obtain the coordinates of the two points in the vertical direction forming the side points, and then correct the y-coordinate values of the coordinates of the two points to match the height of the first outline.
The direction of the recognized object may be any one of a total of eight directions. For example, the direction of the recognized object may be any one of a front-left direction, a front direction, a front-right direction, a left-only direction, a right-only direction, a rear-left direction, a rear direction, and a rear-right direction. When the direction of the object recognized in the image is determined, the cuboid obtaining device 1700 performs control such that an accurate cuboid may be obtained by correcting the coordinates of the side points, the exposure sign, and the rear sign obtained as the cuboid of the object, according to a preset formula for each determined direction. The correction method applied differently for each of the eight directions of objects will be described below with reference to
In
The cuboid obtaining device may primarily analyze the object to detect that the object is in the front-left direction, and then secondarily obtain coordinates for the side points 1810 and the rear sign 1830 as a cuboid for the object. According to an embodiment, the cuboid obtaining device may primarily obtain the coordinates for the side points 1810 and the rear sign 1830 for the object and then secondarily detect the direction of the object.
After detecting that the direction of the object is the front-left direction, the cuboid obtaining device may correct y-coordinates of two points in the vertical direction constituting the side points 1810 to the closest y-coordinates of the first outline 1800. For example, when the first outline 1800 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 1810 are (x4, y4), the cuboid obtaining device may correct the coordinates of the upper point of the side points 1810 to (x4, y5). As another example, when the first outline 1800 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 1810 are (x4, y7), the cuboid obtaining device may correct the coordinates of the upper point of the side points 1810 to (x4, y5). That is, in the present disclosure, when the coordinates of a point before correction are outside or inside the first outline 1800, the coordinates of the point are corrected to the coordinates of a point forming the first outline 1800. Here, it would be understood by those of skill in the art that the y-coordinates of points between the coordinates (x1, y5) and (x10, y5) on the first outline 1800, for example, (x2, y5), (x3, y5), and (x4, y5), are maintained and only the x-coordinates of the points may be changed to values between x1 and x10.
In addition, as another example, after detecting that the direction of the object is the front-left direction, the cuboid obtaining device may perform a training process such that a line connecting the two points in the vertical direction constituting the rear sign 1830 is limited to lines that share at least part of the left or right side of the first outline 1800. In detail, the cuboid obtaining device may calculate the height of the rear of the object considering the first outline 1800, the overall outline of the recognized object, and the direction of the detected object (i.e., the front-left direction), and set the length of the rear sign 1830 to be equal to the calculated height. Here, because the rear of the object is not observed in the front-left direction, the rear sign 1830 may be expressed as a line sharing at least part of the left or right side of the first outline. Referring to
In
The cuboid obtaining device may primarily analyze the object to detect that the object is in the front-only direction, and then secondarily obtain coordinates for the side points 1910 and the rear sign 1930 as a cuboid for the object. According to an embodiment, the cuboid obtaining device may primarily obtain the coordinates for the side points 1910 and the rear sign 1930 for the object and then secondarily detect the direction of the object.
After detecting that the direction of the object is the front direction, the cuboid obtaining device may set two points in the vertical direction constituting the side points 1910 as points located at the exact center between the upper and lower ends of the first outline 1900. Thereafter, a connection line of the side points 1910 may be automatically set.
In addition, after detecting that the direction of the object is the front direction, the cuboid obtaining device may obtain, as the rear sign 1930, a line that passes through the exact center of the recognized object and extends in the vertical direction. The rear sign 1930 is a sign indicating the rear of an object, and thus is not indicated when the object is in the front-only direction according to the above description, however, in the present disclosure, in a situation where the sides and rear of the object are not observed, the rear sign 1930 may be indicated as a line located in the exact center of the front of the object as illustrated in
Referring to
In
The cuboid obtaining device may primarily analyze the object to detect that the object is in the front-right direction, and then secondarily obtain coordinates for the side points 2010 and the rear sign 2030 as a cuboid for the object. According to an embodiment, the cuboid obtaining device may primarily obtain the coordinates for the side points 2010 and the rear sign 2030 for the object and then secondarily detect the direction of the object.
After detecting that the direction of the object is the front-right direction, the cuboid obtaining device may correct y-coordinates of two points in the vertical direction constituting the side points 2010 to the closest y-coordinates of the first outline 2000. For example, when the first outline 2000 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2010 are (x4, y4), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2010 to (x4, y5). As another example, when the first outline 2000 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2010 are (x4, y7), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2010 to (x4, y5). That is, in the present disclosure, when the coordinates of a point before correction are outside or inside the first outline 2000, the coordinates of the point are corrected to the coordinates of a point forming the first outline 2000. Here, it would be understood by those of skill in the art that the y-coordinates of points between the coordinates (x1, y5) and (x10, y5) on the first outline 2000, for example, (x2, y5), (x3, y5), and (x4, y5), are maintained and only the x-coordinates of the points may be changed to values between x1 and x10.
In addition, as another example, after detecting that the direction of the object is the front-right direction, the cuboid obtaining device may perform a training process such that a line connecting the two points in the vertical direction constituting the rear sign 2030 is limited to lines that share at least part of the left or right side of the first outline 2000. In detail, the cuboid obtaining device may calculate the height of the rear of the object considering the first outline 2000, the overall outline of the recognized object, and the direction of the detected object (i.e., the front-right direction), and set the length of the rear sign 2030 to be equal to the calculated height. Here, because the rear of the object is not observed in the front-right direction, the rear sign 2030 may be expressed as a line sharing at least part of the left side of the first outline. Referring to
In
The cuboid obtaining device may primarily analyze the object to detect that only the left side of the object is observed (left-only), and then secondarily obtain coordinates for the side points 2110 and the rear sign 2130 as a cuboid for the object. According to an embodiment, the cuboid obtaining device may primarily obtain the coordinates for the side points 2110 and the rear sign 2130 for the object and then secondarily detect the direction of the object.
After detecting that only the left side of the object is observed in the image, the cuboid obtaining device may correct y-coordinates of two points in the vertical direction constituting the side points 2110 to the closest y-coordinates of the first outline 2100. For example, when the first outline 2100 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2110 are (x4, y4), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2110 to (x4, y5). As another example, when the first outline 2100 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2110 are (x4, y7), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2110 to (x4, y5). That is, in the present disclosure, when the coordinates of a point before correction are outside or inside the first outline 2100, the coordinates of the point are corrected to the coordinates of a point forming the first outline 2100.
In addition, as another example, when only the left side of the object is observed, the cuboid obtaining device may perform a training process such that a line connecting the two points in the vertical direction constituting the rear sign 2130 is limited to lines that share the left or right side of the first outline 2100. In detail, the cuboid obtaining device may calculate the height of the rear of the object considering the first outline 2100, the overall outline of the recognized object, and the observed direction of the object (i.e., the left side), and set the length of the rear sign 2130 to be equal to the calculated height. Here, because the rear of the object is not observed in the left side direction, the rear sign 2130 may be expressed as a line sharing at least part of the left or right side of the first outline. Referring to
In
The cuboid obtaining device may primarily analyze the object to detect that only the right side of the object is observed (right-only), and then secondarily obtain coordinates for the side points 2210 and the rear sign 2230 as a cuboid for the object. According to an embodiment, the cuboid obtaining device may primarily obtain the coordinates for the side points 2210 and the rear sign 2230 for the object and then secondarily detect the direction of the object.
After detecting that only the right side of the object is observed in the image, the cuboid obtaining device may correct y-coordinates of two points in the vertical direction constituting the side points 2210 to the closest y-coordinates of the first outline 2200. For example, when the first outline 2200 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2210 are (x4, y4), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2210 to (x4, y5). As another example, when the first outline 2200 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2210 are (x4, y7), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2210 to (x4, y5). That is, in the present disclosure, when the coordinates of a point before correction are outside or inside the first outline 2200, the coordinates of the point are corrected to the coordinates of a point forming the first outline 2200.
In addition, as another example, when only the right side of the object is observed, the cuboid obtaining device may perform a training process such that a line connecting the two points in the vertical direction constituting the rear sign 2230 is limited to lines that share the left or right side of the first outline 2200. In detail, the cuboid obtaining device may calculate the height of the rear of the object considering the first outline 2200, the overall outline of the recognized object, and the observed direction of the object (i.e., the right side), and set the length of the rear sign 2230 to be equal to the calculated height. Here, because the rear of the object is not observed in the right side direction, the rear sign 2230 may be expressed as a line sharing at least part of the left or right side of the first outline. Referring to
In
The cuboid obtaining device may primarily analyze the object to detect that the object is in the rear-left direction, and then secondarily obtain coordinates for the side points 2310 and the rear sign 2330 as a cuboid for the object. According to an embodiment, the cuboid obtaining device may primarily obtain the coordinates for the side points 2310 and the rear sign 2330 for the object and then secondarily detect the direction of the object.
After detecting that the direction of the object is the rear-left direction, the cuboid obtaining device may correct y-coordinates of two points in the vertical direction constituting the side points 2310 to the closest y-coordinates of the first outline 2300. For example, when the first outline 2300 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2310 are (x4, y4), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2310 to (x4, y5). As another example, when the first outline 2300 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2310 are (x4, y7), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2310 to (x4, y5).
Meanwhile, the cuboid obtaining device may determine coordinates of the lower point of the side points 2310 by using the direction of the object and the length between the starting and ending points that determine the height of the front of the object. That the direction of the recognized object is the rear-left direction means that the front of the object is not observed, and thus, the x-coordinates of the side points 2310 coincide with the x-coordinates of the left side of the first outline. In addition, that the direction of the object is the rear-left direction means that the object is moving in a direction away from the observer, and thus, the starting point that determines the height of the front of the object has the y-coordinate of the uppermost point of the first outline, and the ending point that determines the height of the front of the object has the y-coordinate of a point where the left front tire of the object is in contact with the ground surface. Accordingly, as illustrated in
In addition, as another example, after detecting that the direction of the object is the rear-left direction, the cuboid obtaining device may perform a training process such that the rear sign 2330 is limited to quadrangles sharing the right side of the first outline 2300. Referring to
In
The cuboid obtaining device may primarily analyze the object to detect that the object is in the rear-right direction, and then secondarily obtain coordinates for the side points 2410 and the rear sign 2430 as a cuboid for the object. According to an embodiment, the cuboid obtaining device may primarily obtain the coordinates for the side points 2410 and the rear sign 2430 for the object and then secondarily detect the direction of the object.
After detecting that the direction of the object is the rear-right direction, the cuboid obtaining device may correct y-coordinates of two points in the vertical direction constituting the side points 2410 to the closest y-coordinates of the first outline 2400. For example, when the first outline 2400 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2410 are (x4, y4), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2410 to (x4, y5). As another example, when the first outline 2400 includes a side formed from (x1, y5) to (x10, y5), and the coordinates of the upper point of the side points 2410 are (x4, y7), the cuboid obtaining device may correct the coordinates of the upper point of the side points 2410 to (x4, y5).
Meanwhile, the cuboid obtaining device may determine coordinates of the lower point of the side points 2410 by using the direction of the object and the length between the starting and ending points that determine the height of the front of the object. That the direction of the recognized object is the rear-right direction means that the front of the object is not observed, and thus, the x-coordinates of the side points 2410 coincide with the x-coordinates of the right side of the first outline 2400. In addition, that the direction of the object is the rear-right direction means that the object is moving in a direction away from the observer, and thus, the starting point that determines the height of the front of the object has the y-coordinate of the uppermost point of the first outline 2400, and the ending point that determines the height of the front of the object has the y-coordinate of a point where the right front tire of the object is in contact with the ground surface. Accordingly, as illustrated in
In addition, as another example, after detecting that the direction of the object is the rear-right direction, the cuboid obtaining device may perform a training process such that the rear sign 2430 is limited to quadrangles sharing the left side of the first outline 2400. Referring to
In
When the direction of the object recognized in the image is the rear-only direction, the cuboid obtaining device may generate the side points 2510 as a line connecting two points respectively arranged at the uppermost and lowermost points of the vertical line passing through the exact center of the object, as described above with reference to
In addition, when the direction of the recognized object is the rear-only direction, as illustrated in
As described above with reference to
Referring to
Referring to
Referring to
The method of
The cuboid obtaining device 1700 may recognize at least one object included in a first image obtained while driving, and generate a first outline for each recognized object (S2910).
The cuboid obtaining device 1700 may extract first coordinate values constituting the generated first outline from the first image (S2930).
The cuboid obtaining device 1700 may obtain coordinates of a cuboid of the recognized object based on the recognized object and the extracted first coordinate values (S2950).
Referring to
The communication unit 3010 may include one or more components for performing wired/wireless communication with an external server or an external device. For example, the communication unit 3010 may include at least one of a short-range communication unit (not shown), a mobile communication unit (not shown), and a broadcast receiver (not shown).
The DB 3030 is hardware for storing various pieces of data processed by the cuboid obtaining device 3000, and may store a program for the processor 3020 to perform processing and control.
The DB 3030 may include RAM such as DRAM or SRAM, ROM, EEPROM, a CD-ROM, a Blu-ray or other optical disk storage, an HDD, an SSD, or flash memory.
The processor 3020 controls the overall operation of the cuboid obtaining device 3000. For example, the processor 3020 may execute programs stored in the DB 3030 to control the overall operation of an input unit (not shown), a display (not shown), the communication unit 3010, the DB 3030, and the like. The processor 3020 may execute programs stored in the DB 3030 to control the operation of the cuboid obtaining device 3000.
The processor 3020 may control at least some of the operations of the cuboid obtaining device described above with reference to
For example, the processor 3020 may recognize at least one object included in a first image obtained while driving, generate a first outline for each recognized object, extract first coordinate values constituting the generated first outline from the first image, and obtain coordinates of a cuboid of the recognized object based on the recognized object and the extracted first coordinate values.
The processor 3020 may be implemented by using at least one of ASICs, DSPs, DSPDs, PLDs, FPGAs, controllers, microcontrollers, microprocessors, and other electrical units for performing functions.
In more detail,
In general, a camera installed in an autonomous vehicle is able to photograph a scene observed while the autonomous vehicle is driving, and the image captured by the camera may be raw data illustrated in
The present disclosure suggests a method for improving the speed and accuracy of labeling objects included in an image obtained while driving, by implementing the inspector 3110, who inspects primarily labeled data, as an automated device rather than a human. The visual sign correction device 3100 according to the present disclosure may receive raw data, primarily perform auto-labeling, and perform inspection on a result of the auto-labeling, so as to repeatedly perform a process of correcting (modifying) part that is not completely labeled, and classifying, as labeled data, data on which labeling has been completely performed, thereby improving an AI model that is physically or logically included therein.
As described above with reference to
When primarily labeled data is generated as illustrated in
In addition, the visual sign correction device may detect the shape of the quadrangular rear sign 3230, remove part of the rear sign 3230 located outside the outline 3200, and then perform correction such that portions of the upper side and the right side of the outline 3200 constitute the rear sign 3230.
Referring to
In
The visual sign correction device according to the present disclosure may detect the direction an object in a state where the first correction is applied as illustrated in
In the above second correction process, the visual sign correction device may also move the position of an exposure sign 3350 to a point forming the outline 3300. The exposure sign 3350 may indicate whether the entire object is captured (exposed) in the image without omitting any part thereof, and because the entire object is surrounded by the outline 3300 in
The visual sign correction device according to the present disclosure may detect the direction an object in a state where the first correction is applied as illustrated in
In detail, when the direction of the object is detected as the left-only direction, in order to schematically indicate that both the front and rear of the object are not observed, the visual sign correction device may perform second correction to change the positions of the side points 3410 in
The visual sign correction device according to the present disclosure may detect the direction an object in a state where the first correction is applied as illustrated in
In detail, when the direction of the object is detected as the rear-only direction, in order to schematically indicate that the front and sides of the object are all not observed, the visual sign correction device may perform second correction to change the positions of the side points 3510 in
The method of
The visual sign correction device 3100 may input an image obtained while driving into a learning model, and generate a visual sign for the position and direction of an object detected from the image (S3610). Alternatively, an object included in the image obtained while driving may be recognized, and the recognized object may be input into the learning model to generate a visual sign for the position and direction of the object.
The visual sign correction device 3100 may perform first correction to move the position of the visual sign generated in operation S3610 to the inside of an outline of the object (S3630).
The visual sign correction device 3100 may perform second correction to change at least one of the position and shape of the visual sign moved to the inside of the outline, based on the characteristics of the visual sign (S3650).
Referring to
The communication unit 3710 may include one or more components for performing wired/wireless communication with an external server or an external device. For example, the communication unit 3710 may include at least one of a short-range communication unit (not shown), a mobile communication unit (not shown), and a broadcast receiver (not shown).
The DB 3730 is hardware for storing various pieces of data processed by the visual sign correction device 3700, and may store a program for the processor 3720 to perform processing and control.
The DB 3730 may include RAM such as DRAM or SRAM, ROM, EEPROM, a CD-ROM, a Blu-ray or other optical disk storage, an HDD, an SSD, or flash memory.
The processor 3720 controls the overall operation of the visual sign correction device 3700. For example, the processor 3720 may execute programs stored in the DB 3730 to control the overall operation of an input unit (not shown), a display (not shown), the communication unit 3710, the DB 3730, and the like. The processor 3720 may execute programs stored in the DB 3730 to control the operation of the visual sign correction device 3700.
The processor 3720 may control at least some of the operations of the visual sign correction device described above with reference to
For example, the processor 3720 may use an image obtained while driving as an input of a learning model, generate a visual sign for the position and direction of an object detected from the image, perform first correction to move the position of the generated visual sign to the inside of an outline of the object, and perform second correction to change at least one of the position and shape of the visual sign moved to the inside of the outline, based on the characteristics of the visual sign.
The processor 3720 may be implemented by using at least one of ASICs, DSPs, DSPDs, PLDs, FPGAs, controllers, microcontrollers, microprocessors, and other electrical units for performing functions.
An embodiment of the present disclosure may be implemented as a computer program that may be executed through various components on a computer, and such a computer program may be recorded in a computer-readable medium. In this case, the medium may include a magnetic medium, such as a hard disk, a floppy disk, or a magnetic tape, an optical recording medium, such as a CD-ROM or a digital video disc (DVD), a magneto-optical medium, such as a floptical disk, and a hardware device specially configured to store and execute program instructions, such as ROM, RAM, or flash memory.
Meanwhile, the computer program may be specially designed and configured for the present disclosure or may be well-known to and usable by those skilled in the art of computer software. Examples of the computer program may include not only machine code, such as code made by a compiler, but also high-level language code that is executable by a computer by using an interpreter or the like.
According to an embodiment, the method according to various embodiments of the present disclosure may be included in a computer program product and provided. The computer program product may be traded as commodities between sellers and buyers. The computer program product may be distributed in the form of a machine-readable storage medium (e.g., a CD-ROM), or may be distributed online (e.g., downloaded or uploaded) through an application store (e.g., Play Store™) or directly between two user devices. In a case of online distribution, at least a portion of the computer program product may be temporarily stored in a machine-readable storage medium such as a manufacturer's server, an application store's server, or a memory of a relay server.
The operations of the methods described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The present disclosure is not limited to the described order of the operations. The use of any and all examples, or exemplary language (e.g., ‘and the like’) provided herein, is intended merely to better illuminate the present disclosure and does not pose a limitation on the scope of the present disclosure unless otherwise claimed. Also, numerous modifications and adaptations will be readily apparent to one of ordinary skill in the art without departing from the spirit and scope of the present disclosure.
Accordingly, the spirit of the present disclosure should not be limited to the above-described embodiments, and all modifications and variations which may be derived from the meanings, scopes and equivalents of the claims should be construed as failing within the scope of the present disclosure.
According to an embodiment of the present disclosure, effective annotation may be performed on an object included in image data for artificial intelligence learning and vehicle driving control. In particular, a bounding box may be generated in an appropriate manner according to the type of an object.
In addition, by suggesting particular criteria for objects that may be difficult to annotate, errors in extracting particular objects may be minimized.
According to the present disclosure, it is possible to accurately collect three-dimensional information (cuboid information) about the direction of an object in an image even in a complex road environment, making it possible to generate basic information for implementing a stable autonomous driving algorithm.
It should be understood that embodiments described herein should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each embodiment should typically be considered as available for other similar features or aspects in other embodiments. While one or more embodiments have been described with reference to the figures, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0111491 | Sep 2022 | KR | national |
10-2022-0135629 | Oct 2022 | KR | national |
10-2022-0164963 | Nov 2022 | KR | national |
10-2022-0164964 | Nov 2022 | KR | national |