This invention relates to the technical field of image processing. More particularly, this invention relates to a system, apparatus, method or computer program for measuring the dimensions of an article such as a bag belonging to a passenger.
A major issue in the air transport industry (ATI) is departure delays caused by an excessive amount of hand luggage (or “cabin baggage” or “carry-on baggage”) being brought into the cabin of an aircraft. If passengers bring too many items of hand luggage into the cabin, or if those items are too large to be stored in the cabin, then the excess hand luggage must be transferred to the hold at the departure gate, which is time consuming and may result in departure delays.
Although the ATI industry recommends that carry-on baggage must conform to a size, weight and shape to fit under a passenger seat or in an enclosed storage compartment, there is no universal standard for the acceptable dimensions of carry-on baggage across different airlines. Indeed, even different seat classifications within the same aircraft will have different carry-on baggage allowances. In addition, baggage manufacturers often inaccurately quote the size of a particular bag or fail to include wheels or handles when stating measurements. Therefore, although items of luggage are often marked with indications such as “cabin approved” or “TSA approved”, this is not a guarantee that every airline will accept the item of luggage in the cabin. It is also difficult for passengers and airport agents to prevent departure delays by accurately identifying whether a particular item of hand luggage should be stored in the hold of an aircraft during a journey. Additionally, there is no existing means of automating the measurement of carry-on baggage, so any inspection of the size of baggage must be done manually which is very time intensive.
Existing technologies seek to address the above problems by requiring manual user interaction, for example by manually identifying the corners of a bag, or by precisely manipulating a camera to generate point cloud data of the bag. US2020/0065988 A1 describes methods of measuring an object that involve identifying a set of feature points of the object (i.e. point cloud data) during an image data gathering phase. A 3D model of the object is then generated after the image data gathering phase based on the set of feature points. The generated model is measured with respect to a reference plane, which provides an indication of the measured size of the generated model.
However, systems such as those described above are unreliable and inaccurate. This is partially due to the final measurement being reliant on the accuracy of the inputs provided by the user: if the user inaccurately identifies the corners of a bag then the resulting measurement will not accurately capture the dimensions of the bag. The accuracy of the measurement is therefore highly dependent on the skill level of the user. The use of Artificial Reality (AR) based point cloud data is also problematic. This is because AR point cloud data indiscriminately identifies features within an image, which creates huge volumes of data that slows down the processing time for each image, creates high levels of random noise and inaccurate point capture.
These problems are inextricably linked to real-world images of baggage due to, for example, different textures, uneven surfaces, lighting conditions, and additional items shown within the image frame. These factors may cause an image processing system to falsely identify features of a bag that are not present, or fail to identify a sufficiently large number of points on the bag to enable an accurate size measurement to be determined. Additionally, it can be difficult to track the same feature—such as the handle of a bag—over multiple image frames because the use of point cloud data may result in the same feature migrating location between different image frames. The huge volume of information within the point cloud data also means that the computational burden of tracking the changing location of a particular feature is extremely high. This can lead to features being duplicated, or detected multiple times, which slows down the image processing time. Additionally, techniques requiring point cloud data rely on camera movement to accumulate the necessary data, and so are limited in applicability as they cannot be applied to CCTV images.
Some of the above problems may be addressed using more specialised, non-AR based, point cloud detection technologies, such as LIDAR, however these technologies are typically prohibitively expensive to acquire and operate, and so are not widely available to be able to be used industry-wide.
Accordingly, all of the above factors make the use of point cloud data unsuitable for accurately measuring the dimensions of an object by determining feature points with AR technology.
The present invention overcomes or ameliorates the above problems in the manner described below.
The invention is defined by the independent claims, to which reference should now be made. Preferred features are set out in the dependent claims.
Embodiments of the invention seek to address problems arising from oversized or excessive carry-on baggage by using artificial intelligence (AI) and augmented reality (AR) to automatically measure the dimensions (i.e. the height, width and depth) of a bag, which may be based on a single image of the bag. The invention may be implemented at certain key locations in a journey—such as during check-in and/or at a departure gate—in order to monitor baggage and determine whether certain items of baggage meet or exceed the permitted dimensions specified by the airline. In some embodiments, the invention may be implemented on a computer, for example when performed at check-in, or the invention may be implemented on a mobile phone, for example when performed at a departure gate.
According to a first embodiment there is provided a method for measuring the dimensions of an article, the method comprising obtaining image data associated with an article, identifying, based on the image data, a plurality of pixel locations associated with the article, each of the pixel locations corresponding to a respective corner of the article, calculating corresponding 3D coordinates for each of the pixel locations, and determining the dimensions of the article based on the 3D coordinates.
In one example the 3D coordinates are calculated using a Perspective-n-Point (PnP) algorithm based on camera calibration data.
In one example the dimensions of the article are determined based on a calculated scaling factor and the 3D coordinates.
In one example the scaling factor is calculated based on a predetermined height between the camera and the floor, a relative height of the article and a relative depth of the article.
In one example the relative height of the article and the relative depth of the article are determined based on the calculated 3D coordinates, wherein the height and depth of the article are relative to the width of the article.
In one example the PnP algorithm determines the pose of a camera that provides the image data based on calibration parameters of the camera.
In one example the plurality of pixel locations includes a pixel location associated with the centroid of the article.
Another example further comprises determining a plurality of corresponding 2D coordinates based on the calculated 3D coordinates.
Another example further comprises comparing the plurality of 2D coordinates with the plurality of pixel locations to determine an error.
Another example further comprises identifying a discrete range of acceptable values that define a relative height of the article and a relative depth of the article, wherein the height and depth of the article are relative to the width of the article.
In one example the discrete range of acceptable values that define the relative height and the relative depth of the article is determined by standard travel industry sizes for articles of baggage.
Another example further comprises determining a set of 3D coordinates for each possible combination for the relative height and relative depth of the article and determining a corresponding set of 2D coordinates for each set of 3D coordinates.
Another example further comprises identifying an optimum set of 3D coordinates based on a least-squares analysis of the error between the plurality of pixel locations and each set of 2D coordinates.
In one example the relative height and relative depth of the article associated with the optimum 3D coordinates are used as relative size parameters.
In one example the 3D coordinates are calculated using a ray-casting algorithm in an augmented reality environment.
In one example the dimensions of the article are determined by calculating the distance between 3D coordinates in the AR environment
In one example the ray-casting algorithm includes simultaneous localisation and mapping techniques.
In one example each of the pixel locations corresponds to a respective corner of the article.
Another example further comprises generating a bounding box based on the 3D coordinates.
In one example, the bounding box may be a cuboid defined by the 3D coordinates of the pixel locations, and the bounding box approximates the dimensions of the article.
In one example the image data is obtained from a single image of the article.
In one example each of the plurality of pixel locations is identified using a neural network.
In one example the article is a bag in an airport environment and the method further comprises calculating a volume of carry-on baggage based on the dimensions of one or more articles associated with checked-in passengers intending to board an aircraft, identifying the total cabin storage capacity of the aircraft, and comparing the volume of carry-on baggage with the total cabin storage capacity to identify a remaining cabin storage capacity for the aircraft.
Another embodiment further comprises sending a notification if the remaining cabin storage capacity falls below a threshold value.
Another embodiment further comprises a training phase wherein an annotation tool is used to train a neural network to identify the plurality of pixel locations associated with the article, wherein one or more of the pixel locations may be identified manually with the annotation tool.
In a second embodiment there is provided a system for measuring the dimensions of an article, the system comprising a camera configured to obtain image data associated with an article, a neural network configured to identify, based on the image data, a plurality of pixel locations associated with the article, each of the pixel locations corresponding to a respective corner of the article, and a processor configured to calculate corresponding 3D coordinates for each of the pixel locations, and further configured to determine the dimensions of the article based on the 3D coordinates.
In one example the location of the camera is fixed.
In one example the processor is implemented on an edge device.
In one example the camera and the processor are located on a mobile device.
Another example further comprises an airport operation database and common use terminal equipment system.
The above embodiments of the invention may provide the following benefits or advantages:
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
The following exemplary description is based on a system, apparatus, and method for use in the aviation industry. However, it will be appreciated that the invention may find application outside the aviation industry in any context that requires remote or non-physical measurement of objects. This may include other transportation industries, such as shipping, cruises or trains, or delivery industries where items are transported between locations.
A bag detection and metering algorithm 121, which may be implemented on the edge device or the mobile device, processes and analyses the bag image 111 to determine the dimensions of the bag 112 based on the pixel locations of the vertices. This is achieved by translating the pixel locations of the vertices and centroid (which are in two-dimensional pixel coordinates) into a set of three-dimensional coordinates and determining a scaling factor to resolve monocular scaling ambiguities. For fixed-height cameras, the scaling factor may be determined based on the known height between the camera and the floor, as further described below. For mobile cameras, the scaling factor may be determined using an augmented reality mobile application, as further described below.
Airport Operation Database (AODB) 130 is a central database or repository for all operative systems and may provide all flight-related data in real-time. For example, the AODB may provide the limitations of each aircraft arriving at a particular airport terminal such as the cabin space capacity of the particular aircraft. Accordingly, interfacing with the AODB 130 enables the system to calculate the remaining cabin space 131 based on the cabin space capacity for a particular aircraft at the airport terminal and the total amount of hand luggage or carry-on baggage that has been identified for passengers checked onto a flight on the particular aircraft.
Once the dimensions of the bag 112 have been calculated by the machine learning algorithm, the bag dimensions can be paired with passenger related information, which in some embodiments may be provided from Common Use Terminal Equipment (CUTE) systems 140 once a passenger has checked in for a flight. The CUTE system 140 can provide information indicating whether a particular passenger has checked in, and integrating with CUTE system 140 enables baggage information to be associated with passenger information.
All of this data—the bag dimensions, remaining cabin size, passenger related information, and any other pertinent data—may be integrated within an Application Programming Interface (API) which enables notifications 141 to be communicated between systems. This may, for example, enable the system to notify an airport agent 150 that the amount of remaining cabin space has fallen to below a particular threshold, such as 10% or 5% of the total cabin capacity. On receiving such a notification, the airport agent can then inform subsequent passengers checking in for the flight that their carry-on bags must be placed in the hold instead of being carried into the cabin. In preferred embodiments, on determining that a carry-on bag must be placed in the hold the system is further configured to provide a bag tag for the bag. The tagged bag may then be loaded into the hold of the aircraft. In addition, the API may be implemented on a bag measurement mobile application that enables passengers to check whether their bags comply with carry-on baggage regulations at any time.
In some embodiments, the invention is implemented on an edge device. In such embodiments, image data is obtained via cameras that are a fixed height above the floor, such as CCTV cameras, and a scaling factor is determined based on the known floor-to-camera height in order to calculate the dimensions of a bag. An example process for calculating the dimensions of a bag using fixed-height cameras, such as CCTV cameras, is shown in
In a first step 210, an image of a bag 211 is provided as an input image to a neural network 212. The neural network 212 identifies the bag within the bag image 211 and identifies 9 key points of the image by identifying the 2D pixel locations of each of the 8 vertices of the bag and the centroid of the bag.
In preferred embodiments, the system performs an iterative optimisation process 220 to identify the relative dimensions of the object based on the input image. In preferred embodiments, the optimisation process involves steps 221, 222, 223, 224 and 225 as described further below.
In step 221, the 2D pixel locations are converted to corresponding 3D coordinates for each of the 9 key points determined by the neural network 212 in order to determine a 3D bounding box, which may be used to calculate the relative dimensions of the bag. The advantage of using a 3D bounding box, as opposed to using conventional 2D object detection, is that the 3D bounding box of the present invention has 9 degrees of freedom: the width, height and length dimensions; the pitch, roll and yaw rotational directions, and the location of the centroid. By contrast, conventional 2D object detection can only consider 4 degrees of freedom. Providing more degrees of freedom advantageously enables the system to provide more accurate results.
In preferred embodiments, the 3D bounding box may be determined using a known Perspective-n-Point (PnP) algorithm 222 that calculates the position and orientation (i.e. the pose) of a camera based on the projected locations of n 3D points in a 2D image. Intrinsic information relating to camera calibration parameters, such as focal length, is also provided to the PnP algorithm to more accurately determine the 3D pixel locations.
The system may preferably determine a discrete range of acceptable values that the dimensions of the object may take. In some embodiments, this is achieved by determining a discretised acceptable range for both the relative depth and the relative height of the bag. The relative width may be normalised as unity and so the relative depth and relative height of the bag are expressed relative to the width of the bag. In some embodiments, the range of acceptable depth and height for a bag is based on industry standard bag sizes but, alternatively, the acceptable range may be determined by setting appropriate limits on the relative depth and height, for example the height dimension being no more than 10 times the depth dimension of the bag. In such embodiments, the PnP algorithm may determine a 3D bounding box based on the 9 key points determined by neural network 212 for each possible combination of the relative height and relative depth in the range determined above. Accordingly, the PnP algorithm outputs a plurality of bounding boxes based on the acceptable range of relative height and depth for the bag.
In step 223, the 3D pixel locations produced by the PnP algorithm are converted to 2D pixel locations in order to enable a comparison between the pixel locations derived by the PnP algorithm and the pixel locations derived by the neural network 212.
In step 224, the 2D pixel locations associated with each calculated 3D bounding box are compared with the original 2D pixel locations determined by the neural network 212 to determine an error for the calculated 3D coordinates. For embodiments where a plurality of 3D bounding boxes are generated, as described above, a comparison may be made to determine which of the plurality of bounding boxes most closely aligns with the original 2D pixel locations. This may be achieved using a least squares analysis whereby the set of 3D coordinates that provides the smallest output value from the least squares analysis is selected as the closest fit to the original 2D pixel locations.
In step 225, the relative height and relative depth are identified based on the optimum 3D coordinates associated with the bounding box that provides the least error, which acts as a proxy for the relative dimensions of the bag. Accordingly, the system determines the respective size parameters a and b corresponding to the relative height and depth of the bag. In preferred embodiments, the relative height and relative depth of the bag are normalised with respect to the width of the bag, which accordingly has a respective width parameter equal to 1. As for step 224, in some embodiments a comparison may be made between the respective size parameters a and b and the original 2D pixel locations to determine the accuracy of the calculated size parameters a and b compared to the original data.
An example pseudocode for an algorithm that performs the above iterative optimisation process is as follows:
In the above pseudocode, amax and amin are the largest and smallest acceptable values of size parameter a, and bmax and bmin are the largest and smallest acceptable values of size parameter b. Accordingly, in steps 210 to 230 above the system is able to accurately determine the relative dimensions of a bag. However, due to the monocular scale ambiguity, the distance between coordinates will include a scaling error and so the above steps will not provide the dimensions of the bag.
Therefore, in a further step 240 the system calculates the absolute size of the baggage by resolving the monocular scale ambiguity by calculating a scaling factor based on the known and fixed camera height 241 of the camera 110 and the relative size parameters a and b. The absolute size of the bag is then calculated based on the scaling factor and the relative size parameters as further described below with reference to
The scaled ground-to-camera height is calculated using the dot product of Vnormal and A:
Since the distance between the camera and the floor (i.e. the real height, Hreal) is known, a scaling factor can be calculated by comparing the calculated vector length Hscaled with the known ground-to-camera height Hreal.
Assuming that the depth of the bounding box (for example, the displacement between vertices 331 and 332) is “a”, the height of the bounding box (for example, the displacement between vertices 331 and 334) is “b” and the width of the bounding box is 1, then the real width, depth and height of the bag may be calculated using the following formulae:
where a is a scaled depth,
and b is a scaled height
In alternative embodiments, the invention may be implemented on a mobile device such as a cell phone. In such embodiments, image data is obtained via a camera module of the mobile device, and a scaling factor is determined using applications implemented on the mobile device in order to calculate the dimensions of a bag. In particular, AR applications are employed to overcome scale ambiguity by using known AR libraries, such as ARCore and ARKit.
This is achieved by first establishing a surface—usually the ground—on which the item of baggage is placed. The location of the ground surface may be determined using AR applications performing known techniques to track locations in the field of view of the camera to derive the relative distance and position of the device relative to those locations based on input data from the mobile device, such as data from sensors, accelerometers, magnetometers and gyroscopes and image data from the camera. This enables the AR library to generate a virtual surface that corresponds to a flat surface in the real world, such as the ground underneath a bag.
Once the virtual ground surface has been established, the 8 vertices of the bounding box are ray-cast with absolute scale. This is achieved by using the mobile application to project imaginary light rays from the calculated focal point of the camera to the ground plane through each pixel of the camera sensor. As the four bottom, or base, corners of the bag (331, 332 and 333 shown in
In some embodiments, the accuracy of the 3D locations of the base corners may be improved by using a limited amount of point cloud data, for example using a cluster of data points that define where a bag face starts may be used to make small adjustments to the calculated 3D co-ordinates by determining whether an extrapolated bag face plane would intersect with the calculated 3D co-ordinates, and adjusting the 3D co-ordinates accordingly.
The absolute size calculation 560 comprises determining the distance between each vertex in the virtual space based on the calculated 3D coordinates. Since the ray-casting produces 3D coordinates in absolute scale the dimensions of the bag 562 may then be calculated straightforwardly based on the absolute distance between the 3D coordinates of each vertex.
Advantageously, compared with techniques for measuring objects using point-cloud data, only a single surface (the ground plane) needs to be detected in order to accurately measure the dimensions of an item of baggage.
In preferred embodiments the neural network comprises six layers: four convolutional layers 610 and two fully connected layers 620. Each of the four convolutional layers 610 includes one normalization layer 611, one nonlinear activation function (PReLU) 612, and one max-pooling layer 613.
The convolution layers extract the high-level features from an image, such as edges, colour, and gradient orientation. The max pooling layer reduces the spatial size of the image, extracts dominant features within the image, and suppresses noise within the image. Each of the two fully connected layers 620 comprises one linear layer 621, and one nonlinear activation function (PReLU) 622.
In a preferred embodiment, the neural network is designed based on a modified version of the known AlexNet deep neural network, whereby the final PReLU layer 622 is modified to provide an output of 18 floating point numbers, which corresponds to an x,y coordinate pair for each of the 9 key points for each image as described above. Thus, in preferred embodiments the neural network 212 is provided with bag images as an input 630 and provides an output 640 of 18 numbers corresponding to the 2D pixel locations of the 8 vertices and centroid described above. In preferred embodiments, the input bag images have an image size of 224×224 pixels and have three colour channels, as shown in
The location of hidden corners can be determined because the neural network learns to predict the position of each corner based training data used to train the neural network. In particular, the training data set includes labels that include the x-y pixel coordinates of all 9 key points described above, regardless of whether each corner is visible to the camera or not. Therefore the neural network is always able to predict the position of the corners of the bag based on this learned understanding of where the corners of the bag are expected to be.
An example of an annotated training image 900 is provided in
As shown in
While the application is gathering data, the machine learning algorithm described above may calculate a predicted 3D bounding box based on the available data and the GUI may display the predicted 3D bounding box 1003 and calculated dimensions of the bag 1004 based on the 3D bounding box. As may be seen in
In
Again, throughout the process the machine learning algorithm described above calculates a predicted 3D bounding box based on the newly available data and the GUI displays an updated predicted 3D bounding box 1010 and calculated dimensions of the bag 1011 based on the 3D bounding box. As may be seen in
Finally, in
The above detailed description of embodiments of the invention are not intended to be exhaustive or to limit the invention to the precise form disclosed. For example, while processes or blocks are presented in a given order, alternative embodiments may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times.
Additionally, it will be noted that although the above description may discuss measuring regular cuboid-shaped objects, it will be appreciated that embodiments of the invention are not limited to measuring such shapes. Where the object to be measured is a non-cuboid rigid, or semi-rigid, shape the above-described methods may be used without modification to provide a cuboid-shaped bounding box that encloses the measured object.
The teachings of the invention provided herein can be applied to other systems, not necessarily the system described above. The elements and acts of the various embodiments described above can be combined to provide further embodiments.
While some embodiments of the inventions have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the disclosure. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
21208064.2 | Nov 2021 | EP | regional |
This application is a continuation of International Patent Application No. PCT/GB2022/052832, filed on Nov. 9, 2022, and entitled “METHOD AND SYSTEM FOR MEASURING AN ARTICLE,” which claims the benefit of and priority of: EP Patent Application No. 21208064.2, filed Nov. 12, 2021 and entitled “METHOD AND SYSTEM FOR MEASURING AN ARTICLE,” which are each incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/GB2022/052832 | Nov 2022 | WO |
Child | 18658393 | US |