This disclosure deals with a system and method for approximating traditional SAR imaging on mobile millimeter-wave (mmWave) devices. The presently disclosed technology enables human-perceptible and machine-readable shape generation and classification of hidden objects on mobile mmWave devices. The resulting system and corresponding methodology are capable of imaging through obstructions, like clothing, and under low visibility conditions.
mmWave systems enable through-obstruction imaging and are widely used for screening in state-of-the-art airports and security portals[1, 2]. They can detect hidden contrabands, such as weapons, explosives, and liquids, by penetrating wireless mmWave signals through clothes, bags, and non-metallic obstructions[3]. In addition, mmWave imaging systems could enable applications to track beyond line-of-sight[4-7], see through walls[8-10], recognize humans through obstructions[10-12], and analyze materials without contaminating them[13]. mmWave systems also have advantages over other screening modalities, such as privacy preservation and low-light condition usages over optical cameras; very weak ionization effect over x-ray systems; and shape detection of non-metallic objects over metal detectors.
Furthermore, the ubiquity of mmWave technology in 5G-and-beyond devices enables opportunities for bringing imaging and screening functionalities to handheld settings. Hidden shape perception by humans or classification by machines under handheld settings will enable multiple applications, such as in situ security check without pat-down searches, baggage discrimination (i.e., without opening the baggage), packaged inventory item counting without intrusions, discovery of faults in water pipes or gas lines without tearing up walls, etc.
Traditional mmWave imaging systems operate under the Synthetic Aperture Radar (SAR) principle[14-19]. They use bulky, mechanical motion controllers or rigid bodies that move the mmWave device in a predetermined trajectory forming an aperture[1, 2, 14]. As it moves along the aperture, the device transmits a wireless signal and measures the reflections bounced off of the nearby objects. Combining all the reflected signals coherently across the trajectory allows the system to discriminate the objects with higher reflectivity against the background noise. The spatial resolution of the final 2D or 3D shape depends on the span of the apertures in horizontal and vertical axes and the bandwidth of the system[16, 20]. However, emulating the SAR principle on a handheld mmWave device is challenging for a key reason: mmWave signals are highly specular due to their small wavelength, i.e., many objects introduce mirror-like reflections[21, 22]. Thus, the effective strength of the reflections from various parts of the object depends highly on its orientation with respect to the aperture plane. So, even if some parts of the object could reflect mmWave signal strongly, those reflections may not arrive at the receiver. Consequently, some parts and edges of the object do not appear in the reconstructed mmWave shape.
In addition, due to the weak reflectivity of various materials, its reflected signals may be buried under the signals from strong reflectors. Thus, the weak reflecting parts of the object may have poor blurry resolution or may often be missing from the final shape completely, allowing for a partial shape reconstruction only. The resultant shape could lack discriminating features for automatic object classification, or it could be imperceptible by humans.
Aspects and advantages of the presently disclosed subject matter will be set forth in part in the following description, or may be apparent from the description, or may be learned through practice of the presently disclosed subject matter.
Broadly speaking, the presently disclosed subject matter relates to human-perceptible and machine-readable shape generation and classification of hidden objects.
We propose MilliGAN, a system and corresponding methodology that approximates traditional SAR imaging on mobile mmWave devices. It enables human-perceptible and machine-readable shape generation and classification of hidden objects on mobile mmWave devices. The system and methodology are capable of imaging through obstructions, like clothing, and under low visibility conditions.
Since traditional SAR mmWave imaging suffers from poor resolution, specularity, and weak reflectivity from objects, the reconstructed shapes could often be imperceptible by humans. To this end, MilliGAN designs a machine-learning model to recover the high-spatial frequencies in the object to reconstruct an accurate 2D shape and predict its 3D features and category. Although we have customized MilliGAN for security applications, the model is adaptable to different applications, with limited training samples. We implement our combination system incorporating off-the-shelf components and demonstrate performance improvement over the traditional SAR, qualitatively and quantitatively.
More generally, presently disclosed subject matter relates more broadly to sensors.
In some exemplary embodiments disclosed herewith, systems and methods for hidden objects' shape generation, detection, and classification are described.
It is to be understood that the presently disclosed subject matter equally relates to associated and/or corresponding methodologies. One exemplary such method relates to a method for approximating SAR imaging on mobile mmWave devices to enable human-perceptible and machine-readable shape generation, and for classification of hidden objects on mobile mmWave devices, comprising obtaining shape data from a mobile device 3D mmWave for a target object; and using a machine-learning model to recover high-spatial frequencies in the object and reconstruct a 2D shape of the target object.
Another exemplary such method relates to a method for imaging and screening in handheld device settings, to achieve hidden shape perception by humans or classification by machines, to enable in situ security check without physical search of persons or baggage, comprising training a machine-learning model, based on inputs of examples of 3D mmWave shapes and based on the corresponding ground truth shapes, to learn the association between 3D mmWave shapes and the corresponding ground truth shapes; providing input to the trained machine-learning model, such input comprising 3D mmWave shape data from a mobile device; and operating the trained machine-learning model to process such input data to determine and output the corresponding ground truth 2D shape.
Other example aspects of the present disclosure are directed to systems, apparatus, tangible, non-transitory computer-readable media, user interfaces, memory devices, and electronic devices for ultrafast photovoltaic spectroscopy. To implement methodology and technology herewith, one or more processors may be provided, programmed to perform the steps and functions as called for by the presently disclosed subject matter, as will be understood by those of ordinary skill in the art.
Another exemplary embodiment of presently disclosed subject matter relates to a system that approximates, on mobile mmWave devices, SAR imaging of full-sized systems, to enable on mobile mmWave devices human-perceptible and machine-readable shape generation and classification of hidden objects, comprising a conditional generative adversarial network (cGAN)-based machine-learning system, trained based on inputs of examples of 3D mmWave shapes and on the corresponding ground truth shapes, to learn the association between 3D mmWave shapes and the corresponding 2D ground truth shapes; an input to the cGAN-based machine-learning system from a mobile device of 3D mmWave shape data of target objects; and a display for producing corresponding human perceptible 2D shapes output from the cGAN-based machine-learning system based on the input thereto.
Still another exemplary embodiment of presently disclosed subject matter relates to a cGAN-based machine-learning system, comprising one or more processors programmed to use a machine-learning model to recover the high-spatial frequencies in imperceptible 3D mmWave shape data for a target object, and to reconstruct and display an accurate human perceivable 2D shape for the target object.
Additional objects and advantages of the presently disclosed subject matter are set forth in, or will be apparent to, those of ordinary skill in the art from the detailed description herein. Also, it should be further appreciated that modifications and variations to the specifically illustrated, referred and discussed features, elements, and steps hereof may be practiced in various embodiments, uses, and practices of the presently disclosed subject matter without departing from the spirit and scope of the subject matter. Variations may include, but are not limited to, substitution of equivalent means, features, or steps for those illustrated, referenced, or discussed, and the functional, operational, or positional reversal of various parts, features, steps, or the like.
Still further, it is to be understood that different embodiments, as well as different presently preferred embodiments, of the presently disclosed subject matter may include various combinations or configurations of presently disclosed features, steps, or elements, or their equivalents (including combinations of features, parts, or steps or configurations thereof not expressly shown in the figures or stated in the detailed description of such figures). Additional embodiments of the presently disclosed subject matter, not necessarily expressed in the summarized section, may include and incorporate various combinations of aspects of features, components, or steps referenced in the summarized objects above, and/or other features, components, or steps as otherwise discussed in this application. Those of ordinary skill in the art will better appreciate the features and aspects of such embodiments, and others, upon review of the remainder of the specification, will appreciate that the presently disclosed subject matter applies equally to corresponding methodologies as associated with practice of any of the present exemplary devices, and vice versa.
These and other features, aspects and advantages of various embodiments will become better understood with reference to the following description and appended claims. The accompanying figures, which are incorporated in and constitute a part of this specification, illustrate embodiments of the present disclosure and, together with the description, serve to explain the related principles.
A full and enabling disclosure of the present subject matter, including the best mode thereof to one of ordinary skill in the art, is set forth more particularly in the remainder of the specification, including reference to the accompanying figures in which:
Repeat use of reference characters in the present specification and figures is intended to represent the same or analogous features, elements, or steps of the presently disclosed subject matter.
Reference will now be made in detail to various embodiments of the disclosed subject matter, one or more examples of which are set forth below. Each embodiment is provided by way of explanation of the subject matter, not limitation thereof. In fact, it will be apparent to those skilled in the art that various modifications and variations may be made in the present disclosure without departing from the scope or spirit of the subject matter. For instance, features illustrated or described as part of one embodiment may be used in another embodiment to yield a still further embodiment. Thus, it is intended that the presently disclosed subject matter covers such modifications and variations as come within the scope of the appended claims and their equivalents.
In general, the present disclosure is directed to a system (and methodology) which is a design to improve the human perceptibility of the mmWave shapes. MilliGAN uses cGAN[23-25]. The high-level idea is intuitive: MilliGAN trains a cGAN framework by showing thousands of examples of mmWave shapes from traditional reconstruction and the corresponding ground truth shapes. cGAN framework uses a Generator (G) to learn the association between the 3D mmWave shape to the 2D ground truth shape and uses a Discriminator (D) that teaches G to learn better association at each iteration[23]. During the run time, when cGAN has been trained appropriately, G can estimate an accurate 2D depth map outlining the shape without the ground truth. In addition to the shape, we also use a Quantifier network (Q) that predicts the mean depth and orientation in the 3D plane, and a Classifier network (C) to automatically classify the objects into different categories.
More particularly regarding the subject MilliGAN Learning System,
The core purpose of the G is to be able to convert the imperceptible 3D mmWave shape to a human-perceivable 2D shape with all the edges, parts, and high-spatial frequencies. To this end, we utilize the traditional encoder-decoder architecture[26]. The encoder layer converts the 3D mmWave shape into a 1D feature vector using multiple 3D convolution layers and an end flatten layer. This 1D representation compresses the 3D shape so that the deeper layers could learn the high-level abstract features. By the end of 3D convolutions, we convert the spatial 3D data to 1×1×1, and at this point, the number of channels has been increased to hold these abstract features. The decoder layer leverages these 1D features and applies multiple deconvolution layers to decrease the number of channels and increase the spatial dimensions. Deconvolution stops when we reach the desired output size, and at that point, we have a single channel for a 2D shape. In our design, we follow prior literature[27] to use six 3D convolution layers and the eight 2D deconvolution layers at the encoder and the decoder, respectively, as represented in
Yet, passing the 3D mmWave shape through the encoder-decoder layers may yield a loss of detailed high-frequency information during encoding[28]. This is because the object could spread over the reconstructed volume, but only a few 2D slices contain the high-spatial frequencies; however, the encoder compresses them further while converting them into abstract 1D features. To preserve such high-frequency details, G employs a skip connection[27; 28] between input layer to the 6th deconvolutional layer. The skip connection extracts the highest energy 2D slice from the 3D shape and concatenates it to the 2D deconvolution layer. However, due to different orientations of the object, various parts of it may not appear at a single highest energy slice; thus, a single 2D slice may not capture all the relevant high-frequency depth information and might cause instability in the network[27]. Therefore, G first finds the plane that intersects with the 3D voxel and likely has the highest energy from the object. Then, it selects a few neighboring 2D slices parallel to the highest-energy plane towards and away from the aperture plane. In practice, 4 neighboring slices from both sides of the highest energy plane perform well.
Finally, G leverages the feedback from the D to adjust the weights of its encoder-decoder layers to learn and predict the accurate 2D shapes. Table I summarizes the G network parameters.
The purpose of the D is to teach G a better association between the 3D mmWave shape and its 2D ground truth shape. D achieves this by distinguishing real and generated samples during the training process. It takes two inputs in the form of the 3D mmWave shape and the 2D shape that either is a real shape or is generated by G and produces output as a probability that the input is real, as illustrated by
Finally, the two 1D feature vectors from both 3D and 2D convolutions are cascaded and fed into 2 fully connected dense layers that finally reach the single neuron output layer. The output layer is passed through a sigmoid activation function and outputs the probability that the given 2D shape is real. By G trying to minimize the expected value and D trying to maximize it, the entire cGAN will converge when D consistently outputs close to 0.5 probability of recognizing inputs correctly, i.e., real and generated shapes have an equal probability of being real. This ensures that G has learned enough to produce the correct 2D shapes. Table II summarizes the D network parameters.
Although our cGAN can recover most of the missing edges and parts of the objects, its output is only a 2D shape. Rather than predicting the entire 3D shape directly from the cGAN, which would be not only computationally expensive but also hard to learn due to inadequate input 3D data[29; 30], MilliGAN leverages a Q network that can estimate the 3D features of the object—Mean depth and its orientations in the 3D plane.
So far, MilliGAN recovers the full 2D shape and 3D features of an object from its 3D mmWave shape. We now extend MilliGAN's capability to detect and classify various real-life objects automatically. This is useful in non-intrusive applications like automated packaged inventory counting, remote pat-down searching, etc. To this end, we propose a C network customized for a handheld security application that leverages the predicted 2D shape to label it to one of the object classes automatically. Similar to D and Q, C leverages seven 2D convolution layers and two fully connected dense layers to predict the classes.
In our implementation, we select the sample of items used by most security screening procedures (e.g., pistols, knives, scissors, hammers, boxcutters, cell phones, explosives, screwdrivers[3]) as the categorical outputs. In addition to these categories, we add one extra “Other” category to include various other items, such as books, keyring, wallet, keychain, etc. Hence, the categorical output has nine neurons in the output layer. Although C is currently not trained on all possible items of interest exhaustively, we note that our network is scalable to more objects without requiring substantial changes in the layers or training with large samples. In addition to fine-grained classification, we also incorporate a binary classification of objects being suspicious or not. Such binary output could be very useful for hidden object annotations so security personnel could perform additional checks. Dangerous objects which should not be missed during classification are labeled as suspicious, e.g., knives, pistols, explosives, etc. Finally, C uses the softmax and sigmoid activation functions for the categorical and binary output layers, respectively. Table IV summarizes the C network parameters.
Regarding network loss functions, all the network blocks rely on their loss functions to appropriately tune the weights and train themselves. We use the L1-norm loss L1(G)[31] as well as traditional GAN loss L(G)[32] to train the cGAN consisting of G and D. L1 loss helps the networks to predict the 2D shape better by estimating pixel-to-pixel mean absolute error, while traditional GAN loss maintains the adversarial game. Our combined cGAN loss is determined by:
L
cGAN
=L(G)+λL·L1(G), where L1(G)=E∥xL−G(zL)∥1 (1)
Q network leverages the cGAN loss LcGAN and 3D features' loss between the ground truth and the prediction to determine its loss function:
L
Q
=L
cGANλF·LF(G), where LF(G)=E∥xF−G(zF)∥1 (2)
Finally, C network leverages LcGAN, categorical loss LC, and binary loss LB. The categorical and binary losses are computed as the cross-entropy losses between actual probabilities and predicted probabilities of different categories and binary classes[33], and are calculated as:
where c(si) and ti are the predicted and actual probabilities of ith class (categorical output), p0 and t0 are the predicted and actual probabilities of suspicious object (binary output), and the hyper-parameters (λL, λF, λC, λB) represent the networks' focus on shape reconstruction, features prediction, and classification.
Our goal is to find the set of values for these parameters which would minimize the individual losses. However, determining the exact values is tricky and difficult, although intuitively, the value for λL should be the largest since it is responsible for accurate reconstruction of human perceivable 2D shapes. These networks with their optimized loss functions enable MilliGAN to fill up the missing edges and parts in 2D shapes, predict the 3D features, and classify the objects accurately.
Results include shape improvement from cGAN and 3D features prediction, as discussed hereafter.
With reference to shape improvement from cGAN, we evaluate MilliGAN's cGAN architecture in enhancing the shapes.
First,
Second, to evaluate the generalizability of MilliGAN, we run cGAN over 150 test samples and calculate the SSIM[34] by considering the 2D ground truth shapes as the reference.
Regarding prediction of 3D features, recall that Q leverages the generated 2D shape to predict the object's 3D features (mean depth and 3D orientation). We use the previous 150 test samples and estimate the error in predicting the features. We also compare the results with a baseline network that uses the shapes reconstructed by the traditional SAR only. To create the baseline, we use Q's architecture but train the layers with traditional SAR generated shapes. This baseline network is also trained with identical sets of synthesized and real samples for the same number of epochs that were used in MilliGAN training.
More particularly,
Recall that C can predict nine object categories along with their binary classes. We randomly select 540 test samples (60 from each of the categories) and use the cGAN to produce the accurate 2D shapes. Then, we input these 2D shapes to C to predict their class labels. Since C is customized towards security application, we use 0.98 as the class probability threshold so any object with less than 98% confidence is placed under the “Other” class. We also use the same set of samples for binary classification of labeling the objects as suspicious or not.
Table V shows the confusion matrix of categorical labeling with rows as the predicted probability.
Cell phones, explosives, hammers, pistols, and scissors all show 100% accuracy in Table V because these objects reflect mmWave signals strongly and cGAN can accurately reconstruct their shapes, aiding C to do a perfect classification. We also observe that 13% and 25% of “Other” categories are classified as cell phones and explosives because of their shape similarity (e.g., wallet and keychains, etc.). Overall, C has an average prediction accuracy of ˜90%. Instead of 98% confidence, using the highest output probability to predict the labels, we still find that the average prediction accuracy is ˜88%, indicating that our model does not fit data to any one of the particular categories excessively.
We also observe in Table VI that the binary classification is more accurate, which is expected since there are only two class labels. Still, we get 6% false positives (non-suspicious items classified as suspicious) mostly due to the wrong classifications of “Other” categories. However, the false negatives in our test samples are low (1.75%), which makes MilliGAN promising for security applications.
Finally,
While the traditional SAR fails to generate any interpretable results, either in partially or fully occluded scenes, MilliGAN can clearly show sharp images with discriminating features, even if it has never learned the scene before. These results demonstrate that MilliGAN is well generalizable under real conditions with different background noise and movements in the environment.
While certain embodiments of the disclosed subject matter have been described using specific terms, such description is for illustrative purposes only, and it is to be understood that changes and variations may be made without departing from the spirit or scope of the subject matter. In particular, this written description uses examples to disclose the presently disclosed subject matter, including the best mode, and also to enable any person skilled in the art to practice the presently disclosed subject matter, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the presently disclosed subject matter is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they include structural and/or step elements that do not differ from the literal language of the claims, or if they include equivalent structural and/or elements or steps with insubstantial differences from the literal language of the claims.
The present application claims the benefit of priority of U.S. Provisional Patent Application No. 63/192,345, titled Human-Perceptible and Machine-Readable Shape Generation and Classification of Hidden Objects, filed May 24, 2021; and claims the benefit of priority of U.S. Provisional Patent Application No. 63/303,805, titled Human-Perceptible and Machine-Readable Shape Generation and Classification of Hidden Objects, filed Jan. 27, 2022, both of which are fully incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
63192345 | May 2021 | US | |
63303805 | Jan 2022 | US |