Cameras are a technology that has existed for over a thousand years with the first photograph occurring in 1827 and has greatly improved with features and abilities even in just the last decade (Grepstad, 2006). Currently, the average resolution for off-the-shelf cameras is roughly 10 megapixels. Resolution determines how much detail can be visibly obtained in an image (Loebich, 2007). For the development of the social distancing alerting systems for schools and social gatherings, we have proposed the use of an AI-enabled camera and thermal camera that can detect the temperature of the students and supervise if they are keeping a safe distance from each other by analyzing real-time video streams from both the cameras. The detector could highlight students whose temperature is above the normal body temperature and whose distance is below the minimum acceptable distance in red and draw a line between to emphasize this. The system will also be able to issue an alert to remind students to keep a safe distance if the protocol is violated.
We seek to develop a Social Distancing and Contact Mapping Alerting device which includes an AI-enabled smart camera and a thermal camera to facilitate personnel identification, body temperature measurement, contact tracing, and community mapping. Ultimately, this device can be shipped to the schools, set up remotely at the classrooms, hallways, dorms, etc. and with the help of our software and apps, remotely monitor the students as the schools reopen across the United States. Our Artificial Intelligence software in conjunction with a targeted community map within the surveilled area will identify which communities within the school campus need to be prioritized and use the data collected by our devices to report to health departments. What is unique in our remote monitoring kit is our smart camera, which uses computer vision, machine learning, and AI algorithms to identify if the students are maintaining the social distancing protocol. We seek to capitalize on smart devices such as smartphones that can be alerted by the AI system in the form of a text message or an automated phone call.
Deep learning and machine learning are two related but different forms of AI. Machine learning is a way of training an algorithm by feeding it huge amounts of data as a method of training it to adjust itself to improve its performance. Deep learning is a different, more complex form of machine-learning-based artificial neural networks (ANNs), which mimic the structure of the human brain. Each ANN has distinct layers (each layer picks out a specific feature such as curve or edge in an image) with connections to other neurons. The more layers, the deeper the learning. Aside from the different types of machine learning used for AI in video surveillance, there are also different avenues of deployment, including on the edge (i.e., the camera) or the backend (i.e., the server), and on the physical network or through the cloud. We deploy it on the camera here which monitors students in the classroom.
Machine learning in Artificial Intelligence has many supervised and unsupervised algorithms that use Distance Metrics to understand patterns in the input data. Choosing a good distance metric will improve how well a classification or clustering algorithm is performed. A distance metric employs distance functions that tell us the distance between the elements in the dataset. Manhattan Distance metric is the most appropriate distance metric for our model as we want to calculate the distance between two points in a grid-like path where every data point has a set of numerical Cartesian coordinates that specify uniquely that point. These coordinates are a signed distance from the point to two fixed perpendicular oriented lines.
Our system's methodology consists of three main steps namely Calibration, Detection, and Measurement to implement social distancing among students in a classroom, dorm, and hallway, or in any other social gatherings.
As the input video from the camera may be taken from an arbitrary perspective view, the first step of the pipeline is computing the transform that morphs the perspective view into a bird's-eye (top-down) view. We term this process calibration. As the input frames are monocular (taken from a single camera), the simplest calibration method involves selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view. This assumes that every person is standing on the same flat ground plane. From this mapping, we can derive a transformation that can be applied to the entire perspective image. This method, while well-known, can be tricky to apply correctly. As such, we have built a lightweight tool that enables even non-technical users to calibrate the system in real-time. During the calibration step, we also estimate the scale factor of the bird's eye view, e.g. how many pixels correspond to 6 feet in real life.
The second step of the pipeline involves applying a human detector to the perspective views to draw a bounding box around each student. For simplicity, we use an open-source human detection network based on the Faster R-CNN architecture. To clean up the output bounding boxes, we apply minimal post-processing such as non-max suppression (NMS) and various rule-based heuristics. we should choose rules that are grounded in real-life assumptions (such as humans being taller rather than they are wide), to minimize the risk of overfitting.
Given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view. Since the calibration step outputs a transformation for the ground plane, we apply said transformation to the bottom-center point of each person's bounding box, resulting in their position in the bird's eye view. The last step is to compute the bird's eye view distance between every pair of people and scale the distances by the scaling factor estimated from calibration. We highlight people whose distance is below the minimum acceptable distance in red and draw a line between to emphasize this.
This product has the following components—
The proposed system that uses AI cameras, hardware, and the software will work in an integrated form. An edge device is used to run multiple neural networks in parallel for applications like image classification, object detection, segmentation, and speech processing. All in an easy-to-use platform that runs in as little as 5 watts. An existing hardware infrastructure can be used which connects to the classroom camera and uses an edge device such as Jetson Nano or Google Coral Dev Board to monitor social distancing. A Smart Distancing, which is an open-source application can be used to quantify social distancing measures using edge computer vision systems. A Docker software must be installed on the device after which we can run this application on edge devices such as NVIDIA's Jetson Nano or Google's Coral Edge-TPU using Docker. It measures social distancing distances and gives proper notifications each time someone ignores social distancing rules. By generating and analyzing data, this solution outputs statistics about the communities that are at high risk of exposure to COVID-19 or any other contagious virus. Since all computation runs on the device, it requires minimal setup and minimizes privacy and security concerns. This model takes advantage of existing hardware infrastructure and state-of-the-art embedded edge devices, eliminating the need for IT Cloud infrastructure investment.