This invention relates generally to climate control units, and more particularly to controlling air conditioner (AC) units according to the locations of objects (people) in an environment using a camera.
In the prior art, various techniques have been used to improve the performance of climate control units, such as an air conditioner (AC) or heating units.
3D sensors have been used to obtain 3D location information. 2D cameras have also been used, but not for estimating 3D locations. 2D sensors other than cameras, such as motion sensors, have not been used to obtain 3D locations.
U.S. Pat. No. 6,645,066, “Space-Conditioner Control Employing image-Based Detection of Occupancy and Use,” uses a conventional 2D camera to detect an occupancy rate, an occupant activity rate, and an occupant activity class. That system only counts people, but does not determine the locations of the people in the environment.
U.S. Pat. App. Pub. No. US 200910193, “Person Location Detection Apparatus and Air Conditioner,” uses a time-of-flight (TOF) 3D sensor to determine 3D locations of people in an environment. The publication describes a TOF sensor, and provides a method for detecting a person at a location given a time sequence of depth maps to control an AC unit.
In U.S. Pat. No. 5,634,846, “Object Detector for Air Conditioner,” motion detection is performed with an infrared (IR) sensor with a Fresnel lens. The system detects the amount of motion in different zones in a field of view of the sensor, which provides very rough information about the 2D locations of people.
Jap. Pat. JP02197747 uses a thermal IR camera to detect people and determine their 2D locations to control an air flow from an AC unit. 3D locations are not described.
The embodiments of the invention provide a method and system for controlling climate control units, such as air conditioner (AC) or heating units. The method takes input from a 2D monocular camera to determine 3D locations of objects in an environment to be climatically controlled.
As an advantage, a 2D monocular camera is inexpensive, when compared with 3D sensors, has better resolution than other types of 2D sensors, and can have a relatively high frame rate to enable real-time object tracking.
The embodiments can not only count objects, but also locate and track the objects.
Using a time-of-flight (TOF) sensor or other 3D sensors makes location determination simpler, but such sensors are generally more expensive than a 2D monocular camera.
Instead of obtaining rough 2D locations, we perform accurate 3D tracking.
As shown in
The environment can include a set of objects 102, e.g., people, animals, perishable goods, etc. The objects can move. The set can be the null set, i.e., there are no objects in the environment.
Output of the camera is connected to a processor 120. The output can be in a form of a sequence of one or more images (such as video frames). A control signal is fed back to one or more climate control units 130. The signal is dependent on the location of the objects in the environment. In some embodiments, the camera is incorporated into the climate control unit(s) 130.
As shown in
If the environment includes multiple climate control units, for example, in an office space in which warm or cold air can be directed at every desk, then the local environments can be individually controlled.
As shown in
The background model is a mixture of one or more Gaussian distributions per pixel that estimates the distribution of background intensities for each pixel in the image. The intensities are represented in a color space, such as grayscale values, rgb color values, or near-infrared intensities.
A foreground model 211 is also constructed 220 for each person in the environment during operation of the system from a sequence of images 202.
Each foreground models can be a histogram in a color space of all pixels in the foreground region, or it can be a mixture of Gaussian distributions for all pixels in the foreground region.
Alternatively, the foreground model can be a template. A template is typically a region of an image that covers the foreground object.
Pixels associated with foreground objects such as people will have a low probability of being classified as background, because they do not correspond to the background model.
The models are used to identify 230 regions of pixels that are likely to be associated with people in images 202. The models can be updated dynamically as the images are acquired. Updating the background and foreground models as new images are acquired can improve the accuracy of the system when there are changes in the appearance of the background or foreground due to factors such as changes in lighting, moving furniture, and changes in a person's pose.
The 2D location of each person is tracked 240 using the background and foreground models. The sequence of locations of a person over time is called a track 241. The track is used to estimate the location of the person in a next image.
Using the 2D location of a person and other information inferred from the image sequence, the depth of the person is determined 250, which enables the person's 3D location 251 to be estimated.
In one embodiment, inferences can be determined by a head detector, or head and shoulders detector. The inferences can be used to verify whether a tracked object is a person and also to determine the 2D location and 2D size of the head. By assuming that 3D head sizes of people are substantially similar, the depth may be determined 250 from the 2D size of the head. Combining the estimate of the depth (i.e., distance from the camera) with the 2D location information yields the estimated 3D location 251 of the person. The number and 3D locations of people in the environment is then used to improve the control of the climate control unit(s).
In an alternative embodiment, to find the depth of each person, a 3D ground plane of the environment is automatically estimated from one or more images in the sequence 202. The 2D location of the person's feet is estimated from the track. The 2D location of a point on the ground plane is sufficient to determine the distance to the camera, and thus by assuming that the person's feet are located on the ground plane, we obtain the depth 250 and hence the 3D location 251.
Other object shape characteristics for known objects can also be used to determine the depth. For example, the shape can be represented by a bounding box, and the depth can be estimated from a size of the bounding box.
The 3D location is processed by a controller 260, which can be part of the processor, to generate control signals for the unit(s) 130.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.