The present disclosure relates to a system and method for camera registration in three-dimensional (3-D) geometry model.
The surveillance and monitoring of a building, a facility, a campus, or other area can be accomplished via the placement of a variety of cameras throughout the building, the facility, the campus, or the area. However, in the current state of the art, it is difficult to determine the most efficient and economical uses of the camera resources at hand.
Development of geometry technology, high-fidelity three-dimensional (3-D) geometry models of building, such as Building Information Model (BIM) or Industry Foundation Classes (IFC), and its data are becoming more and more popular. Such technology makes it possible to display cameras and their coverage area in a 3-D geometry model (hereinafter used interchangeably with “3-D model” or “3-D building model”). Three-dimension (3-D) based solutions allow providing more intuitive and higher usability for a video surveillance application, compared to two-dimension (2-D) based solutions. This is because the 3-D based solutions, for example, better visualize occlusions by objects in field of view (FOV) of a camera, such as a surveillance camera installed in or around a building or any other area. Applicants have realized that with camera parameters, such as a location (or position) (x, y, z) or an orientation (pan, tilt, zoom), in a 3-D model, it is possible to furthermore enhance the situation awareness, for example, via integrating a real video from the surveillance camera into the 3-D model.
However, automatic registration of a camera in the 3-D model is a difficult task. For example, it is difficult to place, for a real camera installed in or around a physical building or other area, a virtual camera simulating the real camera in a 3-D model view (or scene). Although the camera parameters may be imported from a camera planning system, there is almost always some offset between the camera parameters as planned and recorded in the camera planning system and the camera parameters as currently installed in a relevant location. For example, the installation of the camera may not be as precise as originally planned or, in some cases, the parameters of the camera, position or orientation, may need to be changed later in order to provide a better surveillance view.
Consequently, if the camera parameters imported from the planning system are directly used to get an image (e.g., video) for a coverage area of the (real) surveillance camera and then to map the image (that captures an actual view or scene) in the 3-D model, the image will not display as precise a view or scene as originally intended by a user (e.g., administrator of a camera system), providing the user with inconsistent description of a real situation. This causes lower user awareness in many security-related applications, such as a surveillance camera system. Thus, Applicants have realized that it is beneficial to adjust the virtual camera from the initial position or orientation based on the camera planning system to a refined position or orientation that reflects the actual position or orientation of the real camera more closely. This allows, for example, providing better situation awareness of the coverage area of the real camera in a 3-D environment (hereinafter used interchangeably with “3-D virtual environment”) provided by the 3-D model.
Conventional camera parameters registration is not only complex but also inaccurate because each reference point needs to be manually specified from a relevant 2-D image (e.g., 2-D video). In other words, the conventional system or method requires a user (e.g., system administrator) to micro-adjust the parameters of the virtual camera to match the coverage area of the real camera substantially precisely. For example, each pixel in a 2-D video will represent all points from the camera along a line, so camera registration technology is often used. However, traditional camera registration requires an auxiliary accessorial device, such as a planar (2-D) checkerboard.
Even if it is acceptable to place the auxiliary device in the environment, how to get the device geometry data becomes another difficult problem. The camera registration with the video scene itself is called “the self camera registration problem” and is still in question in the art. Applicants have also realized that to solve these problems, it is beneficial to use semantic data from the 3-D model, such as a building information model (BIM) model, to automate camera registration of the virtual camera in the 3-D model.
Some embodiments described herein may comprise a system, apparatus and method of automating camera registration, and/or video integration in a 3D environment, using BIM semantic data and a real video. In various embodiments, the real image (e.g., video) may be imported from a real camera (e.g., surveillance camera) physically installed in or around a building or other outdoor area (e.g., park or street). Rough (or initial) camera parameters, such as a position or orientation, of the real camera may also be obtained. The rough (or initial) parameters may be provided by a user, such as a system administrator, or automatically imported from a relevant system, such as a camera planning system as described in the cross-referenced application entitled “SYSTEM AND METHOD FOR AUTOMATIC CAMERA PLACEMENT.”
In various embodiments, as illustrated in
Camera registration may then be performed, for example, using mapping information between the virtual image and the real image. To perform the mapping, feature points may be extracted from the virtual image and the real image, respectively, and then one or more of the extracted feature points may be matched to each other. Then, at least one pair of the matched points may be selected and marked as a matching pair.
In various embodiments, one or more points (or vertices) associated with at least one of the features in the virtual image may be mapped to a corresponding point (or vertex) associated with a matching feature in the real image, detecting one or more pairs of matching points. The mapping may be performed manually (i.e., as a function of one or more user inputs), automatically or by a combination thereof (e.g., heuristic basis).
In various embodiments, semantic information in a BIM model may be used to extract the features. For example, for a door in the field of view, the entire boundary of the door in the virtual image may be detected concurrently instead of detecting each edge at a time, using relevant semantic BIM data. Similarly, for a column, parallel lines may be detected concurrently instead of detecting its edges one by one. These geometric features may be automatically presented in the virtual image. The matching pair features in the real image from the real camera may be automatically selected using the semantic features.
Various algorithms may be used to match features between a virtual image and a corresponding real image. In various embodiments, as described in
As illustrated in
As illustrated in
Then, as illustrated in
Camera calibration may be further performed using the pairs of matching vertices (or points), computing refined (or substantially precise) camera parameters for the virtual camera as a function of a camera registration algorithm. Using the camera calibration process, the position and orientation of the virtual camera placed in the 3D environment that are the same as the position and orientation of the corresponding camera in the real world may be calculated.
Once the mapping process described above is completed, then two groups of matching points in the virtual image and the real image may be obtained:
In various embodiments, the following procedures and formulas may be used to perform the camera calibration by the corresponding points in the real image and 3D environment:
Step1: find a 3×4 matrix M, which satisfies Pri=MP3di(i=1, 2 . . . n)
With
We can get:
m
11
X+m
12
Y+m
13
Z+m
14
−m
31
uX−m
32
uY−m
33
uZ−m
34
u=0
m
21
X+m
22
Y+m
23
Z+m
24
−m
31
vX−m
32
vY−m
33
vZ−m
34
v=0
Then, the following equation can be computed: AL=0, with A is a 2n*12 matrix, wherein:
Now, the proper L which minimizes ∥AL∥ may be calculated.
With the constraint of m34=1, we can get L′=−(CTC)−1CTB, wherein:
L′=[l1, l2 . . . l11];
C=[a1, a2 . . . a11];
B=[a12].
Further, M can be calculated by L.
Step2: abstract the parameter matrix from M
For M=KR[I3|−C], Left 3×3 sub matrix P of M is of form P=K R, wherein:
Now, the orientation parameter of the camera can be computed by R and the interior parameter of the camera can be computed by K.
For KRC=−m4; C can also be computed and the position of the camera can be obtained.
As illustrated in
As illustrated in
In various embodiments, technology of shadow mapping may be used to integrate the updated image (e.g., surveillance video) into the 3-D environment. Shadow mapping is one of the popular methods for computing shadows. Shadow mapping is mainly based on 3-D rendering in pipe lined fashion of 3D rendering. In one example embodiment, shadow mapping may comprise two passes, as follows:
As illustrated in
Shadow mapping may be applied to a coverage area of a camera. In order to project a predefined texture onto a scene (a window view of an application to display the 3-D model in the 3-D environment) to show the effect of the coverage area, a process of projecting coverage texture onto the scene may be added to the above two passes, and the light for the shadow mapping may be defined as the camera.
In various embodiments, the second pass may be modified. For each pixel rendered in the second stage, if a pixel can be illuminated from the light position (that is, the pixel can be seen from the camera), the color of the pixel may be blended with the color of the coverage texture projected from the camera position based on the projection transform of the camera. Otherwise, the original color of the pixel may be preserved. The flow of implementation of display of the coverage area is illustrated in
However, in some instances, as illustrated in
In various embodiments, for example, video distortion for rectangle ABCD (as illustrated in
Distortion of the video may be computed as follows:
xA=PXλ
D=∥α
angle
D
angle, αratioDratio, αrotationDrotatoin∥
Then, the perspective constraint may be forced as follows: wherein QD is the failure threshold of mapping video in the 3D scene, according to the perspective of the current user, if D is greater than QD (i.e., D>QD), the display of video will be removed for serious distortion; otherwise, the video will be mapped to the 3D scene to enhance the situation awareness of the user. The flow of implementation of display of the coverage area is illustrated in
Camera drift of the real camera may be further detected in substantially real time based on discrepancy detected as a result of comparing the feature points. Once detected, a notification for the camera drift may be sent to the user and/or the real camera may be automatically adjusted using the above described camera registration methods.
Various embodiments described herein may comprise a system, apparatus and method of automating camera registration, and/or video integration in a 3D environment, using BIM semantic data and a real video. In the following description, numerous examples having example-specific details are set forth to provide an understanding of example embodiments. It will be evident, however, to one of ordinary skill in the art, after reading this disclosure, that the present examples may be practiced without these example-specific details, and/or with different combinations of the details than are given here. Thus, specific embodiments are given for the purpose of simplified explanation, and not limitation. Some example embodiments that incorporate these mechanisms will now be described in more detail.
The camera registration server 120 may comprise one or more central processing units (CPUs) 122, one or more memories 124, a user interface (I/F) module 130, a camera registration module 132, a rendering module 134, one or more user input devices 136, and one or more displays 140.
The camera planning server 160 may be operatively coupled with one or more cameras 162, such as surveillance cameras installed in a building or other outdoor area (e.g., street or park, etc.). The camera planning server 160 may store extrinsic parameters 166 for at least one of the one or more cameras 162 as registered at the time the at least one camera 162 is physically installed in the building or other outdoor area. Also, the camera planning server 160 may receive one or more real images 164, such as surveillance images, from a corresponding one of the one or more cameras 162 in real time and then present the received images to a user (e.g., administrator) via its one or more display devices 140 or provide the received images to another system, such as the camera registration server 120, for further processing. In one example embodiment, the camera planning server 160 may store the received image in its associated one or more memories 124 for later use.
The BIM server 170 may store BIM data 174 for a corresponding one of the building or other outdoor area. In one example embodiment, the BIM server 170 may be operatively coupled with a BIM database 172 locally or remotely, via the network 150 or other network (not shown in
In various embodiments, the camera registration server 120 may comprise one or more processors, such as the one or more CPUs 122, to operate the camera registration module 132. The camera registration module 132 may be configured to receive a real image 164 of a coverage area of a surveillance camera. The coverage area may correspond to at least one portion of a surveillance area. The camera registration module 132 may receive BIM data 174 associated with the coverage area. The camera registration module 132 may generate a virtual image based on the BIM data 174, for example, using the rendering module 134. The virtual image may include at least one three-dimensional (3-D) image that substantially corresponds to the real image 164. The camera registration module 132 may map the virtual image with the real image 164. Then, the camera registration module 132 may register the surveillance camera in a BIM coordination system using an outcome of the mapping.
In various embodiments, the camera registration module 132 may be configured to generate the virtual image based on initial extrinsic parameters 166 of the surveillance camera. The initial extrinsic parameters 166 may be parameters used at the time the surveillance camera is installed in a relevant building or area. In one example embodiment, the initial extrinsic parameters 166 may be received as one or more user inputs 138 from a user (e.g., administrator) of the camera registration server 120 via one or more of the input devices 136. In yet another example embodiment, the initial extrinsic parameters 166 may be imported from the camera planning server 160 or a camera installation system (not shown in
In various embodiments, for example, to perform the mapping between the virtual image and the real image 164, the camera registration module 132 may be configured to match a plurality of pairs of points on the virtual image and the real image 164, calculate at least one geometry coordination for a corresponding one of the points on the virtual image, and calculate refined extrinsic parameters (not shown in
In various embodiments, each of the plurality of points may comprise a vertex associated with a geometric feature extracted from a corresponding one of the virtual image or the real image 164. In one example embodiment, the geometric feature associated with the virtual image may be driven using semantic information of the BIM data 174. For example, the geometric feature may comprise a shape or at least one portion of a boundary line of an object or a building structure (e.g., door, desk or wall, etc.) viewed in a corresponding one of the virtual image or the real image 164.
In various embodiments, for example, during the matching between the virtual image and the real image 164, the camera registration module 132 may be configured to mark at least one of the plurality of pairs as matching a function of the user input 138 received from the user (e.g., administrator), for example, via one or more of the input devices 136. The camera registration module 132 may further be configured to remove at least one pair of points from a group of automatically suggested pairs of points as a function of a corresponding user input.
In various embodiments, the camera registration module 132 may be configured to display at least a portion of the mapping process via a display unit, such as the one or more displays 140.
In various embodiments, for example, to perform the registering of the surveillance camera in the BIM coordination system, the camera registration module 132 may be configured to calculate refined extrinsic parameters of the surveillance camera using the outcome of the mapping. For example, camera registration equation as described earlier may be used to calculate the refined extrinsic parameters. The refined extrinsic parameters may include information indicating a current location and a current orientation of the surveillance camera in the BIM coordination system.
In various embodiments, the registration module 132 may be configured to present, via a display unit, such as the one or more displays 140, the coverage area in three dimensional (3-D) graphics using the refined extrinsic parameter. The registration module 132 may be further configured to highlight the coverage areas as distinguished from non-highlighted portion of the 3-D graphics displayed via the display 140, for example, using a different color or texture or a combination thereof, etc.
In various embodiments, the camera registration module 132 may be further configured to project updated real image 168, such as updated surveillance video, on a portion of the coverage area displayed via the display 140. The updated real image 168 may be obtained directly from the surveillance camera in real time or via a camera management system, such as the camera planning server 160.
In various embodiments, the camera registration module 132 may be configured to inhibit display of at least one portion of the updated real image 168 based on a constraint on a user perspective. The camera registration module 132 may be configured to determine the user perspective using the refined extrinsic parameters. In one example embodiment, the user perspective may comprise a cone shape or other similar shape.
In various embodiments, the camera registration module 132 may be configured to use the rendering module 134 to render any graphical information via the one or more displays 140. For example, the camera registration module 132 may be configured to control the rendering module 134 to render at least one portion of the virtual image, real image 164, updated real image 168, or the mapping process between the virtual image and the real image 164, etc. Also, the camera registration module 132 may be configured to store at least one portion of images from the one or more cameras 162, the BIM data 174, or the virtual image generated using the BIM data 174 in a memory device, such as the memory 124.
In various embodiments, the camera registration module 132 may be further configured to detect a camera drift of a corresponding one of the one or more cameras 162 using the refined extrinsic parameters (not shown in
In various embodiments, the camera registration module 132 may be configured to determine refined extrinsic parameters of a corresponding one of the one or more cameras 162 periodically. In one example embodiment, for a non-initial iteration (i.e., 2nd, 3rd . . . Nth) of determining the refined extrinsic parameters, the camera registration module 132 may be configured to use the refined extrinsic parameters determined for a previous iteration (e.g., 1st iteration) as new initial extrinsic parameters 166. The refined extrinsic parameters 166 calculated for each iteration may be stored in a relevant memory, such as the one or more memories 124, for later use.
Each of the modules described above in
A computer-implemented method 200 that can be executed by one or more processors may begin at block 205 with receiving a real image for a coverage area of a surveillance camera. The coverage area may correspond to at least one portion of a surveillance area. At block 210, Building Information Model (BIM) data associated with the coverage area may be received. At block 215, a virtual image may be generated using the BIM data. The virtual image may include at least one three-dimensional (3-D) image substantially corresponding to the real image. At block 220, the virtual image may be mapped with the real image. Then, at block 240, the surveillance camera may be registered in a BIM coordination system using an outcome of the mapping.
In various embodiments, the mapping of the virtual image with the real image may comprise matching a plurality of pairs of points on the virtual image and the real image, calculating at least one geometry coordination for a corresponding one of the points on the virtual image, and calculating refined extrinsic parameters for the surveillance camera using the at least one geometry coordination, as depicted at blocks 225, 230 and 235, respectively.
In various embodiments, at block 245, the computer-implemented method 200 may further present, via a display unit, such as the one or more displays 140 in
Although only some activities are described with respect to
The methods described herein do not have to be executed in the order described, or in any particular order. Moreover, various activities described with respect to the methods identified herein can be executed in repetitive, serial, heuristic, or parallel fashion. The individual activities of the method 200 shown in
The method 200 shown in
For example,
One of ordinary skill in the art will further understand the various programming languages that may be employed to create one or more software programs designed to implement and perform the methods disclosed herein. The programs may be structured in an object-oriented format using an object-oriented language such as Java or C++. Alternatively, the programs can be structured in a procedure-oriented format using a procedural language, such as assembly or C. The software components may communicate using any of a number of mechanisms well known to those of ordinary skill in the art, such as application program interfaces or interprocess communication techniques, including remote procedure calls. The teachings of various embodiments are not limited to any particular programming language or environment. Thus, other embodiments may be realized.
For example, an article 300 of manufacture, such as a computer, a memory system, a magnetic or optical disk, some other storage device, and/or any type of electronic device or system may include one or more processors 304 coupled to a machine-readable medium 308 such as a memory (e.g., removable storage media, as well as any memory including an electrical, optical, or electromagnetic conductor) having instructions 312 stored thereon (e.g., computer program instructions), which when executed by the one or more processors 304 result in the machine 302 performing any of the actions described with respect to the methods above.
The machine 302 may take the form of a specific computer system having a processor 304 coupled to a number of components directly, and/or using a bus 316. Thus, the machine 302 may be similar to or identical to the apparatus 102 or system 100 shown in
Returning to
A network interface device 340 to couple the processor 304 and other components to a network 344 may also be coupled to the bus 316. The instructions 312 may be transmitted or received over the network 344 via the network interface device 340 utilizing any one of a number of well-known transfer protocols (e.g., HyperText Transfer Protocol and/or Transmission Control Protocol). Any of these elements. coupled to the bus 316 may be absent, present singly, or present in plural numbers, depending on the specific embodiment to be realized.
The processor 304, the memories 320, 324, and the mass storage 306 may each include instructions 312 which, when executed, cause the machine 302 to perform any one or more of the methods described herein. In some embodiments, the machine 302 operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked environment, the machine 302 may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
The machine 302 may comprise a personal computer (PC), a tablet PC, a set-top box (STB), a PDA, a cellular telephone, a web appliance, a network router, switch or bridge, server, client, or any specific machine capable of executing a set of instructions (sequential or otherwise) that direct actions to be taken by that machine to implement the methods and functions described herein. Further, while only a single machine 302 is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
While the machine-readable medium 308 is shown as a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers, and/or a variety of storage media, such as the registers of the processor 304, memories 320, 324, and the mass storage 306 that store the one or more sets of instructions 312). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine 302 and that cause the machine 302 to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The terms “machine-readable medium” or “computer-readable medium” shall accordingly be taken to include tangible media, such as solid-state memories and optical and magnetic media.
Various embodiments may be implemented as a stand-alone application (e.g., without any network capabilities), a client-server application or a peer-to-peer (or distributed) application. Embodiments may also, for example, be deployed by Software-as-a-Service (SaaS), an Application Service Provider (ASP), or utility computing providers, in addition to being sold or licensed via traditional channels.
Embodiments of the invention can be implemented in a variety of architectural platforms, operating and server systems, devices, systems, or applications. Any particular architectural layout or implementation presented herein is thus provided for purposes of illustration and comprehension only, and is not intended to limit the various embodiments.
The Abstract of the Disclosure is provided to comply with 37 C.F.R. §1.72(b) and will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.
In this Detailed Description of various embodiments, a number of features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as an implication that the claimed embodiments have more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
The present application is also related to U.S. Non-Provisional patent application Ser. No. 13/150,965 entitled “SYSTEM AND METHOD FOR AUTOMATIC CAMERA PLACEMENT” that was filed on the date of Jun. 1, 2011, the contents of which are incorporated by reference in their entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2011/000983 | 6/14/2011 | WO | 00 | 2/28/2014 |