The present invention relates to a video distribution technique by an image capturing apparatus that includes two or more image capturing units.
In recent years, among network cameras used for monitoring purposes, models capable of capturing images at night and/or under adverse conditions, such as rain and snow, using infra-red light have been on the increase. Many network cameras are used for security purposes, and among these network cameras, a model including both an infra-red light camera and a visible light camera exists.
An infra-red light camera causes a dedicated sensor to sense infra-red light emitted from an object and performs image processing on the sensed data of the infra-red light, thereby generating a video that can be visually confirmed. The infra-red light camera has the following advantages. The infra-red light camera does not require a light source and is less likely to be influenced by rain or fog. Furthermore, the infra-red light camera is suitable for long-distance monitoring. On the other hand, the infra-red light camera also has the disadvantage that the infra-red light camera has lower resolution than a general visible light camera, and therefore is not suitable for capturing a color and a design such as a character.
Recently, a technique for generating a video by clipping the shape of an object sensed by an infra-red light camera and combining the clipped shape with a visible light video has been used.
However, in a case where there are a plurality of types of video data to be transmitted by a twin-lens network camera as described above, the transmission band may be strained by transmitting both an infra-red video and a visible video. Thus, Japanese Patent No. 6168024 discusses a method for combining an infra-red video with a portion of a visible video where contrast is low, and distributing the combined video.
It may be, however, difficult for a user to determine which of an infra-red light video, a visible light video, and a combined video is more desirable for use in monitoring, because the user needs to make the determination depending on the image capturing situation that varies. The method discussed in Japanese Patent No. 6168024 cannot assist a user in determining a video desirable for use in monitoring.
According to an aspect of the present invention, an image capturing apparatus including an infra-red light capturing unit and a visible light capturing unit includes a detection unit configured to detect an object from at least one of a first image obtained by the infra-red light capturing unit and a second image obtained by the visible light capturing unit, a combining unit configured to generate a combined image based on the first and second images, and an output unit configured to, based on a result of the detection by the detection unit, output at least one of the first image, the second image, and the combined image to a client apparatus via a network. The detection unit includes a first detection unit configured to detect an object from the first image obtained by the infra-red light capturing unit, and a second detection unit configured to detect an object from the second image obtained by the visible light capturing unit.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
With reference to the drawings, a first exemplary embodiment is described below.
In
An interface (IF) 204 communicates with the network camera 100 via the network 120 according to a protocol such as the Transmission Control Protocol/Internet Protocol (TCP/IP), the Hypertext Transfer Protocol (HTTP), or the ONVIF protocol. The IF 204 receives video data, metadata of detected object information, and the above responses from the network camera 100 and transmits the above various commands to the network camera 100.
A display apparatus 205 is a display device such as a display for displaying a video according to video data. The housing of the client apparatus 110 may be integrated with the display apparatus 205. A user interface (UI) 206 is an input apparatus such as a keyboard and a mouse, or may be a joystick or a voice input apparatus.
As the client apparatus 110, a general personal computer (PC) can be used. By the CPU 201 reading a program code stored in the HDD 202 and executing the read program, the client apparatus 110 can provide a graphical user interface (GUI) for setting the function of detecting an object. The present exemplary embodiment is described on the assumption that the CPU 201 performs processing. Alternatively, at least a part of the processing of the CPU 201 may be performed by dedicated hardware. For example, the process of displaying a GUI and video data on the display apparatus 205 may be performed by a graphics processing unit (GPU). The process of reading a program code from the HDD 202 and loading the read program code into the RAM 203 may be performed by direct memory access (DMA) that functions as a transfer device.
Next, the hardware configuration of the network camera 100 is described. A CPU 210 is a central processing unit for performing overall control of the network camera 100. A read-only memory (ROM) 211 stores a program for the CPU 210 to control the network camera 100. The network camera 100 may include a secondary storage device equivalent to the HDD 202 in addition to the ROM 211. A RAM 212 is a memory into which the CPU 210 loads the program read from the ROM 211 and in which the CPU 210 executes processing. Further, the RAM 212 as a primary storage memory is also used as a storage area for temporarily storing, in the network camera 100, data on which various processes are to be performed.
An IF 213 communicates with the client apparatus 110 via the network 120 according to a protocol such as the TCP/IP, the HTTP, or the ONVIF protocol. The IF 213 transmits video data, metadata of a detected object, or the above responses to the client apparatus 110 or receives the above various commands from the client apparatus 110.
An image capturing device 214 is an image capturing device such as a video camera for capturing a live video as a moving image or a still image. The housing of the network camera 100 may be integrated with or separate from the housing of the image capturing device 214.
Next, with reference to
A visible light image capturing unit 301 includes an image capturing unit 3011, which includes a lens and an image sensor, an image processing unit 3012, a face detection unit 3013, and a pattern detection unit 3014. The visible light image capturing unit 301 captures an image of a subject and performs various types of image processing and detection processes.
The image processing unit 3012 performs image processing necessary to perform a detection process at a subsequent stage, on an image signal captured by the image capturing unit 3011, thereby generating image data (also referred to as a “visible light image” or a “visible light video”). For example, in a case where matching is performed based on a shape characteristic in the detection process at the subsequent stage, the image processing unit 3012 performs a binarization process or performs the process of extracting an edge in the subject. Further, in a case where detection is performed based on a color characteristic in the detection process at the subsequent stage, the image processing unit 3012 performs color correction based on the color temperature of a light source or the tint of a lens estimated in advance or performs a dodging process for backlight correction or blurring correction. Further, in a case where the image processing unit 3012 performs a histogram process based on the luminance component of the captured image signal, and the captured image includes portions overexposed or underexposed, the image processing unit 3012 may perform high-dynamic-range (HDR) imaging in conjunction with the image capturing unit 3011. As the HDR imaging, a general technique for combining a plurality of images captured by changing the exposure of the image capturing unit 3011 can be used.
The face detection unit 3013 analyzes the image data sent from the image processing unit 3012 and determines whether a portion that can be recognized as a person's face is present in an object in the video. “Face detection” refers to the process of extracting any portion from an image and checking (matching) the extracted portion image with a pattern image representing a characteristic portion forming the person's face, thereby determining whether a face is present in the image. Examples of the characteristic portion include the relative positions between the eyes and the nose, and the shapes of the cheekbones and the chin. Further, a pattern characteristic (e.g., the relative positions between the eyes and the nose, and the shapes of the cheekbones and the chin) may be held instead of the pattern image and compared with a characteristic extracted from the portion image, thereby matching the portion image with the pattern characteristic.
The pattern detection unit 3014 analyzes the image data sent from the image processing unit 3012 and determines whether a portion where a pattern such as a color or character information can be recognized is present in an object in the video. “Pattern detection” refers to the process of extracting any portion in an image and comparing the extracted portion with a reference image (or a reference characteristic) such as a particular character or mark, thereby determining whether the extracted portion matches the reference image. To take maritime surveillance and border surveillance as examples, examples of the reference image include characters written on the body of a detected object and the color or the design of the displayed national flag.
An infra-red light capturing unit 302 includes an image capturing unit 3021, which includes a lens and an image sensor, an image processing unit 3022, and an object detection unit 3023. The infra-red light capturing unit 302 captures an image of a subject and performs necessary image processing and a detection process.
The image processing unit 3022 performs signal processing for converting a signal captured by the image capturing unit 3021 into an image that can be visually recognized, thereby generating image data (an infra-red light image or an infra-red light video).
The object detection unit 3023 analyzes the image data sent from the image processing unit 3022 and determines whether an object different from the background is present in the video. For example, the object detection unit 3023 references as a background image an image captured in the situation where no object appears. Then, based on the difference between the background image and the captured image on which the detection process is to be performed, the object detection unit 3023 extracts as the foreground a portion where the difference is greater than a predetermined threshold and the difference region is equal to or greater than a predetermined size. Further, in a case where the circumscribed rectangle of the difference region has an aspect ratio corresponding to a person, a vehicle, or a vessel, the object detection unit 3023 may sense the type of the object. Further, the object detection unit 3023 may execute frame subtraction together with background subtraction to enable distinction between a moving object and a still object. If a region sensed by the background subtraction includes a predetermined proportion or more of a difference region obtained by the frame subtraction, the region is distinguished as a moving object. If not, the region is distinguished as a still object.
A network video processing unit 303 includes a video determination unit 3031, which determines video data to be distributed, a combining processing unit 3032, which performs the process of combining the infra-red light video with the visible light video, and an encoder 3033, which performs a video compression process for distribution of the video data to the network 120.
The combining processing unit 3032 generates combined image data (a combined image or a combined video) using the video determination unit 3031. For example, if it is determined that the visible light video has poor visibility, the combining processing unit 3032 performs a combining process in which the details (the shape and the texture) about the object detected in the infra-red light video are clipped and the clipped details are superimposed on a corresponding position in the visible light video. The details of the determination process performed by the video determination unit 3031 will be described below. Examples of techniques used for the combining process by the combining processing unit 3032 include a technique for combining the visible light video with the infra-red light video by superimposing, on a portion of the visible light video where contrast is low, an image at the same position in the infra-red video, and a technique for combining the visible light video with the infra-red light video by superimposing the foreground of the infra-red video on the background image of the visible light video. Alpha blending may also be used so long as the visible light video and the infra-red video can be combined together such that the background of the visible light video and the foreground of the infra-red video are emphasized.
The encoder 3033 performs the process of compressing the video data determined by the video determination unit 3031 and transmits the video data to the network 120 via the IF 213. As the method for compressing the video data, an existing compression method such as Joint Photographic Experts Group (JPEG), Moving Picture Experts Group phase 4 (MPEG-4), H.264, or High Efficiency Video Coding (HEVC) may be used.
Each of the visible light image capturing unit 301 and the infra-red light capturing unit 302 in
Next, with reference to
If an object is not detected in step S402 (No in step S402), then in step S408, the video determination unit 3031 determines the infra-red light video as the distribution video. This is because it is desirable to use the infra-red light video for monitoring in priority to other videos for the following reasons. As the properties of the infra-red light video, the sensing accuracy of the infra-red light video in the visible light video obtained at night or in bad weather is less likely to decrease even under adverse conditions. Further, an object at a long distance can be sensed in the infra-red light video, compared to the visible light video.
If, on the other hand, an object is detected in step S402 (Yes in step S402), then in step S403, the video determination unit 3031 acquires a face detection result from the face detection unit 3013 and acquires a pattern detection result from the pattern detection unit 3014. Then, based on the acquired detection results, in step S404, the video determination unit 3031 determines whether a face is sensed. Further, in step S405, the video determination unit 3031 determines whether a pattern is sensed.
If a face is detected in step S404 (Yes in step S404), or if a pattern is detected in step S405 (Yes in step S405), the processing proceeds to step S407. In step S407, the video determination unit 3031 determines the visible light video as the distribution video. This is because a video in which a face can be detected is distributed to the client apparatus 110, and thereby can be used in a face authentication process by the client apparatus 110, or a video in which a pattern can be detected is distributed to the client apparatus 110, whereby the object can be identified using a more vast dictionary by the client apparatus 110.
If, on the other hand, a face is not detected in step S404 (No in step S404), and if a pattern is not detected in step S405 (No in step S405), then in step S406, the video determination unit 3031 determines the combined video as the distribution video. This is because a background portion that can be visually recognized in the visible light video and the position of the object can be confirmed together. When a user references the distribution video displayed on the display apparatus 205 to actually visually confirm the object, the combined video obtained by combining the visible light video and the infra-red video such that the background of the visible light video and the foreground of the infra-red video are emphasized is advantageous for monitoring purposes.
As described above, according to the present exemplary embodiment, a video type suitable for monitoring is determined based on the result of the detection of an object and transmitted to the client apparatus 110, so that the user does not need to determine and switch to the video type desirable for monitoring, which leads to improvement of convenience. Further, control can be performed so that video data undesirable for monitoring is not distributed. Thus, it is possible to perform efficient monitoring.
Further, there is a case where a network camera can transmit only a single video among a plurality of types in the first place, depending on the installation location. This case corresponds to, for example, a network camera installed deep in the mountains or near a coastal line where there is no building or street light around the network camera. In such a location, an infrastructure for transmitting a video is not put in place, so that a sufficient transmission band cannot often be secured. However, in a case where only one of the infra-red light video and the visible light video can be transmitted and the infra-red light video is always distributed, a face authentication function or an object specifying function cannot be achieved in good image capturing conditions. Further, if the visible light video is always distributed, an object cannot be detected in adverse image capturing conditions. According to the above exemplary embodiment, a video suitable for monitoring that is less likely to be influenced by weather conditions can be distributed even in an installation location where a large amount of data cannot be transferred.
Further, there is a case where, even if it is detected that an object is present in the infra-red light video, it is difficult to determine whether the infra-red light video should be switched to the visible light video. Further, generally, since the visible light video often has higher resolution and lower compression efficiency than the infra-red light video, the amount of data of the visible light video to be transmitted via a network tends to be large. If any effects of the monitoring cannot be expected, thus, it may be desirable that the infra-red light video should not be switched to the visible light video in terms of the amount of data transfer.
In such a case, machine learning may be applied to an object determination process, and the type of an object may be determined based on a characteristic such as the shape or the size. Then, only if an object at a certain detection level or higher is identified, the infra-red light video may be switched to the visible light video. The “detection level” indicates the degree at which an object should be monitored.
Further, “machine learning” refers to an algorithm for performing recursive learning from particular sample data, finding a characteristic hidden in the particular sample data, and applying the learning result to new data, thereby enabling the prediction of the future according to the found characteristic. An existing algorithm such as TensorFlow, TensorFlow Lite, or Caffe2 may be used. In the following description, components or steps having functions similar to those in
With reference to
With reference to
The machine learning processing unit 5041 prepares in advance data obtained by learning the characteristics of objects and vessels to be sensed at sea and performs a machine learning process on a video input from the visible light image capturing unit 301 or the infra-red light capturing unit 302.
Based on the result of the determination by the machine learning processing unit 5041, the detection level determination unit 5042 determines the detection level.
Next, with reference to
First, in step S801, the video determination unit 3031 acquires the detection level from the machine learning unit 504.
If the detection level is 2 or lower (Yes in step S802), then in step S408, the video determination unit 3031 determines the infra-red light video as the distribution video. This is because, if the detection level is 2 or lower, the object is not identified as a vessel, and therefore, it is not necessary to distribute the visible light video, which has a large amount of data. Next, if the detection level is 3 or higher (No in step S802), then in step S403, the video determination unit 3031 acquires a detection result from the face detection unit 3013 and also acquires a pattern detection result from the pattern detection unit 3014.
As the detection results, if a face is detected (Yes in step S404), or if a pattern is detected (Yes in step S405), then in step S407, the video determination unit 3031 determines the visible light video as the distribution video. If a face is not detected (No in step S404), and if a pattern is not detected (No in step S405), then in step S406, the video determination unit 3031 determines the combined video as the distribution video.
As described above, according to the configuration in
Further, as illustrated in
Based on the position coordinates and the object size of an acquired object number, the encoder 3033 sets a rectangular region and performs the process of reducing the bit rate of a portion outside the rectangular region. Further, using the video determination unit 3031, the encoder 3033 may perform a high compression process on a video of a type other than a distribution target and distribute the video of the type other than the distribution target at a low bit rate together with a video of a type as the distribution target. The above description has been given using the face detection unit 3013 as an example. Alternatively, the function of detecting a human body (the upper body, the whole body, or a part of the body) may be used.
In the above description, an example has been described where the distribution video is determined within the network camera 100. Alternatively, the network camera 100 may transmit the infra-red light capturing video and the visible light capturing video to the client apparatus 110 connected to the network camera 100, and the client apparatus 110 may select a video to be output.
In this case, the CPU 201 of the client apparatus 110 may execute a predetermined program, thereby functioning as the video determination unit 3031 and the combining processing unit 3032.
Further, the face detection unit 3013, the pattern detection unit 3014, and the object detection unit 3023 may also be achieved by the CPU 201 of the client apparatus 110. Further, a configuration may be employed in which the machine learning unit 504 may be achieved by the CPU 201 of the client apparatus 110.
Further, the client apparatus 110 may display only a video of the type selected by the video determination unit 3031 on the display apparatus 205, or may emphasize the video of the type selected by the video determination unit 3031 or cause the video to pop up when a plurality of types of videos are displayed. In the specification, “detection” and “sensing” have the same meaning and mean finding something by examination.
Further, the present invention can be achieved also by performing the following process. This is the process of supplying software (a program) for achieving the functions of the above exemplary embodiment to a system or an apparatus via a network or various recording media, and of causing a computer (or a CPU or a microprocessor unit (MPU)) of the system or the apparatus to read the program and execute the read program.
Based on the image capturing state of a video captured by the camera, it is possible to facilitate the determination of a video suitable for monitoring use, from among an infra-red light video, a visible light video, and a combined video.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2017-251719, filed Dec. 27, 2017, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2017-251719 | Dec 2017 | JP | national |