The present invention relates to an apparatus, a system and a method of information processing, and a storage medium.
In video image production, an automatic tracking control technique automatically performs pan, tilt, and zoom (PTZ) control for an imaging apparatus to place a specific subject at a desired position within the imaging angle of view.
Japanese Patent Application Laid-Open Publication No. 2009-218719 discusses a technique for showing the moving direction and moving speed of a tracking subject, and if a behavior likely to depart from the angle of view is detected, displaying a warning.
However, Japanese Patent Application Laid-Open Publication No. 2009-218719 does not disclose control for tracking a plurality of subjects at the same time.
According to an aspect of the present invention, an information processing apparatus includes a detection unit configured to detect a plurality of subjects from an image, a selection unit configured to select a subject as a tracking target, a tracking unit configured to track the subject as the tracking target selected by the selection unit using information about the plurality of subjects detected by the detection unit, a counting unit configured to count the number of subjects currently being tracked by the tracking unit, and a notification unit configured to notify a tracking state by the tracking unit. The notification unit notifies the tracking state according to the number of subjects selected by the selection unit and the number of subjects counted by the counting unit.
Further features of the present invention will become apparent from the following description of embodiments with reference to the attached drawings.
Embodiments of the present invention will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the present invention within the scope of the appended claims. While a plurality of features is described in the embodiments, not all of the plurality of features is used in the present invention, and any combination of the plurality of features can be used. In the accompanying drawings, identical or similar components are assigned the same reference numerals, and duplicated descriptions thereof will be omitted.
The PTZ camera 100 is an imaging apparatus capable of capturing images of tracking targets (subjects) and areas around subjects, and outputting captured images to the PC 200 and an external device. The PTZ camera 100 according to the present embodiment includes a drive unit 109 (described below) provided with a mechanism for performing pan and tilt operations to change the imaging direction. The PTZ camera 100 also includes an inference unit 111 (described below) for inferencing positions of subjects on captured images.
The PC 200 accesses the PTZ camera 100 via the LAN 300 to acquire images output by the PTZ camera 100, perform imaging controls based on the user's operations, and set various imaging conditions. Images according to the present embodiment include moving and still images, and the present embodiment is applicable to both types of images.
The PTZ camera 100 according to the present embodiment includes a central processing unit (CPU) 101, a read only memory (ROM) 102, a random access memory (RAM) 103, an image output interface (I/F) 104, and a network I/F 105. The PTZ camera 100 further includes an image processing unit 106, an image sensor 107, a drive I/F 108, a drive unit 109, the inference unit 111, and an internal bus 110 for communicably connecting the above-described components.
The CPU 101 generally controls the apparatus by controlling different components of the PTZ camera 100.
The ROM 102 is a nonvolatile storage device represented by a flash memory, a hard disk drive (HDD), a solid state drive (SSD), and a secure digital (SD) card. The ROM 102 is used as a permanent storage area for storing an operating system (OS), various programs, and various types of data, and also used as a temporary storage area for storing various types of data.
The RAM 103 is a volatile high-speed storage device represented by a dynamic random access memory (DRAM) into which an OS, various programs, and various types of data are loaded. The RAM 103 is also used as a work area of the OS and various programs.
The image output I/F 104, an interface for outputting images captured by the image sensor 107 (described below) to an external device, includes a serial digital interface (SDI) and a high-definition multimedia interface (HDMI®).
The network I/F 105, an interface for connecting with the LAN 300, communicates with an external device, such as the PC 200 via a communication medium such as Ethernet®.
The image processing unit 106 connected to the image sensor 107 performs various types of image processing (e.g., defect correction, noise reduction (NR) processing, and color conversion processing) on image data acquired from the image sensor 107 based on instructions from the CPU 101, performs image data conversions into predetermined formats, and performs compression processing. The processed image data is stored in the RAM 103.
The image sensor 107 including a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor functions as an imaging unit in the PTZ camera 100. The image sensor 107 photoelectrically converts subject images formed by a non-illustrated imaging optical system to generate image data.
While, according to the present embodiment, image data is output to the image processing unit 106 as digital signals by an analog-to-digital (A/D) conversion circuit included in the image sensor 107, the image data can be output as analog signals. The image sensor 107 and the image processing unit 106 can be integrated in a stacked chip configuration. While, according to the present embodiment, the imaging optical system for the image sensor 107 to receives subject images through is also integrated, the imaging optical system can be configured to be attachable to, detachable from, and interchangeable for the image sensor 107. According to the present embodiment, the imaging optical system and the image sensor 107 are collectively referred to as an imaging unit in some cases.
The drive I/F 108 is an interface for transmitting instructions received from the CPU 101 to the drive unit 109.
The drive unit 109 is a mechanism and an optical system for changing the imaging direction of the PTZ camera 100. According to the present embodiment, the drive unit 109 changes the imaging direction by rotatably driving the image sensor 107.
The drive unit 109 consists of a mechanical drive system and motors as driving sources. The drive unit 109 performs rotational drive, such as pan and tilt operations to change the imaging direction with respect to the horizontal and vertical directions based on instructions received from the CPU 101 via the drive I/F 108. With an imaging optical system provided with a magnification lens (also referred to as a zoom lens), the drive unit 109 can perform a zoom control to optically change the imaging angle of view by moving the zoom lens in the optical axis direction.
The inference unit 111 performs inference processing using a learned inference model and inference parameters according to an inference program. The inference processing of the inference unit 111 can be performed by a calculation processing apparatus specialized in image processing and inference processing of a graphics processing unit (GPU). The GPU is a processor capable of performing a large number of sum-of-product calculations, and has the ability to perform a neural network matrix calculation in a short time. The inference processing of the inference unit 111 can be performed by a reconfigurable logic circuit, such as a Field-Programmable Gate Array (FPGA). The inference unit 111 can perform the inference processing in collaboration with the CPU 101.
A lamp 112, also referred to as a tally lamp, is a light source, such as a light emitting diode (LED), indicating the control under which the PTZ camera 100 is currently being subjected. The CPU 101 receives instructions from the outside via the network I/F 105 to change the display pattern. The CPU 101 performs status display by color or periodical blink, for example, red indicates that a video image is used for broadcasting and recording, and green indicates a preview state.
The PC 200 according to the present embodiment includes a CPU 201, a ROM 202, a RAM 203, a network I/F 204, a display unit 205, a user input I/F 206, and an internal bus 207 for communicably connecting these components.
The CPU 201 controls components of the PC 200 to generally control the apparatus.
The ROM 202 is a nonvolatile storage device represented by a flash memory, an HDD, an SSD, and an SD card. The ROM 202 is used as a permanent storage device for storing an OS, various programs, and various types of data, and also used as a temporary storage area for storing various types of data.
The RAM 203 is a volatile high-speed storage device represented by a DRAM into which an OS, various programs, and various types of data are loaded. The RAM 103 is also used as a work area of the OS and various programs.
The network I/F 204, an interface for connecting with the LAN 300, communicates with an imaging apparatus, such as the PTZ camera 100 and an external device, such as a server via a communication medium, such as Ethernet.
The display unit 205 displays images acquired from the PTZ camera 100 and setting screens of the PC 200. For example, the display unit 205 is a liquid crystal panel or an organic electroluminescence (EL) panel. While the PC 200 includes the display unit 205 as an example, the PC 200 and the display unit 205 can be configured as different components, such as a display monitor for displaying only captured images and the PC 200 are provided as different components.
The user input I/F 206 includes input devices (operation units), such as a keyboard, a pointing device (mouse), a touch panel, and switches, and receives instructions from the user to the PC 200. The keyboard can be a software keyboard. The CPU 201 monitors the user input I/F 206 and, on a detection of the user's operation on the user input I/F 206, performs processing in response to the detected operation.
Basic controls performed by the system will be described. The basic controls include the automatic tracking control for controlling the PTZ camera 100 to track subjects, and the subject selection for selecting tracking target subjects for the PTZ camera 100 based on the user's operations received by the PC 200.
The automatic tracking control will be described with reference to
This control flow is started when the CPU 101 of the PTZ camera 100 receives a control command for performing the automatic tracking control via the network I/F 105.
In step S301, the CPU 101 determines which is received via the network I/F 105, a control command or an end command, and stores the received control command in the RAM 103. If the CPU 101 determines that a control command is received (YES as the result of checking the operation state in step S301), the CPU 101 stores the received control command in the RAM 103. Then, the processing proceeds to step S302. If the CPU 101 determines that an end command is received (NO as the result of checking the operation state in step S301), the CPU 101 completes this control.
In step S302, the CPU 101 acquires the image data stored by the image processing unit 106 from the RAM 103.
In step S303, the CPU 101 determines information about features and positions of subjects in each frame of the captured image data and stores the information in the RAM 103. More specifically, the CPU 101 inputs the image data acquired from the RAM 103 to the inference unit 111. Then, the CPU 101 stores the features of the subjects inferenced by the inference unit 111 and the positional information on the images of the subjects in the RAM 103. The inference unit 111 includes a learned model created using a machine learning technique, such as deep learning, and receives images as input data and outputs identifiers (ID) for identification and positional information as subject information. The positional information will be described as information about the upper left point, the width and height, and the coordinates of the center of gravity of a rectangle that circumscribes a subject on the image. However, the present invention is not limited thereto.
A table illustrated in
In step S304, the CPU 101 determines whether subject selection information is received as a control command from the PC 200 via the network I/F 105. If received (YES in step S304), the CPU 101 stores the subject selection information received from the PC 200 in the RAM 103. The present embodiment will be described below on the premise that the subject selection information refers to coordinates (region) in the angle of view. With any subject selected, the processing constantly proceeds to step S305 regardless of the subject selection information.
In step S305, the CPU 101 reads the subject information output in step S303 and the subject selection information received in step S304 from the RAM 103. The CPU 101 compares the positional information in the subject information with the coordinate information included in the subject selection information to check whether the coordinate information is included in the positional information about the subject. A specific example will be described with reference to
In step S306, the CPU 101 calculates control positional information (control information) in the automatic tracking control, and stores the control positional information in the RAM 103. The control positional information refers to information (imaging parameters), such as a pan angle, a tilt angle, and a zoom angle of view used to control (move) the image sensor 107 to any desired point. The CPU 101 calculates the pan angle, the tilt angle, the zoom angle of view, and the angular speed when the coordinates of the center of gravity of the tracking target subject selected in step S305 are moved to the center position of the angle of view, and stores the calculation result in the RAM 103 as the control positional information. A state where no subject is selected is equivalent to a state where the control positional information is absent, in which the angle of view is not controlled. When the control positional information is received as a control command from the outside in step S301, the CPU 101 stores the information in the RAM 103 giving priority to the control command, enabling position control from the outside.
In step S307, the CPU 101 reads the control positional information stored in the RAM 103. Based on the control positional information, the CPU 101 extracts drive parameters (control details) for the drive unit 109 to enable pan, tilt, and zoom controls at desired speeds in desired directions. More specifically, the drive parameters are used to control the motors included in the drive unit 109. Amounts of operations based on the control positional information can be converted into drive parameters with reference to a conversion table prestored in the RAM 103.
In step S308, the CPU 101 controls the drive unit 109 via the drive I/F 108 based on the extracted drive parameters, and the drive unit 109 performs rotation operations based on the drive parameters for the PTZ camera 100 to perform the pan, tilt, and zoom operations. This control flow for calculation based on the subject selection information received in step S304 enables the PTZ camera 100 to perform an operation according to the user's subject selection successively transmitted from the PC 200.
Processing for controlling the PTZ camera 100 based on the user's operations on the PC 200 illustrated in
In step S401, the CPU 201 of the PC 200 detects the user's operation for closing the menu screen via the user input I/F 206. It the user's operation is not detected (YES in step S401), the processing proceeds to step S402. If the user's operation is detected (NO in step S401), the processing exits the control flow.
In step S402, the CPU 201 of the PC 200 transmits a control command for acquiring information to the PTZ camera 100 via the network I/F 204. When the CPU 101 of the PTZ camera 100 detects the reception of the control command, the CPU 101 reads the subject information output in step S303 in
In step S403, the CPU 201 of the PC 200 detects whether the user is performing a subject selection operation, via the user input I/F 206. The user can input an operation by specifying one point on the camera image 502 through a mouse operation or touching on the camera image 502 displayed on the touch panel. However, the method for inputting operations is not limited thereto. As in step S304 in the above-described control flow in
In step S404, the CPU 201 of the PC 200 acquires the tracking state from the PTZ camera 100. The tracking state refers to information about whether the ID read in step S305 in the above-described control flow in
In step S405, the CPU 201 of the PC 200 displays the menu screen 510 updated on the RAM 203 by controlling the display unit 205. When this control flow is started, the screen illustrated in
If the point specified by the user in step S403 has already been a tracking target subject, the CPU 101 can release the tracking target by selecting another subject to change the tracking target. Although the tracking operation is started through the subject selection, the user can change the setting to issue explicit instructions for starting and stopping the tracking operation.
The above-described basic operations enable an automatic tracking control of the PTZ camera 100 according to the user's subject selection in the control flow of the PTZ camera 100 and the PC 200.
A tracking operation when a plurality of subjects is selected as a characteristic operation of the present invention will now be described. The operation of the PTZ camera 100 will be described about differences from the control flow illustrated in
The control flow illustrated in
In step S705, as in step S305, the CPU 101 transmits subject information for three different subjects from the PTZ camera 100. However, the PC 200 displays a menu screen illustrated in
In step S704, the CPU 101 of the PTZ camera 100 determines whether subject selection information is received as a control command from the PC 200, via the network I/F 105. As with the basic operation, if a point 804 is specified, the CPU 101 of the PTZ camera 100 transmits the information illustrated in
In step S706, the CPU 101 of the PTZ camera 100 compares the tracking state received from the PTZ camera 100 with the number of targets selected. The processing in step S706 is equivalent to the control flow illustrated in
In step S710, the CPU 101 determines whether the number of targets selected is zero. As described above, since the number of targets selected is one at this timing, the processing proceeds to step S711. The number of targets selected is zero when the control flow illustrated in
In step S711, the CPU 101 determines whether the number of tracking targets is zero. Since the number of tracking targets is one at this timing, the processing proceeds to step S712. The advance of the processing to step S714 when the number of tracking targets becomes zero will be described below.
In step S712, the CPU 101 compares the number of tracking targets with the number of targets selected. If the number of targets selected matches the number of tracking targets (YES in step S712), the processing proceeds to step S715. If the number of targets selected does not match the number of tracking targets (NO in step S712), the processing proceeds to step S716. Since both the numbers are one at this timing, the processing proceeds to step S715. In step S715, the CPU 101 transmits information so that a tracking control state 810 becomes “Tracking”. The CPU 201 of the PC 200 updates the menu screen on the RAM 203, as illustrated in
On completion of the above-described operations, a menu screen illustrated in
The PTZ camera 100 is to be controlled to track the subjects selected as tracking targets. Taking a state in
Assume a case where one of the subjects moves out of the angle of view of the PTZ camera 100.
In a case of a camera image 843 illustrated in
Further, assume a state where all of the subjects move out of the angle of view of the PTZ camera 100, and a subject 854 that was not tracked enters the angle of view thereof. In a case of a camera image 853 illustrated in
If the subject is not a tracking target, a frame equivalent to a “Stopped” frame as a detection result is displayed and thus the subject may be in a state where the subject cannot be detected depending on the posture or orientation, other than moving out of the angle of view. The user can explicitly specify coordinates or a subject that continues to be caught for a certain time period (a predetermined time period or a predetermined number of times of detection) in the angle of view can be automatically determined to be a tracking target and then can be in a selected state. If a subject is off the angle of view for a certain time period (a predetermined time period or a predetermined number of times of detection), the subject can be excluded from the tracking targets.
In the present embodiment, the example with the PTZ camera 100 has been described. However, the form of the present embodiment is not so limited thereto. The present embodiment is also applicable, for example, to an apparatus capable of detecting a plurality of subjects from an image and controlling the PTZ camera 100 based on the detection result. Specific examples of such apparatuses include an edge device and PC provided with an image input unit, network communication unit, and GPU.
According to the first embodiment described above, an imaging system for tracking a plurality of subjects can be driven in consideration of the increase and decrease in the number of tracking targets and show changes in tracking state to the user.
A second embodiment will be described. According to the first embodiment, a plurality of subjects is selected by a user successively selecting subjects. It, however, is conceivable that selecting predetermined subjects beforehand could save time and trouble of making successive selections. Thus, according to the present embodiment, how to show users in a method for managing a plurality of subjects will be described with reference to
The first group 1001 indicates a state where subjects A to C are selected, which is close to a selected state where the number of targets selected is three in the first embodiment.
In a similar system configuration to that of the first embodiment, modifications in the control flowcharts in
If the user selects the first group 1001 via the user input I/F 206, the CPU 201 displays a selection group 1110 in
Since the number of targets selected and the number of tracking targets are 3, then in step S705, the tracking control state is determined to be “Tracking” as indicated by a tracking control state 1111. Then, the CPU 201 of the PC 200 displays the menu screen in
A case where a subject other than those in the groups appears will be described. Assume a case where a camera image 1130 illustrated in
Assume a case where the user selects the second group 1002 via the user input I/F 206. In the second group 1002, a priority setting is made for a selected subject, and the subject A is given higher priority than subjects D and E. A difference made by a priority-based tracking operation can be caused by changing the above-described center-of-gravity calculation in the calculation of control positional information in step S306.
Assume a case where the second group 1002 is selected, and a menu screen illustrated in
A case where no subjects are detected in the angle of view will be described. In step S716 (described above), the CPU 101 displays the non-match between the number of tracking targets and the number of targets selected in the form of “Partially Lost”. However, the addition of the priority results in a modification described below, allowing display of additional information to the user. Assume a case where the main subject A is not detected in the angle of view, i.e., subject information illustrated in
In step S716, the menu screen being displayed becomes “Partially Lost”. However, to emphasize that the main subject is not detected, the CPU 101 of the PTZ camera 100 resets a tracking control state 1310 as “Main Subject Lost”. The CPU 101 updates the menu screen so that the number of detections of main subjects 1311 is zero. Likewise, with subject information illustrated in
As in the case where the first group is selected, a detection of an unselected subject or a subject other than the groups results in “Lost”.
Instead of explicit group selections by the user, a group selection can be made resulting from the CPU 201 determining that the result of subject detection corresponds to group information about any group. In this case, as priority when a plurality of groups matches the result of subject detection, controls can be performed, for example, of continuously setting the group found first or of setting a specific group given priority.
According to the second embodiment described above, a similar effect can be produced by showing users information indicating that the method for selecting a plurality of subjects and a subject type is changed from those according to the first embodiment.
A third embodiment will be described. According to the first and the second embodiments, a menu screen is updated to be shown to the user. As a method for determining the tracking control state more intuitively, the display of the lamp 112 mounted on the PTZ camera 100 can be changed to be shown to the user.
In the third embodiment described above, a similar effect to that of the first and second embodiments can be produced in modified notification forms.
A fourth embodiment will be described. While, according to the first to the third embodiments, the PTZ camera 100 as an imaging apparatus performs subject detection and tracking control, the PC 200 as an information apparatus or a server as an external device can serve to perform subject detection and tracking. An example where the PC 200 serves to perform subject detection and tracking will be described below.
In step S1701, the CPU 101 determines which is received via the network I/F 204, a control command or an end command and stores the received control command in the RAM 203. When the CPU 201 receives a control command (YES as the result of checking the operation state in step S1701), the CPU 101 stores the received control command in the RAM 203. Then, the processing proceeds to step S1702. When the CPU 101 receives an end command (NO as the result of checking the operation state in step S1701), the CPU 101 completes this control.
In step S1702, the CPU 101 receives a captured image of the PTZ camera 100 via the network I/F 204 and stores the image in the RAM 203.
In step S1703, as in step S303, the CPU 201 determines information about the features and positions of subjects from the captured image using the inference unit 208 and stores the information in the RAM 203.
In step S1704, as in step S304, the CPU 201 determines subject selection information. According to the present embodiment, as in the first embodiment, a menu screen can be displayed using the display unit 205 to detect the coordinates on the angle of view via the user input I/F 206. Besides, as in the second embodiment, group selection information can be received via the user input I/F 206. The subject information is stored in the RAM 203.
In step S1705, as in step S705, the CPU 201 determines a tracking target in the subject information and stores the subject information in the RAM 203 without transmitting the subject information to the outside.
In step S1706, as in step S306, the CPU 201 calculates control positional information. The calculated control positional information is transmitted via the network I/F 204 to the PTZ camera 100. The CPU 101 of the PTZ camera 100 performs the pan, tilt, and zoom operations via instructions from the PC 200 by performing the control in steps S307 to S308 based on the information.
In step S1707, as in step S706, the CPU 201 compares the tracking state with the selected target. As in the other embodiments, the CPU 101 determines the tracking control state based on the number of targets selected and the number of tracking targets and stores the tracking control state in the RAM 203.
In step S1707, as in the other embodiments, the CPU 201 displays the captured images, the subject information, and the tracking control state stored in steps described above, using the display unit 205.
According to the fourth embodiment, the PC 200 as an information apparatus can produce a similar effect to that of the other embodiments.
The present invention can also be implemented by a program for implementing at least one of the functions according to the above-described embodiments being supplied to a system or apparatus via a network or storage medium, and one or more processors in a computer of the system or apparatus reading and executing the program. Further, the present invention can also be implemented with a circuit, such as an Application Specific Integrated Circuit (ASIC) for implementing one or more functions. The present invention is not limited to the above-described embodiments but can be modified and changed in diverse ways without departing from the scope thereof.
The present invention enables an information processing apparatus for tracking a plurality of subjects to inform a user of tracking states.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc™ (BD)), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but is defined by the scope of the following claims.
This application claims the benefit of Japanese Patent Application No. 2023-220111, filed Dec. 26, 2023, which is hereby incorporated by reference herein in its entirety.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2023-220111 | Dec 2023 | JP | national |