The present disclosure relates to an image compression coding technique.
In recent years, along with the proliferation of smartphones, digital video cameras, and the like, opportunities for generating video data by performing image capturing have increased. On the other hand, since the storage capacity for storing data and the communication band used for exchanging data are limited, there is a need for a technique that can efficiently compress video data. As a video compression method, H.264/AVC is known as a standard. In addition, H.265/HEVC has also become popular as a standard.
In a video data compression coding technique, parameters such as a quantization parameter and the like are defined for the adjustment of image quality. By using these parameters, the data amount needs to be minimized as much as possible while maintaining necessary information. More specifically, there is a method in which a region of interest in a video is extracted as an ROI (Region of Interest), and different quantization parameters are used for the ROI and each region other than the ROI. Since a moving object tends to be an object of importance in the case of a network camera which is used mainly for the purpose of monitoring, a method that detects a moving object and sets the detected moving object as an ROI is known. In addition, a method that detects a specific object, such as a person, a car, or the like that tends to be regarded as more important among moving objects, and specifies only the specific object as an ROI is also generally used,
Although moving objects tend to be objects of importance, there can be exceptions. For example, moving objects can be background objects which constantly sway such as a fountain, the sea surface, trees which are being blown by the wind, and the like. Since such background objects move in a complicated manner, an accurate reproduction of these objects can degrade the compression efficiency and increase the data amount. However, the information included as these objects are generally unimportant. Hence, by increasing the image quality of an important region as an ROI while simultaneously decreasing the image quality of an unimportant region which has movement, it will be possible to reduce the bit rate without the loss of important information.
A region such as a water surface, vegetation, or the like can be obtained by applying a region-based segmentation (also referred to as segmentation) method on every single image (to be referred to as a frame) forming an obtained video. However, since the regions cannot be correctly segmented if a person or a car which is to be the foreground is included, a background image needs to be generated by excluding the foreground. Japanese Patent Laid-Open No. 2012-203680 discloses a method of generating a background image by using a plurality of frames. In addition, Japanese Patent Laid-Open No. 8-181992 discloses a method that changes the image quality by segmenting a human facial region, which is regarded to be an important region, into a region with movement and a region without movement.
Although the method of Japanese Patent Laid-Open No. 2012-203680 can be used to create a background image that excludes the foreground, the method of Japanese Patent Laid-Open No. 2012-203680 does not perform compression control by using the background image. Since a region with movement included in the background is not targeted in the method of Japanese Patent Laid-Open No. 8-181992, the movement of vegetation or the like is not assumed in the method. Furthermore, although region-based segmentation can be performed for each frame and the image quality parameter can be changed depending on the segmented contents, the image quality will be set uniformly for the vegetation in such a case and the image quality settings cannot be changed between vegetation which is moving and vegetation which is not moving.
The present disclosure provides, in a case in which different compression coding parameters are to be set for a specific region and a non-specific region of a background image used in compression coding, a technique for setting a compression coding parameter that corresponds to an amount of movement in the specific region.
In order to set, in a case in which different compression coding parameters are to be set for a specific region and a non-specific region of a background image used in compression coding, a compression coding parameter that corresponds to an amount of movement in the specific region, a first aspect of the present disclosure provides an image processing apparatus comprising: a determination unit configured to obtain a pixel value of the same pixel position from a plurality of images and determine, based on a frequency distribution of the obtained pixel value, the pixel value and an amount of movement in the pixel position in a background image; and a setting unit configured to set a compression coding parameter to the background image, wherein in a specific region in the background image, the setting unit sets a compression coding parameter corresponding to an amount of movement of a pixel belonging to the specific region.
A second aspect of the present disclosure provides an image processing method performed by an image processing apparatus, comprising: obtaining a pixel value of the same pixel position from a plurality of images, and determining, based on a frequency distribution of the obtained pixel value, the pixel value and an amount of movement in the pixel position in a background image; and setting a compression coding parameter to the background image, wherein in a specific region in the background image, a compression coding parameter corresponding to an amount of movement of a pixel belonging to the specific region is set in the setting.
Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
Each of the following embodiments will describe an example in which image capturing is performed for the purpose of monitoring. However, the present disclosure is not limited to this, and each of the following embodiments is applicable to image capturing performed for various kinds of purposes such as for the purpose of broadcasting and the like. In addition, each of the following embodiments will describe an image processing apparatus that functions as an image capturing apparatus (network camera) that can connect to a network to communicate with another apparatus. However, the present disclosure is not limited to this, and each of the following embodiments can be applied to an image processing apparatus that functions as an image capturing apparatus that is capable of connecting to a network. Furthermore, although the image processing apparatus will be described as having an image capturing function in each of the following embodiments, the present disclosure is not limited to an example in which the image processing apparatus has the image capturing function. The image capturing function may be implemented by an apparatus separate from the image processing apparatus, and the image processing apparatus may be configured to obtain a captured image from this separate apparatus.
This embodiment includes an analysis stage in which a background image that is to be used for compressing (compression-coding) an image of a frame in a captured moving image is analyzed, and a compression stage in which an image of a frame in a moving image captured after the analysis stage is compression-coded by using the result of the analysis.
In the preceding analysis stage, a background image and an amount of movement in a given time for each pixel position in the background image are obtained from the respective images of a plurality of frames of a moving image capturing the same seen by a fixed angle of view or the like. Subsequently, different compression coding parameters are set for a specific region and a non-specific region in the background image. A compression coding parameter that corresponds to the amount of movement in the specific region will be set for the specific region. Although an example that uses, as the compression coding parameter, a Qp value which is a quantization parameter value will be described hereinafter, the compression coding parameter is not limited to the Qp value. Any kind of compression coding parameter can be employed as long as it is a compression coding parameter that influences the image quality.
In the subsequent compression stage, a foreground is extracted from an image of each frame in the moving image capturing the same scene (the same scene as the scene captured in the analysis stage) by a fixed angle of view or the like, and an ROI is set to the extracted foreground. Subsequently, “a Qp value set to the specific region of the background image” is set to a correspondence region that corresponds to the above-described specific region in the image, and “a Qp value set to a non-specific region of the background image” is set to a correspondence region that corresponds to the above-described non-specific region in the image, At this time, “a Qp value corresponding to high image quality (a Qp value which is smaller than either of the Qp value of the specific region and the Qp value of the non-specific region)” will be set to the ROI of the image. Subsequently, by executing compression coding by quantifying each region of the image by using the Qp value of the region, image compression can be performed so that only the image quality of a region that has a large movement in the background but does not include important information in contrast to the high compression cost will be decreased while maintaining the image quality of an important region in the foreground.
First, an example of the arrangement of the image processing system 10 according to this embodiment will be described with reference to the block diagram of
Based on an operation by a user. the client apparatus 200 transmits, to the image processing apparatus 100, a distribution request command for requesting the distribution of a moving image (stream) and a setting command for setting various kinds of parameters, information of the ROI, and the like. The image processing apparatus 100 will transmit a stream to the client apparatus 200 in response to the distribution request command, and store the various kinds of parameters, the information of the ROI, and the like in response to the setting command. The client apparatus 200 is a computer apparatus such as a personal computer, a tablet terminal, a smartphone, or the like. A processor of a CPU or the like of the client apparatus 200 will use computer programs and data stored in a memory of the client apparatus 200 to execute various kinds of processing. As a result, the processor of the client apparatus 200 can control the operation of the entire client apparatus 200 as well as execute or control each processing operation to be described as processing to be executed by the client apparatus 200.
An example of the arrangement of the image processing apparatus 100 will be described with reference to
The example of the functional arrangement of the image processing apparatus 100 will be described first with reference to the block diagram of
A background analysis unit 214 uses the captured images, of a plurality of frames, obtained by the image obtainment unit 211 to generate a background image from which the foreground of the captured scene has been excluded and to obtain an amount of movement corresponding to each pixel in the background image. Next, although the background analysis unit 214 will execute region-based segmentation so the generated background image will be segmented into regions for respective objects and set a Qp value for each segmented region, a Qp value corresponding to a corresponding amount of movement will be set for a segmented region of a specific object. Subsequently, the background analysis unit 214 stores the Qp values set for the respective regions in the storage unit 222.
A foreground extraction unit 215 extracts the foreground (foreground region) from each captured image obtained by the image obtainment unit 211, and sets an ROI to each extracted foreground. A compression coding unit 212 uses the Qp values stored in the storage unit 222 by the background analysis unit 214 to perform compression coding of each captured image obtained as a compression coding target by the image obtainment unit 211.
A compression coding unit 213 transmits, for example, in a streaming format via a communication unit 224 (
Next, an example of the hardware arrangement of the image processing apparatus 100 will be described with reference to
The storage unit 222 includes memory devices such as a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The storage unit 222 stores the computer programs and data to be used by a control unit 223 to execute or control various kinds of processing which are described as processing performed by the image processing apparatus 100. In addition, the storage unit 222 can store data (commands and images) and various kinds of parameters obtained via the communication unit 224 from an external device such as the client apparatus 200 or the like. For example, the storage unit 222 stores camera parameters such as the settings of white balance, exposure, and the like of the moving image obtained by the image capturing unit 221, the compression coding parameters, and the like. Quantization parameter values (Qp values) are included in the compression coding parameters. Note that the quantization step will increase as the value of the Qp value increases and will decrease as the value of the Qp value decreases. Hence, the image quality will degrade as the Qp value used to perform the compression coding increases, and the image quality will improve as the Qp value used in the compression coding decreases. In addition, the storage unit 222 can store parameters related to each captured image such as the frame rate of the moving image, the size (resolution) of the captured image, and the like.
In addition, the storage unit 222 can provide a work area to be used when the control unit 223 is to execute various kinds of processing. Furthermore, the storage unit 222 can function as a frame memory and a buffer memory. Note that other than memories like the ROM, the RAM, and the like, a storage medium such as a flexible disk, a hard disk, an optical disk, a magnetooptical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a DVD or the like can be used as the storage unit 222.
The control unit 223 includes a CPU (Central Processing Unit), an MPU (Micro Processing Unit), and the like. The control unit 223 executes various kinds of processing by using computer programs and data stored in the storage unit 222. As a result, in addition to controlling the operation of the entire image processing apparatus 100, the control unit 223 executes or controls each processing described to be performed by the image processing apparatus 100. Note that the control unit 223 may control the entire image processing apparatus 100 based on the cooperation of an OS (Operating System) and the computer programs stored in the storage unit 222. Note that the control unit 223 may be formed by a processor such as a DSP (Digital Signal Processor) or the like or an ASIC (Application Specific Integrated Circuit).
The communication unit 224 transmits/receives wired signals or wireless signals to communicate with the client apparatus 200 via the network 300. Note that each functional unit of the image processing apparatus 100 shown in
An accelerator unit 225 includes a CPU, a GPU (Graphics Processing Unit), an FPGA (Field-Programmable Gate Array), a storage unit, and the like. The accelerator unit 225 is a processing unit added to the image capturing unit 221 to mainly execute high performance processing by deep learning. The accelerator unit 225 may also perform the processing operations of the background analysis unit 214 and the foreground extraction unit 215.
The processing operations of the functional units shown in
The processing performed by the image processing apparatus 100 in the analysis stage will be described in accordance with the flowchart of
In addition, the image obtainment unit 211 obtains compression coding parameters from the storage unit 222. The compression coding parameters obtained by the image obtainment unit 211 from the storage unit 222 include the respective Qp values (quantization parameter values), described above, for executing compression coding in compliance with H.264. The Qp values to be obtained by the image obtainment unit 211 include a Qp value (Qp value of a non-specific region) for the general background and a Qp value of a specific region. As one example, assume that the Qp value of the general background is “36” and the Qp value of the specific region is “40”.
In step S420, the image obtainment unit 211 generates, from the moving image captured by the image capturing unit 221, captured images consisting of frames of a corresponding predetermined time in accordance with the various kinds of settings obtained in step S410. In this embodiment, in a case in which the predetermined time is, for example, 10 minutes and the frame rate is 30 fps, 18,000 frames of captured images will be generated from the moving image.
This embodiment assumes a use case targeting monitoring of a general road as shown in
In step S430, the background analysis unit 214 uses the 18,000 captured images obtained by the image obtainment unit 211 in step S420 to obtain a background image and an amount of movement for each small region in the background image.
The generation method of the background image will be described first. The background image is generated by combining, for each small region, the most frequent pixel value of each correspondence region of the 18,000 captured images, corresponding to the small region. A case in which each small region is a pixel and the pixel value is a luminance value will be described below. That is, a determination method for determining the luminance value of each pixel position (x, y) in the background image from the 18,000 captured images will be described below. Applying this determination method to each pixel position in the background image will determine the luminance value of each pixel position in the background image, and a background image in which the luminance values of the respective pixel positions have been determined can be generated as a result. First, the background analysis unit 214 will collect the luminance value of a pixel position (x, y) from each of the 18,000 captured images, and generate a frequency distribution of the collected luminance values (the luminance values of the 18,000 pixels). In this embodiment, as one example of the luminance distribution, the background analysis unit 214 will generate a histogram representing the frequency of each luminance value.
Hence, the background analysis unit 214 determines the highest frequency luminance value in the histogram generated for each pixel position (x, y) in the background image as the luminance value of the pixel at the pixel position (x, y) in the background image.
For example, in
In addition, for example, in
A method of obtaining the amount of movement for each pixel in a background image will be described next. A method for obtaining the amount of movement in each pixel position (x, y) in the background image will be described below. By applying this method to each pixel position in the background image, the amount of movement in each pixel position in the background image can be obtained.
The amount of movement in the pixel position (x, y) in the background image can be a reciprocal of the width of a peak that includes the highest frequency in the histogram generated for the pixel position (x, y) or a reciprocal of the ratio of the total frequency (the total number of frequencies, which is 18,000 in this case) of the total value of the highest frequency and the frequencies distributed in its periphery. The latter method will be used here to describe the method of obtaining the amount of movement in the pixel position (x, y) in the background image.
First, the total value of the highest frequency in the histogram generated for the pixel position (x, y) in the background image and the respective frequencies of two luminance values adjacent to luminance value corresponding to the highest frequency in the histogram is obtained as “the width of the peak”. Subsequently, the background analysis unit 214 obtains the ratio of “the width of the peak” to the total frequency of “18,000”, and the reciprocal of the obtained ratio is obtained as the amount of movement in the pixel position (x, y) in the background image. Since the object that requires attention here is the obtainment of the amount of movement in the background by excluding the movement of the foreground, for example, the influence of the variation spreading in a low luminance value of
For example, in a case in which the amount of movement in a pixel position corresponding to the pixel position 360 in the background image is to be obtained, first, the width of the peak of each of the R, G, and B values will be obtained with reference to the histogram of
In regards to the R value, since the highest frequency is “3,544” and the frequencies corresponding to the luminance values adjacent to the luminance value corresponding to the highest frequency are “1,532” and “0”, the width of the peak will be the total value “5,076” (=3,544+1,532+0) of these values. Hence, the ratio of “the width of the peak” to the total frequency “18,000” will be 5,076/18,000=0.282.
In regards to the G value, since the highest frequency is “4,898” and the frequencies corresponding to the luminance values adjacent to the luminance value corresponding to the highest frequency are “2,761” and “0”, the width of the peak will be the total value “7,659” (=4,898+2,761+0) of these values. Hence, the ratio of “the width of the peak” to the total frequency “18,000” will be 7,659/18,000=0.426.
In regards to the B value, since the highest frequency is “4,055” and the frequencies corresponding to the luminance values adjacent to the luminance value corresponding to the highest frequency' are “3,573” and “0”, the width of the peak will be the total value “7,628” (=4,055+3,573+0) of these values. Hence, the ratio of “the width of the peak” to the total frequency “18,000” will be 7,628/18,000=0.424.
The amount of movement can be obtained for each of the R, G, and B values of one pixel position or a single amount of movement can be obtained for one pixel position. The latter method will be employed here. Hence, in this case, the average value “0.377” (=(0.282+0.426+0.421)/3) of the ratios obtained for the respective R, G, and B values will be obtained, and a reciprocal “2.65” of this average value will be obtained as “the amount of movement in the pixel position corresponding to the pixel position 360 in the background image”.
In addition, for example, in a case in which the amount of movement in a pixel position corresponding to the pixel position 370 in the background image is to be obtained, first, the width of the peak of each of the R, G, and B values will be obtained with reference to the histogram of
In regards to the R value, since the highest frequency is “693” and the frequencies corresponding to the luminance values adjacent to the luminance value corresponding to the highest frequency are “512” and “334”, the width of the peak will be the total value “1,539” (=693+512+334) of these values. Hence, the ratio of “the width of the peak” to the total frequency “18,000” will be 1,539/18,000=0.086.
In regards to the G value, since the highest frequency is “727” and the frequencies corresponding to the luminance values adjacent to the luminance value corresponding to the highest frequency are “631” and “540”, the width of the peak will be the total value “1,898” (=727+631+540) of these values. Hence, the ratio of “the width of the peak” to the total frequency “18,000” will be 1,898/18,000=0.105.
In regards to the B value, since the highest frequency is “1,020” and the frequencies corresponding to the luminance values adjacent to the luminance value corresponding to the highest frequency are “816” and “511”, the width of the peak will be the total value “2,347” (=1,020+816+511) of these values. Hence, the ratio of “the width of the peak” to the total frequency “18,000” will be 2,347/18,000=0.130.
The average value “0.107” (=(0.086+0.105+0.130)/3) of the ratios obtained for the respective R, G, and B values will be obtained, and a reciprocal “9.35” of this average value will be obtained as “the amount of movement in the pixel position corresponding to the pixel position 370 in the background image”.
As described above, since most of the movement will be increasingly distributed around the highest frequency as the size of movement decreases, the above-described ratio will increase. Hence, based on such a relationship, the reciprocal of the average value of the ratios is set as the amount of movement in this embodiment.
Note that the method of obtaining the amount of movement from the histogram described above is merely an example, and the method is not limited to this. For example, the total value of the highest frequency and the frequencies of luminance values adjacent to the luminance value corresponding to the highest frequency is obtained in the above-described embodiment. However, as the width of the peak increases, the height of the peak will be averaged out with the surroundings even in a case in which the movement is small. Hence, to prevent such an influence, the reciprocal of the ratio of the highest frequency to the total frequency may be set as the amount of movement. Alternatively, the total value of the highest frequency and the higher frequency among the frequencies of the luminance, values adjacent to the luminance value corresponding to the highest frequency may be obtained as the total value described above. Note that “neighboring luminance values” may be used instead of “the adjacent luminance values”.
Next, in step S440, the background image for each object region is segmented by the background analysis unit 214 by executing semantic segmentation processing (segmentation) to the background image generated in step S430. Note that in this embodiment, “a region of vegetation (vegetation region)” among the segmented regions obtained by the region-based segmentation executed in step S440 will be set as the specific region, and segmented regions other than the “vegetation region” will be set as non-specific regions. However, the attributes of the specific region and the non-specific region are not limited to the “vegetation region” and a “segmented region other than the vegetation region”, respectively.
Although a plurality of methods are known as segmentation methods, DeepLab (Google) which is a method based on machine learning, particularly, deep learning will be used here. To construct a discriminator to obtain regions corresponding to the road, the sky, the trees, and buildings by using DeepLab, frame images in which the road and the buildings appear are collected as training data from the moving image. More specifically, regions of the road and the buildings are extracted from each frame image in the moving image to generate a file in which the labels (of the road and the buildings) are written. By executing training using the training data prepared in this manner, a discriminator that can segment the regions of the road and the buildings can be constructed.
Next, in step S450, the background image generated in step S430 is segmented into a plurality of unit regions by the background analysis unit 214. The background analysis unit 214 subsequently sets a Qp value to each unit region in the background image. Since a Qp value will be set based on a unit of 16×16 pixels as a macroblock in H.264, a Qp value will be set for each macroblock (that is, a unit region=a macroblock will be set) in this embodiment. However, if a macroblock could be segmented even smaller, a Qp value may be set based on a smaller unit. Furthermore, since a Qp value can be set on a CTU basis in H.265, the setting can be performed in accordance with the size of the unit region to which a Qp value can be set.
If even a single pixel among the pixels forming a macroblock belongs to a non-specific region of the segmented regions obtained by the region-based segmentation executed in step S440, the background analysis unit 214 will determine that this macroblock belongs to a non-specific region. The background analysis unit 214 will set a value of “36” which is the Qp value for the non-specific region to the macroblock determined to belong to the non-specific region.
On the other hand, if all of the pixels forming a macroblock belong to a specific region of the segmented regions obtained by the region-based segmentation executed in step S440, the background analysis unit 214 will determine that this macroblock belongs to a specific region. The background analysis unit 214 will set a value of “40”, which is the Qp value for the specific region, to the Qp value controlled based on the amount of movement of each pixel forming the macroblock determined to belong to the specific region. For example, for a macroblock in which all of the pixels belong to a specific region, the background analysis unit 214 will obtain an average value Av of the amounts of movement corresponding to pixels forming the macroblock, and set the Qp value to be set to the macroblock to be “40+Av”.
For example, since the average value of the amounts of movement in a macroblock formed by pixels (the pixel in the pixel position 370 of
Note that the Qp value “40” may also be added to a weighted average value of the amounts of movement. More specifically, by setting γ to be a weighting coefficient, the Qp value to be set to a vegetation region in the periphery of the pixel position 370 of
In this manner, by setting a greater Qp value in a location with greater movement even in the same vegetation region, it will be possible to degrade the image quality of a movement region in the background that does not include important information but can increase the bit rate due to low compression efficiency. As a result, the bit rate can be reduced. Subsequently, the background analysis unit 211 will store, in the storage unit 222, the Qp value of each macroblock in the background image.
Next, the processing performed by the image processing apparatus 100 in the compression stage will be described in accordance with the flowchart of
In step S720, the control unit 223 obtains, from the storage unit 222, the respective Qp values of the macroblocks in the background image which were obtained by the processing in accordance with the flowchart of
In step S740, the foreground extraction unit 215 extracts the foreground (foreground region) to be the detection target from each captured image obtained in step S730. The scene of the road shown in
As a method of detecting a car or a person by image analysis, a method based on machine learning, particularly, deep learning, is known as a method that can achieve high accuracy and high speed processing that can support real time processing. More specifically, methods such as YOLO (You Only Lock Once), SSD (Single Shot Multibox Detector), and the like can be used. A case that uses SSD will be illustrated here. SSD is a method for detecting each object from an image that includes a plurality of objects.
To construct a discriminator that uses SSD to detect a car or a person from an image, training data is prepared by collecting each image that includes a car or a person from a plurality of images. More specifically, each person region and each car region are extracted from each image, and a file in which the coordinates of the center position and the size of each region are written is created. A discriminator that can detect a car or a person from an image is constructed by performing training by using the training data prepared in this manner.
Upon detecting a car or a person from a captured image by using the discriminator that has been generated in this manner, the foreground extraction unit 215 outputs the position and the size (the width and the height) of the detected car region or person region (foreground region) to the compression coding unit 212. The position of each foreground region is set at the center position of the foreground region in a coordinate system in which an upper left position of a captured image is set as the origin. Also, the size of the foreground region is the ratio of the foreground region (the width and the height) to the size of the captured image (the width and the height). The position and the size of each foreground region obtained in this manner will be output to the compression coding unit 212 in a list since a plurality of cars and persons may be detected in a captured image.
In step S750, the compression coding unit 212 specifies each correspondence region on the background image corresponding to a “foreground region on the captured image” which is specified based on “the position and the size of the foreground region” output from the foreground extraction unit 215 in step S740. Subsequently, the compression coding unit 212 specifies, among the macroblocks in the background image, a macroblock that is partially or entirely included in the correspondence region, and changes the setting so that the Qp value “32” for an ROI will be used instead of the Qp value of the specified macroblock.
In step S760, the captured image is segmented into a plurality of macroblocks by the compression coding unit 212, and compression coding of each segmented macroblock is performed by using the Qp value of the macroblock in the background image corresponding to the segmented macroblock. Subsequently, the compression coding unit 213 controls the communication unit 224 to distribute the captured image, in which all of the macroblocks have been compression-coded, to the client apparatus 200 via the network 300. Note that the distribution destination of the communication unit 224 is not limited to a specific destination. For example, the communication unit 224 may distribute the compression-coded captured image to another apparatus in addition to or instead of the client apparatus 200, or may store the compression-coded captured image in its own storage unit 222.
In step S770, the control unit 223 determines whether to continue executing the compression coding operation (whether a captured image to be processed is present). If it is determined that the processing is to be continued, the process advances to step S730. Otherwise, the processing according to the flowchart of
In this manner, in this embodiment, a Qp value is set to the background based on a background image and each amount of movement in the background that have been generated and extracted by analyzing captured images of frames corresponding to a predetermined time. As a result, compression coding can be performed at a high compression rate on a region, such as vegetation with constant movement or the like, in which the bit rate can increase but important information is not included. Furthermore, according to this embodiment, compression coding of an ROI of a captured image can be performed by using a Qp value for the ROI, and compression coding of a region in which the ROI has been excluded can be performed by using a Qp value set to a correspondence region which corresponds to the region in the background image. Hence, the image quality of the foreground can be increased preferentially in a case in which a target passes the front of a vegetation region, and the image quality of the vegetation region can be decreased as a background in a case in which a target does not pass in front of the vegetation region. As a result, the bit rate can be reduced more effectively.
In this embodiment, the number of frames to be used for the background analysis by the background analysis unit 214 and a target time (that is, whether to thin or use all of the 30 fps) and the timing at which the background information (the background image and the amount of movement for each pixel in the background image) is updated become important.
The time to be taken for the background analysis needs to be changed in accordance with the use case. For example, the semantics of the movement to be extracted in the background will differ between a case in which the background information is updated once a month by executing a background analysis on a moving image equivalent to a day's worth of image capturing and a case in which the background information is updated every few minutes by executing a background analysis on a moving image of approximately few GOPs (Groups of Pictures). This embodiment assumes a use case that targets monitoring of a general road as shown in
In each embodiment including this embodiment hereinafter, differences from the first embodiment will be described. Assume that various kinds of arrangements are similar to those of the first embodiment unless particularly mentioned below. The control of compression coding includes not only control by designating Qp values, but also control by using a CBR (Constant Bit Rate). Control by CBR is control performed to maintain a constant bit rate by changing the Qp values in accordance with the moving image. Although the control by CBR is advantageous in that the capacity for recording a moving image can be controlled, negative effects such as a large degradation in the image quality may occur depending on the contents of the moving image. Also, it is also possible to assume a case in which the image quality of a main object will change, even if the same scene is captured, due to the fact that the Qp value to be set will differ between a day in which trees sway greatly due to a strong wind and a day in which the trees do not sway. In order to prevent such a state, the bit rate will be controlled by selectively reducing the image quality of a region with large movement in this embodiment.
Processing performed by an image processing apparatus 100 in an analysis stage will described in accordance with the flowchart of
In step S810, in addition to the settings obtained in the process of step S410, an image obtainment unit 211 obtains, with respect to the Qp values to be used when encoding is to be performed in compliance with H.264, a difference between a Qp value for a general background and a Qp value for an ROI and a difference between a Qp value for a specific region and the Qp value for the ROI.
In this case, “4” is obtained as the difference (to be referred to as “Δ general background Qp value” hereinafter) between the Qp value for the general background and the Qp value for the ROI, and “8” is obtained as the difference (to be referred to as “Δ specific region Qp value”) between the Qp value for the specific region and the Qp value for the ROI.
Next, in step S850, the background image generated in the process of step S430 is segmented into a plurality of unit regions by a foreground extraction unit 215. The foreground extraction unit 215 subsequently sets a difference Qp value to each of the unit regions in the background image. A difference Qp value will be set for each macroblock in this embodiment as well.
If even a single pixel among the pixels forming a macroblock belongs to a non-specific region of the segmented regions obtained by the region-based segmentation executed in step S440, the foreground extraction unit 215 will determine that this macroblock belongs to a non-specific region. Subsequently, the foreground extraction unit 215 will set, to the macroblock determined to belong to the non-specific region, a difference Qp value=α×Δgeneral background Qp value as the compression coding parameter. In this case, α represents a weighting coefficient.
If all of the pixels forming a macroblock belong to a specific region of the segmented regions obtained by the region-based segmentation executed in step S440, the foreground extraction unit 215 will determine that this macroblock belongs to a specific region. Subsequently, the foreground extraction unit 215 will set, to the macroblock determined to belong to the specific region, a difference Qp value=β×Δspecific region Qp value+γ×v as the compression coding parameter. In this case, β and γ are weighting coefficients (γ is as described above) and v is an average value of the amounts of movement corresponding to pixels forming the macroblock. The foreground extraction unit 215 will store, in a storage unit 222, the difference Qp value set for each macroblock in the background image.
Next, the processing performed by the image processing apparatus 100 in the compression stage will be described in accordance with the flowchart of
In step S910, the image obtainment unit 211 obtains, in a manner similar to the process of the above-described step S410, the settings necessary for analyzing the moving image. In addition, a compression coding unit 212 obtains the compression coding parameters from the storage unit 222. The compression coding parameters obtained in this step include the Qp value for (assume that the value is “32” in this case) the ROI, an initial value (assume that the value is “38” in this ease) of CBR of the Qp value for the ROI, and “2 Mbps” as the target bit rate of the CBR.
In step S920, a control unit 223 obtains, from the storage unit 222, the difference Qp value of each macroblock in the background image obtained by the processing in accordance with the flowchart of
Next, in step S950, the compression coding unit 212 sets the corresponding Qp value to each of the ROI, the specific region, and the non-specific region in the captured image. Although a plurality of methods are known for controlling the bit rate, the simplest control method will be employed here. That is, in the control method to be employed, compression coding will be performed by setting an initial Qp value, and the Qp value will be increased if the bit rate is higher than expected and decreased if the bit rate is lower than expected. A Qp value for a comparatively low image quality will be set as the initial Qp value to prevent a state in which the distribution and storage will be suppressed due the occurrence of an unexpectedly extremely high bit rate. As an example, the compression coding unit 212 will set the following Qp values as the respective Qp values of the ROI, the specific region, and the non-specific region in the background image.
Qp value for the ROI=R
Qp value for the specific region=R+(β×Δspecific region Qp value+γ×v)
Qp value for the non-specific region=R+(α×Δgeneral background Qp value)
Here, the term “(β×Δ specific region Qp value+γ×v)” of the Qp value for the specific region is the difference Qp value set to the macroblock in the background image that corresponds to the macroblock in the specific region. Also, the term “(α×Δ general background Qp value)” the Qp value for the non-specific region is the difference Qp value set to the macroblock in the background image that corresponds to the macroblock in the non-specific region.
Let “38” be the initial value of R, and “1” be the initial value of each of α, β, and γ. In this case, the Qp value for the ROI, the Qp value for the specific region, and the Qp value for the non-specific region will be as follows respectively.
Qp value for the ROI=38
Qp value for the specific region=38+(8×β+v)
Qp value for the non-specific region=38+(4×α)
Next, in step S960, the compression coding unit 212 uses the Qp value for the ROI, the Qp value for the specific region, and the Qp value for the non-specific region to perform compression coding of the background image. The compression coding of the ROI is performed by using the Qp value for the ROI, the compression coding of the specific region is performed by using the Qp value for the specific region, and the compression coding of the non-specific region is performed by using the Qp value for the non-specific region. Subsequently, the compression coding unit 212 reduces the value of R so as to bring the bit rate obtained as a result of the compression coding closer to the target bit rate. Hence, in the next compression coding processing, compression coding will be performed by using Qp values to which the reduced R value has been reflected.
For example, in a case in which the bit rate obtained as a result of the compression coding is lower than the target bit rate, the compression coding unit 212 will reduce the R value (that is, the R value will not be decreased any more if the R value has reached 32). Since it can be assumed that the first result of the compression coding will be smaller than the target bit rate, the R value will be reduced one by one from the initial value of 38 per processing. However, in a case in which the bit rate is half the target value or lower, the R value may be reduced by a value of 2 per processing.
Subsequently, if the current bit rate is still lower than the target bit rate even when the R value has reached 32, the compression coding unit 212 will reduce the image quality degradation of the background by reducing, while keeping the R value fixed to 32, the weighting coefficients α and β to reduce the difference between the Qp value for the ROI and the Qp values for specific region and the non-specific region.
If the current bit rate is still lower than the target bit rate even when the weighting coefficients α and β have reached 0. The compression coding unit 212 will reduce the weighting coefficient γ while keeping the R value fixed to 32 and the weighting coefficients α and β fixed to 0 (that is, the degree of contribution of the average value of the amounts of movement to the Qp values will be decreased). The method of reducing the weighting coefficients α, β, and γ is not limited to a particular method. For example, the weighting coefficient γ may be reduced if the weighting coefficients α and β become 0.5 or less or the weighting coefficients α, β, and γ may be simultaneously reduced based on a predetermined ratio (for example, α:β:γ=4:2:1).
In addition, in a case in which the current bit rate has become higher than the target bit rate before the R value has reached 32, the compression coding unit 212 execute adjustment by increasing the weighting coefficients α, β, and γ so that the current bit rate will become lower than the target bit rate even when the R value reaches 32. In this case, the weighting coefficient γ will be increased first. Subsequently, if the current bit rate is still higher than the target bit rate even when the weighting coefficient γ has been creased to 15, the weighting coefficient β will be increased, and then the weighting coefficient α will be increased last. There are a plurality of methods for adjusting of the weighting coefficients α, β, and γ, and the adjustment method may be changed in accordance with the use ease,
In this manner, according to this embodiment, the moving image can be distributed without reducing the image quality of the ROI when bit rate control is to be executed based on CBR. In this case, control can be performed by using different weights for a background with movement, a specific region such as such as a vegetation region, and a general background. In particular, the image quality of the background with movement will be reduced first, the image quality of the specific region such as the vegetation region will be reduced next, and the image quality of the general background will be reduced last. This will allow the image quality of a region with lesser amount of information where the bit rate can be increased more easily to be reduced first.
In each of the above-described embodiments, Op value control based on a difference between an I-frame and a P-frame that characterizes moving image compression according to standards such as H.264 and H.265 has not been performed. Instead, control has been performed by setting common Qp values to both kinds of frames. However, compared to an I-frame in which compression is performed by using information within the frame, compression is performed only on a difference from a previous frame in the P-frame. Thus, the influence from a movement in the background will increase in the P-frame. Hence, compression coding will be performed by using a Qp value (a Qp value which is not dependent on the amount of movement) in which the weighting coefficient γ=0 in an I-frame captured image, and compression coding will be performed by using a Qp value (a value which is dependent on the amount of movement) in which the weighting coefficient γ is set in a manner similar to the above-described embodiments in a P-frame. Although the compression effect will decrease by making settings in this manner, the image quality of the moving image will greatly improve. This is because a target unit region (macroblock) will be skipped more easily when the Qp value set to the P-frame increases, and each value of the previous frame will be directly used. Therefore, although the change in the movement due to the swaying of the tress will not be reflected accurately, a moving image with a comparatively good image quality background can be obtained because each value of the I-frame that has been compressed in a comparatively high image quality will be directly used. Alternatively, a method for setting each P-frame to be skipped may be employed when there is a large amount of movement.
Executing such processing will allow a moving image in which the image quality has been maintained to be obtained by losing only unnecessary information such as the fine swaying of the trees or the like in a use case such as a scene of a park or the like with a large vegetation region.
Although each of the above-described embodiments exemplified an arrangement in which an image processing apparatus 100 and a client apparatus 200 are connected via a network 300, the present disclosure is not limited to this, and the image processing apparatus 100 and the client apparatus 200 may be integrated.
In addition, each of the above-described embodiments described a case in which background analysis processing by a background analysis unit 214 and foreground extraction processing by a foreground extraction unit 215 are performed by the image processing apparatus 100 which includes an accelerator unit 225. However, in particular, in relation to the background analysis processing, the background analysis processing may be executed in a computer apparatus such as the client apparatus 200 or the like after the moving image has once been distributed or may be executed by an accelerator unit that has been added externally. Also, a moving image captured by the image processing apparatus 100 may be stored in a storage medium such as an SD card that has been inserted in the image processing apparatus 100, and the storage medium may be inserted into a computer apparatus, which is not connected to the network 300, to copy the moving image to the computer apparatus. As a result, the computer apparatus will be able to perform the background analysis processing, the foreground extraction processing, and the like, described above, on the moving image.
In addition, the numerical values, the processing timings, the processing orders, and the like used in the above description are merely examples that are used for the sake of a more specific explanation, and the present disclosure is not limited to these numerical values, processing timings, processing orders, and the like.
Furthermore, some or all of the above-described embodiments may be appropriately combined and used. Additionally, some or all of the above-described embodiments may be selectively used.
The present disclosure can also be implemented by processing for supplying a program configured to implement at least one function of the above-described embodiments to a system or an apparatus via a network or a storage medium and causing at least one processor in the computer of the system or the apparatus to read out and execute the program. The present disclosure can also be implemented by a circuit (for example, ASIC) that implements the at least one function.
The present disclosure is not limited to the above embodiments and various changes and modifications can be made within the spirit and scope of the present disclosure. Therefore, to apprise the public of the scope of the present disclosure, the following claims are made.
According to the above-described embodiments, in a case in which different compression coding parameters are to be set for a specific region and a non-specific region in a background image to be used for compression coding, a technique for setting a compression coding parameter that corresponds to an amount of movement in the specific region can be provided.
Embodiment(s) of the present disclosure can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2020-075607, filed Apr. 21, 2020, which is hereby incorporated by reference herein in its entire
Number | Date | Country | Kind |
---|---|---|---|
2020-075607 | Apr 2020 | JP | national |