The present invention relates to an image processing apparatus, an image processing method, and a storage medium.
In a wide range of fields, there is demand for being able to crop desired subject regions from images. One technique for cropping a subject region is to create an AlphaMatte and use the AlphaMatte to crop the subject. “AlphaMatte” refers to an image in which the image is separated into a foreground region (the subject) and a background region.
A method of using intermediate data called a “Trimap” is often used to create a high-precision AlphaMatte. “Trimap” is an image divided into three regions, namely a foreground region, a background region, and an unknown region.
The technique of Japanese Patent Laid-Open No. 2010-066802, for example, is known as a technique for generating a Trimap. Japanese Patent Laid-Open No. 2010-066802 discloses a technique for generating an AlphaMatte, in which a binary image of a foreground and a background is generated from an input image using an object extraction technique, and a tri-level image is then generated by setting an undefined region of a predetermined width at a boundary between the foreground and background.
However, because Japanese Patent Laid-Open No. 2010-066802 does not use distance information, the accuracy of the Trimap worsens when, for example, the subject and background are the same color.
Having been achieved in light of such circumstances, the present invention provides a technique for generating a highly-accurate Trimap by using distance information obtained through shooting using an image plane phase detection sensor.
According to a first aspect of the present invention, there is provided an image processing apparatus comprising at least one processor and/or at least one circuit which functions as: an obtainment unit configured to obtain a captured image and a plurality of parallax images generated through shooting using an image sensor in which a plurality of photoelectric conversion units are arranged, each photoelectric conversion unit receiving a light flux passing through a different partial pupil region of an imaging optical system; a generation unit configured to generate a background separation image in which regions of the captured image are classified as a foreground region, a background region, and an unknown region, based on distance distribution information obtained from the plurality of parallax images; and an output unit configured to output the captured image and the background separation image, wherein the generation unit generates the background separation image such that a region in which a distance in the distance distribution information is within a first range is classified as the foreground region, a region in which a distance in the distance distribution information is outside a second range broader than the first range is classified as the background region, and a region in which a distance in the distance distribution information is outside the first range and inside the second range is classified as the unknown region.
According to a second aspect of the present invention, there is provided an image processing method executed by an image processing apparatus, comprising: obtaining a captured image and a plurality of parallax images generated through shooting using an image sensor in which a plurality of photoelectric conversion units are arranged, each photoelectric conversion unit receiving a light flux passing through a different partial pupil region of an imaging optical system; generating a background separation image in which regions of the captured image are classified as a foreground region, a background region, and an unknown region, based on distance distribution information obtained from the plurality of parallax images; and outputting the captured image and the background separation image, wherein the background separation image is generated such that a region in which a distance in the distance distribution information is within a first range is classified as the foreground region, a region in which a distance in the distance distribution information is outside a second range broader than the first range is classified as the background region, and a region in which a distance in the distance distribution information is outside the first range and inside the second range is classified as the unknown region.
According to a third aspect of the present invention, there is provided a non-transitory computer-readable storage medium which stores a program for causing a computer to execute an image processing method comprising: obtaining a captured image and a plurality of parallax images generated through shooting using an image sensor in which a plurality of photoelectric conversion units are arranged, each photoelectric conversion unit receiving a light flux passing through a different partial pupil region of an imaging optical system; generating a background separation image in which regions of the captured image are classified as a foreground region, a background region, and an unknown region, based on distance distribution information obtained from the plurality of parallax images; and outputting the captured image and the background separation image, wherein the background separation image is generated such that a region in which a distance in the distance distribution information is within a first range is classified as the foreground region, a region in which a distance in the distance distribution information is outside a second range broader than the first range is classified as the background region, and a region in which a distance in the distance distribution information is outside the first range and inside the second range is classified as the unknown region.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Hereinafter, embodiments of the present invention will be described with reference to the attached drawings. Elements that are given the same reference numerals throughout all of the attached drawings represent the same or similar elements, unless otherwise specified. Note that the technical scope of the present invention is defined by the claims, and is not limited by the following respective embodiments. Also, not all of the combinations of the aspects that are described in the embodiments are necessarily essential to the present invention. Also, the aspects that are described in the individual embodiments can be combined as appropriate.
First, the internal configuration of an image processing apparatus 100 used in each embodiment will be described with reference to
In
The lens unit 106 (an imaging optical system) includes a lens group including a zoom lens and a focus lens, an aperture mechanism, and a drive motor. An optical image that passes through the lens unit 106 is received by the image capturing unit 107. The image capturing unit 107 uses a CCD, CMOS, or similar sensor which serves to replace an optical signal with an electrical signal. Because the electrical signal obtained here is an analog value, the image capturing unit 107 also has a function for converting the analog value into a digital value. The image capturing unit 107 is an image plane phase detection sensor, and will be described in detail.
The CPU 102 controls each unit of the image processing apparatus 100 according to programs stored in the ROM 103, using the RAM 104 as work memory. This control includes control of displays corresponding to the display unit 114 and control of recording into the recording medium 112. The ROM 103 is a non-volatile recording device, in which programs for causing the CPU 102 to operate, and various adjustment parameters, and the like are recorded. The RAM 104 is volatile memory that uses a semiconductor device, and is generally slower and lower in capacity than the frame memory 111.
The frame memory 111 is a device that can temporarily store image signals and read out those signals when necessary. Image signals contain huge amounts of data, and thus a high-bandwidth and high-capacity device is required. In recent years, Dual Data Rate 4-Synchronous Dynamic RAM (DDR4-SDRAM) is often used. By using this frame memory 111, it is possible, for example, to composite images that differ in time, or to cut out only the necessary regions from an image.
The image processing unit 105 performs various types of image processing on data from the image capturing unit 107 or image data stored in the frame memory 111 or the recording medium 112 under the control of the CPU 102. The image processing carried out by the image processing unit 105 includes image data pixel interpolation, encoding processing, compression processing, decoding processing, enlargement/reduction processing (resizing), noise reduction processing, color conversion processing, and the like. The image processing unit 105 also performs processing such as correction of performance variations of pixels in the image capturing unit 107, defective pixel correction, white balance correction, luminance correction, correction of distortion and peripheral light loss caused by lens characteristics, and the like. Note that the image processing unit 105 may be constituted by a dedicated circuit block for carrying out specific image processing. Depending on the type of the image processing, it is also possible for the CPU 102 to carry out image processing in accordance with a program, rather than using the image processing unit 105.
Based on calculation results obtained by the image processing unit 105, the CPU 102 can control the lens unit 106 to magnify the optical image, adjust the focal length, adjust the aperture and the like to adjust the amount of light, and so on. It is also possible to correct hand shake by moving part of the lens group in a plane orthogonal to the optical axis.
The operation unit 113 is one interface with the outside of the device, and receives user operations. The operation unit 113 uses devices such as mechanical buttons, switches, and the like, including a power switch and a mode changing switch.
The display unit 114 provides a function for displaying images. The display unit 114 is a display device that can be seen by the user, and can display, for example, images processed by the image processing unit 105, setting menus, and the like. The user can check the operation status of the image processing apparatus 100 by looking at the display unit 114. For the display unit 114, a compact and low-power-consumption device, such as a liquid crystal display (LCD) or an organic electroluminescence (EL) device, has been used as a display device in recent years. In addition, a resistive film-based or electrostatic capacitance-based thin-film device, called a “touch panel”, can be provided to the display unit 114, and may also be used instead of the operation unit 113.
The CPU 102 generates character strings to inform the user of the setting state and the like of the image processing apparatus 100, menus for configuring the image processing apparatus 100, and the like, superimposes these items on the image processed by the image processing unit 105, and displays the result in the display unit 114. In addition to text information, shooting assistance displays such as a histogram, vectorscope, waveform monitor, zebra, peaking, false color, and the like can also be superimposed.
The image terminal 109 serves as another interface. Typical examples of such an interface include Serial Digital Interface (SDI), High Definition Multimedia Interface (HDMI, registered trademark), DisplayPort (registered trademark), and various other interfaces. Using the image terminal 109 makes it possible to display real-time images on an external monitor or the like.
The image processing apparatus 100 also includes the network terminal 108, which can transmit control signals as well as images. The network terminal 108 is an interface for inputting and outputting image signals, audio signals, and the like. The network terminal 108 can also communicate with external devices over the Internet or the like to send and receive various data such as files, commands, and the like.
The image processing apparatus 100 not only outputs images to the exterior, but also has a function for recording images internally. The recording medium 112 is capable of recording image data, various types of setting data, and the like, and uses a high-capacity storage device. For example, a Hard Disc Drive (HDD), a Solid State Drive (SSD), or the like is used as the recording medium 112. The recording medium 112 is mounted to the recording medium I/F 110.
The object detection unit 115 is a block for detecting objects using, for example, artificial intelligence, as represented by deep learning using neural networks. Taking object detection through deep learning as an example, the CPU 102 sends a program for the processing stored in the ROM 103, as well as a network structure, weighting parameters, and so on such as Single Shot Multibox Detector (SSD), You Only Look Once (YOLO), and the like, to the object detection unit 115. The object detection unit 115 performs processing to detect objects from image signals based on various parameters obtained from the CPU 102, and loads the processing results into the RAM 104.
Finally, to drive these systems, the image processing apparatus 100 also includes the power supply unit 116, the oscillation unit 117, and the like. The power supply unit 116 is a part that supplies power to each of the blocks described above, and has a function of converting and distributing power from a commercial power supply supplied from the outside, a battery, or the like to any desired voltage. The oscillation unit 117 is an oscillation device called a “crystal”. The CPU 102 and the like generate a desired timing signal based on a periodic signal input from this oscillation device, and proceed through program sequences.
The foregoing has described an example of the overall system of the image processing apparatus 100.
The image sensor having the configuration illustrated in
The image capturing unit 107 can output the signal for phase difference detection for each pixel unit, but can also output a value obtained by finding the arithmetic mean the signals for phase difference detection for a plurality of pixel units in proximity to each other. By outputting the arithmetic mean, the time required to read out the signal from the image capturing unit 107 can be reduced, and the bandwidth of the internal bus 101 can be reduced.
Using the output signal from the image capturing unit 107 serving as an image sensor, the CPU 102 calculates the correlation between the two image signals to calculate a defocus amount, parallax information, various types of reliability information, and the like. The defocus amount at the image plane is calculated based on misalignment between the A image signal and the B image signal. The defocus amount has a positive or negative value, and whether the focus is front focus or rear focus can be determined by whether the defocus amount has a positive value or a negative value. The extent to which the subject is out of focus can be determined from the absolute value of the defocus amount, and the subject is determined to be in focus when the defocus amount is 0. In other words, the CPU 102 calculates information indicating front focus or rear focus based on the whether the defocus amount is positive or negative. Additionally, the CPU 102 calculates information indicating the degree of focus, corresponding to the degree to which the subject is out of focus, based on the absolute value of the defocus amount. The CPU 102 outputs the information as to whether the focus is front focus or rear focus when the defocus amount is greater than a predetermined value, and outputs information indicating that the subject is in focus when the absolute value of the defocus amount is within the predetermined value. The CPU 102 controls the lens unit 106 to adjust the focus according to the defocus amount.
Additionally, based on the parallax information and the lens information of the lens unit 106, the CPU 102 calculates a distance to the subject using the principle of triangulation. Furthermore, the CPU 102 generates a Trimap taking into account the distance to the subject, the lens information of the lens unit 106, and the setting status of the image processing apparatus 100. The method of generating a Trimap will be described in detail later.
Here, two signals are output from the image capturing unit 107 for each pixel, namely the (A image signal+B image signal) for image capturing, and the A image signal for phase difference detection. In this case, the B image signal for phase difference detection can be calculated by subtracting the A image signal from the (A image signal+B image signal) after the output. The method is not limited thereto, however, and the output from the image capturing unit 107 may be performed as the A image signal and the B image signal, in which case the (A image signal+B image signal) for image capturing can be calculated by adding the A image signal and the B image signal.
The image processing apparatus 100 has the above configuration, and it is therefore possible to obtain a captured image and a plurality of parallax images generated by shooting using an image sensor in which a plurality of photoelectric conversion units, each receiving a light flux passing through different partial pupil regions of the imaging optical system, are arranged.
In each of the following embodiments, the image processing apparatus 100 described above is used unless otherwise noted. Additionally, the configurations in each of the following embodiments can be combined as appropriate.
Embodiment 10 describes an example of processing for generating a Trimap (a background separation image).
When the power is turned on to the power supply unit 116 by the user operating the operation unit 113, the CPU 102 performs shooting standby processing in step S1001. In the shooting standby processing, the CPU 102 displays, in the display unit 114, an image captured by the image capturing unit 107 and processed by the image processing unit 105, such as that illustrated in
In step S1002, the user operates the operation unit 113 while looking at the display unit 114. The CPU 102 performs settings and processing in response to the above operations for each processing unit of the image processing apparatus 100.
Here, the CPU 102 displays the background threshold setting menu screen 1300 in such a manner that the user cannot set a value smaller than the value set as the reference value for the foreground threshold. For example, if 2 is set as the reference value for the foreground threshold, the CPU 102 performs a display such as a gray display 1302 illustrated in
The CPU 102 also determines the foreground threshold and the background threshold according to the reference values for the foreground threshold and the background threshold set in step S1002, respectively.
In step S1003, the CPU 102 calculates distance information to the subject for each pixel based on the parallax information and lens information of the lens unit 106 (i.e., distance distribution information is obtained).
In step S1004, the CPU 102 determines, for each pixel, whether the distance information to the subject is within the range of the foreground threshold determined in step S1002. If the distance information is within the range of the foreground threshold, the processing moves to step S1006, whereas if the distance information is outside the range of the foreground threshold, the processing moves to step S1005.
In step S1005, the CPU 102 determines, for each pixel, whether the distance information to the subject is outside the range of the background threshold determined in step S1002. If the distance information is outside the range of the background threshold, the processing moves to step S1007, whereas if the distance information is within the range of the background threshold, the processing moves to step S1008.
In step S1006, the CPU 102 classifies a region of pixels for which the distance information is determined to be within the range of the foreground threshold in step S1004 as a foreground region, and performs processing for replacing the pixel values in that region with white data.
In step S1007, the CPU 102 classifies a region of pixels for which the distance information is determined to be outside the range of the background threshold in step S1005 as a background region, and performs processing for replacing the pixel values in that region with black data.
In step S1008, the CPU 102 classifies a region of pixels for which the distance information is determined to be within the range of the background threshold in step S1005 as an unknown region, and performs processing for replacing the pixel values in that region with gray data.
Specifically, assume that, for example, the distance information calculated by the CPU 102 in step S1003 takes a value in the range of from −128 to +127, and that the value of the distance information at the position where the defocus amount is 0 is 0. Furthermore, assume that the reference value of the threshold set by the user in step S1002 and a range of values according to the reference value are in the relationship illustrated in
Through the above processing, the CPU 102 generates a Trimap divided into three regions, namely the foreground region, the background region, and the unknown region.
In step S1009, the CPU 102 performs processing for outputting the Trimap to the display unit 114, the image terminal 109, or the network terminal 108.
As described above, in the present embodiment, a Trimap can be generated easily, without calibration, by generating the Trimap using the distance information calculated from data from an image plane phase detection sensor.
Although the present invention describes a configuration in which the Trimap is displayed or output, the configuration may be such that the Trimap is recorded into the recording medium 112 via the recording medium I/F 110. The configuration may be such that the Trimap is displayed, output, or recorded as a single still image, or a plurality of sequential Trimaps are displayed, output, or recorded as a moving image.
Additionally, although the present embodiment describes a configuration in which the signals for phase difference detection are output for each pixel unit from the image capturing unit 107, the configuration may be such that values obtained by finding the arithmetic mean of the signals for phase difference detection from a plurality of pixel units in proximity to each other in the image capturing unit 107 are output and a reduced Trimap is generated using those values. The reduced Trimap may be displayed, output, or recorded at the original image size, or may be resized by the image processing unit 105 and displayed, output, or recorded at a different image size.
Additionally, although the present embodiment describes a configuration in which the Trimap is displayed using white data for the foreground region, black data for the background region, and gray data for the unknown region, the color data for each region may be replaced with color data different from that in the above example.
In Embodiment 10, it is difficult for the user to grasp a positional relationship between a shot image and the boundaries of each region of the Trimap. Therefore, Embodiment 20 will describe an example of processing of superimposing boundary lines of each region of the Trimap on the captured image.
In step S2001 of
Note that in step S2001, the user also sets the reference value for the foreground threshold and the reference value for the background threshold, in the same manner as in step S1002.
In step S2002, the CPU 102 generates the Trimap by performing the same processing as step S1003 to step S1008 described in Embodiment 10.
In step S2003, the CPU 102 extracts the boundaries of each region in the Trimap. Specifically, the boundaries of each region can be extracted by, for example, applying a high-pass filter with a predetermined cutoff frequency to luminance values of the Trimap in which the foreground region, the background region, and the unknown region are constituted by white data, black data, and gray data, respectively, and extracting high-frequency components. The cutoff frequency is determined by the CPU 102 according to the value of a frequency set by the user through the operation unit 113 in step S2001.
Furthermore, the CPU 102 can also determine whether a boundary is between white data and gray data, between gray data and black data, or between white data and black data, based on the positive/negative sign and magnitude of the values extracted by the aforementioned high-pass filter. For example, because the difference in luminance between white data and gray data is smaller than the difference in luminance between white data and black data, the magnitude of the value extracted by the high-pass filter can be used to determine whether a pixel in the white data region is on the boundary of the gray data or the boundary of the black data. When the gray data is used as a reference, the difference in luminance between the gray data and white data and the difference in luminance between the gray data and black data are opposite in terms of the positive/negative sign, and thus the positive/negative sign of the values extracted by the high-pass filter can be used to determine whether a pixel in the gray data region is on the boundary of the white data or on the boundary of the black data.
In this manner, it is possible to determine whether a boundary is between white data and gray data, between gray data and black data, or between white data and black data, i.e., whether a boundary is between the foreground region and the unknown region, between the unknown region and the background region, or between the foreground region and the background region.
In step S2004, the CPU 102 determines, for each pixel, whether the boundary extracted in step S2003 is a boundary between the foreground region and the unknown region. If the boundary is a boundary between the foreground region and the unknown region, the processing moves to step S2005, whereas when such is not the case, i.e., if the boundary is a boundary between the unknown region and the background region or between the foreground region and the background region, the processing moves to step S2006.
In step S2005, the CPU 102 superimposes color data, corresponding to the setting of the boundary line between the foreground region and the unknown region set in step S2001, on an output image signal from the image processing unit 105, at the same position as the pixel determined to be on the boundary between the foreground region and the unknown region in step S2004. Specifically, data in which the higher the gain value set in the boundary line setting menu screen 2100 is, the darker the color set as color appears, is superimposed on the output image signal from the image processing unit 105.
In step S2006, the CPU 102 superimposes color data, corresponding to the setting of the boundary line between the unknown region and the background region set in step S2001, on the output image signal from the image processing unit 105, at a boundary that is not the boundary between the foreground region and the unknown region in step S2004, i.e., at a position of a pixel determined to be on the boundary between the unknown region and the background region or the boundary between the foreground region and the background region. Specifically, data in which the higher the gain value set in the boundary line setting menu screen 2100 is, the darker the color set as color appears, is superimposed on the output image signal from the image processing unit 105.
In step S2007, the CPU 102 performs processing for outputting the image signal on which the boundary lines have been superimposed in step S2005 or step S2006 to the display unit 114, the image terminal 109, or the network terminal 108.
As described above, the present embodiment makes it easier for the user to understand the relationship between the shot image and the boundaries between the regions of the Trimap by superimposing the boundary lines among the Trimap regions on the captured image.
Additionally, by making the setting of the boundary lines between the foreground region and the background region the same as the setting of the boundary lines between the unknown region and the background region, it can be made easier for the user to recognize that the subject is in the unknown region.
There is an issue in that when the image and the Trimap are displayed separately, it is difficult to check whether the foreground region and the unknown region of the Trimap cover the subject of the image. The present embodiment will describe a configuration that addresses this issue.
In the present embodiment, the image processing unit 105 illustrated in
With reference to
In step S3003, by the user operating the operation unit 113, the CPU 102 displays a Trimap transparency setting menu screen 3100, illustrated in
In step S3004, the user moves the cursor 3101 displayed in the Trimap transparency setting menu screen 3100 and selects “preset setting” as the transparency setting of the Trimap by operating the operation unit 113. In response to the user operation, the CPU 102 displays a list of presets in the Trimap transparency setting menu screen 3100. In this case, the processing moves from step S3004 to step S3005. Here, the list of presets may be displayed when the Trimap transparency setting menu screen 3100 is displayed in step S3003. Note that a case where a user setting is selected (when the processing moves from step S3004 to step S3007) will be described in Embodiment 31.
In step S3005, the user moves a cursor 3201 displayed in the Trimap transparency setting menu screen 3100 and selects a desired preset as the transparency setting of the Trimap by operating the operation unit 113. Here,
In step S3008, the CPU 102 performs transparency processing on the Trimap based on the transparencies read out in step S3006. Here, the transparency processing may be realized by applying a different degree of transparency to each region in a single instance of processing for the entire Trimap, based on region information of the Trimap. Alternatively, the transparency processing may be realized by performing the transparency processing on each region of the Trimap in order, temporarily recording the intermediate data into the frame memory 111, and reading the data out when the transparency processing is performed on the next region.
In step S3009, the CPU 102 superimposes the Trimap, which has undergone the transparency processing in step S3008, on the image obtained in step S3001. In step S3010, the CPU 102 loads the Trimap superimposed image into the frame memory 111 and displays that image in the display unit 114. The Trimap superimposed image may be displayed in picture-in-picture format, or the image may be output from the image terminal 109, or may be recorded into the recording medium 112. The CPU 102 may also record the Trimap superimposed image and the Trimap region information and then change the transparency during playback, or display the recorded Trimap superimposed image in the display unit 114 only during REC review. Here,
As described above, according to Embodiment 30, the image and the Trimap can easily be checked at the same time.
Embodiment 30 described an example where the user selects the transparency setting for the Trimap from presets, but an example where the user manually setting the transparency setting of the Trimap is conceivable as another embodiment.
Embodiment 31 will describe an example of a user manually setting the transparency setting of the Trimap with reference to the flowchart in
First, step S3001 to step S3003 are the same as in Embodiment 30 and will therefore be omitted. Next, in step S3004, the user operates the menu in the same manner as in Embodiment 30, and selects “user setting” as the transparency setting for the Trimap. In response to the user operation, the CPU 102 displays a Trimap transparency setting screen 3800 in the display unit 114. In this case, the processing moves from step S3004 to step S3007. Here,
In step S3007, the user moves the scroll bar 3801, the scroll bar 3802, and the scroll bar 3803 displayed in the Trimap transparency setting screen 3800 by operating the operation unit 113. In response to the user operation, the CPU 102 sets the transparency a for each of the foreground region, the unknown region, and the background region of the Trimap. Here, the transparency setting of Trimap may be realized not only by using a Graphical User Interface (GUI) such as a scroll bar, but also by using a physical interface such as a volume knob that can change the setting value as desired. Next, step S3008 to step S3010 are the same as in Embodiment 30 and will therefore be omitted.
As described above, according to Embodiment 31, the image and the Trimap can easily be checked at the same time.
In Embodiment 30 and Embodiment 31, there is an issue in that it is difficult to check the image or the Trimap when a state that affects the image or the Trimap regions arises, or when an operation that affects the image or the Trimap regions is performed. The present embodiment will describe a configuration that addresses this issue.
Embodiment 32 will describe an example of automatically setting the transparency of the Trimap with reference to the flowchart in
First, step S3901 and step S3902 are the same as step S3001 and step S3002 in
Next, in step S3904, the CPU 102 determines whether a Trimap transparency change condition, which is held in the ROM 103, is satisfied. Here, “transparency change condition” refers to whether a state, operation, or the like that affects the image or the Trimap regions is detected, e.g., when a subject enters from outside the angle of view and an additional foreground region is detected, when a lens operation is detected, or the like. If the transparency change condition is satisfied, the processing moves to step S3905, whereas if the transparency change condition is not satisfied, the processing moves to step S3906.
Note that to improve the visibility by preventing continuous changes in the transparency, a configuration may be employed in which the processing moves to step S3905 and the transparency is changed even when the transparency change condition is not satisfied, as long as the frame is within a predetermined number of frames after the transparency change condition is satisfied. In addition to the presence or absence of detection, other conditions may be used as the transparency change condition.
In step S3905, the CPU 102 reads out a transparency according to the transparency set in step S3903 and the transparency change condition from the ROM 103, and changes the transparency. For example, during lens operation, the user will wish to prioritize checking the image, and thus the CPU 102 reads out the setting value of α=1 for all of the foreground region, the unknown region, and the background region as the transparency of the Trimap during lens operation detection, and changes the transparency. In this case, during lens operation, only the image is displayed in the display unit 114, and after the lens operation is completed, the image is displayed in the display unit 114 having been subjected to the transparency processing reflecting the transparency set in step S3903. Here, the transparency according to the transparency change condition may be set as desired by the user. Additionally, when using a transparency change condition aside from the presence or absence of the detection of a state or operation that affects the image or the Trimap regions, a configuration may be employed in which a transparency corresponding to each condition is held in the ROM 103, the transparency setting value corresponding to the condition is read out, and the transparency is changed.
A case where the transparency change condition is not satisfied in step S3904 and the processing moves to step S3906 will be described next. In step S3906, the CPU 102 maintains the transparency set in step S3903 without change.
Step S3907, step S3908, and step S3909 following the processing of step S3905 or step S3906 are the same as step S3008, step S3009, and step S3010 in
As described above, according to Embodiment 32, the image and the Trimap can be easily checked at the same time, and the image or the Trimap can be easily checked when a state or operation that affects the image or the Trimap regions occurs.
A configuration that makes it easy for the user to recognize a relationship between the thresholds used when generating the Trimap and the distance information of the subject to be shot, for the Trimap output by the image processing apparatus 100, will be described next. The present embodiment will describe an example of generating and outputting a distance distribution display histogram from a distribution of the distance information.
In step S4001, the CPU 102 obtains the foreground threshold and the background threshold set in step S1002 of Embodiment 10, and stores the thresholds in the RAM 104. Step S4004 is the same as step S1003 in
In step S4005, the CPU 102 determines whether a display setting for the distance distribution display histogram is on or off. The display setting of the distance distribution display histogram is set by the user by operating the menu using the operation unit 113. If the display setting is on, the processing moves to step S4006, whereas if the display setting is off, the processing moves to step S4014.
In step S4006, the CPU 102 generates a distance distribution display histogram based on the distance information obtained in step S4004. In the present embodiment, the CPU 102 obtains the distance information of corresponding pixels in the image obtained from the frame memory 111 in step S4004, and generates a distance distribution display histogram expressing the distribution of the distance information.
The distance distribution display histogram takes the horizontal axis as the distance, and takes the position where the distance information is 0 as a center value. The distance has a range of ±direction, with the positive direction being the direction away from the image processing apparatus. For example, the actual distance (meters) is normalized to a real number from −128 to 127, and an in-focus position is expressed as 0. Furthermore, the number of pixels in the image having each distance value is expressed as a frequency on the vertical axis.
In step S4007, the CPU 102 reads out the foreground threshold and the background threshold stored in the RAM 104. The foreground threshold is constituted by a first foreground threshold having a negative value and a second foreground threshold having a positive value. The background threshold is constituted by a first background threshold having a negative value and a second background threshold having a positive value.
In step S4008, the CPU 102 superimposes the foreground threshold and the background threshold read out in step S4007 on the distance distribution display histogram generated in step S4006. Specifically, the CPU 102 superimposes a vertical dotted line 4106 at a position that matches the first foreground threshold and a vertical dotted line 4107 at a position that matches the second foreground threshold on the horizontal axis of the distance distribution display histogram 4109, as illustrated in
Additionally, as illustrated in
In step S4009, the CPU 102 obtains an image from the frame memory 111. In step S4010, the CPU 102 superimposes the distance distribution display histogram generated in step S4008 onto the image obtained in step S4009.
In step S4011, the CPU 102 outputs an image such as that illustrated in
A case where the processing has moved from step S4005 to step S4014 will be described next. The process of step S4014 is the same as step S4009 and will therefore not be described. In step S4015, the CPU 102 outputs the image obtained in step S4014 to the display unit 114 and causes the image to be displayed in the display unit 114. This makes it possible to display only the shot image in the display unit 114 when the distance distribution display histogram is set to be hidden.
As described above, according to the present embodiment, the distribution of the distance information in the image is represented by a distance distribution display histogram, which makes it easy for the user to recognize the relationship between the thresholds used when generating the Trimap and the distance information of the subject being shot. This also makes it possible for the user to make adjustments while visually checking the ranges of the thresholds.
Embodiment 40 described an example of generating a distance distribution display histogram from the distribution of distance information and displaying the histogram such that the positional relationship between the subject and the foreground and background thresholds can be easily recognized. The embodiment also described an example where by displaying the foreground threshold and the background threshold, the user can make adjustments while visually checking the ranges of the thresholds. However, in the above embodiment, if the subject moves or takes action, the user may not notice that the subject is out of the range of the background threshold, and it may not be possible to generate the Trimap as intended by the user and crop the subject in the intended shape.
In contrast, Embodiment 41 will describe a configuration that expresses the distance distribution display histogram and the image in an emphasized manner to reduce the possibility that the subject to be shot jumps out of the range of the background threshold and the cropping fails.
In
The processing of step S4502 and step S4503 is the same as step S4006 and step S4007 in Embodiment 40, and will therefore not be described. In step S4504, the CPU 102 obtains the display range offset value stored in the ROM 103 in advance. Note that the storage location of the display range offset values is not limited to the ROM 103, and may instead be the recording medium 112 or the like. The user may also be able to change the display range offset value as desired. For example, the user selects the display range offset value by operating the menu using the operation unit 113, and the CPU 102 obtains the display range offset value from the operation unit 113.
In step S4505, the CPU 102 calculates the display threshold based on the background threshold read out in step S4503 and the display range offset value obtained in step S4504. A specific method for calculating the display threshold will be described with reference to
In step S4506, the CPU 102 superimposes the foreground threshold and the background threshold read out in step S4503, as well as the display threshold calculated in step S4505, on the distance distribution display histogram generated in step S4502. The method of superimposing the foreground threshold and the background threshold on the distance distribution display histogram is the same as in step S4008 of Embodiment 40, and will therefore not be described. A method for superimposing the display threshold on the distance distribution display histogram will be described with reference to
In step S4507, the CPU 102 obtains coloring setting information stored in the ROM 103 in advance. The coloring setting information is information of colors specifying each region in order to color the distance distribution display histogram and the image such that the regions to which those items belong can be distinguished. In the present embodiment, an item is colored with a first color if the item belongs to the foreground region and the unknown region. The background region is colored with a second color if the distance information is negative, and with a third color if the distance information is positive. Note that the storage location of the coloring setting information is not limited to the ROM 103, and may instead be the recording medium 112 or the like. The user may also be able to change the coloring setting information as desired. For example, the user specifies the first color, the second color, and the third color by operating a menu using the operation unit 113, and the CPU 102 obtains the coloring setting information from the operation unit 113.
In step S4508, the CPU 102 obtains a number of classes in the distance distribution display histogram. The obtained number of classes is stored in the RAM 104 as a variable Nmax. For example, if the number of classes in the distance distribution display histogram is 256, then the variable Nmax is 256.
In step S4509, the CPU 102 focuses on the class, among the classes in the distance distribution display histogram, that has the shortest distance information. Specifically, the class in the distance distribution display histogram that is focused on is set as a variable n; n is then set to 1 and stored in the RAM 104. A higher variable n corresponds to a histogram in a class of a distance further away from the image processing apparatus.
In step S4510, the CPU 102 determines whether the variable n is within a range from the first display threshold to the second display threshold. If the variable n is within the range of the display thresholds, the processing moves to step S4511, whereas if the variable n is not within the range, the processing moves to step S4516.
In step S4511, the CPU 102 determines whether the variable n is within a range from the first background threshold to the second background threshold. If the variable n is within the range from the first background threshold to the second background threshold, the processing moves to step S4512, whereas if the variable n is not within the range from the first background threshold to the second background threshold, the processing moves to step S4513.
In step S4512, the CPU 102 sets the histogram of the class of the variable n to be colored using the first color.
In step S4513, the CPU 102 determines whether the variable n is within a range from the first display threshold to the first background threshold. If the variable n is within the range from the first display threshold to the first background threshold, the processing moves to step S4514, whereas if the variable n is not within the range of the first display threshold to the first background threshold, the processing moves to step S4515.
In step S4514, the CPU 102 sets the histogram of the class of the variable n to be colored using the second color.
In step S4515, the CPU 102 sets the histogram of the class of the variable n to be colored using the third color.
In step S4516, the CPU 102 sets the histogram of the class of the variable n to be hidden.
In step S4517, the CPU 102 determines whether the variable n is equal to the number of classes Nmax of the histogram. If these items are equal, the processing moves to step S4517, whereas if these items are not equal, the processing moves to step S4518.
In step S4518, the CPU 102 substitutes n+1 for the variable n and stores the result in the RAM 104. Through this, the CPU 102 raises the histogram being focused on by one class.
In step S4519, the CPU 102 stores the distance distribution display histogram subjected to the coloring settings in the RAM 104.
The processing of step S4520 and step S4521 is the same as step S4012 and step S4013 in Embodiment 40, and will therefore not be described. If a determination of “no” is made in step S4520, the processing moves to step S4406 of
As described above, by executing the processing in the flowcharts in
Refer again to
In step S4602, of the distance information calculated in step S4404, the CPU 102 focuses on the distance information corresponding to a pixel (x,y). Note that the variable x represents a coordinate on the horizontal axis of the image, and the variable y represents a coordinate on the vertical axis of the image.
In step S4603, the CPU 102 determines whether the distance information of the pixel (x,y) being focused on in step S4602 is within the range from the first display threshold to the second display threshold. If the information is within the range of the display thresholds, the processing moves to step S4604, whereas if the information is not within the range, the processing moves to step S4608.
In step S4604, the CPU 102 determines whether the distance information of the pixel (x,y) being focused on in step S4602 is within the range from the first background threshold to the second background threshold. If the information is within the range of the background thresholds, the processing moves to step S4608, whereas if the information is not within the range, the processing moves to step S4605.
In step S4605, the CPU 102 determines whether the distance information of the pixel (x,y) being focused on in step S4602 is within the range from the first display threshold to the first background threshold. If the information is within the range from the first display threshold to the first background threshold, the processing moves to step S4606, whereas if the information is not within the range, the processing moves to step S4607.
In step S4606, the CPU 102 sets the pixel (x,y) of the image obtained in step S4601 such that the second color obtained in step S4507 is superimposed.
In step S4607, the CPU 102 sets the pixel (x,y) of the image obtained in step S4601 such that the third color obtained in step S4507 is superimposed.
In step S4608, the CPU 102 determines whether the variable x is equal to the horizontal size Xmax of the image. If these items are equal, the processing moves to step S4610, whereas if these items are not equal, the processing moves to step S4609.
In step S4609, the CPU 102 substitutes x+1 for the variable x and stores the result in the RAM 104. As a result, the CPU 102 focuses on the pixel one place to the right in the same line.
In step S4610, the CPU 102 determines whether the variable y is equal to the vertical size Ymax of the image. If these items are equal, the processing moves to step S4612, whereas if these items are not equal, the processing moves to step S4611.
In step S4611, 0 is substituted to the variable x and y+1 to the variable y, and the results are stored in the RAM 104. As a result, the CPU 102 focuses on the first pixel one line below.
In step S4612, the CPU 102 stores the image subjected to the processing illustrated in step S4603 to step S4611 to the RAM 104.
As described above, by executing the processing in the flowcharts in
Refer again to
Furthermore, if the subject moves during shooting and a part of the subject jumps into the background threshold, the CPU 102 performs the same emphasis as the region 4701 and the region 4702 of the image and the distribution 4306 and the distribution 4307 of the distance distribution display histogram. This makes it possible to notify the user in real time that a part of the subject has jumped out, which makes it possible to prevent the need to re-shoot the image.
Note that when superimposing the image and the distance distribution display histogram, these items are not limited to being arranged vertically, and another superimposing method may be used as long as the image and the distance distribution display histogram can be checked at the same time. For example, the image and the distance distribution display histogram may be displayed side by side on the left and right, or the distance distribution display histogram may have transparency and be superimposed on part of the image.
In step S4408, the CPU 102 outputs the image generated in step S4407 to the display unit 114, and causes the image to be displayed.
As described above, according to the present embodiment, when the subject to be shot jumps out of the range of the background threshold, the user is notified by coloring the distance distribution display histogram and the image, which makes it possible to prevent re-shooting due to cropping failures.
Embodiment 40 described an example of generating a distance distribution display histogram from the distribution of distance information and displaying the histogram such that the positional relationship between the subject and the foreground and background thresholds can be easily recognized. The embodiment also described an example where by displaying the foreground threshold and the background threshold, the user can make adjustments while visually checking the ranges of the thresholds. In addition, Embodiment 41 described an example of adding emphasis to the distance distribution display histogram and the image and presenting these items to the user in order to prevent the subject to be shot from jumping out of the range of the background threshold and having to re-shoot due to a cropping failure.
Incidentally, it is unclear to the user which part of the image has distance information that is 0, and the user cannot fully grasp the relationship between the subject of the image and the distribution of the distance distribution display histogram.
Accordingly, Embodiment 42 will describe an example in which pixels having distance information of 0 are colored in an image and presented to the user along with the distance distribution display histogram.
According to the present embodiment, pixels for which the distance information is 0 can be clearly indicated, which makes it easier for the user to identify to which part of the subject being shot the distance distribution display histogram corresponds.
The processing of step S4801 and step S4804 is the same as step S4001 and step S4004 in Embodiment 40, and will therefore not be described.
In step S4805, the CPU 102 obtains coloring setting information stored in the ROM 103 in advance. The coloring setting information has information of a fourth color with which the pixels having distance information of 0 are to be colored. Note that the storage location of the coloring setting information is not limited to the ROM 103, and may instead be the recording medium 112 or the like. The user may also be able to change the coloring setting information as desired. For example, the user specifies the fourth color by operating a menu using the operation unit 113, and the CPU 102 obtains the coloring setting information from the operation unit 113.
The processing of step S4806 to step S4809 is the same as step S4005 to step S4008 in Embodiment 40, and will therefore not be described.
In step S4810, the CPU 102 obtains an image from the frame memory 111. In step S4811, for the distance information obtained in step S4804, the CPU 102 sets a flag to 1 for pixels for which the distance information is 0, sets the flag to 0 for pixels for which the distance information is not 0, and stores the set flag in the frame memory 111.
In step S4812, the CPU 102 refers to the flag stored in the frame memory 111 in step S4811. For pixels having a flag of 1, the CPU 102 colors the corresponding pixels in the image obtained in step S4810 with the fourth color obtained in step S4805. For pixels having a flag of 0, the CPU 102 uses the pixels of the image obtained in step S4810 as-is. As a result, an image on which the fourth color is partially superimposed is generated.
In step S4813, the CPU 102 superimposes the distance distribution display histogram generated in step S4809 onto the image generated in step S4812.
Note that when superimposing the image and the distance distribution display histogram, these items are not limited to being arranged vertically, and another superimposing method may be used as long as the image and the distance distribution display histogram can be checked at the same time. For example, the image and the distance distribution display histogram may be displayed side by side on the left and right, or the distance distribution display histogram may have transparency and be superimposed on part of the image.
In step S4814, the CPU 102 outputs the image generated in step S4813 to the display unit 114, and causes the image to be displayed.
The processing of step S4815 and step S4816 is the same as step S4012 and step S4013 in Embodiment 40, and will therefore not be described.
The processing of step S4817 and step S4818 is the same as step S4014 and step S4015 in Embodiment 40, and will therefore not be described. This makes it possible to display only the shot image in the display unit 114 when the distance distribution display histogram is set to be hidden.
As described above, according to the present embodiment, in an image of a subject, a subject region for which the distance information is 0 can be clearly indicated, to which part of the subject being shot the distance distribution display histogram corresponds can therefore be identified more easily.
As one embodiment, it is also possible to generate Trimap using parallax information, a defocus amount, and the like that can be calculated by CPU 102 based on the information obtained from the image plane phase detection sensor. There is an issue in that in actual shooting, it is not possible to check in real time whether the captured image and the foreground region in the Trimap match. The present embodiment will describe a configuration that addresses this issue by generating and outputting a bird's-eye view image from the distance information and clearly showing, in real time, an image serving as the foreground region.
The bird's-eye view image will be described with reference to
The processing of step S5001 and step S5004 is the same as step S4001 and step S4004 in Embodiment 40, and will therefore not be described.
In step S5005, the CPU 102 determines whether the display setting for the bird's-eye view image is on or off. The display setting of the bird's-eye view image is set by the user by operating the menu using the operation unit 113. If the setting is on, the processing moves to step S5006, whereas if the setting is off, the processing moves to step S5014.
In step S5006, the CPU 102 generates a bird's-eye view image such as that illustrated in
The processing of step S5007 is the same as step S4007 in Embodiment 40, and will therefore not be described.
In step S5008, the CPU 102 superimposes the foreground threshold and the background threshold on the bird's-eye view image.
The processing of step S5009 is the same as step S4009 in Embodiment 40, and will therefore not be described.
In step S5010, the CPU 102 combines the two images, i.e., the bird's-eye view image generated in step S5008 and the image obtained in step S5009, into a parallel or superimposed image. In step S5011, the CPU 102 outputs the image generated in step S5010 to the display unit 114.
The processing of step S5012 and step S5013 is the same as step S4012 and step S4013 in Embodiment 40, and will therefore not be described.
The processing of step S5014 and step S5015 is the same as step S4014 and step S4015 in Embodiment 40, and will therefore not be described.
As described above, according to the present embodiment, the image that will be the foreground region can be clearly indicated in real time by generating and outputting a bird's-eye view image from the distance information.
As described in Embodiment 50, the image that will be the foreground region can be clearly indicated in real time by generating and outputting a bird's-eye view image from the distance information.
On the other hand, with the method described in Embodiment 50, there is an issue in that it is difficult to check in real time whether the subject itself is outside a region of image separation when the subject requires a deep depth of field. The present embodiment will describe a method expected to provide an effect of making it easier to understand parts that are outside the stated region of image separation.
The present embodiment provides a configuration which performs processing on the captured image and the bird's-eye view image described in Embodiment 50, which is expected to provide the stated effect of making the parts easier to understand.
A region 5306 in
The subject 5301 in
In the present embodiment, the CPU 102 performs processing of coloring a part where the implement 5303 overlaps with the region 5308 (i.e., the region 5304) with a predetermined color in each of the captured image and the bird's-eye view image. Additionally, in the present embodiment, the CPU 102 performs processing of coloring a part where the region 5308 and the background 5302 overlap (i.e., a region 5305) with a predetermined color in each of the captured image and the bird's-eye view image.
As described above, according to the present embodiment, it is possible to expect an effect in which parts outside the stated image separation area are made easy to understand.
As described in Embodiment 50 and Embodiment 51, the image that will be the foreground region can be clearly indicated in real time by generating and outputting a bird's-eye view image from the distance information. However, the method described in Embodiment 50 and Embodiment 51 has an issue in that it is difficult to check in real time whether the subject itself is in focus. The present embodiment will describe a method for checking, in an easy-to-understand manner, whether a region that is in focus, as mentioned above, is equivalent to the subject itself.
The present embodiment provides a configuration which performs processing on the captured image and the bird's-eye view image, which is expected to provide the stated effect of making the in-focus part easier to understand.
In the present embodiment, the CPU 102 performs processing of coloring the corresponding pixel in the image illustrated in
The user can check whether the subject itself is in focus in the image obtained by the image processing apparatus 100 by viewing both a region 5401 and the subject in the image in
As described above, according to the present embodiment, it is possible to check, in an easy-to-understand manner, whether the stated region that is in focus is equivalent to the subject itself.
The image capturing unit 107 of the image processing apparatus 100 can transmit the parallax information of a plurality of pixel ranges of the image signal together, as illustrated in
In a parallax information range A illustrated in
Embodiment 60 will describe an example of using an edge detection result of the image signal to reclassify the pixels in the unknown region into the foreground region, the background region, and the unknown region in finer units than the parallax information range, and generate a second Trimap in which the area of the unknown region is reduced.
In step S6001, the CPU 102 generates a first Trimap by performing the same processing as step S1003 to step S1008 described in Embodiment 10. The CPU 102 records the first Trimap into the frame memory 111.
In step S6002, the CPU 102 performs edge detection by causing the image processing unit 105 to process the image signal read out from the frame memory 111. The edge detection performed by the image processing unit 105, for example, detects positions where luminance changes, color changes, or the like in the image signal are discontinuous, and specifically, the edge detection is realized through the gradient method, the Laplacian method, or the like. The CPU 102 records the edge detection result processed by the image processing unit 105 in the frame memory 111. The image processing unit 105 outputs the edge detection result as a flag, for each pixel in the image signal, indicating whether the pixel corresponds to an edge.
In step S6003, the CPU 102 reads out the region, in the first Trimap, that corresponds to the parallax information range to be processed, from the frame memory 111, and determines whether the range is classified as an unknown region. If the parallax information range to be processed is classified as an unknown region, the processing moves to step S6004. However, if the parallax information range to be processed is not classified as an unknown region, the processing moves to step S6016.
In step S6004, the CPU 102 reads out the region, in the edge detection result, that corresponds to the parallax information range to be processed, from the frame memory 111, and determines whether there is a pixel corresponding to an edge within that range. If the parallax information range to be processed contains a pixel that corresponds to an edge, the processing moves to step S6005. However, if the parallax information range to be processed does not contain a pixel that corresponds to an edge, the processing moves to step S6016.
In step S6005, the CPU 102 keeps the pixel corresponding to the edge, in the region of the first Trimap corresponding to the parallax information range to be processed, as the unknown region.
In step S6006, the CPU 102 reads out the region, in the first Trimap, that corresponds to the parallax information range adjacent to the left of the parallax information range to be processed, from the frame memory 111, and determines whether that range is classified as a foreground region. If the parallax information range on the left is classified as a foreground region, the processing moves to step S6007. However, if the parallax information range on the left is not classified as a foreground region, the processing moves to step S6008.
In step S6007, the CPU 102 changes, to the foreground region, the pixel located to the left of the pixel corresponding to an edge in the region of the first Trimap corresponding to the parallax information range to be processed. The CPU 102 records the changed Trimap in the frame memory 111.
In step S6008, the CPU 102 reads out the region, in the first Trimap, that corresponds to the parallax information range adjacent to the left of the parallax information range to be processed, from the frame memory 111, and determines whether that range is classified as a background region. If the parallax information range on the left is classified as a background region, the processing moves to step S6009. However, if the parallax information range on the left is not classified as a background region, the processing moves to step S6010.
In step S6009, the CPU 102 changes, to the background region, the pixel located to the left of the pixel corresponding to an edge in the region of the first Trimap corresponding to the parallax information range to be processed. The CPU 102 records the changed Trimap in the frame memory 111.
In step S6010, the CPU 102 keeps the pixel located to the left of the pixel corresponding to the edge, in the region of the first Trimap corresponding to the parallax information range to be processed, as the unknown region.
In step S6011, the CPU 102 reads out the region, in the first Trimap, that corresponds to the parallax information range adjacent to the right of the parallax information range to be processed, from the frame memory 111, and determines whether that range is classified as a foreground region. If the parallax information range on the right is classified as a foreground region, the processing moves to step S6012. However, if the parallax information range on the right is not classified as a foreground region, the processing moves to step S6013.
In step S6012, the CPU 102 changes, to the foreground region, the pixel located to the right of the pixel corresponding to an edge in the region of the first Trimap corresponding to the parallax information range to be processed. The CPU 102 records the changed Trimap in the frame memory 111.
In step S6013, the CPU 102 reads out the region, in the first Trimap, that corresponds to the parallax information range adjacent to the right of the parallax information range to be processed, from the frame memory 111, and determines whether that range is classified as a background region. If the parallax information range on the right is classified as a background region, the processing moves to step S6014. However, if the parallax information range on the right is not classified as a background region, the processing moves to step S6015.
In step S6014, the CPU 102 changes, to the background region, the pixel located to the right of the pixel corresponding to an edge in the region of the first Trimap corresponding to the parallax information range to be processed. The CPU 102 records the changed Trimap in the frame memory 111.
In step S6015, the CPU 102 keeps the pixel located to the right of the pixel corresponding to the edge, in the region of the first Trimap corresponding to the parallax information range to be processed, as the unknown region.
In step S6016, the CPU 102 determines whether all of the parallax information ranges in the image signal recorded in the frame memory 111 have been processed. If all the parallax information ranges have been processed, the processing moves to step S6018. However, if not all the parallax information ranges have been processed, the processing moves to step S6017.
In step S6017, the CPU 102 selects an unprocessed parallax information range as the next range to be processed. For example, the parallax information range to be processed is selected in raster direction order from the upper-left. The processing then returns to step S6003.
In step S6018, the CPU 102 outputs the Trimap recorded in the frame memory 111 to the exterior through the image terminal 109 or the network terminal 108 as the second Trimap. Note that the CPU 102 may record the second Trimap into the recording medium 112.
The pixel that corresponds to the boundary between the background and the subject is determined to correspond to an edge by the edge detection of step S6002, as indicated by the diagonal lines in the edge detection result in
As describe above, according to Embodiment 60, by using an edge detection result of the image signal, the pixels in the unknown region can be reclassified into the foreground region, the background region, and the unknown region in finer units than the parallax information range, and a second Trimap in which the area of the unknown region is reduced can be generated. By reducing the area of the unknown region of the Trimap, the detection accuracy of the neural network that uses the Trimap to crop out the foreground and background can be improved.
When a subject such as a human body is shot as far down as the feet, the ground surface near where the feet touch the ground is at about the same distance as the subject's feet, and thus when a Trimap is generated from the distance information, the ground surface will be erroneously determined to be the foreground region.
Embodiment 70 will describe an example in which by detecting a foot part of the subject, a second Trimap is generated in which the ground surface, which was erroneously determined to be a foreground region at the same relative distance as the foot part of the subject, is reclassified as an unknown region or a background region.
In step S7001, the CPU 102 generates a first Trimap by performing the same processing as step S1003 to step S1008 described in Embodiment 10. The CPU 102 records the first Trimap into the frame memory 111.
In step S7002, the CPU 102 detects the feet of the human body by loading parameters for detecting the feet of a human body, recorded in the ROM 103, into the object detection unit 115, and causing the object detection unit 115 to process an image read out from the frame memory 111. The object detection unit 115 records, as part detection information in the RAM 104, two coordinates indicating the vertices of opposing corners of a rectangle encompassing the foot region detected in the image, with the horizontal direction of the image as the x-axis and the vertical direction as the y-axis, and the lower-left corner of the image as the coordinates (0,0).
Although the present embodiment describes a case where the object detection unit 115 is a neural network that outputs coordinates of the detected region, the object detection unit 115 may be another neural network that detects the skeleton of a human body.
In step S7003, the CPU 102 determines whether the part detection information is recorded in the RAM 104. If the part detection information is recorded in the RAM 104, the CPU 102 determines that the feet of the human body have been detected in the image, and the processing moves to step S7004. However, if no part detection information is recorded in the RAM 104, the CPU 102 determines that the feet of the human body have not been detected in the image, and the processing of the flowchart ends.
In step S7004, the CPU 102 reads out the first Trimap recorded in the frame memory 111 and the part detection information recorded in the RAM 104, and changes the inside of the rectangular region in the Trimap, indicated by the part detection information, to an unknown region. The processing performed in step S7004 will be described in detail later with reference to
In step S7005, the CPU 102 changes a region classified in the Trimap as the foreground region or the unknown region, in a region having a y coordinate in the same range as the y coordinate of the rectangle indicated by the part detection information on the Trimap but not having an x coordinate in the same range as the x coordinate of the rectangle, to the background region. The CPU 102 records the Trimap changed in step S7004 and step S7005 into the frame memory 111. The processing performed in step S7005 will be described in detail later with reference to
In step S7006, the CPU 102 determines whether another instance of part detection information is recorded in the RAM 104. If another instance of part detection information is recorded in the RAM 104, the CPU 102 determines that the feet of another human body have been detected in the image, and the processing moves again to step S7004. If no part detection information is recorded in the RAM 104, the CPU 102 determines that the feet of another human body have not been detected in the image, and the processing moves to step S7007.
In step S7007, the CPU 102 outputs the Trimap recorded in the frame memory 111 to the exterior through the image terminal 109 or the network terminal 108 as the second Trimap. The processing then moves to the ending step. Note that the CPU 102 may record the second Trimap into the recording medium 112.
The processing of step S7004 will be described in detail with reference to
The processing of step S7005 will be described in detail with reference to
As described above, according to Embodiment 70, a second Trimap can be generated in which the ground surface, which was erroneously determined to be a foreground region at the same relative distance as the foot part of the subject, is reclassified as an unknown region or a background region.
The present embodiment has described an example of using a neural network that, by detecting the feet of a human body, reclassifies the ground surface that is in contact with the feet of the human body as an unknown region or a background region. If the subject is a car, a motorcycle, or the like, for example, the present embodiment can be applied by using a neural network that detects the tires that make contact with the ground surface. Likewise, the present embodiment can be applied for other subjects by using a neural network that detects parts of the other subjects that make contact with the ground surface.
Embodiment 70 described an example of generating a second Trimap in which a ground surface erroneously determined to be a foreground region is reclassified as an unknown region or a background region. However, the range of the ground surface that is erroneously determined to be a foreground region at the same distance as the subject is broader if the image processing apparatus 100 is tilted forward and narrower if the image processing apparatus 100 is tilted backward.
Embodiment 71 will describe an example of changing the range to be reclassified by referring to the tilt of the image processing apparatus 100 using information from an accelerometer for image stabilization built into the lens unit 106 when generating the second Trimap in which a ground surface erroneously determined to be a foreground region is reclassified as an unknown region or a background region.
The processing from step S7101 to step S7104 is the same as the processing from step S7001 to step S7004 described in Embodiment 70, and will therefore not be described here.
In step S7105, the CPU 102 reads out tilt information from the accelerometer of the lens unit 106. The tilt information is a numerical value that indicates whether the image processing apparatus 100 is tilted forward or backward. The CPU 102 determines a background region adjustment value t based on the tilt information. The background region adjustment value t is set to 0 if the image processing apparatus 100 is parallel to the ground surface, increases if the image processing apparatus 100 is tilted forward, and decreases if the image processing apparatus 100 is tilted backward.
In step S7106, the CPU 102 changes a region classified in the Trimap as the foreground region or the unknown region, in a region having a y coordinate in the same range as a y coordinate extended in the y coordinate direction, by the background region adjustment value t, from the upper part and lower part of the rectangle indicated by the part detection information on the Trimap, but not having an x coordinate in the same range as the x coordinate of the rectangle, to the background region. The CPU 102 records the Trimap changed in step S7104 and step S7106 into the frame memory 111. The processing performed in step S7106 will be described in detail later with reference to
The processing from step S7107 to step S7108 is the same as the processing from step S7006 to step S7007 described in Embodiment 70, and will therefore not be described here.
The processing of step S7106 will be described in detail with reference to
As described above, according to Embodiment 71, the range to be reclassified to the background region can be changed by referring to the tilt of the image processing apparatus 100 using information from an accelerometer for image stabilization built into the lens unit 106 when generating the second Trimap in which a ground surface erroneously determined to be a foreground region is reclassified as a background region.
As one embodiment, it is also possible to generate Trimap using parallax information, a defocus amount, and the like that can be calculated by CPU 102 based on the information obtained from the image plane phase detection sensor. In a situation where the aperture of the lens is changed during shooting, there is an issue in that the parallax information for each frame at the boundary between the foreground region and the background region also changes, resulting in a change in the boundary of the unknown region. The present embodiment will describe a configuration that addresses this issue.
A function through which the image processing apparatus 100 generates a Trimap based on parallax information will be described with reference to
The processing of step S8001 and step S8002 is the same as step S4001 and step S4004 in Embodiment 40, and will therefore not be described.
In step S8003, the image processing apparatus 100 (the CPU 102) generates the Trimap by performing the same processing as step S1003 to step S1008 described in Embodiment 10.
In step S8004, the image processing apparatus 100 determines whether the depth of field has been changed based on an amount of change in the F value. Note that the F value used in the determination of step S8004 may be replaced by a variable that makes it possible to calculate the focal length and the amount of light entering the lens unit 106. For example, the image processing apparatus 100 may perform a frame-by-frame comparison of an amount of change due to a T value or an H value, which are indicators calculated from the transmittance of the optical system. If there is a change in the F value, the processing moves to step S8006, whereas if there is no change in the F value, the processing moves to step S8008.
In step S8006, the image processing apparatus 100 refers to a table that defines a relationship between the F value and the threshold. This table is assumed to be stored in the image processing apparatus 100 (e.g., in the ROM 103).
In step S8007, the image processing apparatus 100 sets new thresholds (the foreground threshold and the background threshold) in the RAM 104 based on the table referenced in step S8006 and the current (post-change) F value.
In step S8008, the image processing apparatus 100 stores the thresholds (the foreground threshold and the background threshold) in association with the next frame.
The image processing apparatus realizes optimal image separation for each frame by repeating the processing from step S8001 to step S8008 each time a frame is obtained.
Note that a configuration may be employed in which the processing of step S8008 is performed only when, for example, the depth of field is changed, rather than for all consecutive frame images constituting a moving image. A method in which the processing of step S8004 to step S8008 is performed for every set number of frames, instead of for all consecutive frame images constituting a moving image, may also be employed.
Embodiment 80 realizes optimal image separation on a frame-by-frame basis when there is a change in the F value. An example of this is illustrated in
In the configuration of the present embodiment, under a condition that the entire subject 811 in
As described above, according to Embodiment 80, an effect can be expected in which the boundaries of the foreground region, the background region, and the unknown region can be appropriately identified even when the F value is changed by the aperture of the lens.
As one embodiment, it is also possible to generate Trimap using parallax information, a defocus amount, and the like that can be calculated by CPU 102 based on the information obtained from the image plane phase detection sensor.
The obtainment of the parallax information will be described first with reference to
Then, as illustrated in
In the present embodiment, a Trimap is generated by using this detection of the detected shift in positions of the pixels in the A image signal and the B image signal. Based on the concepts of
The above processing will be described with reference to the flowchart in
First, in step S9001, the user shoots an image of a desired subject using the image processing apparatus 100. The image of the subject is received by the image capturing unit 107. In step S9002, the CPU 102 obtains information of an image plane phase difference from the image capturing unit 107 and detects positional shift of the entering information between the A image signal or the B image signal. The CPU 102 generates focus information from that information. In step S9003, if the CPU 102 determines that the positional shift between the A image signal and the B image signal for a given pixel of interest is low and the region is the in-focus region, the processing moves to step S9004, and that pixel is determined to be in the foreground region. On the other hand, if, in step S9005, the CPU 102 determines that the positional shift is large and the image is in a front focus state, the processing moves to step S9006, and that pixel is determined to be in the foreground region. This is because on object in front of the in-focus region is often the subject that the user desires, and is therefore kept as the foreground region. If, in step S9007, the CPU 102 determines that the positional shift between the A image signal and the B image signal for a given pixel of interest is large and the pixel is in a rear focus state, the processing moves to step S9008, and that pixel is determined to be in the background region. Furthermore, if the pixel is neither in the in-focus region, nor in the front focus region, nor in the rear focus region, the CPU 102 moves the processing to step S9009 and determines that the pixel is in the unknown region. In this example, the in-focus region and the front focus region are foreground regions, and there is therefore no need to create an unknown region therebetween.
In step S9010, the CPU 102 temporarily stores the result of this processing in the frame memory 111. In step S9011, the CPU 102 determines whether the processing is complete for all pixels of the image capturing unit 107. If so, the processing moves to step S9012, the image is read out from the frame memory 111, the Trimap image is generated, and these items are output to the display unit 114 and the like.
As described above, the Trimap image can be generated using the focus information and the defocus amount that can be detected from the shift between the A image signal and the B image signal.
In Embodiment 90, the Trimap image was generated using the defocus amount, which is focus information. Embodiment 91 will described a method for generating a Trimap image with even higher accuracy.
Furthermore, as an adjustment function, it may be possible to freely change the threshold setting of the boundary, and different adjustment resolutions can be provided for the front focus region and the rear focus region. This is illustrated in
The above processing will be described with reference to the flowchart in
In step S9102, the CPU 102 sets a zero point, which is the center in the in-focus region. This is a midpoint between the front focus region and the rear focus region, and the boundary separation processing is performed starting from this zero point.
In step S9103, the CPU 102 sets the adjustment resolution for the front focus region. In step S9104, the CPU 102 sets the adjustment resolution for the rear focus region. These adjustment resolutions are set based on the lens information of the lens unit 106 mounted as described earlier, and are set independently for each region.
In step S9105, when the user wishes to change the boundary threshold and starts operations using the operation unit 113, the CPU 102 displays, in the display unit 114, a screen pertaining to which region to set.
In step S9106, if the user selects the front focus region, the processing moves to step S9107, where the user can change the boundary threshold of the front focus region. On the other hand, if the user selects the rear focus region, the processing moves to step S9108, where the user can change the boundary threshold of the rear focus region.
In step S9109, the CPU 102 applies the boundary threshold that has been set. In step S9110, the CPU 102 displays the boundary threshold that has been set in the display unit 114 or the like to inform the user that the setting is complete. In step S9111, when the user completes the setting operation, the processing of this flowchart ends.
As described above, by having the user set a desired boundary threshold in the front focus region and the rear focus region and making the adjustment resolution of the threshold selective, an optimal Trimap image for the shooting state can be generated.
Note that the aforementioned adjustment resolution may be used not only with model information of the lens, but also by holding a plurality of instances of information in the ROM 103 in advance as a table or the like and having the CPU 102 load that information into the RAM 104 or the like. Alternatively, the user may be allowed to set a desired adjustment resolution. It is also possible to flexibly change the adjustment resolution according to the state of the lens, such as the opening and closing state of the aperture, the operation speed of the focus lens, or the like. In addition, although the foregoing descriptions focused specifically on the front focus region and the rear focus region, the embodiment can also be implemented by adding the intermediate region (the unknown region).
Embodiment A0
When shooting a plurality of subjects, it may be necessary to have the plurality of subjects recognized as the foreground region of the Trimap. However, in the foregoing embodiments, it is possible that some of the subjects will be recognized as the background region when the distance between the subjects in the depth direction is too great. In light of this problem, the present embodiment will describe processing for generating a Trimap with all subjects set as the foreground region, even when there are a plurality of subjects.
In the present embodiment, the image processing apparatus 100 illustrated in
After this, the CPU 102 performs pattern matching with respect to the detected edge components, and extracts candidate groups for the eyes, the nose, the mouth, and the ears. Then, from the extracted eye candidate groups, the CPU 102 determines eye pairs that meet preset conditions (e.g., the distance between the two eyes, tilt, and the like) and narrows down the eye candidate groups to only groups having eye pairs. The CPU 102 then detects the face by associating the narrowed-down eye candidate groups with the other parts that form the corresponding face (the nose, mouth, and ears), and passing the image through a pre-set non-face condition filter. The CPU 102 outputs face information according to the face detection results and ends the processing. At this time, the CPU 102 stores features such as the number of faces in the RAM 104.
The Trimap generation processing according to Embodiment A0 will be described next with reference to the flowcharts in
In step SA003, the CPU 102 sets an internal variable N to 1 and sets an internal variable M to 1. In step SA004, the CPU 102 obtains the coordinates of an Nth face region from the image processing unit 105. In step SA005, the CPU 102 calculates an average defocus amount in the face region identified by the coordinates obtained in step SA004. In step SA006, the CPU 102 determines whether the average defocus amount calculated in step SA005 is less than or equal to a threshold. In other words, it is determined whether the average defocus amount in the face region is less than or equal to the threshold and the image is not blurred. If the average defocus amount is determined to be less than or equal to the threshold, the processing moves to step SA007, and if not, the processing moves to step SA013.
In step SA007, the CPU 102 sets parameters of a threshold for generating a Trimap according to the average defocus amount. The threshold here is a threshold for determining the foreground region, the background region, and the unknown region. In step SA008, the CPU 102 calculates an average relative distance in the face region identified by the coordinates obtained in step SA004.
In step SA009, the CPU 102 subtracts the average relative distance calculated in step SA008 from a relative distance of each pixel in a DepthMap (e.g., the distance information obtained by the process of step S1003 in
On the other hand, if it is determined in step SA006 that the average defocus amount is greater than the threshold, in step SA013, the CPU 102 decrements the value of the internal variable M by 1.
Following the processing of step SA010 or step SA013, in step SA011, the CPU 102 determines whether there are any unprocessed face regions. In other words, if the number of face regions obtained in step SA001 matches the internal variable N, the CPU 102 determines that there are no unprocessed face regions. If there is an unprocessed face region, the processing moves to step SA012. In step SA012, the CPU 102 increments the value of the internal variable N by 1, increments the value of the internal variable M by 1, and returns the processing to step SA004.
On the other hand, if it is determined that there are no unprocessed face regions in step SA011, in step SA014, the CPU 102 determines whether the internal variable M is 0. M=0 means that there is no face region where the average defocus amount is determined to be greater than the threshold in step SA006. This is a case when there is no need to generate a new DepthMap. If the internal variable M is determined not to be 0 in step SA014, the processing moves to step SA015.
In step SA015, the CPU 102 composites the M Trimaps generated in step SA010. This compositing is processing for generating a single Trimap by taking the logical OR of the regions determined to be the foreground region and the unknown region.
On the other hand, if the internal variable M is determined to be 0 in step SA014, or if it is determined that there is not face region in step SA002, in step SA016, the CPU 102 generates a Trimap based on the DepthMap.
As described above, according to Embodiment A0, a Trimap that takes each subject as a foreground region can be generated when there are a plurality of subjects in the image.
In Embodiment A0, there is a problem in that the processing for generating the same number of Trimaps as there are detected subjects takes a long time. In light of this problem, the present embodiment will describe processing for generating a Trimap with all subjects set as the foreground region, without generating a plurality of Trimaps, even when there are a plurality of subjects.
The Trimap generation processing according to Embodiment A1 will be described next with reference to the flowcharts in
First, the processing of step SA001 to step SA008 is the same as in
In step SA101, the CPU 102 stores the average calculated in step SA008 in the RAM 104 as an average of the Mth relative distance. The following processes from step SA011 to step SA014 are the same as in
Next, in step SA102, the CPU 102 calculates an average D of the averages of M relative distances stored in the RAM 104. In step SA103, the CPU 102 generates a new DepthMap by subtracting the average D calculated in step SA102 from the relative distance of each pixel. In step SA104, the CPU 102 sets parameters for the threshold of the unknown region determination processing according to the average of the M relative distances stored in the RAM 104 and the average D calculated in step SA102. In step SA105, the CPU 102 generates a Trimap based on the new DepthMap.
As described above, according to Embodiment A1, when there are a plurality of subjects in the image, a Trimap that takes each subject as a foreground region can be generated.
Embodiment A1 has a problem in that when there is some object between subjects, what should originally be the background region is recognized as the foreground region. In light of this problem, the present embodiment will described processing for generating a Trimap by setting parts which may be taken as background regions to be background regions when there is an object between the subjects, even when there are a plurality of subjects.
The Trimap generation processing according to Embodiment A2 will be described next with reference to the flowcharts in
First, the order of the flow from step SA001 to step SA008 is the same as in
Next, in step SA202, the CPU 102 sets the M thresholds stored in the RAM 104 and the average of the relative distances as parameters for the threshold. In step SA203, the CPU 102 generates a Trimap using the DepthMap and the parameters set in step SA202. The processing performed in step SA203 will be described in detail later with reference to
Next, the processing of step SA203 will be described in detail with reference to the flowchart shown in
Next, in step SA303, the CPU 102 sets the parameters of an Ith threshold. In step SA304, the CPU 102 determines whether the Trimap data in the process of being generated is data classified as a foreground region. If it is determined that the data is not classified as a foreground region, the processing moves to step SA305.
In step SA305, the CPU 102 determines whether the distance information to the subject is within the range of the foreground threshold determined in step SA303. If this information is determined to be within the range of the foreground threshold, the processing moves to step SA306. In step SA306, the CPU 102 classifies a region for which the distance information is determined to be within the range of the foreground threshold in step SA304 as a foreground region, and performs processing for replacing the Trimap data of that region with the foreground threshold data.
On the other hand, if the information is determined to be outside the range of the foreground threshold in step SA305, the processing moves to step SA307. In step SA307, the CPU 102 determines whether the Trimap data in the process of being generated is data classified as an unknown region. If it is determined that the data is not classified as an unknown region, the processing moves to step SA308.
In step SA308, the CPU 102 determines whether the distance information to the subject is outside the range of the background threshold determined in step SA303. If the information is determined to be outside the range of the background threshold, the processing moves to step SA309. In step SA309, the CPU 102 classifies a region for which the distance information is determined to be outside the range of the background threshold in step SA308 as a background region, and performs processing for replacing the Trimap data of that region with the background threshold data.
On the other hand, if the information is determined to be within the range of the background threshold in step SA308, the processing moves to step SA310. In step SA310, the CPU 102 classifies a region for which the distance information is determined to be within the range of the background threshold in step SA308 as an unknown region, and performs processing for replacing the Trimap data of that region with the unknown region data.
On the other hand, if it is determined that the data is classified as an unknown region in step SA307, the processing moves to step SA311. Additionally, if it is determined that the Trimap data is classified as a foreground region in step SA304, the processing moves to step SA311.
In step SA311, the CPU 102 increments the value of the internal variable I by 1, and returns the processing to step SA302.
On the other hand, if it is determined that there are no unprocessed parameters in step SA302, the processing of this flowchart ends.
As described above, according to Embodiment A2, when there are a plurality of subjects in the image and an object is present between the subjects, the object can be taken as a background region, and a Trimap can be generated with only the subject as the foreground region.
The present embodiment will describe an example in which when a plurality of subjects located at the same distance are shot, a Trimap that displays only a predetermined subject by changing the distance information outside a selected region is generated. The “predetermined subject” refers to a subject which the user wishes to display as a Trimap, and will be called a “subject of interest”.
In step SB101, the CPU 102 controls the object detection unit 115 to detect a subject in the image processed by the image processing unit 105. In the present embodiment, the processing for detecting a subject, performed by the object detection unit 115, is processing that outputs coordinate data as a processing result, and is deep learning or the like using a neural network called step Single Shot Multibox Detector (SSD), You Only Look Once (YOLO), or the like, for example. Based on the coordinate data obtained from the object detection unit 115, the CPU 102 superimposes a detection region, which indicates the region of the detected subject, onto the image processed by the image processing unit 105, and displays the resulting image in the display unit 114.
In step SB102, the user selects a detection region. Various selection methods may be employed here. For example, the user may select the detection region using a directional key of the operation unit 113 or the like. If the display unit 114 is a touch panel, a method in which the user makes the selection by directly touching a displayed detection region may be employed. Note that the number of selections is not limited to one. Based on the result of the selection made by the user, the CPU 102 superimposes the selected region, which indicates the detection region of the subject of interest, on the image processed in step SB101, and display the resulting image in the display unit 114. The selected region displayed is displayed using a bolder frame than the detection region, for example.
In step SB104, the CPU 102 determines, for each pixel of the image, whether the pixel is in the selected region. Specifically, the CPU 102 determines the coordinate positions of the selected region based on the coordinate data obtained from the object detection unit 115, and if the coordinate position of each pixel is within the range of the coordinate positions of the selected region, determines that that pixel is in the selected region. If the pixel is in the selected region, the processing moves to step SB103, and if not, the processing moves to step SB105.
In step SB105, the CPU 102 determines, for each pixel of the image, whether the pixel is in the background region. The classification of the foreground region, the background region, and the unknown regions uses the same processing as that described in Embodiment 10, and will therefore not be described here. If the pixel is in the background region, the processing moves to step SB103, and if not, the processing moves to step SB106.
In step SB106, the CPU 102 adds a predetermined offset value to the distance information (relative distance) corresponding to a pixel outside the selected region. The offset value is the value at which the pixel is determined to be in the background region after the addition. Specifically, for example, if the range of the distance information is 0 to 255 and the range of 127 to 255 is determined to be the background region, if 255 is provided as the offset value, all pixels outside the selected region will be determined to be in the background region. Note that when adding the offset value to the distance information, it is assumed that a limit is provided at a value of 255 to prevent overflow.
In step SB103, the CPU 102 generates the Trimap by performing the same processing as step S1003 to step S1008 described in Embodiment 10. The CPU 102 loads the generated Trimap into the frame memory 111, and outputs the Trimap to the display unit 114, the image terminal 109, or the network terminal 108. Note that the CPU 102 may record the Trimap into the recording medium 112.
As described above, according to the present embodiment, when shooting a plurality of subjects located at the same distance, a Trimap can be generated in which subjects aside from a subject of interest are not included in the foreground region, and only the subject of interest is displayed.
An example of generating a Trimap that displays only a subject of interest by changing the distance information outside the selected region was described with reference to
The present embodiment will describe an example in which when a plurality of subjects located at the same distance are shot, a Trimap that displays only a subject of interest by changing the color data of the Trimap outside a selected region is generated.
In step SB204, the CPU 102 determines, for each pixel of the Trimap, whether the pixel is in the selected region. The determination processing is the same as the processing of step SB104 in
In step SB205, the CPU 102 determines, for each pixel of the Trimap, whether the pixel is in the background region. The classification of the foreground region, the background region, and the unknown regions uses the same processing as that described in Embodiment 10, and will therefore not be described here. If the pixel is in the background region, the CPU 102 ends the processing of this flowchart, and if not, the CPU 102 moves the processing to step SB206.
In step SB206, the CPU 102 fills the color data of each pixel outside the selected region with a predetermined color corresponding to the background region. Specifically, for example, if the color corresponding to the background region is black, the CPU 102 fills the color data of the pixels outside the selected region with black.
The CPU 102 loads the processed Trimap into the frame memory 111, and outputs the Trimap to the display unit 114, the image terminal 109, or the network terminal 108. Note that the CPU 102 may record the Trimap into the recording medium 112.
As described above, according to the present embodiment, a Trimap that displays only the subject of interest can be generated without changing the distance information.
An example of generating a Trimap that displays only a subject of interest by changing the color data of the Trimap outside the selected region was described with reference to
The present embodiment will describe an example in which when a plurality of subjects located at the same distance are shot, a Trimap that displays only a subject of interest by changing the color data of the Trimap within a selected region is generated.
In step SB304, the CPU 102 determines, for each pixel of the Trimap, whether the pixel is in the selected region. The determination method is the same as the processing of step SB104 in
In step SB305, the CPU 102 determines, for each pixel of the Trimap, whether the pixel is in the background region. The classification of the foreground region, the background region, and the unknown regions uses the same processing as that described in Embodiment 10, and will therefore not be described here. If the pixel is in the background region, the CPU 102 ends the processing of this flowchart, and if not, the CPU 102 moves the processing to step SB306.
In step SB306, the CPU 102 fills the color data of each pixel within the selected region with a predetermined color corresponding to the background region. Note that the details of this processing are the same as step SB206 in
The CPU 102 loads the processed Trimap into the frame memory 111, and outputs the Trimap to the display unit 114, the image terminal 109, or the network terminal 108. Note that the CPU 102 may record the Trimap into the recording medium 112.
As described above, according to the present embodiment, a Trimap that displays only the subject of interest can be generated without displaying anything outside the selected region.
Outputting using Serial Digital Interface (SDI) is one method for outputting the generated Trimap to the exterior. As a method for superimposing the Trimap data on SDI, it is conceivable to convert the data into ancillary packets and multiplex those packets with an ancillary data region. Trying to generate data by packing the Trimap data efficiently may result in prohibited code. In light of the above problem, the present embodiment will describe processing for mapping data such that the data does not become prohibited code.
Stream generation processing according to Embodiment C0 will be described next with reference to the flowcharts in
In step SC002, the CPU 102 packs the Trimap data into data in which one word has 10 bits. The packing processing will be described in detail later. In step SC003, the CPU 102 generates a Y ancillary packet to be multiplexed with the Y data stream. In step SC004, the CPU 102 generates a C ancillary packet to be multiplexed with the C data stream. The processing for generating the Y ancillary packet and the C ancillary packet will be described in detail later. In step SC005, the CPU 102 multiplexes the Y ancillary packet and the C ancillary packet with the data stream. The ancillary packet multiplexing processing will be described in detail later. The processing in the flowchart in
Processing for packing the Trimap data into data having 10 bits for one word will be described next with reference to the flowcharts in
In step SC105, the CPU 102 determines whether the Trimap data of a Pth pixel is white data. In other words, the CPU 102 determines whether the Trimap data is 0x00. If the Trimap data is determined to be white data in step SC105, the processing moves to step SC106, and if not, the processing moves to step SC109.
In step SC106, the CPU 102 determines whether the value of the internal variable P is an even number. If the value is determined to be an even number, the processing moves to step SC107. In step SC107, the CPU 102 sets the white data to 0x00.
On the other hand, if the internal variable P is determined not to be an even number in step SC106, the processing moves to step SC108. In step SC108, the CPU 102 sets the white data to 0x11.
In step SC109, the CPU 102 assigns the Trimap data to the I and I+1 bits of a Wth word.
In step SC110, the CPU 102 determines whether the internal variable I is 8. If the internal variable I is determined to be 8, the processing moves to step SC111. In step SC111, the CPU 102 sets the internal variable I to 0. In step SC112, the CPU 102 increments the internal variable W by 1.
On the other hand, if the internal variable I is determined not to be 8 in step SC110, the processing moves to step SC113. In step SC113, the CPU 102 increments the internal variable I by 2.
Next, in step SC114, the CPU 102 determines whether the current pixel (the Pth pixel) is the final pixel. In other words, the number of pixels in the valid image is 1,920, and thus the CPU 102 determines whether the internal variable P is 1919. If it is determined in step SC114 that the pixel is not the final pixel, the processing moves to step SC115. In step SC115, the CPU 102 increments the value of the internal variable P by 1, and returns the processing to step SC105.
On the other hand, if it is determined in step SC114 that the pixel is the final pixel, the processing moves to step SC116. In step SC116, the CPU 102 stores the one line's worth of word data in which the Trimap data is packed in the RAM 104. In step SC117, the CPU 102 determines whether the current line (an Lth line) is the final line. For example, for a progressive image, the number of valid image lines is 1,080, and thus the CPU 102 determines whether the internal variable L is 1,080. If it is determined that the line is not the final line, the processing moves to step SC118. In step SC118, the CPU 102 increments the value of the internal variable L by 1, and returns the processing to step SC102.
On the other hand, if the line is determined to be the final line in step SC117, the processing of this flowchart ends.
The processing for generating the ancillary packet will be described next with reference to the flowcharts in
In
Details of Status are illustrated in
In
First, in step SC201, the CPU 102 sets the internal variable L to 1. In step SC202, the CPU 102 sets the internal variable W to 0. In step SC203, the CPU 102 multiplexes the Ancillary Data Flag (ADF). In step SC204, the CPU 102 multiplexes the Data ID (DID). In step SC205, the CPU 102 multiplexes the Secondary Data ID (SDID). In step SC206, the CPU 102 multiplexes the Data Count (DC). In step SC207, the CPU 102 multiplexes the Line Number (LN). In step SC208, the CPU 102 multiplexes the Status.
In step SC209, the CPU 102 determines whether the word in which the Trimap data is packed is the final word. For example, if 5 pixels are packed per word, the number of words is 384. In other words, the CPU 102 determines whether the internal variable W is 384. If it is determined in step SC209 that the word is not the final word, the processing moves to step SC210. In step SC210, the CPU 102 determines whether to generate a Y ancillary. If it is determined that the Y ancillary is to be generated, the processing moves to step SC211. In step SC211, the CPU 102 reads out the data of the Wth word of the Lth line from the RAM 104 and multiplexes that data.
On the other hand, if it is determined in step SC210 that the Y ancillary is not to be generated (i.e., that a C ancillary is to be generated), the processing moves to step SC212. In step SC212, the CPU 102 multiplexes the data of the W+1-th word of the Lth line.
In step SC213, the CPU 102 increments the value of the internal variable W by 2, and returns the processing to step SC209.
On the other hand, if it is determined in step SC209 that the word is the final word, the processing moves to step SC214. In step SC214, the CPU 102 multiplexes the CS. In step SC215, the CPU 102 stores the generated ancillary packet in the RAM 104.
In step SC216, the CPU 102 determines whether the current line (i.e., the Lth line) is the final line. For example, for a progressive image, the number of valid image lines is 1,080, and thus the CPU 102 determines whether the internal variable L is 1,080. If it is determined that the line is not the final line, the processing moves to step SC217. In step SC217, the CPU 102 increments the value of the internal variable L by 1, and returns the processing to step SC202.
On the other hand, if the line is determined to be the final line in step SC216, the processing of this flowchart ends.
The processing for multiplexing the ancillary packets will be described next with reference to the flowchart in
In step SC303, the CPU 102 determines whether the Pth pixel is a position where an ancillary packet is multiplexed. For example, the ancillary can be multiplexed from the 1,928th pixel in
In step SC304, the CPU 102 reads out the data to be multiplexed on the Pth pixel in the Y ancillary packet of the Lth line from the RAM 104 and multiplexes that data. In step SC305, the CPU 102 reads out the data to be multiplexed on the Pth pixel in the C ancillary packet of the Lth line from the RAM 104 and multiplexes that data.
Next, in step SC306, the CPU 102 determines whether the current pixel (the Pth pixel) is the final pixel. In other words, the number of pixels in one line is 2,200, and thus the CPU 102 determines whether the internal variable P is 2099. If it is determined in step SC306 that the pixel is not the final pixel, the processing moves to step SC307. In step SC307, the CPU 102 increments the value of the internal variable P by 1, and returns the processing to step SC303.
On the other hand, if it is determined in step SC306 that the pixel is the final pixel, the processing moves to step SC308. In step SC308, the CPU 102 determines whether the current line (an Lth line) is the final line. For example, for a progressive image, the number of valid image lines is 1,080, and thus the CPU 102 determines whether the internal variable L is 1,080. If it is determined that the line is not the final line, the processing moves to step SC309. In step SC309, the CPU 102 increments the value of the internal variable L by 1, and returns the processing to step SC302.
On the other hand, if the line is determined to be the final line in step SC308, the processing of this flowchart ends.
As described above, according to Embodiment C0, Trimap data can be output from SDI by packing the Trimap data and generating and multiplexing SDI ancillary packets.
Embodiment C0 has a problem in that when attempting to output a plurality of pieces of Trimap data, the auxiliary region will be insufficient and the data cannot be transmitted. In light of the above problem, the present embodiment will describe processing for mapping a plurality of pieces of Trimap data such that the prohibited code is not produced.
A structure of a 3G-SDI data stream when the framerate is 29.97 fps will be described. In the present embodiment, the image processing apparatus 100 transmits moving image data according to the SDI standard. Specifically, the image processing apparatus 100 complies with SMPTE ST 425-1 and allocates each instance of pixel data by applying the R′G′B′+A 10-bit multiplexing structure of SMPTE ST 372. Any desired data may be multiplexed on the A channel, and thus in the present embodiment, the image processing apparatus 100 multiplexes and transmits a plurality of pieces of Trimap data.
The processing according to Embodiment C1 will be described next with reference to the flowcharts in
In step SC701, the CPU 102 sets the internal variable L for counting lines to 1. In step SC702, the CPU 102 sets the internal variable P for counting pixels to 0. In step SC703, the CPU 102 sets the internal variable N for counting the Trimap to 1. In step SC704, the CPU 102 obtains a Trimap maximum number Nmax.
In step SC705, the CPU 102 determines whether the Trimap data of a Pth pixel in the Nth frame is white data. If it is determined that the Trimap data is white data, the processing moves to step SC706, and if not, the processing moves to step SC709. In step SC706, the CPU 102 determines whether the internal variable N is an odd number. If the value is determined to be an odd number, the processing moves to step SC707. In step SC707, the CPU 102 sets the white data to 0x00.
On the other hand, if the internal variable N is determined to be an even number in step SC706, the processing moves to step SC708. In step SC708, the CPU 102 sets the white data to 0x11.
Next, in step SC709, the CPU 102 assigns data to the (N*2) bit and (N*2)+1 bit of the A channel of the Pth pixel. In step SC710, the CPU 102 determines whether the internal variable N is equal to Nmax. If it is determined that N is not equal to Nmax, the processing moves to step SC711. In step SC711, the CPU 102 increments the value of the internal variable N by 1, and returns the processing to step SC705.
On the other hand, if it is determined in step SC710 that N is equal to Nmax, the processing moves to step SC712. In step SC712, the CPU 102 determines whether the current pixel (the Pth pixel) is the final pixel. In other words, the number of pixels in the valid image is 1,920, and thus the CPU 102 determines whether the internal variable P is 1919. If it is determined in step SC712 that the pixel is not the final pixel, the processing moves to step SC713. In step SC713, the CPU 102 increments the value of the internal variable P by 1, and returns the processing to step SC703.
On the other hand, if it is determined in step SC712 that the pixel is the final pixel, the processing moves to step SC714. In step SC714, the CPU 102 stores the A channel. In step SC715, the CPU 102 determines whether the current line (an Lth line) is the final line. For example, for a progressive image, the number of valid image lines is 1,080, and thus the CPU 102 determines whether the internal variable L is 1,080. If it is determined that the line is not the final line, the processing moves to step SC716. In step SC716, the CPU 102 increments the value of the internal variable L by 1, and returns the processing to step SC702.
On the other hand, if the line is determined to be the final line in step SC715, the processing of this flowchart ends.
In the present embodiment too, the CPU 102 may also generate the ancillary packets described in Embodiment C0. In the present embodiment, the CPU 102 multiplexes the packed Trimap data onto the A channel, and there is thus no need to include TrimapData in the ancillary packets. Additionally, for ancillary packets, the CPU 102 only needs to multiplex one ancillary packet anywhere in the region where an ancillary can be multiplexed.
Note that although the present embodiment describes a case of a single transmission path, the configuration is not limited thereto, and a configuration in which a plurality of transmission paths are prepared and the Trimap data is output using a different transmission path than that used for the image may be employed. Additionally, the transmission technique is not limited to SDI, and may be any transmission technique capable of image transmission, such as HDMI (registered trademark), DisplayPort (registered trademark), USB, or LAN, and a plurality of transmission paths may be prepared by combining these techniques.
Note that when a reduced Trimap is generated, the CPU 102 may output the reduced data, or the same data may be duplicated multiple times in the SDI format size.
As described above, according to Embodiment C1, a plurality of pieces of Trimap data can be output from SDI by packing the plurality of pieces of Trimap data and multiplexing the data on the A channel of SDI.
The foregoing embodiments are merely specific examples, and different embodiments can be combined as appropriate. For example, Embodiment 1 to Embodiment C1 can be partially combined and carried out in such a form. The configuration may also be such that the user is allowed to select a function from a menu display in the image processing apparatus 100 to execute the control.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2021-040695, filed Mar. 12, 2021 which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2021-040695 | Mar 2021 | JP | national |