The present invention relates to a metadata adding apparatus which adds metadata to an image captured by an imaging apparatus, and a metadata adding method.
Conventionally, many apparatuses and methods of classifying and managing captured images according to subject matter have been proposed. Among them, there are a captured image processing apparatus which classifies captured images by means of image analysis according to object, and the like (for example, see Patent Reference 1). In the apparatus, still image data which are obtained by capturing by a digital camera or the like are automatically classified and managed according to object.
In many situations, there arises a need to classify captured images according to object. Other than still images, in a live sports program in which videos from cameras that are placed in plural places are broadcast, for example, there are cases such as that where it is desired to extract video portions relating to a certain decisive moment from plural video data, and edit the video portions so that the edited video portions are continuously broadcast as videos of the same object which are taken at different angles (multi-angle videos).
Patent Reference 1: JP-A-2004-356984 (page 6, FIG. 1)
However, the conventional classification based on image analysis requires a large processing load. Therefore, it is not realistic to apply such classification to a purpose of classifying and extracting video portions in which the same object is captured, from videos each configured by plural image frames. For example, videos each configured by 30 image frames per second will be considered. In the case where predetermined videos are classified and extracted from videos each having a length of 60 seconds which are taken by three cameras, image analysis of 60×30×3=5,400 frames is required.
In the conventional classification based on image analysis, moreover, a correcting process is necessary in the case of images in which the object is captured in different manners, i.e., the angle and size of the object are different. Therefore, the recognition accuracy is sometimes poor. In the above example of a live sports program, the cameras are placed at different positions, and hence the object is always captured in different manners. Also from this point of view, it is difficult to classify and extract arbitrary portions of videos in image analysis.
For example, the case where, in a broadcast of a baseball game, a scene where a certain player hits a home run is to be continuously broadcast as videos of various angles will be considered. In such a case, conventionally, it is required to conduct an editing work in which respective videos are searched manually, i.e., visually, and pertinent portions are extracted and connected to one another.
The invention has been conducted in view of the above-discussed conventional circumstances. It is an object of the invention to provide a metadata adding apparatus and method in which search and extraction of images obtained by capturing the same region are enabled to be performed at low load and in an easy manner.
The apparatus for adding metadata of the invention is a metadata adding apparatus which adds the metadata to images captured by an imaging apparatus, and includes: a sensing information acquiring unit for acquiring sensor information relating to a capturing condition of the imaging apparatus; a focus-plane deriving unit for deriving a position of a focus plane which is an imaging plane of the captured image, based on the acquired sensor information, and a metadata adding unit for adding the derived position of the focus plane as the metadata to the captured image. According to the configuration, the position of focus plane is added as the metadata to the image, and the images are grouped on the basis of positional relationships of the focus planes. As compared with the conventional technique in which grouping is performed by image analysis, therefore, the processing load can be reduced. Consequently, search and extraction of images obtained by capturing the same region are enabled to be performed at low load and in an easy manner.
Furthermore, the metadata adding apparatus of the invention comprises: a grouping unit for grouping the images based on positional relationships among the focus planes; and an addition information recording unit for recording results of the grouping as addition information while correlating the addition information with the images. According to the configuration, a focus plane including a captured image is derived, and images are grouped on the basis of positional relationships of the focus planes. As compared with the conventional technique in which grouping is performed by image analysis, therefore, the processing load can be reduced. Consequently, search and extraction of images obtained by capturing the same region are enabled to be performed at low load and in an easy manner.
Furthermore, in the metadata adding apparatus of the invention, the grouping unit groups the images which have the focus planes intersected with each other, into a same group. According to the configuration, images can be grouped by means of calculation.
Furthermore, in the metadata adding apparatus of the invention, based on a table which stores the positional relationships among the focus planes, the grouping unit groups the images having the focus planes which are included in the positional relationships, into a same group. According to the configuration, when the positions of focus planes which are used for classifying images to the same group are previously determined, images can be grouped without conducting calculations.
The method of adding metadata of the invention is metadata adding method of adding metadata to an image captured by an imaging apparatus, and has: a sensing information acquiring step of acquiring sensor information relating to a capturing condition of the imaging apparatus; a focus-plane deriving step of deriving a position of a focus plane which is an imaging plane of the captured image, based on the acquired sensor information; and a metadata adding step of adding the derived position of the focus plane as the metadata to the captured image.
Furthermore, the metadata adding method of the invention has a grouping step of grouping the images based on positional relationships among the focus planes; and an addition information recording step of recording results of the grouping as addition information while correlating the addition information with the images.
In the metadata adding method of the invention, the grouping step groups images which have focus planes intersected with each other, into a same group.
In the metadata adding method of the invention, based on a table which stores the positional relationships among the focus planes, the grouping step groups the images having the focus planes which are included in the positional relationships, into a same group.
According to the invention, the positions of focus planes are added as metadata to images, and the images are grouped on the basis of positional relationships of the focus planes. As compared with the conventional technique in which grouping is performed by image analysis, therefore, the processing load can be reduced, and grouping of motion pictures which are obtained by capturing the same imaging region and same object can be realized at higher accuracy. Consequently, search and extraction of images obtained by capturing the same region are enabled to be performed at low load and in an easy manner.
Hereinafter, metadata adding apparatuses according to embodiments of the invention will be described in detail with reference to the accompanying drawings. In Embodiments 1 and 2, an example in which the metadata adding apparatus is executed as a multi-angle information generating apparatus is shown, and, in Embodiment 3, an example in which the metadata adding apparatus is executed as an addition information generating apparatus is shown.
The multi-angle information generating apparatus 10 includes a sensing metadata acquiring unit 101, a focus-plane metadata deriving unit 102, a grouping judging unit 103, and a multi-angle metadata recording unit 104.
The sensing metadata acquiring unit 101 acquires sensor information relating to capturing conditions of the imaging apparatuses 20. The sensing metadata acquiring unit 101 obtains sensing metadata relating to the position, azimuth, elevation angle, field angle, and focus distance of each of the imaging apparatuses via the database 30. In the embodiment, the sensing metadata are assumed to be generated by the imaging apparatuses 20. The internal structure of the imaging apparatuses 20, and the detail of the sensing metadata will be described later.
The focus-plane metadata deriving unit 102 derives focus planes which are imaging planes of images captured by the imaging apparatuses 20, based on the obtained sensing metadata, and calculates as coordinate values rectangles which indicate capturing focus planes in real spaces of the imaging apparatuses 20, on the basis of the sensing metadata. The focus-plane metadata will be described later in detail.
The grouping judging unit 103 groups images on the basis of positional relationships of the focus planes. While using the focus plane of each of the imaging apparatuses derived by the focus-plane metadata deriving unit 102, the grouping judging unit judges whether the images are obtained by capturing the same region or not, on the basis of predetermined judgment conditions.
The multi-angle metadata recording unit 104 records results of the grouping as multi-angle information with correlating the information with images, and outputs and records information which is correlated with images which are judged to be those obtained by capturing the same region, as multi-angle metadata into the database 30. The multi-angle metadata will be described later in detail.
The multi-angle information generating apparatus 10 is connected to the database 30 which stores video data from the plural imaging apparatuses 20, produces the multi-angle metadata as information related to correlation of plural video data which are obtained by capturing the same object at the same time, on the basis of the sensing metadata obtained from the imaging apparatuses, and outputs the data to the database 30. The multi-angle video searching apparatus 40 which is connected to the database 30 can search video data on the basis of the multi-angle metadata.
Next, the imaging apparatuses will be described.
The CCD 202 is driven in synchronization with a timing signal generated by the timing signal generating unit 204 connected to the driving circuit 203, and outputs an image signal of an object image which is incident through the lens group 201, to the sampling unit 205.
The sampling unit 205 samples the image signals at a sampling rate which is specific to the CCD 202. The A/D converting unit 206 converts the image signal output from the CCD 202 to digital image data, and outputs the data to the video file generating unit 207.
The video address generating unit 208 starts to produce a video address in response to a signal from the timing signal generating unit 204. The video identifier generating unit 209 issues and adds an identifier (for example, a file name or an ID) which correlates a video with sensing metadata described later.
The machine information sensor 210 is configured by a GPS (Global Positioning System) receiver, a gyro sensor, an azimuth sensor, a range sensor, and a field angle sensor.
The GPS receiver receives radio waves from satellites to obtain distances from three or more artificial satellites the positions of which are previously known, whereby the three-dimensional position (latitude, longitude, altitude) of the GPS receiver itself can be obtained. When this function is used, it is possible to obtain the absolute position of the imaging apparatus on the earth.
The gyro sensor is generally called a three-axis acceleration sensor, and uses the gravity of the earth to detect the degree of acceleration in the direction of an axis as viewed from the sensor, i.e., the degree of inclination in the direction of an axis as a numerical value. When this function is used, it is possible to obtain the inclination (azimuth angle, elevation angle) of the imaging apparatus.
The azimuth sensor is generally called an electronic compass, and uses the magnetism of the earth to detect the direction of north, south, east, or west on the earth. When the gyro sensor is combined with the azimuth sensor, it is possible to indicate the absolute direction of the imaging apparatus on the earth.
The range sensor is a sensor which measure the distance to the object. The sensor emits an infrared ray or an ultrasonic wave from the imaging apparatus toward the object, and can know the distance from the imaging apparatus to the object, i.e., the focus distance by which focusing is to be obtained, from the time which elapses until the imaging apparatus receives the reflection.
The field angle sensor can obtain the field angle from the focal length and the height of the CCD. The focal length can be obtained by measuring the distance between a lens and a light receiving portion, and the height of the light receiving portion is a value which is specific to the imaging apparatus.
On the bases of an output request from the sensing metadata 211, the machine information sensor 210 outputs sensing information relating to the position of the imaging apparatus, the azimuth which will be used as a reference, the azimuth angle, the elevation angle, the field angle, and the focus distance, from the GPS (Global Positioning System) receiver, the gyro sensor, the azimuth sensor, the range sensor, and the field angle sensor. The sensing metadata generating unit 211 obtains the sensing information from the machine information sensor 210 in accordance with a video address generating timing from the video address generating unit 208, produces the sensing metadata, and outputs the data to the recording unit 212. The machine information sensor 210 and the sensing metadata generating unit 211 start to operate in response to a signal from the timing signal generating unit 204.
The production and output of the sensing information are not related to the primary object of the present application, and therefore detailed description of the operation of the sensor is omitted.
The acquisition of the sensing information may be performed at the sampling rate ( 1/30 sec.) of the CCD, or may be performed every several frames.
In the case where photographing is performed indoors, or where a GPS sensor does not operate, the position information of the capturing place may be manually input. In this case, position information which is input through inputting unit that is not shown is input into the machine information sensor.
Hereinafter, the sensing metadata generating operation of the imaging apparatus having the above-described configuration will be described.
First, when depression of a predetermined switch of a main unit of the imaging apparatus, or the like is performed, a capturing start signal is received (step S101). Then, the imaging apparatus 20 starts a video recording process (step S102), and the imaging apparatus 20 starts a process of generating the sensing metadata (step S103). When the timing signal generating unit 204 receives a capturing end signal, the imaging apparatus 20 terminates the video recording process and the sensing metadata generating process (step S104).
The video recording process which is started in step S102, and the sensing metadata generating process which is started in step S103 will be described with reference to
A video electric signal from the CCD 202 is acquired (step S204), the sampling unit 205 performs sampling on the acquired signal (step S205), and the A/D converting unit 206 performs conversion to digital image data (step S206).
A video address generated by the video address generating unit 208 is acquired in response to an instruction command from the timing signal generating unit 204 (step S207), and a video file is generated by the video file generating unit 207 (step S208). Furthermore, the video identifier generated by the video identifier generating unit 209 is added (step S209), and the final video file is recorded into the recording unit 212 (step S210).
Next, the sensing metadata generating unit 211 records the camera position, the azimuth angle, the elevation angle, the field angle, and the focus distance together with the video identifier and video address which are acquired, produces and outputs the sensing metadata (step S305), and records the data into the recording unit 212 (step S306).
Next, a multi-angle information generating operation of the multi-angle information generating apparatus having the above-described configuration will be described.
First, the sensing metadata acquiring unit 101 of the multi-angle information generating apparatus 10 acquires all sensing metadata of a group of videos which are taken at the same time by the plural imaging apparatuses 20 (step S401). Next, the focus-plane metadata deriving unit 102 derives focus-plane metadata on the basis of the acquired sensing metadata (step S402).
Then, the focus-plane metadata deriving unit 102 determines whether the derivation of focus-plane metadata is completed for all of sensing metadata or not. If not completed, the operation of deriving focus-plane metadata in step S402 is repeated. By contrast, if the derivation of focus-plane metadata is completed for all of sensing metadata, the process then transfers to the operation of generating multi-angle metadata (step S403). Next, the grouping judging unit 103 produces multi-angle metadata on the basis of the focus-plane metadata acquired from the focus-plane metadata deriving unit 102 (step S404).
Finally, the multi-angle metadata recording unit 104 outputs the multi-angle metadata acquired from the grouping judging unit 103, toward the database 30 (step S405).
The operation of deriving focus-plane metadata in step S402 will be described with reference to
The flowchart of
In the case where, as shown in
Next, from the camera direction vector (e, f, g) and the camera position (a, b, c), the equation of the straight line passing the camera position (a, b, c) and the focus point can be derived. When an intermediate parameter z is used, the equation of the straight line can be expressed as (ez, fz, gz). From the equation of the straight line, the coordinates which are on the straight line, and which are separated by a distance L from the camera position (a, b, c) can be derived as a focus point. The expression for obtaining is L=√(ez−a)2+(fz−b)2+(gz−c)2. The intermediate parameter z is derived from this expression. When L=√(ez−a)2+(fz−b)2+(gz−c)2 is solved, z={(ae+bf+cg)±√(ae+bf+cg)2−(e+f+g)(a2+b2+c2−L2)}/(e+f+g) is obtained, and the focus point is attained by substituting the obtained z in (ez, fz, gz) (step S503).
The obtained focus point is expressed as (h, i, j). The equation of the focus plane can be derived from the normal vector (e, f, g) and the focus point (h, i, j). The equation of the focus plane is ex+fy+gz=eh+fi+gj (step S504).
From the field angle of 2γ deg., the distance from the camera position (a, b, c) to the boundary coordinates of the focus plane is L/cos γ. It can be the that the boundary coordinates are coordinates which exist on a sphere centered at the camera position (a, b, c) and having a radius of L/cos γ, and in the focus plane obtained in the above. The equation of the sphere centered at the camera position (a, b, c) and having a radius of L/cos γ is (x−a)2+(y−b)2+(z−c)2=(L/cos γ)2.
The features of the plane to be captured by the camera, i.e., those that a horizontal shift does not occur (namely, the height (z-axis) of the upper side of the plane is constant, and also the height (z-axis) of the lower side is constant), and that the ratio of the length and the width in the focus plane is fixed are used as conditions for solving the equation. Since z is constant (namely, the height (z-axis) of the upper side of the plane is constant, and also the height (z-axis) of the lower side is constant), z can be set as two values z1 and z2. From the above, equations of ex+fy+gz1=eh+fi+gj, ex+fy+gz2=eh+fi+gj, (x−a)2+(y−b)2+(z1−c)2=(L/cos γ)2, and (x−a)2+(y−b)2+(z2−c)2=(L/cos γ)2 are obtained.
When the four equations are solved, four boundary coordinates in which the values of x and y are expressed respectively by z1 and z2 can be derived. First, the case where z is z1 or ex+fy+gz1=eh+fi+gj and (x−a)2+(y−b)2+(z1−c)2=(L/cos γ)2 will be considered. For the sake of simplicity, eh+fi+gj−gz1=A, (z1−c)2=B, and (L/cos γ)2=C are set, and then x+fy+gz1=A and (x−a)2+(y−b)2+B=C are obtained. When x is eliminated from the two equations and A−ea=D, e2(B−C)=E, e2+f2=F, −(2DF+2be2)=G, and e2b2+E=H are set, Fy2+Gy+H=0 is obtained, and the value of y is y=(−G±√G2−4FH). Similarly, x=(A−f(−G±√G2−4FH)/2F) can be obtained. For the sake of simplicity, the obtained x and y are set as X1, Y1, X2, Y2, respectively.
Next, x and y are obtained also in the case where z is z2 or ex+fy+gz2=eh+fi+gj and (x−a)2+(y−b)2+(z2−c)2=(L/cos γ)2. The deriving method in the case of z2 is identical with that in the case of z1, and hence its description is omitted. The obtained x and y are set as X3, Y3, X4, Y4, respectively. Therefore, the four boundary coordinates are (X1, Y1, Z1), (X2, Y2, Z1), (X3, Y3, Z2), and (X4, Y4, Z2).
Since the ratio of the length and the width in the focus plane is fixed (here, length: width=P:Q), the length of the upper side: the length of the right side=P:Q and the length of the lower side: the length of the left side=P:Q can be derived. Diagrammatically, (X1, Y1, Z1), (X2, Y2, Z1), (X3, Y3, Z2), and (X4, Y4, Z2) are set as the upper left (X1, Y1, Z1), the upper right (X2, Y2, Z1), the lower left (X3, Y3, Z2), and the lower right (X4, Y4, Z2). The length of the upper side=√(X1−X2)2+(Y1−Y2)2, the length of the right side=√(X2−X4)2+(Y2−Y4)2+(Z1−Z2)2, the length of the lower side=√(X3−X4)2+(Y3−Y4)2, and the length of the left side=√(X1−X3)2+(Y1−Y3)2+(Z1−Z2)2. Therefore, √(X1−X2)2+(Y1−Y2)2:√(X2−X4)2+(Y2−Y4)2+(Z1−Z2)2=P:Q, and √(X3−X4)2+(Y3−Y4)2:√(X1−X3)2+(Y1−Y3)2+(Z1−Z2)2=P:Q are attained, and two equations can be obtained. The upper left (X1, Y1, Z1), the upper right (X2, Y2, Z1), the lower left (X3, Y3, Z2), and the lower right (X4, Y4, Z2) are values expressed by z1 and z2. When the replacement for the simplification is returned to the original one, therefore, simultaneous equations for z1 and z2 can be obtained from √(X1−X2)2+(Y1−Y2)2:√(X2−X4)2+(Y2−Y4)2+(Z1−Z2)2=P:Q, and √(X3−X4)2+(Y3−Y4)2:√(X1−X3)2+(Y1−Y3)2+(Z1−Z2)2=P:Q, and z1 and z2 can be obtained. The expressions of z1 and z2 are complicated, and hence their description is omitted. When the obtained z1 and z2 are substituted in the upper left (X1, Y1, Z1), the upper right (X2, Y2, Z1), the lower left (X3, Y3, Z2), and the lower right (X4, Y4, Z2), it is possible to obtain boundary coordinates. The obtained boundary coordinates are set as the upper left (k, l, m), the upper right (n, o, p), the lower left (q, r, s), and the lower right (t, u, v) (step S505).
Finally, the focus-plane metadata deriving unit 102 adds the calculated boundary coordinate information of the four point to sensing metadata for each of the video addresses, to produce the data as focus-plane metadata (step S506).
Hereinafter, the method of deriving the focus plane and the boundary coordinates will be described by actually using the sensing metadata of
Next, from the normal vector (−1, 0, 0) and the camera position (1, 0, 0), it is possible to obtain the equation of a straight line in which the normal vector is (−1, 0, 0), and which passes the camera position (1, 0, 0). The equation of the straight line is y=0, z=0. The coordinates which is on the straight line, and in which the focus distance from the camera position (1, 0, 0) is 1, i.e., the coordinates of the focus point are (0, 0, 0) from the equation of the straight line y=0, z=0 and the focus distance=1.
Next, from the coordinates (0, 0, 0) of the focus point and the normal vector (−1, 0, 0), the equation of the focus plane is derived. From the coordinates (0, 0, 0) of the focus point and the normal vector (−1, 0, 0), the equation of the focus plane is x=0.
Since the field angle is 90 deg., the distance to the boundary coordinates on the focus plane is 1/cos 45°, i.e., √2. It can be said that the boundary coordinates exist on a sphere having a radius of √2 and centered at the camera position (1, 0, 0), and in the focus plane. The equation of the sphere having a radius of √2 and centered at the camera position (1, 0, 0) is (x −1)2+y2+Z2=2. From the sphere equation (x−1)2+y2+z2=2 and the equation of the focus plane x=0, y2+z2=1 can be derived. When it is assumed that the screen size captured by the camera has a ratio of the length and width of 4:3, z=4/3y is obtained. When solving y2+z2=1 and z=4/3y, y=±3/5 and z=±4/5 can be derived. Therefore, the boundary coordinates are (0, 3/5, 4/5), (0, −3/5, 4/5), (0, −3/5, −4/5), and (0, 3/5, −4/5).
Next, the operation of generating multi-angle metadata in step S404 will be described with reference to
The grouping judging operation in step S603 will be described with reference to
Hereinafter, the grouping judging method will be described by actually using the focus-plane metadata of
Next, it is judged whether the intersection line of the plane equations is within the boundary coordinates or not. In the boundary ranges of −3/5≦x≦3/5, −3/5≦y≦3/5, and −4/5≦z≦4/5 expressed by the boundary coordinates “(0, 3/5, 4/5), (0, −3/5, 4/5), (0, −3/5, −4/5), and (0, 3/5, −4/5)” and “(3/5, 0, 4/5), (−3/5, 0, 4/5), (−3/5, 0, −4/5), and (3/5, 0, −4/5)” of the two planes “x=0” and “y=0”, the obtained equation of the intersection line “x=0, y=0” is x=0 and y=0 between −4/5≦z≦4/5, and can be judged to be within the boundary ranges of −3/5≦x≦3/5, −3/5≦y≦3/5, and −4/5≦z≦4/5. Therefore, it is judged that the two focus planes intersect with each other, or that the video data are obtained by capturing the same object. Then, the video identifier “543210” is added to the focus-plane metadata in which “Video identifier” is “012345”, to be generated as multi-angle metadata. The video identifier “012345” is added to the focus-plane metadata in which “Video identifier” is “543210”, to be generated as multi-angle metadata.
As described above, multi-angle metadata are recorded while being correlated with corresponding video data. By using multi-angle metadata, therefore, the multi-angle video searching apparatus 40 can search and extract video data which are obtained by capturing the same object at the same time.
In the embodiment, the configuration example in which the imaging apparatuses are separated from the multi-angle information generating apparatus has been described. Alternatively, the imaging apparatus may include a sensing metadata acquiring unit and a focus-plane metadata deriving unit.
In the embodiment, video data are correlated with various metadata by using a video identifier. Alternatively, various metadata may be converted into streams, and then multiplexed to video data, so that a video identifier is not used.
In the grouping judgment, the grouping judgment may be performed in the following manner. The focus distance is extended or contracted in accordance with the depth of field which is a range in front and rear of the object where focusing seems to be attained. Then, a focus plane is calculated for each focus distance.
Therefore, the work burden in a case such as where multi-angle videos are edited can be remarkably improved.
Next, an example in which, in the grouping judgment, the grouping judgment is performed under other judgment conditions will be described. The configurations of the multi-angle information generating apparatus and the multi-angle information generating system, and the procedure of the multi-angle information generating operation are identical with those of Embodiment 1, and hence their description is omitted.
In Embodiment 2, the grouping of images is performed on the basis of a table which stores position information of a focus plane for grouping images into the same group. In Embodiment 2, namely, the grouping judging unit 103 incorporates a table describing a grouping rule, and “judgment of existence in a predetermined region of a focus plane” is performed based on the table.
The grouping judging method will be described by actually using the focus-plane metadata of
The judgment on existence in a predetermined region may be performed depending on whether all of focus plane boundary coordinates exists in the region or not, or whether at least one set of coordinates exists in the region or not.
In the embodiment, the grouping rule may be changed in accordance with the situations. The table describing the grouping rule may not be disposed within the grouping judging unit. A configuration where the table is disposed in an external database, and the grouping judging unit refers the external table may be employed.
The embodiment may be configured so that sensing metadata are generated only when sensing information is changed. In the configuration, the data amount to be processed is reduced, and the processing speed can be improved. Actually, it is expected that adjacent image frames often have the same multi-angle information. Therefore, multi-angle metadata may not be generated for each image frame, and multi-angle metadata having a data structure indicating only corresponding relationships between a video address and multi-angle information may be generated. In this case, the data amount to be processed is reduced, and the processing speed can be improved. Furthermore, multi-angle metadata may not be generated for each image frame, but may be generated for each of groups which are classified by the grouping judging unit. According to the configuration, a process of duplicately recording the same information into metadata of respective video data is reduced, and the processing speed can be improved.
The embodiment is configured so that sensing metadata are generated by the imaging apparatus. The invention is not restricted to this. For example, sensing metadata are obtained from the outside of the imaging apparatus.
In Embodiments 1 and 2, the example where images which are started to be captured at the same time by plural imaging apparatuses are grouped has been described. In the embodiment, an example where images which are captured at different times by a single imaging apparatus are grouped will be described. In Embodiments 1 and 2, namely, N-th frames of all video data are subjected to the judgment whether images are obtained by capturing the same region or not. By contrast, in the embodiment, judgment is made on combinations of all frames of video data.
The addition information generating apparatus 1010 includes a sensing metadata acquiring unit 1101, a focus-plane metadata deriving unit 1102, a grouping judging unit 1103, and a metadata recording means 1104.
The sensing metadata acquiring unit 1101 acquires sensor information relating to capturing conditions of the imaging apparatus 1020. The sensing metadata acquiring unit 1101 obtains sensing metadata relating to the position, azimuth, elevation angle, field angle, and focus distance of each of the imaging apparatuses via the database 1030. In the embodiment, the sensing metadata are assumed to be generated by the imaging apparatus 1020. The internal structure of the imaging apparatuses 1020, and the detail of the sensing metadata will be described later.
The focus-plane metadata deriving unit 1102 derives focus planes which include images captured by the imaging apparatus 1020, based on the obtained sensing metadata, and calculates as coordinate values rectangles which indicate capturing focus planes in a real space of the imaging apparatus 1020, on the basis of the sensing metadata. The focus-plane metadata will be described later in detail.
The grouping judging unit 1103 groups images on the basis of positional relationships of the focus planes. While using the focus plane derived by the focus-plane metadata deriving unit 1102, the grouping judging unit judges whether the images are obtained by capturing the same region or not, on the basis of predetermined judgment conditions.
The metadata recording unit 1104 records results of the grouping as addition information with correlating the information with images, and outputs and records information which is correlated with images judged to be those obtained by capturing the same region, as addition metadata into the database 1030. The addition metadata will be described later in detail.
The addition information generating apparatus 1010 is connected to the database 1030 which stores video data from the imaging apparatus 1020, produces the addition metadata as information related to plural video data which are obtained by capturing the same object, on the basis of the sensing metadata obtained from the imaging apparatus, and outputs the data to the database 1030. The video searching apparatus 1040 which is connected to the database 1030 can search video data on the basis of the addition metadata.
Next, the imaging apparatus will be described.
The CCD 1202 is driven in synchronization with a timing signal generated by the timing signal generating unit 1204 connected to the driving circuit 1203, and outputs an image signal of an object image which is incident through the lens group 1201, to the sampling unit 1205.
The sampling unit 1205 samples the image signal at a sampling rate which is specific to the CCD 1202. The A/D converting unit 1206 converts the image signal output from the CCD 1202 to digital image data, and outputs the data to the video file generating unit 1207.
The video address generating unit 1208 starts to produce a video address in response to a signal from the timing signal generating unit 1204. The video identifier generating unit 1209 issues and adds an identifier (for example, a file name or an ID) which correlates a video with sensing metadata described later.
The machine information sensor 1210 is configured by a GPS (Global Positioning System) receiver, a gyro sensor, an azimuth sensor, a range sensor, and a field angle sensor.
The GPS receiver receives radio waves from satellites to obtain distances from three or more artificial satellites the positions of which are previously known, whereby the three-dimensional position (latitude, longitude, altitude) of the GPS receiver itself can be obtained. When this function is used, it is possible to obtain the absolute position of the imaging apparatus on the earth.
The gyro sensor is generally called a three-axis acceleration sensor, and uses the gravity of the earth to detect the degree of acceleration in the direction of an axis as viewed from the sensor, i.e., the degree of inclination in the direction of an axis as a numerical value. When this function is used, it is possible to obtain the inclination (azimuth angle, elevation angle) of the imaging apparatus.
The azimuth sensor is generally called an electronic compass, and uses the magnetism of the earth to detect the direction of north, south, east, or west on the earth. When the gyro sensor is combined with the azimuth sensor, it is possible to indicate the absolute direction of the imaging apparatus on the earth.
The range sensor is a sensor which measure the distance to the object. The sensor emits an infrared ray or an ultrasonic wave from the imaging apparatus toward the object and can know the distance from the imaging apparatus to the object, i.e., the focus distance by which focusing is to be obtained, from the time which elapses until the imaging apparatus receives the reflection.
The field angle sensor can obtain the field angle from the focal length and the height of the CCD. The focal length can be obtained by measuring the distance between a lens and a light receiving portion, and the height of the light receiving portion is a value which is specific to the imaging apparatus.
On the bases of an output request from the sensing metadata 1211, the machine information sensor 1210 outputs sensing information relating to the position of the imaging apparatus, the azimuth which will be used as a reference, the azimuth angle, the elevation angle, the field angle, and the focus distance, from the GPS (Global Positioning System) receiver, the gyro sensor, the azimuth sensor, the range sensor, and the field angle sensor. The sensing metadata generating unit 1211 obtains the sensing information from the machine information sensor 1210 in accordance with a video address generating timing from the video address generating unit 1208, produces the sensing metadata, and outputs the data to the recording unit 1212. The machine information sensor 1210 and the sensing metadata generating unit 1211 start to operate in response to a signal from the timing signal generating unit 1204.
The production and output of the sensing information are not related to the primary object of the present application, and therefore detailed description of the operation of the sensor is omitted.
The acquisition of the sensing information may be performed at the sampling rate ( 1/30 sec.) of the CCD, or may be performed every several frames.
In the case where capturing is performed indoors, or where a GPS sensor does not operate, the position information of the capturing place may be manually input. In this case, position information which is input through inputting unit that is not shown is input into the machine information sensor.
Hereinafter, the sensing metadata generating operation of the imaging apparatus having the above-described configuration will be described.
First, when depression of a predetermined switch of a main unit of the imaging apparatus, or the like is performed, a capturing start signal is received (step S1101). Then, the imaging apparatus 1020 starts a video recording process (step S1102), and the imaging apparatus 1020 starts a process of generating the sensing metadata (step S1103). When the timing signal generating unit 1204 receives a capturing end signal, the imaging apparatus 1020 terminates the video recording process and the sensing metadata generating process (step S1104).
The video recording process which is started in step S1102, and the sensing metadata generating process which is started in step S1103 will be described with reference to
A video electric signal from the CCD 1202 is acquired (step S1204), the sampling unit 1205 performs sampling on the acquired signal (step S1205), and the A/D converting unit 1206 performs conversion to digital image data (step S1206).
A video address generated by the video address generating unit 1208 is acquired in response to an instruction command from the timing signal generating unit 1204 (step S1207), and a video file is generated by the video file generating unit 1207 (step S1208). Furthermore, the video identifier generated by the video identifier generating unit 1209 is added (step S1209), and the final video file is recorded into the recording unit 1212 (step S1210).
Next, the sensing metadata generating unit 1211 records the camera position, the azimuth angle, the elevation angle, the field angle, and the focus distance together with the video identifier and video address which are acquired, produces and outputs the sensing metadata (step S1305), and records the data into the recording unit 1212 (step S1306).
Next, an addition information generating operation of the addition information generating apparatus having the above-described configuration will be described.
First, the sensing metadata acquiring unit 1101 of the addition information generating apparatus 1010 acquires all sensing metadata of a group of videos which are taken by the imaging apparatus 1020 (step S1401). Next, the focus-plane metadata deriving unit 1102 derives focus-plane metadata on the basis of the acquired sensing metadata (step S1402).
Then, the focus-plane metadata deriving unit 1102 determines whether the derivation of focus-plane metadata is completed for all of sensing metadata or not. If not completed, the operation of deriving focus-plane metadata in step S1402 is repeated. By contrast, if the derivation of focus-plane metadata is completed for all of sensing metadata, the process then transfers to the operation of generating addition metadata (step S1403). Next, the grouping judging unit 1103 produces addition metadata on the basis of the focus-plane metadata acquired from the focus-plane metadata deriving unit 1102 (step S1404).
Finally, the metadata recording unit 1104 outputs the addition metadata acquired from the grouping judging unit 1103, toward the database 1030 (step S1405).
The operation of deriving focus-plane metadata in step S1402 will be described with reference to
The flowchart of
In the case where, as shown in
Next, from the camera direction vector (e, f, g) and the camera position (a, b, c), the equation of the straight line passing the camera position (a, b, c) and the focus point can be derived. When an intermediate parameter z is used, the equation of the straight line can be expressed as (ez, fz, gz). From the equation of the straight line, the coordinates which are on the straight line, and which are separated by a distance L from the camera position (a, b, c) can be derived as a focus point. The expression for obtaining is L=√(ez−a)2+(fz−b)2+(gz−c)2. The intermediate parameter z is derived from this expression. When L=√(ez−a)2+(fz−b)2+(gz−c)2 is solved, z={(ae+bf+cg)±√(ae+bf+cg)2−(e+f+g)(a2+b2+c2−L2)}/(e+f+g) is obtained, and the focus point is attained by substituting the obtained z in (ez, fz, gz) (step S1503).
The obtained focus point is expressed as (h, i, j). The equation of the focus plane can be derived from the normal vector (e, f, g) and the focus point (h, i, j). The equation of the focus plane is ex+fy+gz eh+fi+gj (step S1504).
From the field angle of 2γ deg., the distance from the camera position (a, b, c) to the boundary coordinates of the focus plane is L/cos γ. It can be the that the boundary coordinates are coordinates which exist on a sphere centered at the camera position (a, b, c) and having a radius of L/cos γ, and in the focus plane obtained in the above. The equation of the sphere centered at the camera position (a, b, c) and having a radius of L/cos γ, is (x−a)2+(y−b)2+(z −c)2=(L/cos γ)2.
The features of the plane to be captured by the camera, i.e., those that a horizontal shift does not occur (namely, the height (z-axis) of the upper side of the plane is constant, and also the height (z-axis) of the lower side is constant), and that the ratio of the length and the width in the focus plane is fixed are used as conditions for solving the equation. Since z is constant (namely, the height (z-axis) of the upper side of the plane is constant, and also the height (z-axis) of the lower side is constant), z can be set as two values z1 and z2. From the above, equations of ex+fy+gz1=eh+fi+gj, ex+fy+gz2=eh+fi+gj, (x −a)2+(y−b)2+(z1−c)2=(L/cos γ)2, and (x−a)2+(y−b)2+(z2−c)2=(L/cos γ)2 are obtained.
When the four equations are solved, four boundary coordinates in which the values of x and y are expressed respectively by z1 and z2 can be derived. First, the case where z is z1 or ex+fy+gz1=eh+fi+gj and (x−a)2+(y−b)2+(Z1−c)2=(L/cos γ)2 will be considered. For the sake of simplicity, eh+fi+gj−gz1=A, (z1−c)2=B, and (L/cos γ)2=C are set, and then x+fy+gz1=A and (x−a)2+(y−b)2+B=C are obtained. When x is eliminated from the two equations and A−ea=D, e2(B−C)=E, e2+f2=F, −(2DF+2be2)=G, and e2b2+E=H are set, Fy2+Gy+H=0 is obtained, and the value of y is y=(−G±√G2−4FH). Similarly, x=(A−f(−G±√G2−4FH)/2F) can be obtained. For the sake of simplicity, the obtained x and y are set as X1, Y1, X2, Y2, respectively.
Next, x and y are obtained also in the case where z is z2 or ex+fy+gz2=eh+fi+gj and (x−a)2+(y−b)2+(z2−c)2=(L/cos γ)2. The deriving method in the case of z2 is identical with that in the case of z1, and hence its description is omitted. The obtained x and y are set as X3, Y3, X4, Y4, respectively. Therefore, the four boundary coordinates are (X1, Y1, Z1), (X2, Y2, Z1), (X3, Y3, Z2), and (X4, Y4, Z2).
Since the ratio of the length and the width in the focus plane is fixed (here, length:width=P:Q), the length of the upper side:the length of the right side=P:Q and the length of the lower side:the length of the left side=P:Q can be derived. Diagrammatically, (X1, Y1, Z1), (X2, Y2, Z1), (X3, Y3, Z2), and (X4, Y4, Z2) are set as the upper left (X1, Y1, Z1), the upper right (X2, Y2, Z1), the lower left (X3, Y3, Z2), and the lower right (X4, Y4, Z2). The length of the upper side=√(X1−X2)2+(Y1−Y2)2, the length of the right side=√(X2−X4)2+(Y2−Y4)2+(Z1−Z2)2, the length of the lower side=√(X3−X4)2+(Y3−Y4)2, and the length of the left side=√(X1−X3)2+(Y1−Y3)2+(Z1−Z2)2. Therefore, √(X1−X2)2+(Y1−Y2)2:√(X2−X4)2+(Y2−Y4)2+(Z1−Z2)2=P:Q, and √(X3−X4)2+(Y3−Y4)2:√(X1−X3)2+(Y1−Y3)2+(Z1−Z2)2=P:Q are attained, and two equations can be obtained. The upper left (X1, Y1, Z1), the upper right (X2, Y2, Z1), the lower left (X3, Y3, Z2), and the lower right (X4, Y4, Z2) are values expressed by z1 and z2. When the replacement for the simplification is returned to the original one, therefore, simultaneous equations for z1 and z2 can be obtained from √(X1−X2)2+(Y1−Y2)2:√(X2−X4)2+(Y2−Y4)2+(Z1−Z2)2=P:Q, and √(X3−X4)2+(Y3−Y4)2:√(X1−X3)2+(Y1−Y3)2+(Z1−Z2)2=P:Q, and z1 and z2 can be obtained. The expressions of z1 and z2 are complicated, and hence their description is omitted. When the obtained z1 and z2 are substituted in the upper left (X1, Y1, Z1), the upper right (X2, Y2, Z1), the lower left (X3, Y3, Z2), and the lower right (X4, Y4, Z2), it is possible to obtain boundary coordinates. The obtained boundary coordinates are set as the upper left (k, l, m), the upper right (n, o, p), the lower left (q, r, s), and the lower right (t, u, v) (step S505).
Finally, the focus-plane metadata deriving unit 1102 adds the calculated boundary coordinate information of the four point to sensing metadata for each of the video addresses, to produce the data as focus-plane metadata (step S1506).
Hereinafter, the method of deriving the focus plane and the boundary coordinates will be described by actually using the sensing metadata of
Next, from the normal vector (−1, 0, 0) and the camera position (1, 0, 0), it is possible to obtain the equation of a straight line in which the normal vector is (−1, 0, 0), and which passes the camera position (1, 0, 0). The equation of the straight line is y=0, z=0. The coordinates which is on the straight line, and in which the focus distance from the camera position (1, 0, 0) is 1, i.e., the coordinates of the focus point are (0, 0, 0) from the equation of the straight line y=0, z=0 and the focus distance=1.
Next, from the coordinates (0, 0, 0) of the focus point and the normal vector (−1, 0, 0), the equation of the focus plane is derived. From the coordinates (0, 0, 0) of the focus point and the normal vector (−1, 0, 0), the equation of the focus plane is x=0.
Since the field angle is 90 deg., the distance to the boundary coordinates on the focus plane is 1/cos 45°, i.e., √2. It can be said that the boundary coordinates exist on a sphere having a radius of √2 and centered at the camera position (1, 0, 0), and in the focus plane. The equation of the sphere having a radius of √2 and centered at the camera position (1, 0, 0) is (x −1)2+y2+z2=2. From the sphere equation (x−1)2+y2+z2=2 and the equation of the focus plane x=0, y2+z2=1 can be derived. When it is assumed that the screen size captured by the camera has a ratio of the length and width of 4:3, z=4/3y is obtained. When solving y2+Z2=1 and z=4/3y, y=±3/5 and z=±4/5 can be derived. Therefore, the boundary coordinates are (0, 3/5, 4/5), (0, −3/5, 4/5), (0, −3/5, −4/5), and (0, 3/5, −4/5).
Next, the operation of generating addition metadata in step S1404 will be described with reference to
Next, the pattern number N of the combinations is initialized to 1 (step S1603), and the grouping judging unit 1103 executes the grouping judging operation on the N-th pattern to produce addition metadata (step S1604). Next, the grouping judging unit 103 outputs the generated addition metadata to the metadata recording unit 104 (step S1605). Then, the constant N is incremented by 1 (step S1606), and the grouping judging unit 1103 judges whether the next combination pattern (N-th pattern) exists or not (step S1607). If the next combination pattern exists, the process returns to step S1604, and repeats the addition metadata generating operation. By contrast, if the next combination pattern does not exist, the addition metadata generating operation is ended.
The grouping judging operation in step S1604 will be described with reference to
Hereinafter, the grouping judging method will be described by actually using the focus-plane metadata of
Next, it is judged whether the intersection line of the plane equations is within the boundary coordinates or not. In the boundary ranges of −3/5≦x≦3/5, −3/5≦y≦3/5, and −4/5≦z≦4/5 expressed by the boundary coordinates “(0, 3/5, 4/5), (0, −3/5, 4/5), (0, −3/5, −4/5), and (0, 3/5, −4/5)” and “(3/5, 0, 4/5), (−3/5, 0, 4/5), (−3/5, 0, −4/5), and (3/5, 0, −4/5)” of the two planes “x=0” and “y=0”, the obtained equation of the intersection line “x=0, y=0” is x=0 and y=0 between −4/5≦z≦4/5, and can be judged to be within the boundary ranges of −3/5≦x≦3/5, −3/5≦y≦3/5, and −4/5≦z≦4/5. Therefore, it is judged that the two focus planes intersect with each other, or that the video data are obtained by capturing the same object. Then, the video identifier “543210” is added to the focus-plane metadata in which “Video identifier” is “012345”, to be generated as addition metadata. The video identifier “012345” is added to the focus-plane metadata in which “Video identifier” is “543210”, to be generated as addition metadata.
As described above, metadata are recorded while being correlated with corresponding video data. By using metadata, therefore, the video searching apparatus 1040 can search and extract video data which are obtained by capturing the same object at different times.
In the embodiment, the configuration example in which the imaging apparatus is separated from the addition information generating apparatus has been described. Alternatively, the imaging apparatus may include a sensing metadata acquiring unit and a focus-plane metadata deriving unit.
In the embodiment, video data are correlated with various metadata by using a video identifier. Alternatively, various metadata may be converted into streams, and then multiplexed to video data, so that a video identifier is not used.
In the grouping judgment, the grouping judgment may be performed in the following manner. The focus distance is extended or contracted in accordance with the depth of field which is a range in front and rear of the object where focusing seems to be attained. Then, a focus plane is calculated for each focus distance.
Therefore, videos which are taken by a single camera at different times can be grouped. When a photograph or video which is taken by a usual user is registered in the database, for example, it is automatically grouped according to the place where the object exists. Accordingly, the work burden in a case such as where videos are edited can be remarkably improved.
In the above, the example in which images are grouped by using focus planes has been described. When focus-plane metadata are added to images, the invention can be applied to a use other than grouping of images.
While the invention has been described in detail and referring to the specific embodiments, it is obvious to those skilled in the art that various changes and modifications may be applied without departing the spirit and scope of the invention.
The application is based on Japanese Patent Application (No. 2005-157179) filed May 30, 2005, and Japanese Patent Application (No. 2006-146909) filed May 26, 2006, and their disclosure is incorporated herein by reference.
According to the invention, when grouping of images is performed on the basis of positional relationships of focus planes by adding the positions of the focus planes as metadata, the processing load can be reduced as compared with the conventional technique in which grouping is performed by image analysis. Therefore, the invention has an effect that search and extraction of images obtained by capturing the same region are enabled to be performed at low load and in an easy manner, and is useful in a metadata adding apparatus which adds metadata to an image obtained by capturing by an imaging apparatus, a metadata adding method, and the like.
Number | Date | Country | Kind |
---|---|---|---|
2005-157179 | May 2005 | JP | national |
2006-146909 | May 2006 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2006/310782 | 5/30/2006 | WO | 00 | 11/29/2007 |