The present invention relates generally to the area of image processing, and especially to the identification of content with different format versions in a database.
It is a common image processing problem to identify rapidly a matching image in a database containing a potentially large number of content entities (programme; film; shot or the like), each comprising a potentially large number of images. Many different approaches have been tried. Some techniques have proved successful in the rapid identification of exact matches. In practical applications, however, there is often a need to identify a matching image where the candidate images have undergone processing to the extent that the match is no longer exact. For example, images may have been compressed or filtered and may have undergone luminance or colour processing. They may be in different formats or standards, possibly with different aspect ratios.
According to one aspect of the invention there is provided a method of image processing comprising receiving an image with a set of feature points characteristic of the image;
selecting each of the feature points in turn to be a selected feature point; identifying a number of neighbouring feature points associated with the selected feature point;
creating a first hash comprising information associated with a first pair of neighbouring feature points, comprising a first neighbouring feature point and a second neighbouring feature point, wherein the information associated with the first and second neighbouring feature points represents the relative location of the first and second neighbouring feature points compared to the selected feature point;
creating a second hash comprising information associated with, a second pair of neighbouring feature points, the third neighbouring feature point and the fourth neighbouring feature point, wherein the information associated with the third and fourth neighbouring feature points represents the relative location of the third and fourth neighbouring feature points compared to the selected feature point. According to another aspect of the invention there is provided a method of identifying content in an entity comprising:
receiving an entity having a pre-calculated table associated with the entity, wherein the entity comprises a plurality of entity images, wherein the pre-calculated table has a row for every hash value possible, and wherein each row is populated by entity image identifiers associated with entity images in which the hash occurs, and further wherein multiple hashes are associated with each of a series of feature points characteristic of each entity image;
comparing a series of hashes representing one or more desired images with the pre-calculated table, and further wherein multiple hashes are associated with each of a series of feature points characteristic of the one or more desired images;
generating a histogram representing the number of matches between the series of hashes and each entity image, wherein the histogram comprises a column for each entity image with at least one hash match;
scoring each entity image according to the number of matches;
identifying possible candidate entity images, at least in part, from the entity image scores.
According to a further aspect of the present invention there is provided a method of image processing. The method may comprise receiving an image with a set of feature points characteristic of the image, deriving multiple hash values characteristic of the image based on multiple combinations of data sets, wherein each data set is associated with one of the feature points from the set of feature points;
wherein each hash is formed from a plurality of data fields, each data field corresponding to a characteristic of a feature point, to facilitate matching of similar, but non-identical, images.
One method of characterising an image is to use feature points from the image. A flow-diagram of an exemplary process for determining a set of feature points for an image according to an embodiment of the invention is shown in
In step 106 the pixel values of each tile are evaluated to find: the maximum-value pixel; the minimum-value pixel; and, the average pixel value for the tile. These values are then analysed to determine a set of candidate feature points. This can be done in a variety of ways and what follows is one example.
In step 108 the maximum value from the first tile is tested to see if it is higher than the maxima in the respective adjacent tiles. The edge tiles are excluded from this step as they do not have tiles adjacent to all of their sides. If it is, the process moves to step 110, in which the location of the respective maximum in the tile under test is stored, together with its location, as a candidate feature point. A ‘prominence’ parameter, indicative of the visual significance of the candidate feature point is also stored. A suitable prominence parameter is the difference between the value of the maximum pixel and the average value of all the pixels in its tile.
In step 112 the pixel values of the tile are evaluated to find the respective minimum-value pixel for the tile, and if the minimum is lower than the minimum value for the adjacent tiles, the process moves to step 114 where the respective minimum value in the tile under test is stored, together with its location, as a candidate feature point. An associated prominence value, equal to the difference between the value of the minimum pixel and the average value of all the pixels in its tile is also stored.
Once all non-image-edge tiles have been tested, the candidate feature points recorded in steps 110 and 114 are sorted according to their prominence values; and candidates with low prominence are discarded to reduce the number of feature point to a required number—say 36 feature point for the image.
It is also helpful to sort the candidate feature points within defined regions within the image. For example the image can be divided in four quadrants and the candidates in each quadrant sorted separately. A minimum and a maximum number of feature points per quadrant can be set, subject to achieving the required total number of feature points for the image. For example, if the candidates for a particular quadrant all have very low prominence, the two highest prominence candidates can be selected and additional lower prominence candidates selected in one or more other quadrants so as to achieve the required total number. This process is illustrated at step 118. Once the required number of feature points have been identified, the process ends.
An image of data can thus be characterised by a set of feature point data where the data set comprises at least the position of each feature point within the image and whether the feature point is a maximum value pixel or a minimum value pixel. In television images the positions of the feature points can be expressed as Cartesian co-ordinates in the form of scan-line numbers, counting from the top of the image, and position along the line, expressed as a count of samples from the start of the line. If the image has fewer or more than two dimensions then the positions of the feature points will be defined with fewer or more co-ordinates. For example feature points characterising a single-channel audio stream would comprise a count of audio samples from the start of the image and a maximum/minimum identifier.
It is an advantage that each determination of an interest point depends only on the values of the pixels from a small part of the image (i.e. the tile being evaluated and its contiguous neighbours). This means that it is not essential to have all the pixels of the image simultaneously accessible in the feature point identification process, with consequent reduction in the need for data storage.
The identification of a feature point is not heavily dependent upon the luminescence or contrast of an image, and therefore the use of feature points to match similar versions of the same content may be advantageous. However, one issue is that different screen ratios may lead to some feature points being removed from some versions of the content. It is also possible that in different resolutions the relative position of a feature point will shift slightly, or the point may be completely removed as the image is better resolved. Therefore a method is needed of characterising an image using identified feature points, but that mitigates against the disadvantages that they also present.
One way to do this is to form hashes from the feature points. For example, for each of the identified images a number of neighbouring feature points to the selected feature point can be identified. These may be the closest feature points in the image to the selected feature point, or they may be feature points that are within a specified area, or the most prominent feature points may be used. Once identified, these neighbouring feature points can be grouped into pairs. Each of these pairs can be used to form an individual hash. The series of hashes from each point can be used to characterise the image.
An example of an image which has had feature points identified is shown in
Each hash includes information regarding the relative position of each of the neighbouring feature points to the selected feature point. These relative positions may be quantised into two or more regions. For example if eight regions are selected then the angles around the selected feature point may be split into eight equally sized portions. The neighbouring feature points will each be located in one of these regions and a hash that includes two neighbouring feature points will reflect which relative regions they are located in.
Other information about the neighbouring feature points and the selected feature point may be included in the hash. Making the hash more specific in this way will tend to reduce the number of false positives. Such additional information may include if the feature points are maxima or minima, or if the feature points are in parts of the image that have an orientation that is generally vertical or horizontal. The determination of the orientation may identify a dominant gradient. In one example the orientation is based on comparing the maxima with values of pixels a pre-set number of pixels away both horizontally and vertically. The orientation may be determined by calculating the absolute difference between these pixels and the maxima or minima vertically above and then adding the absolute difference between the maxima or minima vertically below. This is then repeated for the horizontal directions and then the orientation is determined to either be horizontal or vertical. Which half of the image the selected feature point resides in may also be of interest, along with the prominence that the feature points have against their local area in the image. This can be determined—for example—from the absolute difference between the maxima (or minima) and the average of the tile from which it comes.
Once these have been identified, according to whatever criteria, a first hash is created 608. To create the first hash a pair of neighbouring feature points are selected and the relative locations of these neighbouring feature points, relative to the selected feature point are measured, as shown in
The next step is creating a second hash 610. To do so another pair of neighbouring feature points are selected and the relative locations of these neighbouring feature points are measured, relative to the selected feature point. The second pair can include one of the neighbouring feature points from the first pair, or it can be formed from two different neighbouring feature points. The second hash is then created in the same way as the first.
One embodiment of the hash creation may include identifying three, or potentially more, neighbouring feature points. If three neighbouring feature points are identified then only three pairs of them can be created. If 9 points are selected in each quadrant of the image, as may be advantageous in the feature point identification, then a large number of hashes will be created in order to characterise every image. This may be a large enough amount of hashes to ensure that, unless the hashes lack any real detail, random matches will be few.
In an embodiment, eight possible relative locations can be used for both the first and the second neighbouring feature points. By using eight relative locations only three bits of information are required, but the information is still relatively specific. If too many possible relative locations are used, then different versions of the same content with different aspect ratios may have the neighbouring feature points falling into different relative locations. When selecting the number of possible locations it is important that the locations are not so generic that little information is provided, but not so specific that errors become likely.
In an example an image has 36 selected feature points, 9 for each quadrant of the image. For each of the selected feature points 3 neighbouring feature points may be identified. For one such selected feature point the neighbouring feature points identified may be points A, B and C. The relative positions between the points A, B and C and the selected feature point are then determined. These can be used to form hashes. The first hash may be comprised of the relative location of point A relative to the selected feature point, as well as the relative location of point B relative to the selected feature point. The second hash may be comprised of the relative location of point A relative to the selected feature point, as well as the relative location of point C relative to the selected feature point. A third hash could be created using the relative location of point B relative to the selected feature point, and the relative location of point C relative to the selected feature point. There may be additional information in the hashes, in this example the hashes each contain a maxima/minima flag indicating whether the selected feature point is maxima or minima, an orientation flag indicating whether the selected feature point is orientated more horizontally or vertically, a flag to show which half of the image the selected feature point is in, a prominence order code for the selected feature point and the neighbouring feature points, as well as maxima/minima flags for the neighbouring feature points, and an orientation flag for each of the neighbouring feature points. The prominence order code indicates which of the selected feature point and neighbouring feature points is most prominent. It may also indicate which of the selected feature point and neighbouring points is next most prominent and least prominent. Each of the possible scenarios:
The next step comprises comparing a series of hashes (associated with one or more desired images) with the pre-calculated table 704. These hashes correspond to one or more desired images. The desired images may be temporally adjacent to one another, or they may not be. For example, the desired images may form a sequence of images that periodically skips one or more temporally adjacent images. A sequence of desired images could be images 1, 2, 3, 4, 5 and 6, or alternatively only images 1, 3 and 5, could be used. The hashes corresponding to the desired images are associated with each of a series of feature points characteristic of the one or more desired images.
The next step comprises generating a histogram representing the number of matches for each entity image 706. The histogram comprises a column for each entity image with at least one match. Additionally it may have a column for every entity image that does not have a match. This can be generated by taking the rows corresponding to the hashes that appear in the one or more desired images and adding a match every time an entity image identifier for a specific image is present.
In an another embodiment, more than one desired image may be used so that one most desired image can be matched more accurately. A most desired image is located in a sequence, if hashes associated with other images from this sequence are also matched with the pre-calculated table. A single histogram can be used collecting all of the data if the entity image identifiers are altered by adding a temporal offset to the matched entity images. If this set of desired images comprises images 4, 5 and 6, and 6 where image 6 is the most desired image, then when hashes associated with image 4 are matched with the pre-calculated table the entity image identifiers of the matching entity images may then be altered so that entity images that would match with hashes associated with desired image 6 can be found. For example, if the entity image identifiers are sequential then if hashes associated with desired image 4 match with entity images 78, 82 and 96 these can be altered by adding 2 to each of them. This means that entity images 80, 84 and 98 are added as matches to the histogram. This is then repeated for hashes associated with desired image 5, if these match with entity images 35, 83 and 107 then these can be altered by adding one to them. This means that entity images 36, 84 and 108 are added to the histogram. Hashes associated with desired image 6 are then matched and this returns results of entity images 9, 84 and 167 and these are added to the histogram. It seems clear from the collection of data that entity image 84 is the most likely match with the most desired image, although this wasn't clear from matching hashes associated with the most desired image 6 alone. The temporal offset applied to matches with hashes associated with desired images that are not the most desired image may be the same as the temporal difference between the matched desired image and the most desired image.
Each entity image is then scored according to the number of matches. This score may be the number of matches. Alternatively it may be a score out of a pre-set number so that, regardless of the number of hashes being compared, there is always a comparable result. This would normalise the score regardless of the number of hashes associated with the one or more desired images. It is possible that some hashes can be weighted more heavily than others. This may be because they are associated with a specific selected feature point. Such a point may be particularly central, or prominent and therefore be a more reliable indicator of a match.
Entity images may then be identified as possible candidate entity images. This may be at least in part due to the entity image score that was calculated. There may be a score threshold, wherein if a score for an entity image is above the score threshold then it is considered a candidate entity image. Alternatively other characteristics of an entity image may also be considered when identifying candidates.
Further steps may include designating the highest scoring entity image as being a matching image to the one or more desired images.
Alternatively the pre-set score threshold may be used to identify the candidates and then further steps may be taken to reduce the list. One problem with using a threshold is that in a video a lot of images within a temporal window are relatively similar. Therefore they will have a similar amount of hash matches. This means that the histogram will probably have broad large bumps, rather than sharp peaks, as a significant amount of images that are temporally close are all above a pre-set threshold. Therefore a further step may include ranking the entity images in order of the highest score measured and then deleting all of the images that are within a pre-selected temporal range from the list so that they are no longer considered candidate entity images. Then repeating this step by doing the same with the next highest entity image that is on the list until there are either no entity images left on the list or the bottom entity image on the list is reached. The entity images that were not deleted are then still considered candidate images. A further more extensive comparison test can then be used to find which entity image is a match. This may be performed by using a full fingerprint analysis technique.
When calculating an entity image score for each entity image each match may be weighted differently dependent upon a number of factors. For example, the desired image hash may be associated with a selected feature point. If the selected feature point is more prominent, or closer to the centre then the weighting of the match may be different. Additionally if one match is very common among the entity images this match may be weighted differently.
The pre-calculated tables are formed from recording the hashes that occur in each image in an entity. The hashes that occur in each entity are calculated in the same way as described above for a normal image.
Therefore, forming a pre-calculated table comprises creating a plurality of hashes for each of a set of images in an entity. Then creating the table comprises forming a table with a row for every hash value possible, recording in which entity images each hash occurs and populating each row of the table with entity image identifiers associated with entity images in which the hash occurs.
Number | Date | Country | Kind |
---|---|---|---|
1610664.3 | Jun 2016 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2017/051772 | 6/16/2017 | WO | 00 |