The present disclosure relates to classifying materials using texture, and more particularly relates to classifying materials using captured texture images.
In the field of material classification, it is known to classify a material based on texture. For example, one approach trains a set of texture models of known material types using a set of predetermined texture images for the known material types. New texture images can then be classified by finding the closest matching texture model.
With respect to material classifiers based on texture, one problem that continues to confront the art is the issue of magnification. More precisely, if the classifier is expected to recognize materials at different magnifications, then the predetermined training texture images must include images captured at these magnifications. However, it is impractical (and sometimes impossible) to predetermine a set of training images that would work for all possible magnifications.
The foregoing situation is addressed by dynamically generating a collection of images of known materials in accordance with a magnification factor of an image of an unknown material, and matching the image of the unknown material against the dynamically generated collection of images.
Thus, in an example embodiment described herein, an unknown material is classified using texture. A database of predetermined images is accessed, each of a known material and each captured at a magnification factor that is relatively high. A query image of the unknown material is received. The query image is captured at a second magnification factor that is relatively lower than the magnification factors of the predetermined images. A collection of images of the known materials is dynamically generated at the second magnification factor. The received query image is matched against the dynamically generated collection of images, and the unknown material is classified in correspondence to a match between the received image and the dynamically generated collection of images.
By dynamically generating a collection of images of known materials in accordance with a magnification factor of an image of an unknown material, it is ordinarily possible to classify the material regardless of the magnification of the image, while reducing the need to predetermine training images at every possible magnification.
In further aspects of some representative embodiments, a statistical texture feature for the query image of the unknown material is derived, and statistical texture features corresponding to each of the images in the dynamically generated collection of images are dynamically derived. Matching of the received query image against the dynamically generated collection of images includes matching of the statistical texture feature for the received query image against the statistical texture features for the dynamically generated collection of images. In one aspect, the statistical texture features each include a histogram of textons.
In other aspects, the process of dynamically generating a collection of images includes, for each predetermined image in the database of predetermined images, determining a magnification ratio between the magnification factor for the predetermined image and the second magnification factor for the received query image, synthesizing from the predetermined image a novel image of a size at least the size of the received image multiplied by the magnification ratio, and reducing the size of the synthesized novel image by a factor equal to the magnification ratio.
In yet other aspects, reducing the size of the synthesized novel image includes downsampling or resampling. In another aspect, synthesizing a novel image from the predetermined image includes replication of a part of the predetermined image and quilting of the replicated parts of the images.
In still other aspects, a statistical texture feature is derived for the query image of the unknown material, and statistical texture features are dynamically derived for each of the reduced size synthesized novel images. Matching of the received query image against the dynamically generated collection of images includes matching of the statistical texture feature for the received query image against the statistical texture features for the reduced size synthesized novel images.
In still further aspects of representative embodiments, each predetermined image in the database of predetermined images shares a common first magnification factor that is relatively high as compared to the second magnification factor. In another aspect, the magnification factor of each predetermined image in the database of predetermined images is relatively high as compared against an expected range of magnification factors for query images of unknown materials.
In one aspect, the dynamically generated collection of images includes images corresponding to all or substantially all of the predetermined images in the database of predetermined images. In another aspect, the dynamically generated collection of images includes images corresponding to substantially fewer than all of the predetermined images in the database of predetermined images, and a subset of predetermined images is selected for inclusion in the dynamically generated collection of images. The selection is based at least in part on a screening criterion.
In an additional aspect, receiving the query image of the unknown material captured includes receiving a capture setting used by an image capture device and further includes using the received capture setting to derive the second magnification factor. In one aspect, a magnification factor is an absolute magnification factor based on a ratio of physical sizes of an object and its captured image. In another aspect, a magnification factor is a relative magnification factor that is relative to a reference absolute magnification factor.
In still another aspect, the dynamic generation of the collection of images includes reducing the magnification of each corresponding one of the predetermined images to the second magnification factor.
This brief summary has been provided so that the nature of this disclosure may be understood quickly. A more complete understanding can be obtained by reference to the following detailed description and to the attached drawings.
As shown in
While
In
Computer 102 generally comprises a programmable general purpose computer having an operating system, such as Microsoft® Windows® or Apple® Mac OS® or LINUX, and which is programmed as described below so as to perform particular functions and, in effect, become a special purpose computer when performing these functions.
While
Computer 102 also includes computer-readable memory media, such as non-volatile memory 56 (shown in
Conveyor belt 103 facilitates movements of objects 104, 105, and 106 for imaging by image capture device 101, and, if necessary, onward for further processing based on the identification or classification of objects 104, 105 and 106.
RAM 115 interfaces with computer bus 114 so as to provide information stored in RAM 115 to CPU 110 during execution of the instructions in software programs, such as an operating system, application programs, image processing modules, and device drivers. More specifically, CPU 110 first loads computer-executable process steps from non-volatile memory 56, or another storage device into a region of RAM 115. CPU 110 can then execute the stored process steps from RAM 115 in order to execute the loaded computer-executable process steps. Data, such as a database of predetermined images or other information, can be stored in RAM 115 so that the data can be accessed by CPU 110 during the execution of the computer-executable software programs, to the extent that such software programs have a need to access and/or modify the data.
As also shown in
Non-volatile memory also stores image processing module 300. Image processing module 300 comprises computer-executable process steps for classifying an unknown material using texture. As shown in
The computer-executable process steps for image processing module 300 may be configured as part of operating system 118, as part of an output device driver in output device drivers 121, or as a stand-alone application program. Image processing module 300 may also be configured as a plug-in or dynamic link library (DLL) to the operating system, device driver or application program. It can be appreciated that the present disclosure is not limited to these embodiments and that the disclosed modules may be used in other environments.
In particular,
As shown in
Briefly, in
Thus, at run-time, there is a dynamic generation of a collection of images of known materials, based on the magnification factor encountered in a query image of an unknown material. Features of the query image are matched against features of the images in the collection in order to classify the unknown material.
In more detail, in step 401, a query image of an unknown or unclassified material is received. The query image of the unknown material is captured (451) at a second magnification factor m, e.g., m=1:11, that is relatively lower than the magnification factor m0, e.g., m0=1:6, of the predetermined images.
For example, a query image can be received from image capture device 101 in
In step 402, the database of predetermined images (251) is accessed, for example from non-volatile memory 56. The predetermined images are each of a known material, and are each captured at a magnification factor that is relatively high. The predetermined images can be texture images captured for different illuminations and camera directions, but may have a magnification factor fixed at some predetermined known value m0, e.g., m0=1:6. The predetermined images are generally not used directly in training a material classifier, but rather are used to generate training images at a new magnification factor, as discussed below. In some embodiments, the predetermined images may have a fixed size, e.g., 200×200 pixels.
In one example embodiment, the magnification factor of each predetermined image in the database of predetermined images is relatively high as compared against an expected range of magnification factors for query images of unknown materials, e.g., an expected range of 1:21 to 1:11. This range can be estimated, for example, based on knowledge of expected sizes of objects 104, 105 and 106 and the capability of the image capture device 101. In another example embodiment, each predetermined image in the database of predetermined images shares a common first magnification factor, e.g., 1:6, that is relatively high as compared to a second magnification factor, e.g., 1:11, of a captured query image.
The magnification factor of each image in the database of predetermined images can be the same for all images in the database, and the choice of this common magnification factor can be made, for example, during the creation of the database. In one example embodiment, each image in the database of predetermined images is captured at the highest magnification factor possible, based on the capability of the image capture device used and the physical sizes of the known material samples. Thus, the process of capturing images for the database can be controlled so that the predetermined images in the database share a common first magnification factor. As mentioned above, this common first magnification factor is denoted m0.
In step 403, there is dynamic generation of a collection of images of the known materials at the second magnification factor (at which the query image is captured). Thus, new texture images of a larger size are dynamically synthesized (452) and then resampled (453).
In that regard, the current capture setting of an image capture device capturing the query image typically provides information about the second magnification factor m, e.g., m=1:11. Capture setting information, such as focusing distance, focal length or the like, can be directly communicated from image capture device 101 to receiving module 302, for example. As another example, when a query image is received at receiving module 302 from image capture device 101, metadata stored in the image may contain the capture setting. For example, Exif (Exchangeable Image File format) tags may contain information such as focusing distance, focal length or the like. A magnification factor may be either an absolute magnification factor or a relative magnification factor. Knowledge of the focusing distance and focal length can be used to derive an absolute magnification factor which is based on a ratio of physical sizes of an object and its captured image. In another example, if information about the focusing distance or focal length is not available, less precise capture setting information such as a zoom factor of a zoom lens, e.g., 3×, 5×, etc., can still be used to derive a relative magnification factor which is relative to a reference absolute magnification factor. Thus, in one example, receiving the query image of the unknown material captured includes receiving a capture setting used by an image capture device, and further includes using the received capture setting to derive the second magnification factor.
Meanwhile, the same information regarding absolute and/or relative magnification factor can also be captured during creation of the database of predetermined images, so as to obtain the first magnification factor m0 for the images in the database.
According to the disclosure, the predetermined texture images accessed in step 402 are used to dynamically generate new texture images at a new magnification factor, as shown in
Thus, knowing the second magnification factor of the query image m and the magnification factor of the predetermined database image m0, a magnification ratio between the magnification factor for the predetermined image and the second magnification factor can be determined as m0/m. In addition, using the knowledge of the size of the received image (H0×W0), a target size for the new texture image 702 (H×W) which is at least the size of the received image multiplied by the magnification ratio can be determined. The new texture image ordinarily will be a larger size than the query image. In particular, as discussed above, m0 is generally chosen to be a higher magnification than the magnification m of the query image so that the magnification ratio m0/m is larger than one. Therefore, the size H×W of the new synthesized image will ordinarily be larger than the size of the query image. In one example simply for purposes of illustration, H0=W0=200, m0=1:6, and m=1:11, so that
If we choose H=W=367 pixels, then the above conditions on H and Ware satisfied. Another valid choice is H=W=417 pixels, for example.
For the synthesis, any texture synthesis algorithm may be used. For example, an algorithm based on image quilting may be used. For example, synthesizing a novel image from the predetermined image comprises replication of a part of the predetermined image and quilting of the replicated parts of the image. As can be seen in the example in
The relatively large synthesized texture image may optionally be cropped to a cropped image 703 at the target size, e.g., 367×367. For example, the synthesized image may not be at the target size because the texture synthesis algorithm used may produce a synthesized texture image of a different size, e.g., 417×417, that is larger than the target size, e.g., 367×367. In this case, an optional step of cropping may be applied to obtain a synthesized texture image of size 367×367. Note that with cropping or not, the resulting synthesized texture image is at the same magnification factor as the predetermined database image.
The synthesized new texture images may then be reduced in size by a factor equal to the magnification ratio m0/m. This may comprise downsampling or resampling to produce downsampled synthesized texture image 704. As an example shown in
Thus, according to this example embodiment, the process of dynamically generating a collection of images includes, for each predetermined image in the database of predetermined images, determining a magnification ratio between the magnification factor for the predetermined image and the second magnification factor for the received query image, synthesizing from the predetermined image a novel image of a size at least the size of the received image multiplied by the magnification ratio, and reducing the size of the synthesized novel image by a factor equal to the magnification ratio. In one aspect, reducing the size of the synthesized novel image includes downsampling, whereas in another aspect, reducing the size of the synthesized novel image includes resampling.
The downsampled synthesized texture images 704, i.e., the dynamically generated collection of images at the second magnification factor, are then used as training images to calculate texture models or texton histograms to compare to corresponding texture models or texton histograms of the query image, as discussed below.
Thus, returning to
In one example, the dynamically generated collection of images includes images corresponding to all or substantially all of the predetermined images in the database of predetermined images. For example, the number of predetermined database images might be small enough so that dynamically generating a new image for each predetermined image is not overly burdensome on CPU 110 or memory 115.
In an example embodiment comprising the machine vision application to recycling shown in
It is also possible to narrow the dynamically generated collection of images to a subset, so as to reduce processing requirements. Accordingly, in another aspect, the dynamically generated collection of images includes images corresponding to substantially fewer than all of the predetermined images in the database of predetermined images, and a subset of predetermined images is selected for inclusion in the dynamically generated collection of images. The selection is based at least in part on a screening criterion. For example, the database of predetermined images might cover a wide range of materials, and the screening criterion is based on a specific application, such as recycling. In other words, only materials that are valuable for recycling are selected for inclusion in the dynamically generated collection of images.
Returning again to
In more detail, the statistical texture features are texton histograms that are calculated (455) based on a predetermined texton dictionary (454). A texton can be thought of as an elementary building block of texture, although a precise definition is tied to a particular implementation, e.g., using filter banks, as explained below. A texton dictionary is a collection of specially chosen filter bank response vectors that collectively are sufficient for building a filter bank response vector of any arbitrary texture. A texton dictionary is typically predetermined and trained from a relatively large set of texture images, which may or may not include the database of predetermined images 251.
A general process of generating a texton dictionary will now be described with respect to
A sufficiently large set of texture images 501 of different materials is chosen. Referring to
Returning to
A K-means clustering 503 is applied to determine the cluster centers, which are taken as the textons. The textons constitute the texton dictionary 454. In an example embodiment, the number K of cluster centers, or textons, is chosen to be K=600. In other words, there are 600 textons in the texton dictionary. More generally, this number can be in the range from tens to hundreds or more, depending on the requirement of the texton dictionary to represent arbitrary texture.
In that regard, it is possible to perform K-means clustering on several sets of texture images, where each process produces a set of textons. The final texton dictionary 454 is then taken to be a collection of all the textons determined.
Returning to
Accordingly, and referring now to
Returning to
where h=(h1, h2, . . . , hN), k=(k1, k2, . . . , kN) are the histograms to be compared, and N is the number of bins of the histograms. Of course, other methods of matching histograms are possible.
In some aspects, a preprocessing step may be performed to the texture images. For example, since most images are captured in the RGB space, color information can be discarded by converting all images to grayscale. Typically, small cropped images, such as size of 200×200 pixels, can be used. Also, to reduce the effect of global (uniform) illumination, the texture images can be normalized to zero mean and unit variance. The filters in the filter bank can also be L1-normalized so that different axes of the response space have the same scale. Finally, the response vectors F can be normalized by:
a scaling motivated by Weber's law.
Thus, in this example embodiment, a statistical texture feature for the query image of the unknown material is derived, and statistical texture features corresponding to each of the images in the dynamically generated collection of images are dynamically derived. Matching of the received query image against the dynamically generated collection of images includes matching of the statistical texture features for the received query image against the statistical texture features for the dynamically generated collection of images. In this example, the statistical texture features each include a histogram of textons.
In an example where the synthesized novel images are reduced in size, a statistical texture feature is derived for the query image of the unknown material, and statistical texture features are dynamically derived for each of the reduced size synthesized novel images. Matching of the received query image against the dynamically generated collection of images includes matching of the statistical texture features for the received query image against the statistical texture features for the reduced size synthesized novel images.
In step 405, the unknown material is identified or classified in correspondence to a match between the received image and the dynamically generated collection of images.
The following table depicts a confusion matrix as a result of an exemplary run by applying a material classifier dynamically scaled from magnification factor 1:6 to magnification factor 1:11 to actual samples captured at magnification 1:11. Each row corresponds to one of eight materials that are being targeted for classification. The columns correspond to result of the classification. It can be seen that in general, the recognition rate is very high.
By dynamically generating a collection of images of known materials in accordance with a magnification factor of an image of an unknown material, it is ordinarily possible to classify the material regardless of the magnification factor of the image, while reducing the need to predetermine training images at every possible magnification factor.
<Other Embodiments>
According to other embodiments contemplated by the present disclosure, example embodiments may include a computer processor such as a single core or multi-core central processing unit (CPU) or micro-processing unit (MPU), which is constructed to realize the functionality described above. The computer processor might be incorporated in a stand-alone apparatus or in a multi-component apparatus, or might comprise multiple computer processors which are constructed to work together to realize such functionality. The computer processor or processors execute a computer-executable program (sometimes referred to as computer-executable instructions or computer-executable code) to perform some or all of the above-described functions. The computer-executable program may be pre-stored in the computer processor(s), or the computer processor(s) may be functionally connected for access to a non-transitory computer-readable storage medium on which the computer-executable program or program steps are stored. For these purposes, access to the non-transitory computer-readable storage medium may be a local access such as by access via a local memory bus structure, or may be a remote access such as by access via a wired or wireless network or Internet. The computer processor(s) may thereafter be operated to execute the computer-executable program or program steps to perform functions of the above-described embodiments.
According to still further embodiments contemplated by the present disclosure, example embodiments may include methods in which the functionality described above is performed by a computer processor such as a single core or multi-core central processing unit (CPU) or micro-processing unit (MPU). As explained above, the computer processor might be incorporated in a stand-alone apparatus or in a multi-component apparatus, or might comprise multiple computer processors which work together to perform such functionality. The computer processor or processors execute a computer-executable program (sometimes referred to as computer-executable instructions or computer-executable code) to perform some or all of the above-described functions. The computer-executable program may be pre-stored in the computer processor(s), or the computer processor(s) may be functionally connected for access to a non-transitory computer-readable storage medium on which the computer-executable program or program steps are stored. Access to the non-transitory computer-readable storage medium may form part of the method of the embodiment. For these purposes, access to the non-transitory computer-readable storage medium may be a local access such as by access via a local memory bus structure, or may be a remote access such as by access via a wired or wireless network or Internet. The computer processor(s) is/are thereafter operated to execute the computer-executable program or program steps to perform functions of the above-described embodiments.
The non-transitory computer-readable storage medium on which a computer-executable program or program steps are stored may be any of a wide variety of tangible storage devices which are constructed to retrievably store data, including, for example, any of a flexible disk (floppy disk), a hard disk, an optical disk, a magneto-optical disk, a compact disc (CD), a digital versatile disc (DVD), micro-drive, a read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), dynamic random access memory (DRAM), video RAM (VRAM), a magnetic tape or card, optical card, nanosystem, molecular memory integrated circuit, redundant array of independent disks (RAID), a nonvolatile memory card, a flash memory device, a storage of distributed computing systems and the like. The storage medium may be a function expansion unit removably inserted in and/or remotely accessed by the apparatus or system for use with the computer processor(s).
This disclosure has provided a detailed description with respect to particular representative embodiments. It is understood that the scope of the appended claims is not limited to the above-described embodiments and that various changes and modifications may be made without departing from the scope of the claims.
Number | Name | Date | Kind |
---|---|---|---|
4827433 | Kamon | May 1989 | A |
5085517 | Chadwick et al. | Feb 1992 | A |
5333052 | Finarov | Jul 1994 | A |
6313423 | Sommer et al. | Nov 2001 | B1 |
6381365 | Murakawa | Apr 2002 | B2 |
6803919 | Kim et al. | Oct 2004 | B1 |
6912527 | Shimano et al. | Jun 2005 | B2 |
7043094 | Thomas et al. | May 2006 | B2 |
7095893 | Reiners | Aug 2006 | B2 |
7188099 | Kim et al. | Mar 2007 | B2 |
7379627 | Li et al. | May 2008 | B2 |
7449655 | Cowling et al. | Nov 2008 | B2 |
7499806 | Kermani et al. | Mar 2009 | B2 |
7564994 | Steinberg et al. | Jul 2009 | B1 |
7672517 | Buscema | Mar 2010 | B2 |
7680330 | Leung | Mar 2010 | B2 |
7827220 | Saito | Nov 2010 | B2 |
7840059 | Winn et al. | Nov 2010 | B2 |
8050503 | Dekel et al. | Nov 2011 | B2 |
8760638 | Imai et al. | Jun 2014 | B2 |
8861844 | Chittar et al. | Oct 2014 | B2 |
9082071 | Skaff | Jul 2015 | B2 |
20040184660 | Treado et al. | Sep 2004 | A1 |
20090092281 | Treado et al. | Apr 2009 | A1 |
20110243450 | Liu | Oct 2011 | A1 |
20110314031 | Chittar et al. | Dec 2011 | A1 |
20140133734 | Sze | May 2014 | A1 |
20140201126 | Zadeh et al. | Jul 2014 | A1 |
20150070703 | Tin et al. | Mar 2015 | A1 |
Entry |
---|
Varma, et al., “A Statistical Approach to Material Classification Using Image Patch Exemplars”, IEEE Transactions of Pattern Analysis and Machine Intelligence, vol. 31, No. 11, Nov. 2009. |
Blunsden, et al., “Investigating the effects of scale in MRF texture classification”, 2009. |
Kadir, et al., “Saliency, Scale and Image Description”, International Journal of Computer Vision 45(2), 83-105, 2001. |
Paget, et al., “Texture Synthesis and Unsupervised Recognition with a Nonparametric Multiscale Markov Random Field Model”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997. |
Caputo, et al., “Classifying materials in the real world”, Image and Vision Computing 28, 150-163, 2010. |
Varma, et al., “A Statistical Approach to Texture Classification from Single Images”, International Journal of Computer Vision, 2005. |
Efros, et al., “Image Quilting fr Texture Synthesis and Transfer”, Proc. SIGGRAPH, 2001. |
Number | Date | Country | |
---|---|---|---|
20140341436 A1 | Nov 2014 | US |