The disclosure relates generally to computer systems and applications and, more particularly, to methods and systems for discovering styles via color and pattern co-occurrence.
For a perfect fashionable look each part of an ensemble should blend well with the other parts. Not all combinations of “shirt-tie-suit” or “top-skirt” or “dress-bag” will result in a fashionable look. Further, stylish combinations change over time as new trends develop. Although some dress combos may be easier to come up with, in general it can be daunting to mix and match elegant colors and patterns and come up with perfectly fashionable combinations.
Methods and systems for discovering styles via color and pattern co-occurrence are disclosed. According to one embodiment, a computer-implemented method for building a style graph comprises collecting a set of fashion images, selecting at least one subset within the set of fashion images, the subset comprising at least one image containing a fashion item, and computing a set of segments by segmenting the at least one image into at least one dress segment selected from a group consisting of a dress, a bag, a shoe, a piece of jewelry, a shirt, a suit, a tie, a top, a skirt, and a fashion accessory. Color and pattern representations of the set of segments are computed by using a color analysis method and a pattern analysis method respectively and a graph is created wherein each graph node corresponds to one of a color representation or a pattern representation computed for the set of segments, and weights of edges between nodes of the graph indicate a degree of how the corresponding colors or patterns complement each other in a fashion sense.
The systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims. It is also intended that the invention is not limited to require the details of the example embodiments.
The accompanying drawings, which are included as part of the present specification, illustrate the presently preferred embodiment and, together with the general description given above and the detailed description of the preferred embodiment given below, serve to explain and teach the principles of the present invention.
It should be noted that the figures are not necessarily drawn to scale and that elements of similar structures or functions are generally represented by like reference numerals for illustrative purposes throughout the figures. It also should be noted that the figures are only intended to facilitate the description of the various embodiments described herein. The figures do not necessarily describe every aspect of the teachings disclosed herein and do not limit the scope of the claims.
The present system and method discover dress styles by analyzing the color, pattern and co-occurrence statistics in fashion images and videos available on the web and/or from dedicated sources such as fashion shows, purchase or viewership logs of fashion vendors. Styles are determined by asking the question “what goes with what” wherein “what” represents features such as colors and patterns. A style graph is built to capture what color patterns of first fashion items pair with what color patterns of other secondary fashion items. Once the style graph is built, it can be harnessed for a myriad of applications—from completing the look (e.g. given a shirt, selecting an appropriate tie) to discovering fashion trends hitting the blog-sphere.
Dress styles are more or less governed by colors, patterns and their combinations (what color/pattern goes with what color/pattern) and in the Internet era it is not hard to find representative examples of various styles (e.g. from dress/fashion images and videos available on the web, fashion shows, purchase logs of fashion vendors).
For a computing device, interpreting the color mix in a given dress image or video frame is not as easy as it may seem at first glance. Traditional solid color such as red, green, yellow, and blue are not the only ones to be recognized and interpreted. Colors have a myriad of variations (e.g. light blue, sky blue, deep sea blue, navy blue). Further, there are a myriad of complex color mixtures possible by combining these colors in various proportions—sometimes visible as patterns and sometimes not visible as patterns. Image pixels are represented in a given format such as RGB, or HSV. The present disclosure utilizes HSV as a preferred color space in describing a preferred embodiment, however other formats are applicable within the scope of the present system. Each pixel is represented as a vector in a 3-dimensional Euclidean space and each of the 3-coordinates values are in the interval [0,1] representing Hue, Saturation, and Value respectively. In these representations, there are infinite numbers of color possibilities as the pixel colors may take any value from 0 to 1 along any of the H, S, or V dimensions.
A given HSV vector can be mapped to one of the K colors by computing the distance (e.g. Euclidean distance) between the given 3-D vector to all the K vectors and choosing the closest one. The dress patch color representation module 204 uses this mapping logic to compute a color representation for any given image. Given an image or a patch in the image or a set of pixels in the image, each pixel is represented in HSV and is mapped to one of the K colors. A histogram of the K colors is computed by collecting and counting all the pixels that are mapped to their respective colors. The resulting color histogram is a K-dimensional vector where an ith coordinate is the fraction of pixels mapped to the ith color. The color histogram serves as a color description of the given image.
Given two images, the similarity of the two by color is computed by using the similarity of the corresponding color histograms according to a distance metric that is computed by the color similarity algorithm 206. In one embodiment, a naive distance metric that could be used is the Euclidean distance (i.e. L2 metric) between the histogram vectors (or their L2 normalized versions). However, the distance metric does not provide a robust perceptive measure of similarity.
According to one embodiment, an inventive distance metric as part of the present disclosure is defined as follows. First, the pair wise Euclidean distance between each pair of the K colors is computed (recall that each of these K colors is actually a 3-D HSV vector). For pair (i,j) the pair wise Euclidean distance is denoted by di,j. For each of the K colors, all colors can be ranked according to this distance, with the minimum distance (i.e. most similar) first. The top L of the ranked K colors are selected for each color i=1, 2, . . . , K and are called the top neighbors of the color i. A bipartite graph with K left nodes and K right nodes is created. The ith left node is associated with a weight equal to the value of the ith coordinate in the first color histogram. Similarly, the jth right node is assigned a weight equal to the value of the jth coordinate in the second color histogram. Each left node i is connected to the L nodes corresponding its top neighbors and each such edge (i,j) is assigned a weight value equivalent to di,j.
Each color is similar to each of its top neighbors up to a penalty and the present system distributes the weight of the color in first histogram (i.e. a left node) to its top neighbors in the second histogram (i.e. corresponding L right nodes) and each such mapping along edge (i,j) is assigned a penalty value equivalent to di,j. If some part of the color is left unmatched to its top neighbors, it is given a penalty of dmax (which is a value greater than all pair wise distances di,j, for example dmax=2 in one preferred embodiment). Further, any node i on left or right can not be matched to its neighbors in a way so that sum of the matched portions from all the neighbors exceed the weight of the node i. The problem is to distribute the left node weights to their top neighbors so as to minimize the sum of all these penalties. The problem can be formulated as a linear program whose solution gives the minimum penalty matching. Similarly, minimum penalty matching is also computed by interchanging the roles of the first and the second color histograms. The average of these two values defines the distance between the two color histograms and therefore between the corresponding two images. Linear programming problem has polynomial time algorithms, the best being O(n2.38), where n is the input size. This may still not always be feasible if one has to compute the distances for a huge collection of images. So, an algorithm to find the approximate minimum penalty matching may alternatively be used, first to compute a smaller number of most similar images and then to refine them by the original LP based algorithm.
The space of color histograms (i.e. color-mixes) is quantized by the color histogram space quantization module 208 into M bins using similar techniques as in the color space quantization module 202 but using the distance metric as computed by the color similarity algorithm 206. These M bins comprise the color histogram basis 212. Given an image (or a part of the image) its color histogram is mapped to the closest of these M bins (the corresponding K-dimensional vectors) according to the color similarity algorithm 206. Each of these M bins is referred to as a gColor (for global color/overall color of the image patch). A pair of two gColors is called a gColor-bigram or a color bigram.
Color histograms for each image are computed as explained previously in color analysis module 110. It is noted that the color histogram can also be understood as a probability distribution. The entropy for each of the color histograms (i.e. the probability distributions) is then computed. For each image, the entropy of its color histogram is referred to herein as its color entropy. The images whose color entropies are not within a threshold of the color entropy of the input image are either pruned or thrown lower in the pattern similarity ranking.
As in the case of the space of color-mixes, the space of patterns includes infinite possibilities. The space of patters is quantized into P bins using techniques similar to that in the case of color-mixes. Given an image (or a part of the image) its HOG vector and color entropy are computed and mapped to the closest of these P bins (the corresponding R-dimensional HOG vectors) with appropriate penalty if color entropies are not consistent. Each of these P bins is referred to as a pattern. A pair of two patterns is called a pattern-bigram. It is noted that oriented gradients are features discovered by pattern features discovery module 402. The R-dimensional Euclidean vector is the corresponding descriptor of dress patch pattern descriptor module 404. The L2 distance modulated by the color entropy is the pattern descriptor similarity metric 406, and pattern descriptor space quantization module 408 quantizes those descriptors to P bins to obtain the pattern basis 410. This particular pattern feature serves well to identify various types of horizontal, vertical, and oblique stripes, wrinkles and folds.
According to one embodiment, the pattern feature discovery module 402 can learn from a large collection of fashion images in an unsupervised manner about what characteristic of dress color patches actually describe a pattern. In one embodiment, given an image patch it is segmented into various parts that are uniform in color (for example using color similarity algorithm 206).
Clustering based on this feature and the corresponding descriptor gives rise to various pattern rich clusters like polka dots (indicated by circles), floral (indicated by flower shapes), and stripes (indicated by thick lines). Since this clustering is meaningful in terms of dress patterns, shapes of uniform color segments can indeed be used as a pattern feature and polar distribution of points of the a shape can indeed be used as a pattern descriptor.
Thus, modules of
The semi-automatic technique consists of a web, desktop or mobile interface 610 where an image is provided to a human user and he/she segments the various parts and tags them (e.g. tie, shirt, suit, top, skirt, bag) with a predefined set of dress part types. The automatic technique utilizes the color and pattern understanding as described previously in this disclosure in modules 110 and 112 to compute the contiguous parts 604 in the image as per color or pattern similarity and to segment the image into large contiguous parts. The segments are then combined in step 612 and tested using dress part classifiers 608 if they can pass as a dress part or not. Further, a human detection as well as a face detection algorithm may also be used to facilitate the dress segmentation process. Exemplary outputs of exemplary dress segmentation modules are shown in
A variety of recommendation systems can be built on top of the style graph and therefore can harness the co-occurrence statistics of various colors and patterns. Given a database of dress images, popular styles can be discovered by choosing the dress part combinations that have high strength in the style graph corresponding to the database. This can be further used to discover a particular celebrity look by considering the dress images only for the given celebrity. Also, by using the dress images from recent fashion shows and building the corresponding style graph, the trending styles can be essentially discovered by selecting the dress part combinations that have high strength in the style graph.
A data storage device 2227 such as a magnetic disk or optical disc and its corresponding drive may also be coupled to computer system 2200 for storing information and instructions. Architecture 2200 can also be coupled to a second I/O bus 2250 via an I/O interface 2230. A plurality of I/O devices may be coupled to I/O bus 2250, including a display device 2243, an input device (e.g., an alphanumeric input device 2242 and/or a cursor control device 2241).
The communication device 2240 allows for access to other computers (servers or clients) via a network. The communication device 2240 may comprise one or more modems, network interface cards, wireless network interfaces or other well known interface devices, such as those used for coupling to Ethernet, token ring, or other types of networks.
In the description above, for purposes of explanation only, specific nomenclature is set forth to provide a thorough understanding of the present disclosure. However, it will be apparent to one skilled in the art that these specific details are not required to practice the teachings of the present disclosure.
Some portions of the detailed descriptions herein are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the below discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk, including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems, computer servers, or personal computers may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.
Moreover, the various features of the representative examples and the dependent claims may be combined in ways that are not specifically and explicitly enumerated in order to provide additional useful embodiments of the present teachings. It is also expressly noted that all value ranges or indications of groups of entities disclose every possible intermediate value or intermediate entity for the purpose of original disclosure, as well as for the purpose of restricting the claimed subject matter.
It is also expressly noted that the dimensions and the shapes of the components shown in the figures are designed to help to understand how the present teachings are practiced, but not intended to limit the dimensions and the shapes shown in the examples.
Systems and methods for discovering styles via color and pattern co-occurrence have been disclosed. It is understood that the embodiments described herein are for the purpose of elucidation and should not be considered limiting the subject matter of the disclosure. Various modifications, uses, substitutions, combinations, improvements, methods of productions without departing from the scope or spirit of the present invention would be evident to a person skilled in the art.
The present application claims benefit of and priority to U.S. Provisional Application Ser. No. 61/459,063 titled “METHODS AND SYSTEMS FOR DISCOVERING STYLES VIA COLOR AND PATTERN CO-OCCURENCE” filed Dec. 6, 2010 which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61459063 | Dec 2010 | US |