Shape-based retrieval of three-dimensional data (i.e., 3D shape searching) has become of great interest in a variety of research fields including computer vision, mechanical engineering, artifact searching, molecular biology, chemistry, and other fields. 3D shape searching techniques retrieve virtual objects from a database of 3D objects based on the integral similarity of the virtual objects.
Techniques for 3D shape searching include techniques based on global attributes, manufacturing attribute recognition, graphs, histograms, product information, and 3D object-recognition. Many of these techniques convert objects into attribute vectors or relational data structures, such as graphs or trees, in order to determine object similarity.
Histogram-based 3D shape searching techniques sample data points on a surface of a 3D object and extract characteristics from the sampled points. The extracted characteristics are organized in a histogram, or distribution, based on frequency of occurrence. A histogram is a graphical display of frequencies of occurrence. Histogram-based 3D shape searching techniques compare multiple objects by applying a distribution test function to the histograms that represent the objects.
Histogram-based 3D shape searching techniques include a shape distributions method. This method uses a shape function to sample the global geometric properties of a 3D object. These geometric properties are organized into a histogram, or shape distribution, based on frequency of occurrence. 3D shape searching techniques are described in additional detail in Osada, R. et al., Shape Distributions, 21 ACM Transactions on Graphics 807 (2002), which is incorporated herein by reference in its entirety. Among other benefits, the shape distributions method is a robust method for discriminating between objects despite the presence of arbitrary translations, rotations, scales, mirrors, and/or other scale or aspect differences.
While the shape distributions method is both simpler and more robust than many 3D shape searching techniques, different objects may have similar shape distributions. Moreover, 3D shape searching techniques, including the shape distributions method, do not measure object attributes other than shape. That is, these techniques measure spatial attributes only, and fail to capture non-spatial attributes, such as physical, chemical, and/or dynamic object attributes. As a result, 3D shape searching techniques cannot distinguish between similarly shaped objects having different non-spatial attributes.
Accordingly, techniques for distinguishing among objects that have similar shapes but different non-spatial attributes, such as physical, chemical, and/or dynamic attributes, are desired to better recognize objects, in addition to distinguishing among non-physical and/or non-object models. The techniques should apply to large data sets, while keeping computational costs feasible.
Methods and systems for discriminating between multi-dimensional models using difference distributions are described herein. In some embodiments, the system receives one or more models for which difference distribution histograms are to be generated. A model is a virtual object, pattern, phenomenon, behavior, event, data set, or other entity having multiple attributes, including at least one non-spatial attribute. In some embodiments, a model has both spatial attributes and non-spatial attributes. Non-spatial attributes include physical, chemical, dynamic, and/or other attributes. Physical attributes include, for example, material, density, luminance, and color. Chemical attributes include, for example, molecule type, element, and charge. In addition, physical, chemical, and/or other non-spatial attributes may vary dynamically over time.
Once the models have been received, the system selects a sampling function to be applied to the received models. A sampling function measures the difference between two or more data samples from a model with regard to a parameter including, but not limited to, distance, area, or volume. For example, a sampling function may measure the distance between data sample A, a random point on the surface of the model, and data sample B, a fixed point, such as the center of mass of the model. The selected sampling function is applied to multiple groups of two or more data samples (e.g., multiple pairs of data samples) from each received model to generate a difference distribution histogram for that model.
Once multiple difference distribution histograms have been generated to represent multiple models, the similarity of the difference distribution histograms—and thus the models—is determined. In some embodiments, the system receives two or more difference distribution histograms for comparison. In some embodiments, at least one of the difference distribution histograms is stored in a database. For example, the system may receive one or more difference distribution histograms that are to be matched against a database of multiple predefined models. In some embodiments, at least one of the difference distribution histograms is a target specified in a fitness function for a genetic algorithm or machine learning search, to be compared against the difference distribution histograms generated from one or more candidate models. Once the difference distribution histograms have been received, the system selects a distribution test function, which measures the similarity of two or more histograms. The selected distribution test function is applied to the received difference distribution histograms to measure the similarity of the histograms.
Among other benefits, the technology described herein distinguishes among models that have similar shapes but different non-spatial attributes. The described technology also distinguishes among models having only non-spatial attributes. In addition, the described technology offers a general and versatile approach for recognition, analysis, and classification of data patterns. The technology described herein has a variety of applications, including, but not limited to, genetic simulations, text classification, weather and natural disaster prediction, biometric identification and authentication, enemy military tactics and strategy analysis prediction, target acquisition, image intelligence analysis, terrorist activity, medical diagnoses, decryption pattern analysis, and/or a variety of other applications. For example, the described technology may be used to determine model fitness in a genetic simulation. In some embodiments, a genetic algorithm uses difference distributions to compare a modeled object and a target object to determine comparable profiles. The genetic algorithm may make one or more determinations based on whether the difference distribution of the modeled object is sufficiently similar to that of target object. For example, the genetic algorithm may keep, replace, discard, modify, or take other action regarding the modeled object based on the similarity determination. A suitable genetic algorithm is described in additional detail in copending U.S. patent application Ser. No. 11/234,413, entitled METHOD, SYSTEM AND APPARATUS FOR VIRTUAL MODELING OF BIOLOGICAL TISSUE WITH ADAPTIVE EMERGENT FUNCTIONALITY, filed on Sep. 23, 2005; and U.S. patent application Ser. No. 12/554,870, entitled SYSTEMS AND METHODS FOR CELL-CENTRIC SIMULATION OF BIOLOGICAL EVENTS AND CELL-BASED MODELS PRODUCED THEREFROM, filed on Sep. 4, 2009, which are hereby incorporated by reference in their entirety.
Various embodiments of the technology will now be described. The following description provides specific details for a thorough understanding and an enabling description of these embodiments. One skilled in the art will understand, however, that the described technology may be practiced without many of these details. Additionally, some well-known structures or functions may not be shown or described in detail, so as to avoid unnecessarily obscuring the relevant description of the various embodiments. The terminology used in the description presented below is intended to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific embodiments of the technology.
The described technology can also be practiced in distributed computing environments, where tasks or modules are performed by remote processing devices, which are linked through a communications network, such as a LAN, WAN, or the Internet. In a distributed computing environment, program modules or sub-routines may be located in both local and remote memory storage devices. In addition, those skilled in the art will recognize that portions of the described technology may reside on a server computer, while corresponding portions reside on a client computer.
The computing system 100 of
The input devices 102 may include a keyboard and/or a pointing device such as a mouse. Other input devices may include a microphone, joystick, pen, stylus, game pad, scanner, and/or other input device. The data storage devices 104 may include any type of tangible computer-readable media, including magnetically or optically readable computer discs, hard-wired or preprogrammed chips (e.g., EEPROM semiconductor chips), nanotechnology memory, biological memory, and/or other data storage media. Data may be stored in a data storage device 104 according to one or more data structures encompassed within the scope of the described technology. Alternatively or additionally, computer implemented instructions, data structures, screen displays, and other data related to the technology may be distributed over the Internet or over other networks (including wireless networks) via the optional network connection 110 and/or optional wireless transceiver 112, on a propagated signal on a propagation medium (e.g., an electromagnetic wave(s), a sound wave, etc.) over a period of time. In some implementations, the data may be provided on any analog or digital network (e.g., a packet switched, circuit switched, or other network scheme).
Aspects of the described technology may be practiced in a variety of other computing environments, such as that depicted by
At least one server computer 208, coupled to the Internet or World Wide Web (“Web”) 206, performs many or all of the functions for receiving, routing, and storing of electronic messages, such as web pages, audio signals, and electronic images. While the Internet is shown, a private network, such as an intranet may indeed be preferred in some applications. The network may have a client-server architecture, in which a computer is dedicated to serving other client computers, or it may have other architectures such as a peer-to-peer, in which one or more computers serve simultaneously as servers and clients. A database 210 or databases, coupled to the server computer(s), stores much of the web pages and content exchanged between the user computers. The server computer(s), including the database(s), may employ security measures to inhibit malicious attacks on the system, and to preserve integrity of the messages and data stored therein (e.g., firewall systems, secure socket layers (SSL), password protection schemes, and/or encryption).
The server computer 208 may include a server engine 212, a web page management component 214, a content management component 216, and a database management component 218. The server engine performs basic processing and operating system level tasks. The web page management component handles creation and display or routing of web pages. Users may access the server computer by means of a URL associated therewith. The content management component handles most of the functions in the embodiments described herein. The database management component includes storage and retrieval tasks with respect to the database, queries to the database, and storage of data.
The described technology distinguishes among multi-dimensional models using difference distributions. A model is a virtual object, pattern, phenomenon, behavior, event, data set, or other entity having multiple attributes, including at least one non-spatial attribute. Non-spatial attributes include, but are not limited to, physical, chemical, and/or dynamic attributes of the model. Physical attributes include, for example, material, density, luminance, and color. Chemical attributes include, for example, molecule type, indicant, and sensitivity. In addition, physical, chemical, and/or other non-spatial attributes may vary dynamically over time. For example, the chemical attributes of a genetic model may vary over the duration of a simulation.
In some embodiments, a model has both spatial attributes and non-spatial attributes. Spatial attributes include the x-, y-, and/or z-coordinates of the model. For example, in some embodiments, a model is a three-dimensional or other spatial model generated by a genetic simulation, a medical diagnosis system, a weather or natural disaster system, and/or any other information system and/or algorithm.
A. Generating Difference Distribution Histograms
At a block 305, the process 300 receives one or more models for which difference distribution histograms are to be generated. The models may be provided by a modeling and/or information system, a user, and/or in another manner. Sample models are described in reference to example 1 (bug) and example 2 (ellipse).
At a block 310, the process 300 selects a sampling function to be applied to the received models to generate the difference distribution histograms. A sampling function measures the difference between two or more data samples from a model with regard to a parameter including, but not limited to, distance, area, or volume. A variety of sampling functions may be selected for application to the models. The sampling functions described herein are provided for illustrative purposes only, and are not intended to limit the described technology. One skilled in the art will appreciate that a variety of other sampling functions may be used. In addition, although a single sampling function is applied to each model in the illustrated embodiment, in other embodiments multiple sampling functions are applied each model.
In some embodiments, the sampling function incorporates both continuous and nominal attributes of a model, while in other embodiments, the sampling function (or functions) separates the continuous and nominal attributes. An attribute is a nominal attribute if it is assigned one or more distinct values. For example, color is a nominal attribute if it may be assigned values such as blue, red, green, and yellow. Nominal values may be assigned associated numerical values, such as 1 (blue), 2 (red), 3 (green), and 4 (yellow). An attribute is continuous if it may be assigned a value corresponding to any real number along a given number line. For example, position is a continuous attribute if it may be assigned any real number value along a given axis. However, position is a nominal attribute if it may be assigned distinct values such as left, center, and right.
In some embodiments, a sampling function that generates a heterogeneous distance based on differences of continuous and nominal values (herein referred to as “HDCN”) is applied to the models. This sampling function incorporates both the continuous and nominal attributes of a model, as previously described. An example of an HDCN sampling function is provided in equations (1)-(4):
A and B represent two data samples selected from a model. Each sample comprises n attributes. Equation (1), d(Ai,Bi), represents the distance between A and B in reference to the i-th attribute of the data samples. If the i-th attribute is a nominal attribute, equation (2) is applied to calculate the distance between the attributes. binNomn is set to 0 if the nominal attributes have the same value, or to 1 if the nominal attributes have different values. If the i-th attribute is a continuous attribute, equation (3) is applied to calculate the distance between the attributes. normCont represents the normalized distance between the continuous attributes. max, represents the maximum distance for the i-th continuous attribute of the model. max, normalizes the distance between each pair of samples, such that the distance for each attribute will not exceed 1. The overall distance is defined based on a Euclidean distance function represented by equation (4):
Data samples A and B may be selected in a variety of manners. For example, A may be a random point on the surface of the model, while B is a fixed point. As another example, A and B may both be random points on the surface of the model. In the illustrated embodiments, A is a random point on the surface of the model, while B is the center of mass of the model (i.e., a fixed point). In other embodiments, three or more samples are selected. For example, three or four random points on the surface of the object may be selected, and the area or volume between the points measured. Moreover, although the illustrated embodiments select points on the surface of a model, one skilled in the art will appreciate that other embodiments may select points anywhere within the model, not necessarily on the surface of the model.
In some embodiments, the value of a nominal attribute for a fixed data point B is assigned a constant value. In the illustrated embodiments, the constant value of the color attribute is assigned the value of red (2), as described in additional detail herein. In other embodiments, the constant value of the color attribute is assigned the color value that has a maximum number of neighbors from a fixed data point B. Neighbors are described in additional detail herein. One skilled in the art will appreciate that the constant value of a nominal attribute for a fixed data point may be determined in a variety of other ways.
In some embodiments, a sampling function that generates a heterogeneous distance with an extension to nominal values (herein referred to as “HDEN”) is selected and applied to the model. Like the HDCN sampling function, the HDEN sampling function incorporates both continuous and nominal attributes. However, while the HDCN sampling function is generally dominated by the continuous attributes, the HDEN sampling function typically captures more information about nominal attributes. Rather than simply assigning a value of 0 or 1 to the nominal attribute, the HDEN sampling function generates and compares distances within a local geometric landscape surrounding the data points for each discrete value of the nominal attribute. Accordingly, the HDEN sampling function generally facilitates improved discrimination between models having different nominal attribute values. An example of an HDEN sampling function is provided in equations (5)-(7):
As previously described, A and B represent two data samples selected from a model. Each sample comprises n attributes. Equation (6) represents an extension to nominal values, defined as the distance between A and B in reference to the j-th attribute of the data samples. de(NAj,NBj) is the normalized difference between the number of neighbors of A that have the j-th value of the nominal attribute and the number of neighbors of B that have the j-th value of the nominal attribute. Each nominal attribute has m discrete values. Equation (7) calculates the distance between A and B by combining equation (4) (the HDCN sampling function) and equation (6) (the extension to the nominal values).
In some embodiments, the number of neighbors having a specific nominal value for a fixed data point B is assigned a constant value. In the illustrated embodiments, the constant value for the number of neighbors having a specific color value is zero. In other embodiments, the constant value is assigned based on the number of neighbors of the fixed data point B having the specific nominal value (according to a particular radius ratio). One skilled in the art will appreciate that the constant value may be determined in a variety of other ways.
In some embodiments, a sampling function that generates multiple one-dimensional difference distributions (herein referred to as “MODD”) is applied to the model. This sampling function separates continuous and nominal attributes of an model, as previously described. An example of a MODD sampling function is provided in equations (8)-(10):
As previously described, A and B represent two data samples selected from a model. C represents the number of continuous attributes of the model. Equation (8) is applied to the continuous attributes of the model, while equation (9) is applied to the nominal attributes. Equation (8) calculates the distance between the continuous attributes of A and B. The distance for each data sample is computed and a corresponding histogram is generated. Equation (9) defines a nominal attribute distance as the number of neighbors having the k-th value of a nominal attribute, where the sample itself holds the j-th value of the nominal attribute. If the number of discrete values for a nominal attribute is N, then N2 sub-histograms are generated based on the fixed values of j and k. All sub-histograms are then concatenated, to facilitate comparison between models.
An example average difference score for comparing models according to the MODD sampling function is defined by equation (10):
DiffScore=w1*Sc+w2*Sn (10)
Sc represents a difference score for continuous attributes, while Sn represents a difference score for nominal attributes. w1 and w2 denote weights that may be adjusted according to different application requirements. In some embodiments, the weights are equal, such that the continuous and nominal difference scores are evenly distributed, while in other embodiments, the weights are different. Compared to the HDCN and HDEN sampling functions, the MODD sampling function tends to better isolate continuous and nominal attributes, facilitating discrimination between models with complex attributes.
Returning to
As previously described in reference to
In the illustrated embodiment, the value of the nominal attribute (color) may be blue, red, green, or yellow. In a clockwise manner from the top right quadrant of the graph 405, Bug0 has legs that are green, green, green, green, red, red, red, and red. Bug1 depicted by graph 410 has legs that are green, green, green, red, green, red, red, and red. Bug2 depicted by graph 415 has legs that are green, green, red, red, green, green, red, and red. Bug3 depicted by graph 420 has legs that are green, red, green, red, green, red, green, and red.
Once models such as those depicted in
a. HDCN Sampling Function
In some embodiments, the HDCN sampling function is applied to the models. As previously described, in some embodiments, the value of a nominal attribute for a fixed data point B is assigned a constant value. In the illustrated embodiment, the color attribute for data point B is assigned a constant value of 2 (red). This value is selected based on an assignment of the value 1 to the color blue; the value 2 to the color red; the value 3 to the color green; and the value 4 to the color yellow. Because colors 1 and 4 (blue and yellow) do not vary among the models (i.e., only the colors 2 and 3 (red and green) of the legs varies), selecting a constant value of 2 is representative.
The histograms in each column correspond to the same model. Histograms 502, 510, 518, 526, and 534 correspond to Bug0 depicted by graph 405; histograms 504, 512, 520, 528, and 536 correspond to Bug1 depicted by graph 410; histograms 506, 514, 522, 530, and 538 correspond to Bug2 depicted by graph 415; and histograms 508, 516, 524, 532, and 540 correspond to Bug3 depicted by graph 420.
The histograms in each row correspond to a common radius ratio. The radius ratio is a multiplier for determining a neighborhood from which the data samples are to be selected. The radius ratio is a percentage of the distance between the maximum and minimum spatial distance of a model. In the illustrated embodiment, the radius ratio is selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. For example, the radius ratio of 0.01 indicates that data samples are to be selected from a neighborhood that is 1% of the distance between the maximum and minimum spatial distance of an model. One skilled in the art will appreciate that a variety of other radius ratios may be used.
In
b. HDEN Sampling Function
In some embodiments, the HDEN sampling function is applied to the models.
The histograms in each row correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms 602-608 correspond to the radius ratio of 0.01; histograms 610-616 correspond to the radius ratio of 0.05; histograms 618-624 correspond to the radius ratio of 0.10; histograms 626-632 correspond to the radius ratio of 0.30; and histograms 634-640 correspond to the radius ratio of 0.50.
The histograms in each row correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms 702-708 correspond to the radius ratio of 0.01; histograms 710-716 correspond to the radius ratio of 0.05; histograms 718-724 correspond to the radius ratio of 0.10; histograms 726-732 correspond to the radius ratio of 0.30; and histograms 734-740 correspond to the radius ratio of 0.50.
The histograms in each row correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms 802-808 correspond to the radius ratio of 0.01; histograms 810-816 correspond to the radius ratio of 0.05; histograms 818-824 correspond to the radius ratio of 0.10; histograms 826-832 correspond to the radius ratio of 0.30; and histograms 834-840 correspond to the radius ratio of 0.50.
The histograms in each row correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms 902-908 correspond to the radius ratio of 0.01; histograms 910-916 correspond to the radius ratio of 0.05; histograms 918-924 correspond to the radius ratio of 0.10; histograms 926-932 correspond to the radius ratio of 0.30; and histograms 934-940 correspond to the radius ratio of 0.50.
The previously described HDCN and HDEN sampling functions incorporate both the continuous and nominal attributes of a model. When the continuous and nominal attributes are incorporated together, these attributes may interfere with each other to some degree. For example, because the continuous and nominal attributes are not treated separately by the sampling function, they may be conflated to a certain extent. In addition, as more dimensions are measured by the data function, the dimensions may wholly or partially cancel each other out. Accordingly, in some embodiments, a sampling function (or functions) is applied that separates the continuous and nominal attributes of a model.
c. MODD Sampling Function
In some embodiments, the MODD sampling function is applied to the models.
The histograms in each row correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms 1002-1008 correspond to the radius ratio of 0.01; histograms 1010-1016 correspond to the radius ratio of 0.05; histograms 1018-1024 correspond to the radius ratio of 0.10; histograms 1026-1032 correspond to the radius ratio of 0.30; and histograms 1034-1040 correspond to the radius ratio of 0.50.
Because there are four distinct nominal attribute values in the illustrated embodiment, the number of concatenated bins for each model is 1024 (42*64 bins). Bins 0-256 represent the self color of 1 (blue) and neighboring colors of 1 (blue), 2 (red), 3 (green), and 4 (yellow), respectively. Bins 257-512 represent the self color of 2 (red) and neighboring colors of 1 (blue), 2 (red), 3 (green), and 4 (yellow). Bins 513-768 and bins 769-1024 are similar, except that the self color is 3 (green) and 4 (yellow), respectively.
The sub-histograms 1102-1132 in each row correspond to one-fourth of the generated histograms. Sub-histograms 1102-1108 correspond to the nominal attribute value of 1 (blue); sub-histograms 1110-1116 correspond to the nominal attribute value of 2 (red); sub-histograms 1118-1124 correspond to the nominal attribute value of 3 (green); and sub-histograms 1126-1132 correspond to the nominal attribute value of 4 (yellow). These sub-histograms are concatenated to generate a single histogram. For example, sub-histograms 1102, 1110, 1118, and 1126 are concatenated to generate a single histogram representing Bug0 depicted by graph 405, and so on.
The sub-histograms 1202-1232 in each row correspond to one-fourth of the generated histograms. Sub-histograms 1202-1208 correspond to the nominal attribute value of 1 (blue); sub-histograms 1210-1216 correspond to the nominal attribute value of 2 (red); sub-histograms 1218-1224 correspond to the nominal attribute value of 3 (green); and sub-histograms 1226-1232 correspond to the nominal attribute value of 4 (yellow). These sub-histograms are concatenated to generate a single histogram. For example, sub-histograms 1202, 1210, 1218, and 1226 are concatenated to generate the single histogram 1010 of
The sub-histograms 1302-1332 in each row correspond to one-fourth of the generated histograms. Sub-histograms 1302-1308 correspond to the nominal attribute value of 1 (blue); sub-histograms 1310-1316 correspond to the nominal attribute value of 2 (red); sub-histograms 1318-1324 correspond to the nominal attribute value of 3 (green); and sub-histograms 1326-1332 correspond to the nominal attribute value of 4 (yellow). These sub-histograms are concatenated to generate a single histogram. For example, sub-histograms 1302, 1310, 1318, and 1326 are concatenated to generate a single histogram representing Bug0 depicted by graph 405, and so on.
As previously described, a sampling function is selected and applied to the models to generate difference distribution histograms representing the models. As in example 1 (bug), a variety of sampling functions may be applied to the model, including the HDCN, HDEN, and MODD sampling functions described herein.
a. HDCN Sampling Function
In some embodiments, the HDCN sampling function is applied to the models.
The histograms in each row 1535-1555 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms in row 1535 correspond to the radius ratio of 0.01; histograms in row 1540 correspond to the radius ratio of 0.05; histograms in row 1545 correspond to the radius ratio of 0.10; histograms in row 1550 correspond to the radius ratio of 0.30; and histograms in row 1555 correspond to the radius ratio of 0.50.
b. HDEN Sampling Function
In some embodiments, the HDEN sampling function is applied to the models.
The histograms in each row 1635-1655 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms in row 1635 correspond to the radius ratio of 0.01; histograms in row 1640 correspond to the radius ratio of 0.05; histograms in row 1645 correspond to the radius ratio of 0.10; histograms in row 1650 correspond to the radius ratio of 0.30; and histograms in row 1655 correspond to the radius ratio of 0.50.
The histograms in each row 1735-1755 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms in row 1735 correspond to the radius ratio of 0.01; histograms in row 1740 correspond to the radius ratio of 0.05; histograms in row 1745 correspond to the radius ratio of 0.10; histograms in row 1750 correspond to the radius ratio of 0.30; and histograms in row 1755 correspond to the radius ratio of 0.50.
The histograms in each row 1835-1855 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms in row 1835 correspond to the radius ratio of 0.01; histograms in row 1840 correspond to the radius ratio of 0.05; histograms in row 1845 correspond to the radius ratio of 0.10; histograms in row 1850 correspond to the radius ratio of 0.30; and histograms in row 1855 correspond to the radius ratio of 0.50.
The histograms in each row 1935-1955 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms in row 1935 correspond to the radius ratio of 0.01; histograms in row 1940 correspond to the radius ratio of 0.05; histograms in row 1945 correspond to the radius ratio of 0.10; histograms in row 1950 correspond to the radius ratio of 0.30; and histograms in row 1955 correspond to the radius ratio of 0.50.
c. MODD Sampling Function
In some embodiments, the MODD sampling function is applied to the models.
The histograms in each row 2035-2055 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, 0.30, and 0.50. Histograms in row 2035 correspond to the radius ratio of 0.01; histograms in row 2040 correspond to the radius ratio of 0.05; histograms in row 2045 correspond to the radius ratio of 0.10; histograms in row 2050 correspond to the radius ratio of 0.30; and histograms in row 2055 correspond to the radius ratio of 0.50.
B. Measuring the Similarity of Multiple Difference Distribution Histograms
Once multiple difference distribution histograms have been generated to represent multiple models, the similarity of the difference distribution histograms—and thus the models—is determined.
At a block 350, the process 345 receives two or more difference distribution histograms for comparison. The histograms may be provided by the system, a modeling and/or information system, a user, and/or in another manner. In some embodiments, at least one of the difference distribution histograms is stored in a database, such as a database stored on a data storage device 104 (
At a block 355, the process 345 selects a distribution test function to be applied to the received difference distribution histograms to measure the similarity of the histograms. A variety of distribution test functions may be applied to the difference distribution histograms to determine similarity, including several distribution test functions well known in the field of statistics. Suitable distribution test functions include, but are not limited to the chi-square test (herein referred to as “chi”), the Bhattacharyya distance (herein referred to a “bha”), and/or a Minkowski norm (herein referred to as “pdf”). The distribution test functions described herein are provided for illustrative purposes only, and are not intended to limit the described technology. One skilled in the art will appreciate that a variety of other distribution test functions may be used.
In some embodiments, a chi test function is applied to the difference distribution histograms. The chi test function is provided by equation (11):
In equation (11), f and g represent two difference distribution histograms for comparison. For each bin, a comparison is made between the number of events observed (i.e., measurements made) in f and the number of events observed in g. In some embodiments, for the distribution test functions described herein, a large distance value indicates a low probability that the difference distribution histograms represent the same model; a small distance value indicates a higher probability that the difference distribution histograms represent the same model.
In some embodiments, a bha test function is applied to the difference distribution histograms. The bha test function is provided by equation (12):
D(f,g)=1−∫√{square root over (fg)} (12)
In some embodiments, a pdf test function is applied to the difference distribution histograms. A pdf test function is provided by equation (13):
D(f,g)=∫(|f−g|N)1/N (13)
Where the exponent N equals 1, the pdf test function (herein referred to as “pdfL1”) is provided by equation (14):
D(f,g)=∫|f−g| (14)
Where the exponent N equals 2, the pdf test function (herein referred to as “pdfL2”) is defined by equation (15):
D(f,g)=∫(|f−g|2)1/2 (15)
Returning to
C. Difference Score Landscapes
In some embodiments, the similarity of difference distribution histograms is measured according to one or more difference score landscapes. Difference score landscapes may be used instead of or in addition to difference distribution histograms to determine model similarity.
Landscapes in each row 4325-4340 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 4325 correspond to the radius ratio of 0.01; landscapes in row 4330 correspond to the radius ratio of 0.05; landscapes in row 4335 correspond to the radius ratio of 0.10; and landscapes in row 4340 correspond to the radius ratio of 0.50.
Landscapes in each column 4405-4420 correspond to a comparison between a common pair of models. Landscapes in column 4405 correspond to a comparison between Bug0 and itself; landscapes in column 4410 correspond to a comparison between Bug1 and Bug0; landscapes in column 4415 correspond to a comparison between Bug2 and Bug0; and landscapes in column 4420 correspond to a comparison between Bug3 and Bug0.
Landscapes in each row 4425-4440 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 4425 correspond to the radius ratio of 0.01; landscapes in row 4430 correspond to the radius ratio of 0.05; landscapes in row 4435 correspond to the radius ratio of 0.10; and landscapes in row 4440 correspond to the radius ratio of 0.50.
Landscapes in each row 4525-4540 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 4525 correspond to the radius ratio of 0.01; landscapes in row 4530 correspond to the radius ratio of 0.05; landscapes in row 4535 correspond to the radius ratio of 0.10; and landscapes in row 4540 correspond to the radius ratio of 0.50.
Landscapes in each row 4625-4640 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 4625 correspond to the radius ratio of 0.01; landscapes in row 4630 correspond to the radius ratio of 0.05; landscapes in row 4635 correspond to the radius ratio of 0.10; and landscapes in row 4640 correspond to the radius ratio of 0.50.
Landscapes in each row 4725-4740 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 4725 correspond to the radius ratio of 0.01; landscapes in row 4730 correspond to the radius ratio of 0.05; landscapes in row 4735 correspond to the radius ratio of 0.10; and landscapes in row 4740 correspond to the radius ratio of 0.50.
Landscapes in each row 4825-4840 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 4825 correspond to the radius ratio of 0.01; landscapes in row 4830 correspond to the radius ratio of 0.05; landscapes in row 4835 correspond to the radius ratio of 0.10; and landscapes in row 4840 correspond to the radius ratio of 0.50.
Landscapes in each row 4925-4940 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 4925 correspond to the radius ratio of 0.01; landscapes in row 4930 correspond to the radius ratio of 0.05; landscapes in row 4935 correspond to the radius ratio of 0.10; and landscapes in row 4940 correspond to the radius ratio of 0.50.
Landscapes in each row 5135-5150 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 5135 correspond to the radius ratio of 0.01; landscapes in row 5140 correspond to the radius ratio of 0.05; landscapes in row 5145 correspond to the radius ratio of 0.10; and landscapes in row 5150 correspond to the radius ratio of 0.50.
Landscapes in each row 5235-5250 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 5235 correspond to the radius ratio of 0.01; landscapes in row 5240 correspond to the radius ratio of 0.05; landscapes in row 5245 correspond to the radius ratio of 0.10; and landscapes in row 5250 correspond to the radius ratio of 0.50.
Landscapes in each row 5335-5350 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 5335 correspond to the radius ratio of 0.01; landscapes in row 5340 correspond to the radius ratio of 0.05; landscapes in row 5345 correspond to the radius ratio of 0.10; and landscapes in row 5350 correspond to the radius ratio of 0.50.
Landscapes in each row 5435-5450 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 5435 correspond to the radius ratio of 0.01; landscapes in row 5440 correspond to the radius ratio of 0.05; landscapes in row 5445 correspond to the radius ratio of 0.10; and landscapes in row 5450 correspond to the radius ratio of 0.50.
Landscapes in each row 5535-5550 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 5535 correspond to the radius ratio of 0.01; landscapes in row 5540 correspond to the radius ratio of 0.05; landscapes in row 5545 correspond to the radius ratio of 0.10; and landscapes in row 5550 correspond to the radius ratio of 0.50.
Landscapes in each row 5635-5650 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 5635 correspond to the radius ratio of 0.01; landscapes in row 5640 correspond to the radius ratio of 0.05; landscapes in row 5645 correspond to the radius ratio of 0.10; and landscapes in row 5650 correspond to the radius ratio of 0.50.
Landscapes in each row 5735-5750 correspond to a common radius ratio, selected from the set comprised of 0.01, 0.05, 0.10, and 0.50. Landscapes in row 5735 correspond to the radius ratio of 0.01; landscapes in row 5740 correspond to the radius ratio of 0.05; landscapes in row 5745 correspond to the radius ratio of 0.10; and landscapes in row 5750 correspond to the radius ratio of 0.50.
D. Analysis of Results
i. Difference Distributions
In the illustrated embodiments, the HDCN sampling function performs more effectively in discriminating between the ellipses than in discriminating between the bugs. The difference in the effectiveness of HDCN is due in part to the differences in the volume of each color in the models. For the bugs, each color has the same volume for each bug. For the ellipses, each color has a different volume. Ellipse1, Ellipse3, and Ellipse4 have colors with approximately the same proportions of volume, while Ellipse0, Ellipse2, and Ellipse5 have colors with different proportions of volume. As a result, as depicted in
On the other hand, as depicted in
Accordingly, for models that have similar color patterns, color distributions, and overall model volumes, the HDCN sampling function may reflect the similarity of a general pattern between models. However, depending on the models, the choice of the constant color value for the fixed data point B may affect the resultant difference scores. For example, in the illustrated embodiments, if blue were selected as the constant color value, the HDCN sampling function would not detect the differences between the ellipses or between the bugs, as blue is not a color that varies between either type of model.
In cases where the HDCN sampling function does not discriminate effectively between models, the HDEN and MODD sampling functions generally discriminate more effectively. The effectiveness of the HDEN and MODD sampling functions is generally radius ratio dependent. As illustrated by
As illustrated by
Each of the sampling functions described herein may be applied in a variety of circumstances. In some embodiments, the system selects a sampling function that is most suited to the circumstances. For example, among other circumstances, the HDCN sampling function is applicable for comparing a general pattern of nominal attributes according to a suitable constant nominal value. Among other circumstances, the HDEN and MODD sampling functions are applicable to discriminate between complex nominal attributes. In some embodiments, the HDEN and MODD sampling functions achieve improved performance where only the nominal attributes that distinguish the models are included.
ii. Radius Ratios
The HDEN and MODD sampling functions are radius-sensitive, while the HDCN sampling function is not. As illustrated in
As illustrated by
iii. Distribution Test Functions
As illustrated by
iv. Shape Generation, Number of Samples, and Number of Bins
As illustrated by
From the foregoing, it will be appreciated that specific embodiments of the technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the spirit and scope of the described technology. For example, those skilled in the art will appreciate that a variety of sampling functions, distribution test functions, and/or other equations and/or algorithms other than those described herein may be implemented in accordance with the technology described herein. Those skilled in the art will further appreciate that the depicted flow diagrams may be altered in a variety of ways. For example, the order of the blocks may be rearranged, blocks may be performed in parallel, blocks may be omitted, or other blocks may be included. Accordingly, the described technology is not limited except as by the appended claims.
This application claims priority to and incorporates by reference in its entirety U.S. Provisional Patent Application No. 61/209,972, entitled SYSTEMS AND METHODS FOR CELL-CENTRIC SIMULATION OF BIOLOGICAL EVENTS AND CELL-BASED MODELS PRODUCED THEREFROM, filed on Mar. 11, 2009; and U.S. Provisional Application No. 61/313,074, entitled DISCRIMINATION BETWEEN MULTI-DIMENSIONAL MODELS USING DIFFERENCE DISTRIBUTIONS, filed concurrently herewith (attorney docket no. 43332-8001.US06). In addition, this application claims priority to and incorporates by reference in their entirety copending U.S. patent application Ser. No. 11/234,413, entitled METHOD, SYSTEM AND APPARATUS FOR VIRTUAL MODELING OF BIOLOGICAL TISSUE WITH ADAPTIVE EMERGENT FUNCTIONALITY, filed on Sep. 23, 2005; and copending U.S. patent application Ser. No. 12/554,870, entitled SYSTEMS AND METHODS FOR CELL-CENTRIC SIMULATION OF BIOLOGICAL EVENTS AND CELL-BASED MODELS PRODUCED THEREFROM, filed on Sep. 4, 2009.
The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contracts DAMD17-02-2-0049 and W81XWH-08-2-0003 as awarded by the US Army Medical Research Acquisition Activity (USAMRAA).
Number | Date | Country | |
---|---|---|---|
61209972 | Mar 2009 | US | |
61313074 | Mar 2010 | US |