Cluster analysis aims to discover patterns and homogeneity in the data. It reveals subgroups in a population of study, and it has applications in many fields. For example, in psychology, the clusters can be groups of patients that can benefit from a specific set of treatments. To apply cluster analysis to a data set, the data need to have some characteristics; for example, some techniques require the data to be continuous, and oftentimes they need pre-treatments. This project will develop a series of new clustering techniques that are suitable for challenging data sets without pre-treatments, such as those with high dimension, missing values, non-continuous variables, or with outliers. Novel statistical approaches and software packages will be produced and made available to general users. Undergraduate students will be directly involved in the research project, and together with graduate students, they will be trained to conduct research in data analysis. Many more students will be involved through class projects and the research outcomes will enrich the content of some of the offered courses. <br/><br/>A widely used approach for cluster analysis is model-based clustering. It assumes that a population is a mixture of subpopulations, each of which can be represented by a density function. A variety of clustering methods and algorithms exist; however, they still have a series of limitations. Outliers and missing data can impact the clustering results, the high number of parameters makes the techniques not usable on high-dimensional data sets. Moreover, many algorithms assume continuous data; and they are not readily adaptable to handle discrete, binary, categorical, or a mixture of continuous and categorical data types. This is a major limitation because, in many fields such as medicine, biology, marketing, and many others, the data have all those characteristics. In this project, new clustering techniques based on non-Gaussian model-based clustering will be developed that will circumvent existing limitations on cluster shape, outliers, missing data, dimension, and data type of current methods. The novel methods will improve the flexibility in detecting skewed clusters and in obtaining robustness when dealing with outliers and missing data. Implicit and explicit dimension reduction techniques will be used for dimension reduction and latent class models will be adopted to deal with mixed-type data. The project will include a study on the indices to select the number of clusters and a thorough comparison with existing methods on real and simulated data will be undertaken, giving the users a guideline on which model to use based on the goal and challenges in their data.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.