This Small Business Innovation Research (SBIR) Phase I project addresses the<br/>problem of learning predictive models of individual choice behavior using sparse<br/>information on the behavior of any single individual. The intellectual merit of the<br/>project is developing a novel parsimonious view of this problem by modeling<br/>choice behavior as a distribution over permutations of alternatives, and making<br/>this view implementable at scale. A unit of data in this paradigm is a single<br/>comparison between two alternatives. Data of this sort can be derived in a variety<br/>of contexts ranging from product reviews to transaction data. While being a<br/>parsimonious modeling viewpoint, exact computation, or even representing such<br/>models is intractable. The project will focus on developing approximate solutions<br/>that, in the spirit of recent advances in high-dimensional statistics, exploit the<br/>potential of sparse approximations to such models. Given the vast quantities of<br/>data available to build such models it will be important for the algorithms<br/>developed to be amenable to parallelization in a manner reminiscent of the<br/>Map/Reduce computational paradigm. The algorithms developed will fit this<br/>paradigm with key algorithmic steps decomposing across data collected for a<br/>single individual. In summary, this project will develop a massively parallelizable<br/>approach to modeling individual choice behavior using unstructured data from a<br/>variety of sources.<br/><br/>The broader impact/commercial potential of this project rests in enabling the<br/>emerging, all pervasive transition from 'search' to 'discovery'. This transition can<br/>be witnessed in sectors ranging from e-commerce to offline retail to matching<br/>impressions to advertisers on demand side platforms. The key stumbling block in<br/>this transition is the seeming requirement to build attribute rich models for a given<br/>context as opposed to a black box approach. The approach taken in this project<br/>is of the latter variety. As a concrete example, the task of merchandising requires<br/>an offline retailer to decide on the right assortment of products to carry in<br/>segments ranging from tooth paste to clothing; the approach here will power such<br/>decision making in an entirely data driven fashion. In a different direction, serving<br/>ads based on models that capture a surfer's preferences across the various silos<br/>of products and topics on the web can be enabled at scale and incredible<br/>granularity using the approach here. The level of granularity made possible by<br/>the approach here cannot be achieved with 'parametric' attribute driven<br/>approaches. In summary, the tools developed in this project have the potential to<br/>do for `discovery' what the PageRank algorithm did for search.