III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks

Information

NSF Award
1908375

Owner

Florida Institute of Technology, Inc.

Award Id
1908375
Award Effective Date
10/1/2019 - 5 years ago
Award Expiration Date
9/30/2022 - 2 years ago
Award Amount
$ 249,999.00
Award Instrument
Standard Grant

Information

III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks

Sampling and estimating structural information from large-scale networks or graphs has been central to our understanding of the network dynamics and its rich set of applications. Markov Chain Monte Carlo (MCMC) has been the key enabler for a broader context of graph sampling, including estimating the properties of large graphs, sampling the corpus of documents indexed by search engines, sampling records from hidden databases behind Web forms, identifying subgraphs of certain characteristics and frequent graph pattern matching. Despite versatile applications of the MCMC methods and their customized algorithms for analyzing graph-structured data in various forms, there still exist critical challenges and limitations in the literature centered around the MCMC methods. One is the 'cost' consumption/constraints associated with the sampling operation, which limits the size of total samples obtained and negatively affects the accuracy of any estimator based on the obtained samples. Another limitation is that the recent advances in MCMC, especially built up on favorable non-reversible Markov chains, cannot be leveraged to the various large-graph sampling tasks, due to their required global knowledge of the underlying state space, lack of distribution implementation, unconstrained state space, as well as the simplified cost assumption. The goal of this research is to fully exploit the potentials of a set of crawling samplers by making the samplers adaptive and possibly interactive on a properly constructed graph domain, to transcend the current status-quo in the wide range of graph sampling tasks. Specifically, the project aims to: (i) build a theoretical framework to construct a suite of cost-efficient sampling policies by optimally balancing the tradeoff between the sample quality and quantity under challenged access environments with a given cost budget, (ii) design a class of adaptive random walks by fully exploiting the past information to achieve minimal temporal correlations over the obtained samples and by controlling the random walks collectively to enable maximal space exploration, and (iii) extend the standard MCMC toolkits toward faster and more cost-efficient exploration of feasible subgraphs/configurations and computing/optimization on a graph, along with extensive validations to create practical and usable solutions in reality. This research has a high potential impact on a vast range of multi-disciplinary applications, including sampling large-scale graphs for statistical inference and efficient estimation and randomized algorithms for combinatorial optimizations in various disciplines, where the standard MCMC methods have been dominant but also constrained our understanding. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Sylvia Spengler
Min Amd Letter Date
9/7/2019 - 5 years ago
Max Amd Letter Date
9/7/2019 - 5 years ago
ARRA Amount

Institutions

Name
Florida Institute of Technology
City
MELBOURNE
State
FL
Country
United States
Address
150 W UNIVERSITY BLVD
Postal Code
329016975
Phone Number
3216748000

Investigators

First Name
Chul-Ho
Last Name
Lee
Email Address
clee@fit.edu
Start Date
9/7/2019 12:00:00 AM

Program Element

Text
Info Integration & Informatics
Code
7364

Program Reference

Text
INFO INTEGRATION & INFORMATICS
Code
7364

Text
SMALL PROJECT
Code
7923

III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code

Text

Code