Research Topics

Research Summary: Our research interests are in developing the principles and practice of adaptive and robust machine learning. Some recent highlights include (i) scalable, distributed, and fault-tolerant machine learning, and (ii) metric elicitation; selecting more effective machine learning metrics via human interaction. Our applied research is primarily in cognitive neuroimaging and biomedical imaging. Some recent highlights include (i) generative models for biological images, (ii) estimation and analysis of time-varying brain graphs.

Probabilistic Graphical Models for Spatio-temporal Data

Spatio-temporal data are ubiquitous in science and engineering applications. We are pursuing a variety of techniques for modeling such datasets, mainly using probabilistic graphical models and other graph-based analyses. We primarily use these tools to enable the scientific analysis of, and predictive modeling from brain networks. Of particular interest are novel methods that address issues of confounding and multimodality.

Partially linear additive Gaussian graphical models
Sinong Geng, Minhao Yan, Mladen Kolar, and Sanmi Koyejo.
International Conference on Machine Learning (ICML) pages 2180-2190, 2019
Bayesian structure learning for dynamic brain connectivity
Michael Riis Andersen, Lars Kai Hansen, Ole Winther, Russell A. Poldrack, and Oluwasanmi Koyejo.
International conference on Artificial Intelligence and Statistics (AISTATS), 2018
The Dynamics of Functional Brain Networks: Integrated Network States during Cognitive Task Performance
J.M. Shine, P.G. Bissett, P.T. Bell, O. Koyejo, J.H. Balsters, K.J. Gorgolewski, C.A. Moodie and R. A. Poldrack
Neuron (2016)
[url] [arXiv] [code]
Temporal metastates are associated with differential patterns of time-resolved connectivity, network topology, and attention
James M. Shine, Oluwasanmi Koyejo, and Russell A. Poldrack
Proceedings of the National Academy of Sciences (2016): 201604898.
[url] [arXiv] [code]

Distributed and Robust Machine Learning

Distributed data-centers and devices such as smart cars, smartphones, wearable devices, and smart sensors increasingly collect massive and diverse data. To this end, there is a growing interest in training machine learning models jointly across data centers without explicitly sharing data. Along similar lines, there is a trend towards on-device training of machine-learning models jointly across edge devices. However, despite some obvious benefits, distributed training (and federated learning) creates new challenges for private and secure machine learning, as distributed devices are more susceptible to new forms of privacy and security attacks. We are developing novel algorithmic and computational approaches to ensure the privacy and security of federated and distributed machine learning.

Zeno: Robust Asynchronous SGD with an Arbitrary Number of Byzantine Workers
Cong Xie, Sanmi Koyejo, and Indranil Gupta.
Asynchronous Federated Optimization
Cong Xie, Sanmi Koyejo, and Indranil Gupta.
Fall of empires: Breaking byzantine-tolerant SGD by inner product manipulation
Cong Xie, Sanmi Koyejo, and Indranil Gupta.
Conference on Uncertainty in Artificial Intelligence (UAI), 2019
Practical distributed learning: Secure machine learning with communication-efficient local updates
Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)

Metric Elicitation

Given a learning problem with real-world tradeoffs, which metric (equiv. cost function, loss function) should the model be trained to optimize? Selecting a suitable metric for real-world applications of machine learning remains an open problem, as default metrics such as classification accuracy often do not capture tradeoffs relevant to the downstream decision-making. Unfortunately, there is limited formal guidance in the machine learning literature on how to select appropriate metrics. We are developing formal interactive strategies by which a practitioner may discover which metric to optimize, such that it recovers user or expert preferences.

Multiclass Performance Metric Elicitation
Gaurush Hiranandani, Shant Boodaghians, Ruta Mehta, and Oluwasanmi Koyejo.
Neural Information Processing Systems (NeurIPS), 2019
Performance metric elicitation from pairwise classifier comparisons
Gaurush Hiranandani, Shant Boodaghians, Ruta Mehta, and Oluwasanmi Koyejo.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019

Generative Models for Biological Images

Data in scientific and commercial disciplines are increasingly characterized by high dimensions and relatively few samples. For such cases, apriori knowledge gleaned from experts, and experimental evidence is invaluable for recovering meaningful models. Generative models are ideal for such knowledge-driven low data settings. We are developing a variety of generative models for biological imaging data and exploring novel applications of these models. We are also developing novel variational inference techniques that lead to scalable and accurate inference, particularly for high-dimensional structured problems.

Synthetic Power Analyses: Empirical Evaluation and Application to Cognitive Neuroimaging
Peiye Zhuang, Bliss Chapman, Ran Li, and Sanmi Koyejo
Asilomar Conference on Signals, Systems, and Computers (Asilomar), 2019
FMRI data augmentation via synthesis
Peiye Zhuang Alexander Schwing and Oluwasanmi Koyejo
International Symposium on Biomedical Imaging (ISBI), 2019
Max-sliced wasserstein distance and its use for GANs
Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, and Alexander Schwing.
Conference on Computer Vision and Pattern Recognition (CVPR), 2019
On prior distributions and approximate inference for structured variables
Oluwasanmi Koyejo, Rajiv Khanna, Joydeep Ghosh, and Russell A Poldrack
Advances in Neural Information Processing Systems (NIPS) 27, 2014

Learning with Complex Metrics

Real-world machine learning often requires complex evaluation metrics, many of which are non-decomposable, e.g., AUC, F-measure – in contrast to decomposable metrics such as accuracy which can be computed as an empirical average. Indeed, non-decomposability is the primary source of difficulty for the design of efficient algorithms that can optimize complex metrics. We study predictive methods from first principles and derive novel efficient and statistically consistent algorithms that result in improved empirical performance.

On the Consistency of Top-k Surrogate Losses
Forest Yang and Sanmi Koyejo.
Binary Classification with Karmic, Threshold-Quasi-Concave Metrics
Bowei Yan, Oluwasanmi Koyejo, Kai Zhong and Pradeep Ravikumar.
International Conference on Machine Learning (ICML), 2018
Consistency Analysis for Binary Classification Revisited
Krzysztof Dembczynski, Wojciech Kotlowski, Oluwasanmi Koyejo, and Nagarajan Natarajan.
International Conference on Machine Learning (ICML), 2017
Consistent binary classification with generalized performance metrics
Oluwasanmi Koyejo*, Nagarajan Natarajan*, Pradeep Ravikumar, and Inderjit Dhillon
Advances in Neural Information Processing Systems (NIPS) 27, 2014 (Spotlight)

Learning with Aggregated Data

Existing work in spatiotemporal data analysis often assumes that data are available as individual measurements. However, for many applications in econometrics, financial forecasting, and healthcare, data is often only available as aggregates. Data aggregation presents severe mathematical challenges to learning and inference, and a naive application of standard techniques is susceptible to the ecological fallacy. We have shown that in some cases, the aggregation has only a mild effect on model estimates. For other cases, we are developing a variety of tools that enable provably accurate predictive modeling with aggregated data while avoiding unnecessary and error-prone data reconstruction.

Frequency Domain Predictive Modeling with Aggregated Data
Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo
Proceedings of the 20th International conference on Artificial Intelligence and Statistics (AISTATS), 2017
Sparse parameter recovery from aggregated data
Avradeep Bhowmik, Joydeep Ghosh, and Oluwasanmi Koyejo
International Conference on Machine Learning (ICML), 2016

Funding: We graciously acknowledge generous funding support from the National Science Foundation, National Institutes of Health, Google AI, DARPA, Jump ARCHES, CCBGM, Onmilife, and the Mayo Clinic & Illinois Alliance. Our research is also supported by generous computing support from Microsoft Azure, Intel AI, Amazon Web Services, Google Cloud, and NCSA Bluewaters. We have also received funding from the Olga G. Nalbandov Lecture Fund to support our outreach efforts.