Peter Carbonetto, Ph.D.
Research Assistant Professor
Dept. of Human Genetics
University of Chicago

pcarbo -at- uchicago -dot- edu

I'm a Research Assistant Professor at the University of Chicago. Previously, I was a Staff Scientist at Ancestry; a postdoc and HFSP fellow working with Matthew Stephens and Abraham Palmer in the Dept. of Human Genetics at the University of Chicago; a graduate student with Nando de Freitas in the Laboratory for Computational Intelligence at UBC. My work is focused on developing new quantitative approaches to advance the study of global genetic variation in health and disease.

Peter

Mailing address:
CLSC 414 / 920 E 58th St / Chicago, IL 60637

GitHub | Research Network Profile | Google Scholar | LinkedIn

Publications

Eric Weine, Peter Carbonetto and Matthew Stephens. Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca. Bioinformatics, volume 40, August 2024. R package

Yusha Liu, Peter Carbonetto, Michihiro Takahama, Adam Gruenbaum, Dongyue Xie, Nicolas Chevrier and Matthew Stephens. A flexible model for correlated count data, with application to multicondition differential expression analyses of single-cell RNA sequencing data. Annals of Applied Statistics, volume 18, September 2024. R package

Youngseok Kim, Wei Wang, Peter Carbonetto and Matthew Stephens. A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression. Journal of Machine Learning Research, volume 25, 2024. R package

Michihiro Takahama, Ashwini Patil, Gabriella Richey, Denis Cipurko, Katherine Johnson, Peter Carbonetto, Madison Plaster, Surya Pandey, Katerina Cheronis, Tatsuki Ueda, Adam Gruenbaum, Tadafumi Kawamoto, Matthew Stephens and Nicolas Chevrier. A pairwise cytokine code explains the organism-wide response to sepsis. Nature Immunology, volume 25, January 2024.

Peter Carbonetto, Kaixuan Luo, Abhishek Sarkar, Anthony Hung, Karl Tayeb, Sebastian Pott and Matthew Stephens. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biology, volume 24, October 2023. R package

Fabio Morgante, Peter Carbonetto, Gao Wang, Yuxin Zou, Abhishek Sarkar and Matthew Stephens. A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes. PLoS Genetics, volume 19, July 2023. R package | code & data

Yuxin Zou, Peter Carbonetto, Gao Wang and Matthew Stephens. Fine-mapping from summary data with the "Sum of Single Effects" model. PLoS Genetics, volume 18, July 2022. R package | code & data

Selene M. Clay, Nathan Schoettler, Andrew M. Goldstein, Peter Carbonetto, Matthew Dapas, Matthew C. Altman, Mario G. Rosasco, James E. Gern, Daniel J. Jackson, Hae Kyung Im, Matthew Stephens, Dan L. Nicolae and Carole Ober. Fine‑mapping studies distinguish geneti risks for childhood‑ and adult‑onset asthma in the HLA region. Genomic Medicine 14:55, May 2022.

Zhengrong Xing, Peter Carbonetto and Matthew Stephens. Flexible signal denoising via flexible empirical Bayes shrinkage. Journal of Machine Learning Research 22(93), pages 1-28, June 2021. R package | code & data

Gao Wang, Abhishek Sarkar, Peter Carbonetto and Matthew Stephens. A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society, Series B volume 82, pages 1273-1300, July 2020. R package | code & data

Youngseok Kim, Peter Carbonetto, Matthew Stephens and Mihai Anitescu. A fast algorithm for maximum likelihood estimation of mixture proportions using sequential quadratic programming. Journal of Computational and Graphical Statistics volume 29, pages 261-273, 2020. R package | code & data

John D. Blischak, Peter Carbonetto and Matthew Stephens. Creating and sharing reproducible research code the workflowr way. F1000Research 8:1749, 2019. R package

Luìs Felipe Ventorim Ferrão, Romário Gava Ferrão, Maria Amélia Gava Ferrão, Aymbiré Fonseca, Peter Carbonetto, Matthew Stephens and Antonio Augusto Franco Garcia. Accurate genomic prediction of Coffea canephora in multiple environments using whole-genome statistical models. Heredity volume 122, pages 261-275, March 2019.

Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics volume 51, pages 187-195, January 2019. R package | code & data

Ana I. Hernandez Cordero, Peter Carbonetto, Gioia Riboni Verri, Jennifer Gregory, David Vandenbergh, Joe Gyekis, David Blizard and Arimantas Lionikas. Replication and discovery of musculoskeletal QTLs in LG/J and SM/J advanced intercross lines. Physiological Reports 6: e13561, February 2018.

Peter Carbonetto, Xiang Zhou and Matthew Stephens. varbvs: fast variable selection for large-scale regression. arXiv:1709.06597. code

Eunjung Han*, Peter Carbonetto*, Ross Curtis, Yong Wang, Julie Granka, Jake Byrnes, Keith Noto, Amir Kermany, Natalie Myres, Mathew Barber, Kristin Rand, Shiya Song, Theodore Roman, Erin Battat, Eyal Elyashiv, Harendra Guturu, Eurie Hong, Kenneth Chahine and Catherine Ball. Clustering of 770 thousand genomes reveals post-colonial population structure of North America. Nature Communications 8: 14238, February 2017. (* indicates shared first authorship)

Laura Sittig, Peter Carbonetto, Kyle Engel, Kathleen Krauss, Camila Barrios-Camacho and Abraham Palmer. Genetic background limits generalizability of genotype-phenotype relationships. Neuron volume 91, pages 1253-1259, September 2016. perspective | code & data

Clarissa Parker*, Shyam Gopalakrishnan*, Peter Carbonetto*, Natalia Gonzales, Emily Leung, Yeonhee Park, Emmanuel Aryee, Joe Davis, David Blizard, Cheryl Ackert-Bicknell, Arimantas Lionikas, Jonathan Pritchard and Abraham Palmer. Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice. Nature Genetics volume 48, pages 919-926, August 2016. (* indicates shared first authorship) code | data

Laura Sittig, Peter Carbonetto, Kyle Engel, Kate Krauss and Abraham Palmer. Integration of genome-wide association and extant brain expression QTL identifies candidate genes influencing prepulse inhibition in inbred F1 mice. Genes, Brain and Behavior, volume 15, pages 260-270, February 2016. code and data

Luisa Pallares, Peter Carbonetto, Shyam Gopalakrishnan, Clarissa Parker, Cheryl Ackert-Bicknell, Abraham Palmer and Diethard Tautz. Mapping of craniofacial traits in outbred mice identifies major developmental genes involved in shape determination. PLoS Genetics, volume 11, November 2015. code and data

Clarissa Parker*, Peter Carbonetto*, Greta Sokoloff, Yeonhee Park, Mark Abney and Abraham Palmer. High-resolution genetic mapping of complex traits from a combined analysis of F2 and advanced intercross mice. Genetics, volume 198, pages 103-116, September 2014. (* indicates shared first authorship) code

Peter Carbonetto, Riyan Cheng, Joseph Gyekis, Clarissa Parker, David Blizard, Abraham Palmer and Arimantas Lionikas. Discovery and refinement of muscle weight QTLs in B6 x D2 advanced intercross mice. Physiological Genomics, volume 46, pages 571-582, August 2014. code

Peter Carbonetto and Matthew Stephens. Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease. PLoS Genetics, volume 9, October 2013. Pubmed | HFSP article | code

Xiang Zhou, Peter Carbonetto and Matthew Stephens. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics, volume 9, February 2013.

Peter Carbonetto and Matthew Stephens. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, volume 7, March 2012, pages 73-108. code

Matthew Hoffman, Peter Carbonetto, Nando de Freitas and Arnaud Doucet. Inference strategies for solving SMDPs. NIPS Workshop on Probabilistic Approaches for Robotics and Control, December 2009.

Peter Carbonetto, Matthew King and Firas Hamze. A stochastic approximation method for inference in probabilistic graphical models. Neural Information Processing Systems 23, December 2009.

Peter Carbonetto, Mark Schmidt and Nando de Freitas. An interior-point stochastic approximation method and an L1-regularized delta rule. Neural Information Processing Systems 22, December 2008. (Note: the proof of asymptotic convergence that was originally published as an appendix in the original paper has a major flaw; the convergence proof remains an open question.) slides | code

Peter Carbonetto, Gyuri Dorkò, Cordelia Schmid, Hendrik Kück and and Nando de Freitas. Learning to recognize objects with little supervision. International Journal of Computer Vision, volume 77, May 2008, pages 219-237.

Peter Carbonetto and Nando de Freitas. Conditional mean field. Neural Information Processing Systems 19, December 2006, pages 201-208.

Peter Carbonetto, Jacek Kisynski, Nando de Freitas and David Poole. Nonparametric Bayesian Logic. 21st Conference on Uncertainty in Artificial Intelligence, July 2005, pages 85-93. This revision corrects a mistake in Fig. 5.

Peter Carbonetto, Gyuri Dorkò and Cordelia Schmid. Bayesian learning for weakly supervised object classification. Technical Report, INRIA Rhône-Alpes, July 2004.

Peter Carbonetto, Nando de Freitas and Kobus Barnard. A Statistical Model for General Contextual Object Recognition. 8th European Conference on Computer Vision, May 2004, part I, pages 350-362.¹

Hendrik Kück, Peter Carbonetto and Nando de Freitas. A Constrained Semi-Supervised Learning Approach to Data Association. 8th European Conference on Computer Vision, May 2004, part III, pages 1-12.¹

Peter Carbonetto and Nando de Freitas. Why can't José read? The problem of learning semantic associations in a robot environment. Human Language Technology Conference Workshop on Learning Word Meaning from Non-Linguistic Data, June 2003.

Peter Carbonetto, Nando de Freitas, Paul Gustafson and Natalie Thompson. Bayesian feature weighting for unsupervised learning, with application to object recognition. Workshop on Artificial Intelligence and Statistics, January 2003.

Patents

Eunjung Han, Ross E. Curtis and Peter Carbonetto. Discovering population structure from patterns of identity-by-descent, April 10, 2018, U.S. Patent 9,940,433.

Theses

New probabilistic inference algorithms that harness the strengths of variational and Monte Carlo methods. Ph.D. thesis, University of British Columbia, August 2009.

Unsupervised Statistical Models for General Object Recognition. Masters thesis, University of British Columbia, August 2003.

Code

Admixture. A simple EM implementation of the ADMIXTURE model in R, plus extensions.

Variational inference for Bayesian variable selection in MATLAB and R. Companion code to my Bayesian Analysis (2012) paper. Includes routines for computing variational estimates of posterior statistics, and demonstrates how to run the full variational inference procedure for Bayesian variable selection in linear and logistic regression.

MATLAB code for on-line L1 regularization. Companion code to my research paper appearing at the 2008 NIPS conference (see below for data). Includes MATLAB functions for learning linear regressors and classifiers subject to L1 regularization, which acts as a form of feature selection. The linear regression is also known in the statistics community as the LASSO. The software package includes implementations of both batch learning and on-line learning, when the model parameters are rapidly adjusted at each iteration by looking at only a single training example. This software is licensed under the CC-GNU GPL version 2.0 or later.

Semi-supervised classification using a Bayesian kernel machine and data association constraints. Matlab implementation of the MCMC algorithms for simulating the Bayesian data association models described in the ECCV 2004 paper and the INRIA tech report (the data association model with hard group constraints), and Learning to classify individuals based on group statistics by Kuck and de Freitas (data association with group statistics). For a much more stable implementation in C, go here.

Gaussian belief propagation. Matlab code for running belief propagation on Gaussian Markov random fields.

Image Translation. Matlab package for generic object recognition using statistical translation models. See my Masters thesis for more information.

Feature Weighting using Shrinkage Priors. Matlab code for running EM on a mixture of Gaussians with Bayesian feature weighting priors. Used for the paper Bayesian feature weighting for unsupervised learning.

Multiple dispatch. An implementation of multiple dispatch in Java using the ELIDE framework. See here for the project report.

Data

TREC2005. Spam filtering data in MATLAB format. Used to evaluate my on-line logistic regression learning algorithm in the paper An interior-point stochastic approximation method and an L1-regularized delta rule. This data set was originally created by Gordon Cormack and Thomas Lynam as part of the 2005 TREC Spam Filter Evaluation Tool Kit, and contains data from 92,189 emails. The open source software SpamBayes was used to extract features from the emails. By downloading and using this data, you accept the terms of agreement for use of the 2005 TREC public spam corpus.

Corel. Object recognition data used for my Masters thesis and the paper A Statistical Model for General Contextual Object Recognition. Contains manual segmentations for evaluation and extracted featres. The Image Translation package contains code for reading the data into Matlab.

Robomedia. Object recognition data used for the Why can't José read? paper. Contains manual segmentations for evaluation and extracted featres. The Image Translation package contains code for reading the data into Matlab.

Face detection. Training data for robust object detection using the AdaBoost algorithm, as formalized by Viola and Jones. Includes Matlab code for reading the data. The project report is available here.

Other work

MATLAB interface for PARDISO. PARDISO is a publicly available software library for solving large, sparse linear systems. It is particularly useful as a subroutine for interior-point methods. I designed a small interface so that the PARDISO solver is easily incorporated into your MATLAB programs.

MATLAB class for limited-memory BFGS. This little MATLAB class I wrote encapsulates all the functionality of limited-memory quasi-Newton methods. It is particularly well-suited for solving constrained optimization problems; I illustrate how it it is used within a primal-dual interior-point method for solving a constrained optimization problem that arises in maximum likelihood estimation. See here for more details on installing and using this software.

Intuition behind primal-dual interior-point methods for linear and quadratic programming. I'm quite aware of the fact that there are probably a hundred textbooks published every year that contain an introduction to linear programming, and there are many introductory presentations on interior-point methods. But I find they are all lacking in providing the key intuition. So I've written a short 7-page document which I'm confident fills a tiny bit of the void.

MATLAB code for solving constrained, convex programs. I wrote a simple, easy-to-use MATLAB function for minimizing a convex objective subject to convex inequality constraints. It uses a primal-dual interior-point method with a suitable merit function for ensuring global convergence (which is useful when it is not desirable to compute the Newton step using the full Hessian of the objective).

MATLAB code for second-order cone programming. I also implemented a simple primal-dual interior-point method in MATLAB for solving second-order cone programs. At each iteration, the solver follows the Newton search direction and makes sure that the iterates remain feasible (they satisfy all the inequality constraints).

MATLAB interface for IPOPT. IPOPT is a fantastic, new open source software package written in C++ for solving optimization problems with nonlinear objectives and subject to nonlinear constraints. IPOPT is short for Interior Point Optimizer. I've developed an interface so that IPOPT can be easily called from the MATLAB programming environment. You can download the current version of IPOPT from the project website.

Notes on probabilistic decoding of parity check matrices. A review of the basic concepts behind low-density parity check codes, and how to come up with a simple and reasonable method for probabilistic decoding. Assumes some familiarity with some ideas in statistical machine learning concepts and optimization.

A MATLAB interface for L-BFGS-B, a solver for bound-constrained nonlinear optimization problems that uses quasi-Newton updates with a limited-memory approximation to the Hessian.

A non-rigorous derivation of a variational upper bound on the log-partition function in eight parts. This is a brief exposé of Martin Wainwright's derivation of a convex alternative to generalized belief propagation (resulting in the so-called tree-reweighted belief propagation algorithm). The intent is to present the main mathematical steps in the derivation while keeping the presentation as "light" as possible.

Installing IPOPT on Mac OS X. Some of my experiences.

Creating, compiling and linking MATLAB executables (MEX files). A tutorial.

A Lesson in measure theory and change of variables. A technical note illustrating and explaining the subtleties in deriving a correct kernel for the snooker move used in population Monte Carlo.

Project webpage for Learning to recognize objects with little supervision.

How to partition and format an external hard drive for Mac OS X.

Peter Carbonetto, Ph.D. Research Assistant Professor Dept. of Human Genetics University of Chicago