New Directions

Machine learning can be broadly understood as the science of prediction. Recent algorithmic advances have greatly reduced the difficulty in training black box predictors on large amounts of data.

Predictions are not always sufficient. One requires means to understand and capture uncertainty. This can be achieved by:

  1. Requiring the black box machine learning model to output both a prediction and its uncertainty.
  2. Opening the black box to understand how the machine reached its conclusions.

I aim to tackle these issues head on.

Loss Functions for Uncertainty Estimation

Many machine learning methods proceed via the minimization of a loss function over a provided training set. Much work has been done on designing loss functions with good statistical/computational properties.

A natural extension is to augment the loss function, penalizing predictors that do not accurately report their uncertainty. This work aims to provide new loss functions for estimating uncertainty.

Buzzwords: Convex Optimization, Robust Statistics, Down with the Bootstrap.

Opening the Black box with Machine Teaching

Computers can extract patterns from very large data sets, in a fraction of the time it takes humans. For example, performing a regression with thousands of relevant features and millions of training examples takes seconds on my laptop!

Machine teaching provides means to understand these patterns. In much the same way a Professor distills hundreds (if not thousands!) of academic papers into an undergraduate curriculum, machine teaching provides a reduced training set that contains the same information as the larger corpus analyzed by the machine.

A great example is the notion of support vector. In this work, I aim to generalize this notion to other procedures/problems, with firm statistical/computational guarantees a key focus.

Buzzwords: Compression Bounds, Generalized Support Vectors, Clustering (with a purpose!)