New frameworks for learning

It struck me the other day that often we get ideas, many ridiculously futile, but some brilliant, and it’s worth having a place to discuss these ideas and also thoughts’ on other ideas. So, I’ll attempt to discuss ideas in machine learning that I find interesting.

I’ll bootstrap the first set of posts by discussing papers that suggest new frameworks for learning, which in some sense, is the most exciting work. The search for solutions to known problems can be immensely challenging and rewarding, but the payoff can be even greater if one can actually identifying problems that are well-motivated, reasonable to solve, and that sufficiently change the way we think about previous problems.

Side note: I may have not provided a link to the first paper that introduces a framework, but at least, the framework is still relatively new, and I provide an example that sufficiently introduces the framework. Enough of that; here goes:

Andreas Maurer has a relatively recent paper on transfer bounds for linear feature learning. The framework has the flavor of learning to learn; the goal is very exciting: establishing learning theoretic bounds for this setting.

Here’s the setup. Suppose we play a game. You are given a sequence of problems. Each problem is composed of a set of samples (x_i, y_i), and across all problems the samples live in the same space. Each problem has an unknown corresponding distribution from which its samples where drawn.

I first draw a problem by drawing a distribution \mu from a distribution \rho over problem distributions (that is, we have a random measure \rho from which we sample measures \mu). Then, for the drawn problem \mu, I draw m iid samples (x_i,y_i) \sim \mu^m. I do this, say, k times for a total of k problem instances.

At test time, I draw a test problem: a new realization of a problem, obtained by drawing a problem along with m training instances. You deliver an estimator f: \mathcal{X} \rightarrow \mathcal{Y} that hopefully benefitted from the previous training problems, and I compute the loss of f on unseen test data for the test problem. What guarantees can you make on the generalization error on test problems, given what you learned about learning on the training problems?

To summarize and be a bit more precise about this particular paper, the goal here is to use a collection of training problems to find an embedding of the input data such that, when given a new problem, simple estimators can be guaranteed to work well according to some criteria (low test error).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: