Introduction

Neural Information Processing Systems (NeurIPS) is nowadays one of the foremost conferences on machine learning, with thousands of attendees and accepted contributions (papers, talks, tutorials, social events, panels, demonstrations, challenges and so on). It’s a wonderful event, powered by a loving and ever-growing community. The size of the event can also be somewhat overwhelming if you don’t carefully allocate your time if you happen (like I do) to have an even passing interest in more than one presented topic.

It’s hard to do justice to the whole conference in a few sentences. What we can notice from a great height is that machine learning has come of age, and new applications that act either as economic enablers or instruments of oppression appear on a daily basis. The NeurIPS community expanded to include many more voices than just the technical contributions, and that’s a very good thing.

Personally, what I’m most interested in falls roughly in two categories: AI as a social and economic enabler and the study of intelligence per se.

Data programming, weak supervision, learning on relational data, federated learning all fall in this bin; on one hand they let us make sense of complex, real-world signals, and hopefully make better decisions. On the other they promise to empower smaller entities than the usual tech titans, by re-distributing the economic returns of data ownership.
The study of intelligence for its own sake is as old as philosophy itself but we’re nowhere close to even formulating a well-posed question. However, careful cognitive experiments and generative, neuro-symbolic approaches like those advocated by Tenenbaum and collaborators are a source of great insight and better questions, which is ultimately what matters in science.

In what follows I collect some loose notes on my 2021 NeurIPS experience as an attendee and author, as well as all the links I could record on the fly. I could only attend a fraction of the live events, but fortunately all the talks are recorded for posterity.

December 6

Tutorial : Pay Attention to What You Need: Do Structural Priors Still Matter in the Age of Billion Parameter Models?

The recent empirical success and theoretical systematization of graph learning is making us reflect on the role of structure (relationships, grammar, ontologies, syntax) in inference. The field of AI is also coming full-circle (once again) and re-assessing “symbolic” approaches, after their premature demise at the hand of connectionist methods in the late ’90s. This tutorial touches on a few fundamentals such as the connections between group theory and differential geometry (by Seb Racaniere), as well as giving perspective on their role towards better domain-specific inference models for dealing with point clouds, relational models, graphs and so on.

Tutorial : Machine Learning and Statistics for Climate Science

slides

Unfortunately I couldn’t attend this workshop but it’s very dear to my heart so I intend to return to it at a later time.

December 7

Tutorial : Self-Supervised Learning: Self-Prediction and Contrastive Learning

slides

Self-supervision is a very interesting family of techniques for producing data labels via a generative process : the dataset is “augmented” by distorting the data instances in various ways, e.g. rotating or blurring images, or masking parts of a sentence in the case of text. Classifiers are then trained on this augmented data, which has shown to improve performance in a number of downstream tasks.

How we can produce data augmentations that are meaningful to the data and task at hand is one of the open research questions.

Joint Affinity poster session

This was the first poster session of the conference; I will only say that while GatherTown is a nice platform for simulating spatially-distributed conversations like those that would happen in an in-person conference, it’s too accurate as a replica. You waste minutes just making your avatar walk around the venue, which is sadly mostly empty. I think navigation and visual summarization could still be improved.

Keynote : How Duolingo Uses AI to Assess, Engage and Teach Better

Luis von Ahn is one of the founders of Duolingo, a mobile app for learning language. He showed how the app makes heavy use of A/B testing and analyzes systematically all learning sessions to produce a comfortable yet challenging learning experience with spaced repetition, well-placed nudges etc. They have an ML team that made fundamental advances in active learning and multi-armed bandits.

December 8

Social : BigScience

BigScience is a year-long collaborative experiment on large multilingual models and datasets, aka the “LHC of NLP”. I was not directly involved but find it a very meaningful initiative to activate academia and society towards understanding and improving natural language processing for everybody.

December 9

Keynote : Optimal Transport: Past, Present, and Future

Optimal transport (“what is the optimal way to transport a distribution into another one?”) is one of those topics that seemingly pops up everywhere from operations research to biology to high-dimensional learning. Alessio Figalli recently won a Fields medal on the subject, and he gave a good (albeit somewhat dry) overview on the history and application of the optimal transport problem.

December 11

Generative modeling

Rozen - Moser Flow: Divergence-based Generative Modeling on Manifolds

A recent breakthrough on a universal density approximator process for general manifolds (spheres, tori, etc.), demonastrated with examples from climate science.

December 13 - Workshops 1

New Frontiers in Federated Learning: Privacy, Fairness, Robustness, Personalization and Data Ownership

Pentland - Building a New Economy: Federated Learning and Beyond

Data ownership, individual and collective. Value of individual data points vs data in aggregate

Knowing flows (money, people) allows better planning

Community-owned data (real-time census) : employment, disease control, public transit, economic growth

NRECA : cooperatives (not companies or the state) built 56% of US electric grid

Atlas of opportunity

Differentiable programming

Automatic differentiation has become ubiquitous, many languages adopt it either as a library or a first-class primitive. As a result, complex physical models have become differentiable (in some sense), and the workshop showcased the effect of this on molecular dynamics, climate and weather prediction, etc.

December 14 - Workshops 2

Data-centric AI

Ratner, Re - The future of data-centric AI

I’ve been very intrigued by the Snorkel project and weak labeling/supervision in general since learning about them a few years back. Their approach (based on programmatic labeling functions) will likely prove popular and effective within the private sector and in particular among highly-regulated industries where data movement is either hard or impossible.

Advances in programming languages and neurosymbolic systems - AIPLANS’21

This workshop’s ambitious program was to bring together researchers from the AI and programming languages communities, and create a dialogue around the interplay of domain-specific languages and mechanical reasoning. The talks spanned formal logic, machine learning techniques, theorem proving and more, showing what’s possible when machines are made to collaborate with humans.

I submitted a short paper to AIPLANS ( Staged compilation of tensor expressions ), with an early experiment on applying meta-programming techniques to a tensor contraction DSL for Haskell.

Weiss - Thinking like Transformers

Gail introduced us to a programming language that reproduces the semantics of a transformer architecture. This comparison made everybody’s brain gears spin very quickly and questions arose about the expressivity of this language and a possible Chomsky hierarchy of deep language models, to which Gail promptly replied with an ACL’20 article she co-wrote :

A Formal Hierarchy of RNN Architectures

Tenenbaum - Building machines that learn and think like people by learning to write programs

Prof Tenenbaum is an established scientist and gifted speaker, and gave one of his inspiring talks on his group’s recent results. Namely, his research strives to replicate the flexibility of general intelligence (learning in babies, in fact) using generative models of the world (in this case mediated by programs).

Title
The child as hacker	Trends in CogSci
Learning Task-General Representations with Generative Neuro-Symbolic Modeling	ICLR’21
Learning to learn generative programs with Memoised Wake-Sleep	UAI’20
DreamCoder : Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
Leveraging Language to Learn Program Abstractions and Search Heuristics	ICML’21
Communicating Natural Programs to Humans and Machines
LARC dataset