Research

I’m currently interested in interpretable NLP as applied to program synthesis language models.

Natural Language Annotations for Reasoning about Program Semantics

EMNLP (Findings) 2023

We propose a dataset and protocol for annotating programs with natural language predicates at a finer granularity than code comments and without relying on internal compiler representations.

StarCoder: may the source be with you!

Transactions on Machine Learning Research (12/2023)

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.