Research

I have a broad interest in machine learning, optimization and programming languages.

Currently, I’m interested in interpretable NLP as applied to program synthesis language models.

Natural Language Annotations for Reasoning about Program Semantics

EMNLP (Findings) 2023

We propose a dataset and protocol for annotating programs with natural language predicates at a finer granularity than code comments and without relying on internal compiler representations.

StarCoder: may the source be with you!

Transactions on Machine Learning Research (12/2023)

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.