Research
I’m currently interested in interpretable NLP as applied to program synthesis language models.
Natural Language Annotations for Reasoning about Program Semantics
EMNLP (Findings) 2023We propose a dataset and protocol for annotating programs with natural language predicates at a finer granularity than code comments and without relying on internal compiler representations.
StarCoder: may the source be with you!
Transactions on Machine Learning Research (12/2023)The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.