Finding the Core of an expression using Template Haskell and a custom GHC Core plugin
Introduction
GHC is a wonderful compiler platform for writing compilers and languages. In addition to Haskell offering convenient syntactic abstractions for creating domain-specific languages, the language itself and the internals of the compiler can be extended in many ways, which let users come up with mind-bending innovations in scientific computing, testing and code editing, among many other examples.
The compiler offers a plugin system that lets users customize various aspects of the syntax analysis, typechecking and compilation phases, without having to rebuild the compiler itself.
While writing a GHC plugin that lets the user analyze and transform the Core representation of certain Haskell expressions, I found myself in need of a specific bit of machinery: how can the user tell the compiler which expression to look for? Moreover, how to map the names of user-defined terms to the internal representation used by the compiler?
It turns out inspection-testing
provides this functionality as part of its user interface, and I will document it here both to consolidate its details in my head and so that others might learn from it in the future.
This post will also introduce concepts from both the ghc
and template-haskell
libraries as needed, so it should be useful to those who, like me, had zero experience in compiler internals until the other day.
Note on reproducibility : here I’m referring to GHC 9.0.1, some modules changed paths since GHC series 8. I’ve only omitted a few easy imports and definitions from base
, which you can fill in as an exercise ;)
So, let’s dive into the compiler !
Finding the Name
of declarations with template Haskell
A template-haskell Name
represents .. the name of declarations, expressions etc. in the syntax tree.
Resolving a top-level declaration into its Name
requires a little bit of metaprogramming, enabled by the {-# LANGUAGE TemplateHaskell #-}
extension. With that, we can use the special syntax with a single or double quote to refer to values or types respectively (made famous by lens
in makeLenses
‘‘Foo).
Passing Name
s to later stages of the compilation pipeline
This is half of the trick: the Name
we just found (and any other metadata that might be interesting to our plugin), is packed and serialized into a GHC ANNotation via liftData
, which is inserted as a new top-level declaration by a template Haskell action (i.e. a function that returns in the Q
monad).
Annotations can also be attached by the user to declarations, types and modules, but this method does so programmatically.
The resulting function has a type signature similar to this : Name -> Q [Dec]
, i.e. given a Name
it will produce a list of new declarations Dec
at compile time.
If we’re only interested in attaching a Name
to the annotation, we just need :
Picking out our annotation from within the plugin
The other half of the trick takes place within the plugin, so we’ll need to import a bunch of modules from ghc
-the-library :
First, we need a function that looks up all the annotations from the module internals (aptly named ModGuts
in ghc) and attempts to decode them via their Data interface. Here we are using a custom Target
type defined above, which could carry additional metadata.
Next, we need to map template-haskell
names to the internal GHC namespace, thNameToGhcName
to the rescue. If the name can be resolved, lookupTHName
will return the corresponding Core Expr
ession (i.e. the abstract syntax tree corresponding to the name we picked in the beginning).
A custom GHC Core plugin
As noted above, a GHC plugin can customize many aspects of the compilation process. Here we are interested in the compiler phase that produces Core IR, so we’ll only have to modify the installCoreToDos
field of the defaultPlugin
value provided by ghc
by providing our own version :
As a minimal example, let’s pretty-print the Core expression corresponding to the Name
we just found:
All that’s left is to package printCore
into our custom implementation of installCoreToDos
:
Here it’s important to stress that install
appends our plugin pass to the ones received as input from the upstream compilation pipeline.
Another crucial detail : the name string of the plugin as specified in CoreDoPluginPass
must be the full module name where the plugin
value is declared.
Trying out our plugin
A GHC plugin can be imported as any other Haskell library in the build-depends
section of the Cabal file. While developing a plugin, one should ensure that the test hs-srcs-dirs
directory is distinct from that under which the plugin source is defined, so as not to form an import loop.
With this, we can declare a minimal module that imports the TH helper inspect
and the plugin as well. Important to note that MyPlugin
in the -fplugin
option is the name of the Cabal module in which GHC will look for the plugin :: Plugin
value (the entry point to our plugin).
The output of our custom compiler pass will be interleaved with the rest of the GHC output as part of a clean recompile (e.g. stack clean && stack build
):
If you squint a bit, you can still see the structure of the original expression \x y = sqrt x + y
, enriched with additional annotations.
For example:
-
the
Dmd=<...>
parts are strictness/demand annotations computed for each variable -
the
Double -> Double -> Double
expression has been made strict (thecase
branch expressions), and its parameters have been “unboxed” (D#
stands for “unboxed double”, i.e. containing the value itself, not a pointer to it) -
correspondingly, both the addition and square root operators have been specialized to those operating on unboxed doubles.
That’s it! We’ve customized the compiler (without breaking it !), how cool is that?
Credits
First and foremost a big shout out to all the contributors to GHC who have built a truly remarkable piece of technology we can all enjoy, learn from and be productive with.
Of course, Joachim Breitner for inspection-testing
[1]. I still remember the distinct feeling of my brain expanding against my skull after seeing his presentation on this at Zurihac 2017.
Next, I’d like to thank the good folks in the Haskell community for kindly giving exhaustive answers to my questions on r/haskell, the ZuriHac discord server and stackoverflow. Li-yao Xia, Matthew Pickering, Daniel Diaz, David Feuer and others. Matthew has also written a paper [2] and curated a list of references on GHC plugins , which I refer to extensively.
Mark Karpov deserves lots of credit too for writing an excellent reference on template Haskell [3] with lots of worked examples, go and check it out.
References
1) J. Breitner, inspection-testing
2) M. Pickering et al, Working with source plugins
3) M. Karpov, Template Haskell tutorial