Introducing GHC whole program compiler (GHC-WPC)

I'd like to introduce the GHC whole program compiler or GHC-WPC for short. GHC-WPC is an extended GHC that exports the STG IR for modules and linking metadata during the regular compilation process. From the cabal/stack tooling side it is a regular GHC that compiles modules and links executables with its standard incremental compilation pipeline. But beside the normal compilation process it also leaves enough information on the disk for external tools to build the same application in an alternative way.

I call the exported STG IR external STG or Ext-STG and it has its own tooling that lives outside of the GHC codebase. This is a deliberate decision that allows keeping the regular haskell development methods, i.e. to use any library from Hackage or to rely on stack. If you are not a GHC developer you might be surprised, but hacking on GHC poses quite a few inconvenient restrictions, i.e. you have to rely on a tiny set of foundation libraries. I believe these restrictions might kill experiments or at least kill the passion of exploring and implementing ideas in flow.

The root repository of the project is GHC-whole-program-compiler-project that contains the external stg tooling and the slightly modified GHC with a modified cabal library as a submodule. The modification is minimal.
GHC is modified to write STG IR into .o_stgbin files at module compilation time, and at link time it writes the project linker and dependency information into .ghc_stgapp files.
The Cabal library is modified to install .o_stgbin and .ghc_stgapp files along with .hi and .a binary files.

The external stg tooling consists of the gen-exe, gen-obj and ext-stg CLI tools. Ext-stg is a developer tool to pretty print Ext-STG IR. The main component is gen-exe that can compile a working executable from .ghc_stgapp files. It is the compiler driver that collects the application STG IR modules including its dependencies from haskell packages and C library dependencies.
So gen-exe has access to the whole program STG IR and it does a simple live function analysis on it. Then it passes the STG IRs and the liveness analysis results to its gen-obj workers to generate object code for each STG module.

It is well known that GHC generates large binaries. It can be mitigated with GHC’s `-split-sections` option, but the whole program dead code elimination step reduces the binary size even further. E.g. the size of the idris compiler compiled with GHC and GHC-WPC is:

All in all the GHC based whole program compiler pipeline works and you can try it out. It is based on GHC 8.11.0 so due to base library API changes not all programs will compile, but I put together a repository of working sample programs (idris, agda, pandoc). It is stack based and should work out of the box on linux 64 bit machines. Just clone the ghc-wpc-sample-projects repository and follow the readme in the repository.

GHC-WPC and Ext-STG can be useful in many projects, basically any GHC backend/multiplatform related projects like GHCJS, Asterius, or GRIN. And I believe it could be useful for research compiler/IR projects that need access to large real world programs as input data, like Gibbon.

External STG originally lived in the GHC-GRIN repository, but I decided to split the GRIN Compiler project into two parts: 1) GHC related and 2) experimental GRIN optimizer

The goal of the GHC related part is to improve the GHC Haskell ecosystem in a way that is beneficial for other projects also. GHC-WPC belongs to this category. The real benefit of this approach is that any small improvement will be immediately accessible for the whole haskell community.

I also plan to implement an Ext-STG interpreter that could run any Haskell program. I will allow you to observe the runtime behaviour of haskell applications, so it will be easy to trace execution paths or track the used resources. I prefer to implement debuggers and profilers over an interpreter at first because it is much simpler to develop. It is better to get a pattern match error than a segfault.

But before the STG interpreter I'll start a UI and tooling related project. Its name is Haskell Code Spot. It will help to spot the misbehaving parts of Haskell applications. Basically it will be a data visualization tool (based on web technology), that presents the raw data from RTS eventlog and GHC static analyses in a convenient and intuitive way. I'm really looking forward to working on this because this time the focus will be on creative coding and UI and not the complex details of GHC codebase and compiler engineering.