This time I'd like to tell the story why and how I learned to build web UI based compiler tools. More specifically I'll talk about the haskell code spot project which is a runtime profile data visualizer for Haskell.
GHC eventlog was the other technology that I discovered in 2017. I have to admit that I almost never profile my haskell programs which is a bad thing. It is just too complicated to recompile the whole project and GHC's profiling and debug tooling is just ridiculously basic and rough. In fact I believe that it is one of the main blockers of Haskell's industrial adoption when it comes to software maintenance.
Eventlog is a subsystem of the GHC run-time system that can emit a binary log of various runtime system events (profiling, thread scheduling, GC work, memory allocation, etc). It is the next generation of the GHC profile output format. It is a backward and forward compatible binary format and it encodes more information than the still dominant plain text table formats that show the memory allocation and cost centre cpu usage.
While the plain text based (.hp and .prof) formats are human readable, they contain only the fraction or just a projection of the run-time collected statistics. In short, they are lossy formats and the .prof format was designed with human consumption in mind. In contrast, eventlog is intentionally just a container format that is rich with information intended for use by various third party tools.
With all this knowledge about eventlog and webtech I imagined a tool that can visualize the available information intuitively. This goal will be accomplished with an iterative approach to development.
The first thing that I wanted to have is to show the profiling data directly on the source code. From the UI technology side this requires libraries to show syntax highlighted haskell source code and charting libraries for data visualization. Luckily web technology already provides an easy to use solution for these requirements. In fact there are too many libraries that I can choose from, and it needs quite a bit of work to check the most popular and robust ones to form an opinion and make a decision about the final technology stack.
I did some research by reading many blog posts all titled something like: 'top 10 JS charting libraries' and did the same for source code syntax highlight/editor JS libraries. I really enjoyed the abundance of the reviews and forum comments. It helped me to narrow down the selection of libraries. However, I did not want to settle on a single library without trying all the promising ones. So in the end I used D3, C3 and Plotly.js charting libraries and CodeMirror to show source code in haskell code spot.
D3 is the most popular low level (customizable) charting library that is used in data journalism. There are excellent written and video tutorials and courses on the web for it. I learned D3 from a 13 hours long video course. It took a few days to accomplish but it was fun. C3 and Plotly.js are high level charting libraries with a simple API but under the hood both use D3 as a drawing backend. On one hand they are just slightly customizable but on the other hand they require only a few lines of code to get a cool looking interactive chart. So C3 and Plotly.js are excellent for prototyping.
For the source code view component I also wanted something lightweight and simple. I checked the ACE editor and the monaco-editor component that VS Code also uses. CodeMirror has the simplest API from all of these and I only needed the source view features. The other libraries are designed to be feature rich editors primarily, so they would be too heavyweight for a simple syntax highlighter use-case.
Regarding CSS I had only basic knowledge, luckily I've found an excellent youtube channel teaching mainly CSS and HTML. I managed to learn the most important topics like css units, selectors, flexbox, grid and some other random stuff from youtube videos. It gave me enough confidence to get started with my project with the following technology stack: Svelte, CSS, HTML, D3, CodeMirror.
You might ask why I don't use Elm, PureScript or GHCJS for web programming. The short answer is that 'copy-paste programming' would not work with them. The web is full of useful code samples for everyday webdev problems. This helps to lower the entry level for beginners to get started with their projects. When the sample syntax does not match with the project's language then you have to be an expert to translate the JS/HTML/CSS sample to your typed functional DSL which breaks the development flow.
Now let’s see how haskell code spot works.
Its architecture is simple, as it consists of a server written in Haskell with Scotty and a frontend written in Svelte. The frontend is a thin client that visualizes the data it gets from the server. The server has two kinds of data sources:
1. eventlog files (.eventlog)
2. GHC-WPC project files (.ghc_stgapp)
The eventlog is parsed with the ghc-events haskell library, converted to JSON with Aeson and then sent to the frontend when requested. The backend does not change the eventlog message format at all, it does not transform the eventlog messages beside some optional message filtering. The backend is also capable of looking up the source code of the modules of the inspected project. This feature relies on GHC-WPC, because the vanilla GHC pipeline does not save information about the project source code location. GHC-WPC generates an .ghc_stgapp file for the executable that has path location data for all the project dependencies. The frontend can query the source code of a specific module name based on the .ghc_stgapp file.
In the first iteration of this project I only wanted to do a simple visualization of a subset of the eventlog messages. The memory usage related messages looked to be the simplest option. Each eventlog message has a timestamp value that is relative to the application start time and a message specific payload. The payload of HeapLive, HeapSize and HeapAllocated messages is an integer that tells the memory consumption in bytes. It was trivial to visualize the timestamp and bytes in D3, C3 and Plotly.js as a line chart. I literally copy-pasted the code samples from the chart lib documentation.
The eventlog also marks when the GC does some work (GCWork) and in profiling mode it saves the current cost centre stack. The cost centre stack (HeapProfSampleCostCentre, ProfSampleCostCentre) eventlog message is an array of 32 bit cost centre identifiers. The cost centres are defined in separate messages. The cost centre specification (HeapProfCostCentre) identifies the source module and a source location. Then it is up to the user tool how it looks up the module source code.
GHC has recently introduced the .hie files to support advanced editor tooling. The .hie file contains a simplified version of the AST including the original module source code and detailed type information. So the .hie files had just the right thing I needed. Unfortunately the format is not stable across GHC versions and sadly it makes .hie files useless in this case. Instead I extended GHC-WPC to save the module haskell source into the external STG binary. Temporarily this is acceptable, I'll do better in the future by introducing a generic container (.zip) file for each module that can store custom data. Then I could store the module haskell source in that container along with the external stg IR.
Unfortunately I encountered a problem with eventlog also, it turned out that the last two GHC releases (8.8, 8.10) generate corrupted eventlog files due to a race condition bug in the RTS. To me this fact is an indicator that eventlog is barely used in the real world.
Finally I'd like to mention some related projects. The first one is eventlog2html, which is the fancier web based version of the old-school hp2ps tool. It generates an html file from an eventlog file that looks much better than a static pdf and it is interactive at some level.
Eventlog2html generates a static html in contrast to haskell code spot that has a client server architecture which gives much bigger freedom for fancy data visualization.
The other project is ghc-debug. It aims to allow remote debugging for GHC compiled programs. Its core idea is to add server functionality to the GHC runtime system and then allow other programs like profilers and debuggers to connect and query the program's memory structure with the finest detail. The idea is awesome, but the project is in an early stage of development and it requires a development branch of GHC.
It would be extremely useful to have a usable haskell debugger and live program profiler. I'd like to work on these indeed, but following a slightly different approach compared to ghc-debug. I'd prefer to implement an stg interpreter first and build a working debugger and profiler for it in the first development iteration. IMO this is a much easier approach and also allows more in-depth inspection of program properties, like value lifetimes.I favour the interpreter approach for the prototyping phase. My motto is that it is better to have a pattern match error than a segmentation fault.