Why D is a good choice for writing a language

@b4asile2018/08/15

Introduction

D is a programming language that has been appreciated for its powerful templates, its meta-programming features and parts of its standard library related to algorithms and ranges. At first glance nothing really related to writing compilers, which are often used and based upon OOP or at least structured programming.

D is a multi paradigm language and it allows to write in an OOP fashion even if this doesn't unleash its beauty. I'll explain why D is a viable solution, by using my experience with writing STYX as an example.

Designing the grammar with a PEG

In the past I had already written an awful scripting language, called LEOFUNS (Linear Evaluator Of FUnction Stacks). During this project is when I heard first of the tools and libraries used to design or automate the lexical stage of a compiler: BISON, FLEX, YACC, etc. At this time I only wrote in Object Pascal (the Delphi dialect to be more accurate). There was a YACC port, which I never managed to use.

Fortunately things have changed since and packrat parsers are born and now popularized. The library of D third party packages contains a PEG implementation called Pegged. Easy to use, well documented, nothing to do with the dusty tools mentioned before. It has allowed me to start writing the real parser by hand, while having a concise document as my plan, which I call the formal grammar.

a PEG is used as reference document to maintain the parser written in D

The formal grammar itself can be tested in Pegged. The test consists of feeding an automatically generated parser with some source code. Then the AST is displayed in a web browser, allowing to check very easily if the grammar is broken.

Pegged produced HTML, allowing to see quickly if the grammar is broken

Once I started using the D library, Pegged, I didn't want to use a second language in the project, even if the grammar is separated from the compiler; as a sub project standing in its own folder.

Inline unit tests and coverage

Overview

The D language allows unit tests to be defined with their matching code. A simple compiler switch can be used to run them through features of the D runtime. The compiler's reflection system can also be used to run the tests in a more personalized way. A test is the equivalent of a free function but with a special syntax

    // The most simple D unit test
    unittest
    {
        assert(true);    
    }  

In addition, the code can be instrumented to measure the coverage when the unittests are run. This way, many aspects of my compiler can be tested without a test suite based on external files (although this will be necessary in the future).

Application

Since a compiler is about transforming source code, most of the unittests actually call a test function with at least a string representing of some STYX source code and then depending on the compiler feature that's tested, optional parameters, for example some other code when the test is about rewriting or formatting the AST. The test functions take advantage of D's special keywords that are __LINE__ and __FILE_FULL_PATH__, to replace the default assert expression, allowing to get precise error messages, relative to the STYX code that's tested and not the compiler code. These messages are even clickable in the IDE widget dedicated to the compiler and target programs output streams.

a custom function to replace D 'assert'. Expansion of the special keywords allows precise error messages

the error message is a STYX error not a D error

Interaction with the editor is also important to reach 100% coverage. The tests for a specific compiler module (e.g lexer, parser, version processor, etc) can be run independently from the other modules because the compiler is also available as a static library. Note that STYX as a library is not compiled with the unittests otherwise each time all of them would be run.

interaction with the editor is important in order to 'fight' until 100% coverage is reached by the tests

Online tests

D is handled by the most popular continuous integration services. Typically a combination of TravisCI + CodeCov is extremely easy to setup. This opens the field of pull requests with a guard against regressions.

Garbage collector

While somewhat criticized because of its GC, D's default memory management system is not a problem in a compiler. Compilers are single-shot programs and memory management, in most cases, doesn't matter. If at some point it does, it will still be possible to turn it off before a phase and force a collection once finished, before starting another one. For now only new is used and we don't care if an instance is not used (typically after a parser error due to invalid code).

In short: so far so good even if it's true that STYX as a library is only used to run individual tests. More serious uses, I especially think to an auto-completion daemon, with an undefined running time, may reveal unexpected memory issues.

Future and conclusion

The project has reached a point where D testing facilities will be less used but they've been useful to get a solid foundation. In the future I'll expect that D's C FFI will be useful, for example for the backend. This time again I'll count on the third party packages, such as libfirm-d or LLVM-d.

4

4 replies

@b4asile: First post on Putchar.org, wow!

17 Aug

Oh hey BBasile :D

Just dropping a link to my programming language compiler made in D:
https://github.com/beast-lang/beast-dragon

19 Aug

There’s also Volt

30 Aug

Why not OCaml though? FP + OOP + official LLVM bindings + as a bonus you can compile it to js.

I haven’t written anything but toy programs in ocaml though, so I’ll be interested in comparison if anyone can share their experience.