Project

General

Profile

Eigen compiler suite compared to LLVM etc.

Added by Rochus Keller over 1 year ago

I was made aware of the Eigen Compiler Suite today (https://github.com/rochus-keller/Oberon/discussions/55). This looks like a magnificent project and an incredible amount of work, thanks for sharing it!

Now that I have gained a first insight into the documentation and the source code, please allow me to ask you the following questions.

  1. Functionally, the comparison with LLVM and Clang is apparent. Where do you see the strengths of Eigen compared to the technologies mentioned?
  1. With around 65kSLOC C++ (~90kSLOC including assembler and def files) the project seems closer to e.g. QBE (https://c9x.me/compile/) or MIR (https://github.com/vnmakarov/mir) than to LLVM. Would it make sense to position Eigen as a flexible compiler backend, i.e. as an alternative to the technologies mentioned? Where do you see the strengths compared to QBE and MIR?
  1. The development of Eigen seems to have started around 2020. Can you disclose where Eigen is used, and how far away the project is from an official release?
  1. There is an AMD64 backend. Are there plans for an x86 backend?

Replies (39)

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller over 1 year ago

Meanwhile I was able to make some modifications so that it compiles with the MSVC 2014 compiler.

Unfortunately it raises an assertion in amd64generator Context::Emit IsValid(instruction) on the first run (the same happens with MSVC 2022).

When I compile and run with MinGW on Windows or GCC on Linux with the given changes, everything works. I should step through on both architectures in parallel to find out what's different.

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller over 1 year ago

since obcode targets a virtual platform which differs from the target of cdamd32.

What changes were necessary to obcode so it generates IR compatible with cdamd32? Or the other way round: how would I have to change cdamd32, so it works with IR generated by obcode? My main motivation for this: I'm looking for a way to generate example IR from code fragments to better understand IR and meta data. I tried cppcode, but I'm immediately hitting the limits (e.g. passing a struct by value to a function seems not to be supported).

RE: Eigen compiler suite compared to LLVM etc. - Added by Florian Negele over 1 year ago

What changes were necessary to obcode so it generates IR compatible with cdamd32? Or the other way round: how would I have to change cdamd32, so it works with IR generated by obcode?

As said before, both compilers use an abstraction called layout which describes the type layout of the target platform. The x86 back-end defines its layout in tools/amd64generator.cpp line 212, while the intermediate code back-end uses the standard layout in tools/cdgenerator.hpp line 38. You could change that line to resemble the layout for your desired target:

Layout layout {{4, 1, 4}, {8, 4, 4}, 4, 4, {0, 4, 8}, true};

I tried cppcode, but I'm immediately hitting the limits (e.g. passing a struct by value to a function seems not to be supported).

Maybe you are better off with the Oberon front-end. I would suggest using the sandbox for studying code fragments.

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller over 1 year ago

I would suggest using the sandbox for studying code fragments.

Wow, there is even an online-version of the compiler, great. It's really amazing what infrastructure you have created.

As said before ... You could change that line to resemble the layout for your desired target

I am aware of this, but it was too abstract for my current level of knowledge about ECS; I am glad for the example.

RE: Eigen compiler suite compared to LLVM etc. - Added by Florian Negele over 1 year ago

I am glad for the example.

Please change line 39 as well:

Platform platform {layout, false};

RE: Eigen compiler suite compared to LLVM etc. - Added by Florian Negele over 1 year ago

In addition, the function was already available before, e.g. via swap in the STL, or via Boost.Move (which also demonstrates well that even the new "move semantics" is syntactic sugar).

Setting personal preference aside, swapping is just one of many applications of move semantics which conceptually closed the last remaining gap of the resource management of C++. Before that, you had to resort to copying in a lot of cases and hope for some optimisation to kick in like for example when returning a value from a function. And moveable but non-copyable types like std::unique_ptr could not be properly defined at all. Boost.Move just emulates rvalues using templates and requires a lot of explicit macro invocations to achieve the same result. Saying this demonstrates that it is syntactic sugar is a bit of a stretch considering almost all other C++ features could be implemented in C using a lot of macro trickery as well.

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller over 1 year ago

Saying this demonstrates that it is syntactic sugar is a bit of a stretch

Isnt' providing a more pleasing syntax for a given feature to the programmer the major purpose of syntactic sugar? The question is rather how much additional dependency and complexity we are willing to accept for it (remember the scope of the specification has effectively doubled from 98/03 to 11 and it took many years until all relevant compilers were reasonably compatible, and it didn't really get any better with further versions). Personally, I value portability and compiler-independence more than a slightly nicer syntax.

RE: Eigen compiler suite compared to LLVM etc. - Added by Florian Negele over 1 year ago

Isnt' providing a more pleasing syntax for a given feature to the programmer the major purpose of syntactic sugar?

Would you consider the built-in bool type syntactic sugar as well?

remember the scope of the specification has effectively doubled from 98/03 to 11

Just to be fair, the majority of additions affected the standard library. The language proper grew just about a third.

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller over 1 year ago

Would you consider the built-in bool type syntactic sugar as well?

It was already part of C++98. But the discussion is relevant e.g. for C where it can be discussed; personally I don't need a bool type in C; but I have added one to my languages since it makes the intention explicit and supports better type checking.

the majority of additions affected the standard library.

Which is all relevant if I want to implement a compiler or assess/migrate an existing code base, and there is a strong interdependence between syntax, semantics and standard library in C++, which became much larger with C++11 ff (e.g. with iterator loops). So we cannot simply separate the core language from the standard library, as it might be the case with other languages.

Anyway, there is no need for us to be of the same opinion here. It's your code and it's your decicion what language version you prefer. I was able to solve my problem by migrating the parts relevant to me, which was feasible thanks to the highly structured and consistent architecture of your components. At the moment I'm still looking for the ideal existing code base that could easily be used as a C (maybe even C++11) frontend for Eigen. chibicc is nice, but fails e.g. with my generated C99 benchmark.

RE: Eigen compiler suite compared to LLVM etc. - Added by Florian Negele over 1 year ago

I don't think this is a matter of opinion. Rvalue references cannot be syntactic sugar because they allow expressing things like move semantics and perfect forwarding that were not possible before their introduction: "Rvalue references were introduced into C++0x to provide a mechanism for capturing an rvalue temporary (which could previously be done in C++ using traditional lvalue references to const) and allowing modification of its value (which could not)." [N3055]

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller about 1 year ago

In case you're interested:

I now have the second C compiler, which is based on Cparser instead of Chibicc, ready to run Hennessy.

Please find attached the Hennessy measurement results. Chibicc was extremely slow, so I started to implement a backend for Cparser (see https://github.com/rochus-keller/EiGen/tree/cparser/cparser), which has a better quality than Chibicc up-front, but which also gave more work because I had to generate the IR directly from the AST. Just this switch of the frontend rendered a factor two speed-up (my Cparser backend implementation is pretty similar and has the same room for improvement as my Chibicc backend implementation, so the speed-up is likely du to the better input).

Cparser - in contrast to Chibicc - still segfaults with the Are-we-fast-yet benchmark; my Chibicc version turned out to be 6.5 times slower than GCC -O2. Cparser will also likely be significantly faster in this benchmark, but will take some time to debug.

RE: Eigen compiler suite compared to LLVM etc. - Added by Florian Negele about 1 year ago

Interesting results. I wouldn't have expected a big difference because generating intermediate code from C should be quite straightforward.

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller about 1 year ago

Actually I just took it as a fact and dont' understand yet what exactly causes the factor two. I just studied other benchmarks and saw that cparser/libfirm were as fast as GCC whereas chibicc was 70% worse. I also did measurements with the cparser/libform original implementation, both optimized and unoptimized, and was surprised to see that libfirm apparently only makes the cparser output 25% faster (see https://github.com/libfirm/libfirm/issues/37). From that I concluded that the cparser output itself must be much faster than the chibicc output, which is now confirmed (at least on microbenchmark level).

RE: Eigen compiler suite compared to LLVM etc. - Added by Rochus Keller about 1 year ago

Meanwhile I managed to make the majority of Are-we-fast-yet work with Cparser/EIGen; also these measurements confirm that Cparser generates significantly faster code than Chibicc (factor 2.3 geomean so far). Here are the detail results: https://github.com/rochus-keller/Oberon/blob/master/testcases/Are-we-fast-yet/Are-we-fast-yet_results.ods.

It's interesting to note, that Cparser with the EiGen backend even compiles faster than Chibicc and also than GCC. The resulting executable is still 40% slower than TCC and 50% slower than GCC without optimizations, but there is still much room for improvement in codegen.cpp, but that's currently not my priority.

I will now try to make the remaining two benchmarks run and then eventually continue to implement my Micron backend.

(26-39/39)