Forums » Programming with the ECS » Oberon »
Looking for a decently large demo project
Added by Rochus Keller over 1 year ago
I'm looking for a decently large demo project. I want to generate a large number of IR files to better understand the IR, and to create and validate an EBNF grammar of the IR syntax.
For this purpose I downloaded the Oberon System V2 Ceres source files and converted them to UTF-8. See the attached ZIP. I also added a dependency graph of the modules.
Then I added an empty implementation to Kernel.Def and was successfully able to compile it with
ecsd -i obcode Kernel.Def.
My intention was to go up the dependency tree and compile one module after the other.
When trying FileDir.Mod I got errors:
ecsd -i obcode FileDir.Mod FileDir.Mod:119:26: error: assigning 'DirMark' of signed integer type 'SIGNED64' to variable of signed integer type 'LONGINT' a.m := N; a.mark := DirMark; ^ FileDir.Mod:152:14: error: assigning 'DirMark' of signed integer type 'SIGNED64' to variable of signed integer type 'LONGINT' a.mark := DirMark; a.m := 1; a.p0 := oldroot; a.e[0] := U;
Is this to be expected? How can I configure the compiler that LONGINT is SIGNED32 instead of SIGNED64?
Or can you recommend another, better suited demo project which I could use for my purpose?
OberonSystemV2.zip (144 KB) OberonSystemV2.zip |
Replies (42)
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
Is this to be expected? How can I configure the compiler that LONGINT is SIGNED32 instead of SIGNED64?
Yes, this is to be expected. LONGINT
already is SIGNED32
, see Table 7.2, but the value of DirMark
is bigger than MAX(LONGINT)
. It seems that the original compiler accepted this, but it should correctly have assigned the next fitting integer type if available. You can use the identity operation SHORT
(or more explictly LONGINT
) in the definition of DirMark
and HeaderMark
to get the original integer type with possible truncation:
DirMark* = SHORT (9B1EA38DH); HeaderMark* = LONGINT (9BA71D86H);
Or can you recommend another, better suited demo project which I could use for my purpose?
I don't think there are many "large" Oberon code bases other than the Oberon System itself. I have successfully compiled the compilers from the original Project Oberon and its successor hosted on https://projectoberon.net/, but it required some minor source code changes like the one above due to undocumented incompatibilities.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
I want to generate a large number of IR files to better understand the IR, and to create and validate an EBNF grammar of the IR syntax.
Regarding the grammar, the test suites cdcheck.tst
and in particular cdrun.tst
located in the tests
directory contain all possible combinations of instructions and operands. Section 8.2.4 and following define the overall syntax of the generic assembly language used to express intermediate code. The syntax definition of the actual intermediate code operands is given in Section 23.3. I hope this helps.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
You can use the identity operation SHORT (or more explictly LONGINT) in the definition of DirMark
Ok, thanks. So I will try to modify the source code till it compiles. Let me see how far I can get.
from the original Project Oberon
So your compiler can also cope with Oberon-07? I will try with the Project Oberon system then, maybe it's easier.
define the overall syntax of the generic assembly language
Thanks, I've read the sections more than once. But there are a lot of details which catch me now since I'm writing the IR code generator. I was actually thinking about writing something like https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/ for Eigen.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
So your compiler can also cope with Oberon-07?
No, sorry for the confusion. The compiler understands Oberon-2 only, but except for some minor changes the languages are more or less compatible.
But there are a lot of details which catch me now since I'm writing the IR code generator.
I see. The intermediate code is a rather low-level abstraction, may be Table 23.2 shows best what the capabilities of the instruction set are since it groups its operations by function. You can always use the sandbox for examples of high-level construct mapping, have a look at its implementation in the tools/*emitter.cpp
files, or ask here if there are problems.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
Thanks. I know the table and it is indeed useful. I also use the sandbox from time to time, but especially if I'm trying with C examples it is usually quite tricky to find one which compiles. That was one of the reasons why I'm still trying to integrate your backend with one of the lean open source C compilers available. But unfortunately this is a chicken or egg problem, and at the moment I think I should rather make progress with my own compiler.
Meanwhile I tried to compile some more files of the v2 system, but as it seems I would have to change pretty much. The semantics your compiler implement seem to be too far away from Wirth's original notions (which is OK, but since it's advertised as Oberon-2 compatible, I hoped for less resistance). Anyway, I will eventually get there with sufficient time and patience ;-)
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
if I'm trying with C examples it is usually quite tricky to find one which compiles.
Yes, that is unfortunately true. The Oberon front-end on the other hand is pretty stable and except maybe for pointer arithmetics there is hardly any C construct that cannot be expressed directly in Oberon.
The semantics your compiler implement seem to be too far away from Wirth's original notions
Can you give me some examples?
since it's advertised as Oberon-2 compatible, I hoped for less resistance
Please be aware that the compiler is written against specification, not implementation. Of course, there may always be some oversights but my test suite has revealed a lot of instances where the original compilers are not compatible either. I think that the code you are trying to compile probably struggles with these issues and is thus not conformant to begin with.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
Can you give me some examples?
I'm not sure about the validity of these "examples", but if I compile a code base implemented in Oberon 90 or Oberon-2 known to compile, any compiler error looks like a potential example candidate. For example, when I compile Files.Mod, I get
Files.Mod:286:42: error: incompatible result value BEGIN RETURN LONG(f.aleng)*SS + f.bleng - HS ^ Files.Mod:329:25: error: assigning 'SYSTEM.ADR (x)' of unsigned integer type 'UNSIGNED32' to variable of signed integer type 'LONGINT' BEGIN dst := SYSTEM.ADR(x); ^ Files.Mod:334:36: error: assigning 'SYSTEM.ADR (r.buf^.data.B) + r.bpos' of unsigned integer type 'UNSIGNED32' to variable of signed integer type 'LONGINT' src := SYSTEM.ADR(r.buf.data.B) + r.bpos; m := r.bpos + n; ^ Files.Mod:387:25: error: assigning 'SYSTEM.ADR (x)' of unsigned integer type 'UNSIGNED32' to variable of signed integer type 'LONGINT' BEGIN src := SYSTEM.ADR(x); ^ Files.Mod:392:55: error: assigning 'SYSTEM.ADR (r.buf^.data.B) + r.bpos' of unsigned integer type 'UNSIGNED32' to variable of signed integer type 'LONGINT' r.buf.mod := TRUE; dst := SYSTEM.ADR(r.buf.data.B) + r.bpos; m := r.bpos + n; ^ Files.Mod:413:40: error: incompatible result value BEGIN RETURN LONG(r.apos)*SS + r.bpos - HS
the compiler is written against specification
From my humble point of view, the Oberon specification is by far not precise enough, which is not surprising, since one of the goals seems to have been that it is only 16 pages long. Therefore, to understand the missing parts, the reference compiler and Oberon System code somehow has to be seen as part of the language specification. In contrast to the Oberon-07 spec, Oberon 90 and 2 also include features like the VAR ARRAY OF BYTE trick. In the specification of Oberon+, the latter is e.g. explicitly excluded, also the SYSTEM module. But don't feel obliged to change anything; this is just my opinion.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
For example, when I compile Files.Mod
Thank you. These are all examples of two issues that are underspecified such that the code makes assumptions and is thus non-portable: The first error is due to the assumption that INTEGER
is a 16-bit type. The language report does not specify any type size, it only defines type relations. The next errors deal with the SYSTEM
module for which the report explicitly states that modules importing it "are inherently non-portable". Here, the code assumes that addresses are (signed) 32-bit values. I tried to mitigate such portability issues by introducing the type SYSTEM.ADDRESS
which has the correct bit width on any platform, but also is intentionally unsigned such that assignments to LONGINT
are at least flaged like in this case.
From my humble point of view, the Oberon specification is by far not precise enough, which is not surprising, since one of the goals seems to have been that it is only 16 pages long.
I completely agree. Niklaus Wirth however told me once that the compiler should not be considered a reference implementation. Neither the original one nor that of Oberon-07. My experience is that it is quite hard to write Oberon code that is portable not only accross platforms but also compilers for the same platform. You may have more luck with modules further up the import tree, but for low-level modules all bets are off.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
mitigate such portability issues by introducing the type SYSTEM.ADDRESS
Which doesn't avoid issues with existing code. As a naive Oberon user I would simply assume that a compiler claiming compatibility with Oberon-2 would just compile existing code without modifications.
Niklaus Wirth however told me once that the compiler should not be considered a reference implementation.
Well, he hardly ever had this specific use case; he implemented the compiler and evolved the language in doing so, and the specification came after the fact; he didn't depend on a precise specification; we, on the other hand, start with the specification; or we just take the original compiler and modify it, as most Oberon compilers have done.
But anyway, since you have migrated the original Oberon-07 compiler code to your compiler, you could maybe provide it as an example to play with.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
Which doesn't avoid issues with existing code.
Of course not. But it would have, had it only been in the report. The type LENGTH
for example is another extension for that matter. Unfortunately, my proposals were always dismissed and deemed unnecessary.
Well, he hardly ever had this specific use case; he implemented the compiler and evolved the language in doing so, and the specification came after the fact
I agree, but what he meant to say was that not only the compiler writer but also the user, even a naive one, should adhere to the language report only. Quote from a personal email: "Please do not consider my compiler as the definition of Oberon, and in case of doubt read the Report. It is the priviledge of a report to leave certain constructs undefined. In this case a user should simply avoid using the (undefined) feature." This is hard to achieve in general and for such low-level code as your existing source probably next to impossible, so I am really surprised you assumed that you could just recompile it.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
what he meant to say was that not only the compiler writer but also the user, even a naive one, should adhere to the language report only
Sounds like an academic delusion; he assumingly never tried himself.
so I am really surprised you assumed that you could just recompile it.
Compiling most of this code is not a big issue; it just is not expected to work because it is full of assumptions about the hardware; but that's no problem for the present case.
I actually migrated both the Project Oberon System and the System 3 to a version which doesn't use SYSTEM (nor any of the other fancy tricks) and was able to demonstrate, that most of the system could be developed with regular language features, just with a few language extensions.
But anyway, my goal is still to find a decently large code base compatible with your compiler.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
I actually migrated both the Project Oberon System and the System 3 to a version which doesn't use SYSTEM (nor any of the other fancy tricks) and was able to demonstrate, that most of the system could be developed with regular language features, just with a few language extensions.
That is really great! I also had to constantly remind a lot of people to only use SYSTEM
when absolutely necessary and show proper alternatives.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
Here are the repositories, in case you're interested: https://github.com/rochus-keller/OberonSystem (different concepts on different branches) and https://github.com/rochus-keller/OberonSystem3
And attached an example where I just transpiled Wirth's Project Oberon System to C++ (not caring whether it actually would run or not) using my OberonViewer.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
Interesting, thank you. How are things like garbage collection handled in these migrated projects? Does your compiler support it by default?
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
How are things like garbage collection handled in these migrated projects?
Oberon+ generates ECMA-335 or LuaJIT IR (the latter currently not maintained), or C99 code. Both ECMA-335 and LuaJIT have an integrated GC, which I use. The C99 code uses the Boehm conservative GC, which is good enough for this purpose. Here is an example of the generated source code with a Readme.txt how to build it: http://software.rochus-keller.ch/OberonSystem_generated_C_source.zip. I actually wanted to extend the C99 transpiler for a long time to also generate code for MPS (see https://github.com/Ravenbrook/mps), but I didn't find the time yet and it's not that urgent.
Does your compiler support it by default?
For Oberon+ yes; but my forthcoming Micron language (which will use the Eigen backend) doesn't require nor have a GC.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
I had a look at the Micron language repository and it looks like you are about to generate intermediate code as plain text. Since it is written in C++, have you thought about using the ECS to generate the intermediate code representation for you? The file tools/code.cpp
can be used as a library which allows representing sections and instructions as data structures:
using namespace ECS::Code;
Sections sections;
auto type = Unsigned {1};
auto& section = sections.emplace_back (Section::Code, "main");
section.instructions.emplace_back (MOV {Reg {type, R0}, Mem {type, "variable"}});
The file tools/cdemitter.cpp
abstracts most of the actual code emission behind a simple interface and provides automatic register allocation, offset patching, constant folding, and further optimisations. Instructions also provide input and output stream operators so you can still generate text from the instructions if required, see tools/cdgenerator.cpp
.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
ave you thought about using the ECS to generate the intermediate code representation
The Micron IL (MIL) is on a higher level and supposed to have both, a text and binary representation (the latter will be similar to ECMA-335); I will also use the IL for compile time and generic code as well as for symbol files. MIL is also suited for C generation, and doesn't bind the compiler to a specific backend.
The file tools/code.cpp can be used as a library which allows representing sections and instructions ... tools/cdemitter.cpp
I've noticed and even recently migrated the cdemitter to my backported version. But currently I'm just generating the text files myself and will likely build some more tools to handle the IR files.
provides automatic register allocation
In that context I actually asked myself why you provide exactly 8 general purpose registers, and not e.g. an arbitrary numer of temporaries and let the backend do the allocation and spilling? But apparently you have implemented this in cdemitter; I will take a closer look at this.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
PS: the MIL specification is actually in this repository: https://github.com/micron-language/specification
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
In that context I actually asked myself why you provide exactly 8 general purpose registers, and not e.g. an arbitrary numer of temporaries and let the backend do the allocation and spilling?
The intermediate code emitter does not spill but reuses registers as soon as they are not used any longer. The corresponding abstraction is called SmartOperand and uses move semantics for the automatic acquisition and release of registers by the way. Except for pathologically complex expressions, eight registers have proven to be enough in practice. This is also the limiting number for most hardware, since 32-bit architectures require two hardware registers to represent a 64-bit virtual register, and most of them do not provide more than 16 registers. The AMD64 architecture in 32-bit mode is the only one with less registers which is why the complicated spilling is only done in the code generator of that particular back-end.
The emitter does save registers on stack however when calling functions such that registers are not clobbered during argument evaluations, the corresponding abstraction is called RestoreRegisterState.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
Ok, thanks. I will therefore have a closer look at your Oberon emitter.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
and to create and validate an EBNF grammar of the IR syntax.
Here is the grammar, in case you're interested: https://github.com/rochus-keller/EiGen/blob/master/ir_grammar/IntermediateCode.ebnf
I was able to successfully validate it with all cod files I came across so far. The grammar especially consideres the compound instructions and helps me to better understand the IR and what to generate.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
PS: btw. the grammar could actually be LL with two LL exceptions, but the "rec" and "func" instructions rely on explicit instruction counts to find the end of the compound, and their bodies cause ambiguities in an LL grammar; but so far in the examples I run this was no issue. It would be easy to avoid these ambiguities and enable an LL grammar by introducing an "end" instruction, just in case.
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
While I appreciate your effort to increase understanding, I don't think the intermediate code representation lends itself to be represented by a grammar apart from the syntax defined in the manual: A "subsequently" required instruction or type declaration does not necessarily mean "immediately following". A source code location and a type declaration for example can appear in any order and intermixed with other instructions. The same holds for assembly directives which can even appear on any line.
RE: Looking for a decently large demo project - Added by Rochus Keller over 1 year ago
And yet it works. Of all the theoretically possible variants, your compilers always seem to choose the same ones (at least I haven't come across any other variants yet). My grammar may only represent a subset of the possible variants, but it makes it easier to understand and use the IR (at least for me).
RE: Looking for a decently large demo project - Added by Florian Negele over 1 year ago
This may be true for compilers but please bear in mind that users may provide intermediate code as well, for example using SYSTEM.CODE
.