Project

General

Profile

First try with ARM MCU bare betal

Added by Runar Tenfjord about 1 year ago

Based on the rpi2brun.asm runtime I tried to generate code for
a ARM Cortex-M0 and run the result on the Qemu ARM emulator
through the ARM semihost debug interface.

I use the following files.

test.mod:

MODULE Test;

IMPORT SYSTEM;

(** A Boolean value indicating whether the last output operation was successful. *)
VAR Done-: BOOLEAN;

PROCEDURE ^ Putchar ["putchar"] (character: INTEGER): INTEGER;

(** Prints a character value to the standard output stream. *)
PROCEDURE Char (value: CHAR);
BEGIN Done := Putchar (ORD (value)) = ORD (value);
END Char;

(** Prints a string value to the standard output stream. *)
PROCEDURE String (value-: ARRAY OF CHAR);
VAR i: LENGTH; char: CHAR;
BEGIN FOR i := 0 TO LEN (value) - 1 DO char := value[i]; IF char = 0X THEN RETURN END; Char (char) END;
END String;

BEGIN
    String('test123')
END Test.

armt32semihostrun.asm:

.header _header
    .required
    .origin    0x00001000

    ldr    r0, 0x00 ; dummy code

.data ram
    .required
    .origin    0x20000000 ; ram start
    .reserve 0x1000 ; how to define size?

; vector table
.const bootflash
    .required
    .origin    0x0000000
    .qbyte 0x20004000 ; end of ram, start of stack, should be able to calculate this
    .qbyte 0x00001000 ; flash start of code
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle  
    .reserve 0xFC0 ; how to define size easier?

; standard handle
.code handle
loop:
    b loop

; standard stderr variable
.const stderr
    .alignment    4
    .qbyte    0

; standard stdin variable
.const stdin
    .alignment    4
    .qbyte    0

; standard stdout variable
.const stdout
    .alignment    4
    .qbyte    0

; standard abort function
.code abort

    ldr    r0, 0x18
    mov r1, 0x20002
    bkpt 0xab
    bx lr

; standard _Exit function
.code _Exit

    ldr    r0, 0x18
    mov r1, 0x20002
    bkpt 0xab
    bx lr

; standard fclose function
.code fclose

    bx lr

; standard fgetc function
.code fgetc

    ldr    r0, 0x07
    mov    r1, 0
    bkpt 0xab
    bx lr

; standard fputc function
.code fputc

    ldr    r0, 0x03
    mov    r1, sp
    add r1, 4
    bkpt 0xab
    bx lr

; standard putchar function
.code putchar

    ldr    r0, 0x03
    mov    r1, sp
    bkpt 0xab
    ldr    r0, 1
    bx lr

; standard free function
.code free

    bx    lr

; standard malloc function
.code malloc

    ldr    r2, [pc, offset (heap)]
    ldr    r0, [r2, 0]
    ldr    r3, [sp, 0]
    add    r3, r3, r0
    str    r3, [r2, 0]
    bx    lr

heap:    .qbyte    @_heap_start

; heap start
.data _heap_start

    .alignment    4
    .qbyte    0x20000000

; last section
.trailer _trailer

    .required
    .alignment    4

Makefile:

OB := obarmt32
AS := armt32asm
LK := linkmem
QEMU := qemu-system-arm
DEVICE := microbit
ADR := 0x00000000

build: test.rom

test.rom: test.obf armt32semihostrun.obf
    $(LK) $^

test.obf: test.mod
    $(OB) $<

armt32semihostrun.obf: armt32semihostrun.asm
    $(AS) $<

run: test.rom
    $(QEMU) -M $(DEVICE) -semihosting -nographic -device loader,file=$<,addr=$(ADR)

test.elf: test.mod
    llvm-objcopy -I binary -O elf32-littlearm --rename-section=.data=.text,code $< $@

dis: test.elf
    llvm-objdump -d $<

clean:
    rm -f *.sym *.hex *.elf *.obf *.ram *.rom *.map *.lst *.dbg

.PHONY: clean run dis

Building is OK. When I run the generated .rom file in QEMU
the stack pointer and pc seems to be correct, confirming that
it was loaded correctly, but in ends with a hard fault:

qemu-system-arm -M microbit -semihosting -nographic -device loader,file=test.rom,addr=0x00000000
qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)

R00=00000000 R01=00000000 R02=00000000 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=20003fe0 R14=fffffff9 R15=000010fc
XPSR=40000003 -Z-- A handler
FPSCR: 00000000

The .map file seems fine, with the code placed after 0x1000 and
the variable placed in ram.

I could not find out how to disassemble the raw binary and therefore
used llvm for this.

Looking at the disassembled binary file it looks like the
code is messed up and using some online decompiler tool
with the hex dump it seems to be THUMB big endian encoded
and should perhaps be little endian.

Is this a endian issue or is it something other I am doing wrong here?


Replies (17)

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

The disassembler line should read:

test.elf: test.rom
    llvm-objcopy -I binary -O elf32-littlearm --rename-section=.data=.text,code $< $@

RE: First try with ARM MCU bare betal - Added by Florian Negele about 1 year ago

Is this a endian issue or is it something other I am doing wrong here?

I do not know the details of the emulated machine and its boot process so I cannot really help here. While it seems that the stack pointer is initialised correctly, putting a loop: b loop at the start of the _header section does not change the behaviour, so it seems that there is something executed prior to that or there is indeed some encoding issue causing possibly nested interrupts.

I could not find out how to disassemble the raw binary and therefore used llvm for this.

For disassembling, you can create an object file from a raw file by embedding it in an assembly file using the .embed "test.rom" directive.

.reserve 0x1000 ; how to define size?

There is no need to define the size, as all data sections will be placed by the linker after the fixed ram section. The _trailer section is empty and will be placed behind all other data sections such that its address allows to compute the size of the required memory.

.reserve 0xFC0 ; how to define size easier?

Use the .pad 0x1000 to reserve space up until some required padding.

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

A reduced test case which works with GCC.

test.asm:

.cpu cortex-m0
.thumb

.thumb_func
.global _start
_start:
stacktop: .word 0x20004000
.word run
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle
.word handle

.thumb_func
run:
    ldr    r0,=0x04
    ldr r1,=string
    bkpt 0xab
    ldr    r0,=0x18
    ldr r1,=0x20026
    bkpt 0xab
loop:
    b loop

.thumb_func
handle:   b .

.section .rodata
    .align  2
string: .ascii "testing123\0" 

.end

Makefile:

ARM := /c/GNUArmEmbeddedToolchain/bin/arm-none-eabi
AFLAGS := --warn --fatal-warnings -mcpu=cortex-m0
QEMU := qemu-system-arm
DEVICE := microbit
ADR := 0x00000000

build: test.rom

test.rom: test.o test.ld
    $(ARM)-ld -o test.elf -T test.ld test.o
    $(ARM)-objdump -D test.elf > test.lst
    $(ARM)-objcopy test.elf test.rom -O binary

test.o: test.asm
    $(ARM)-as $(AFLAGS) test.asm -o test.o

run: test.rom
    $(QEMU) -M $(DEVICE) -semihosting -nographic -device loader,file=$<,addr=$(ADR)

clean:
    rm -f *.o *.elf *.lst *.rom

.PHONY: clean run

This prints the string 'testing123' and exit with an exit code of 0.

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

The same test case with ECS.

test.asm:

.header _header
    .required
    .origin    0x00000100
    ldr    r0, [pc, offset (SYS_WRITE0)]
    ldr    r1, [pc, offset (string)]
    bkpt 0xab
    ldr    r0, [pc, offset (angel_SWIreason_ReportException)]
    ldr    r1, [pc, offset (ADP_Stopped_ApplicationExit)]
    bkpt 0xab
j:
    b j
SYS_WRITE0: .qbyte    @_SYS_WRITE0
string: .qbyte    @_string
angel_SWIreason_ReportException: .qbyte @_angel_SWIreason_ReportException
ADP_Stopped_ApplicationExit: .qbyte @_ADP_Stopped_ApplicationExit

.const _SYS_WRITE0
    .alignment    4
    .qbyte 0x04

.const _angel_SWIreason_ReportException
    .alignment    4
    .qbyte 0x18

.const _ADP_Stopped_ApplicationExit
    .alignment    4
    .qbyte 0x20026

.const _string
    .alignment    4
    .byte "testing123", 0

.data ram
    .required
    .origin    0x20000000 ; ram start
    .reserve 0x1000

; vector table
.const bootflash
    .required
    .origin    0x0000000
    .qbyte 0x20001000 ; end of ram, start of stack, should be able to calculate this
    .qbyte 0x00000100 ; flash start of code
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle
    .qbyte @handle

; standard handle
.code handle
loop:
    b loop

; main
.code main
loop:
    b loop

; last section
.trailer _trailer

    .required
    .alignment    4

Makefile:

OB := obarmt32
AS := armt32asm
DAS := armt32dism
LK := linkmem
QEMU := qemu-system-arm
DEVICE := microbit # m0
ADR := 0x00000000

build: test.rom

test.rom: test.obf
    $(LK) $^

test.obf: test.asm
    $(AS) $<

run: test.rom
    $(QEMU) -M $(DEVICE) -semihosting -nographic -device loader,file=$<,addr=$(ADR)

dis.obf: dis.asm test.rom
    $(AS) $<

dis: dis.obf
    $(DAS) $<

clean:
    rm -f *.sym *.hex *.elf *.obf *.ram *.rom *.map *.lst *.dbg

.PHONY: clean run dis

This fails with:

qemu-system-arm -M microbit  -semihosting -nographic -device loader,file=test.rom,addr=0x00000000
qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)

R00=00000000 R01=00000000 R02=00000000 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=20000fe0 R14=fffffff9 R15=00000042
XPSR=40000003 -Z-- A handler
FPSCR: 00000000

Looking at the disassembly it seems ECS creates the ldr.w instruction
which is 32bit instruction, but GCC generates the normal ldr instruction.

According to Wikipedia the Cortex-M0 support:
  • Thumb-1 (most), missing CBZ, CBNZ, IT
  • Thumb-2 (some), only BL, DMB, DSB, ISB, MRS, MSR

I tried to change the cpu to Cortex-M4 which should support all of
Thumb1 and Thumb2, but it still failed.

So probably some illegal instruction or unaligned access.

RE: First try with ARM MCU bare betal - Added by Florian Negele about 1 year ago

Is there some documentation about the specifics of the boot process and memory map of the emulated MCU?

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

QEMU info page:

The chip emulated is the Nordic nrf51822 which has an
Cortex-M0 core with added peripherals by Nordic:

The real chip I believe has firmware which is not included
in the QEMU version which is blank.

Cortex-M0 Technical Reference Manual:

Armv6-M Architecture Reference Manual:

RE: First try with ARM MCU bare betal - Added by Florian Negele about 1 year ago

Thanks for the information. Please note that the compiler does not support 16-bit only instructions at the moment. In order to make sure that the assembler uses only 16-bit instructions, use the .n suffix. Also, the boot vector seems to need odd target addresses to select the T32 instruction set. The rom file contains code sections while the .ram file contains all other sections which is why the bootflash memory needs to be marked as code.

RE: First try with ARM MCU bare betal - Added by Florian Negele about 1 year ago

The following code seems to work but requires the attached patch to compile:

.code vector
    .required
    .origin 0x00000000

    .qbyte    0x20004000
    .qbyte @main + 1

    #repeat 15
        .qbyte    @handle + 1
    #endrep

.code main
    .alignment    4

    ldr.n    r0, offset (SYS_WRITE0) + offset (SYS_WRITE0) % 4
    ldr.n    r1, offset (text) + offset (text) % 4
    bkpt.n    0xab

    ldr.n    r0, offset (SYS_EXIT) + offset (SYS_EXIT) % 4
    ldr.n    r1, offset (ADP_Stopped_ApplicationExit) + offset (ADP_Stopped_ApplicationExit) % 4
    bkpt.n    0xab

    .align    4
SYS_WRITE0:    .qbyte    0x04
SYS_EXIT:    .qbyte    0x18
ADP_Stopped_ApplicationExit: .qbyte    0x20026
text:    .qbyte    @text

.code text
    .byte    "hello\n", 0

.code handle
loop:    b    loop
ldr.patch (866 Bytes) ldr.patch

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

Excellent. One step further. Now it is working in armt32 assembler.
Apparently the BX instruction and vectors uses bit 0 to indicate thumb instruction set.

Based on this I updated the test code for Oberon to use the M-4 cpu
which support Thumb-1 and Thumb-2 as I understand the Oberon
compiler create mixed code.

I tested the assembler code with the M-4 cpu and it works as expected.
I believe all Cortex-M types is backward compatible to Thumb-1 code.

armt32semihostrun.asm:

; Only works on Cortex-M profile ARMv6-M and ARMv7-M
; Uses only Thumb1 16bit instructions to support Coretex-M0/M0+
; Flash/memory origin and size must be changed to values for target device.
.code vector
    .required
    .origin 0x00000000 ; flash

    .qbyte 0x20004000 ; stack = ram top
    .qbyte @start + 1 ; +1 for Thumb flag

    #repeat 15
        .qbyte    @handle + 1 ; +1 for Thumb flag
    #endrep

.code handle
    .alignment    4

    loop:    b.n    loop

.code start
    .alignment    4

    nop.n

; last section
.trailer _trailer

.data ram
    .required
    .origin    0x20000000 ; ram start

; standard abort function
.code abort
    .alignment    4

    ldr.n    r0, offset (SYS_EXIT) + offset (SYS_EXIT) % 4
    ldr.n    r1, offset (ADP_Stopped_ApplicationExit) + offset (ADP_Stopped_ApplicationExit) % 4
    bkpt.n   0xab
loop:
    b.n    loop

    .align    4
SYS_EXIT:    .qbyte    0x18
ADP_Stopped_ApplicationExit: .qbyte    0x20026

; standard _Exit function
.code _Exit
    .alignment    4

    bl       @abort

; standard getchar function
.code getchar
    .alignment    4

    ldr.n    r0, offset (SYS_READC) + offset (SYS_READC) % 4
    mov      r1, 0x00
    bkpt.n   0xab
    bx.n     lr

    .align    4
SYS_READC:    .qbyte    0x07

; standard free function
.code free
    .alignment    4

    bx.n    lr

; standard malloc function
.code malloc
    .alignment    4

    ldr.n   r2, offset (heap) + offset (heap) % 4
    ldr.n    r0, [r2, 0]
    ldr.n    r3, [sp, 0]
    add.n    r3, r3, r0
    str.n    r3, [r2, 0]
    bx.n    lr

heap:    .qbyte    @_heap_start

; heap start
.data _heap_start

    .alignment    4
    .qbyte    0x20000000 ; ram start

; standard putchar function
.code putchar
    .alignment    4

    ldr.n    r0, offset (SYS_WRITEC) + offset (SYS_WRITEC) % 4
    ldr.n     r1, [sp, 0]
    bkpt.n   0xab
    mov      r0, 0x01
    bx.n     lr

    .align    4
SYS_WRITEC:    .qbyte    0x03

test.mod:

MODULE Test;

IMPORT SYSTEM;

VAR Done-: BOOLEAN;

PROCEDURE ^ Putchar ["putchar"] (character: INTEGER): INTEGER;

BEGIN
    Done := Putchar (ORD ('x')) = ORD ('x');
END Test.

Makefile:

OB := obarmt32
AS := armt32asm
DAS := armt32dism
LK := linkmem
QEMU := qemu-system-arm
DEVICE := mps2-an386 # cortex-m4
ADR := 0x00000000

build: test.rom

test.rom: test.obf armt32semihostrun.obf
    $(LK) $^

test.obf: test.mod
    $(OB) $<

armt32semihostrun.obf: armt32semihostrun.asm
    $(AS) $<

run: test.rom
    $(QEMU) -M $(DEVICE) -semihosting -nographic -device loader,file=$<,addr=$(ADR)

dis.obf: dis.asm test.rom
    $(AS) $<

dis: dis.obf
    $(DAS) $<

clean:
    rm -f *.sym *.hex *.elf *.obf *.ram *.rom *.map *.lst *.dbg

.PHONY: clean run dis

It runs, but without the expected output.
Probably due to the BX instruction given an even address?

RE: First try with ARM MCU bare betal - Added by Florian Negele about 1 year ago

By jumping to start, your boot vector skips the module initialisation and flows off into abort, see the generated map file. Start at the end of the vector instead:

.qbyte    extent (@vector) + 1

The SYS_WRITEC operation requires a pointer to the character rather than the character itself:

mov    r1, sp

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

Thank you very much for your time fixing this.

I guess now Oberon code should just work with
NEW, DISPOSE and TRACE covered by the runtime code.

There is an error in the putchar code.
It should return the character printed:

    ldr.n      r0, offset (SYS_WRITEC) + offset (SYS_WRITEC) % 4
    mov      r1, sp
    bkpt.n   0xab
    ldr.n     r0, [r1]
    bx.n     lr

I believe this runtime should now work with a large number of MCUs.

I will test this code with real hardware when I get back to the
office. The nice thing here is that the semihost interface works
troughs the programming probe with gdb or similar tools.
There is much more features of this semihost interface, but
I have not had the need for anything other than printing.

Also this makes it possible to do automatic testing trough
QEMU as the printing can be redirected and checked on
the host.

It would also be nice to have a way to run code in ram only
for testing, to avoid wearing on the flash cycles. I think that
is possible by first flashing a rom which redirects start to
ram and then have a second "rom" which is placed in ram.

The only thing a can think of could be a problem is interrupt
handles in Oberon. As these are run directly by the hardware
the procedures can not operate on the stack. We then need
some kind of "naked" functions similar to LLVM/GCC.

RE: First try with ARM MCU bare betal - Added by Florian Negele about 1 year ago

Thanks for the fix. For code running in RAM only you can use the ordinary binary file linker and have an empty vector at the desired target address. For "naked procedures" you could define assembler stubs that setup a fake stack in order to call Oberon procedures. The stubs need to store and restore registers and return from the interrupt with special instructions anyway.

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

I have now updated the runtime and moved the semihost functionality to
a Oberon module which replaces the dummy routines in the runtime at
link time. I think this is a cleaner solution as it keeps the runtime general.

I was also able to use the interrupts from a Oberon module.
Turns out the ARMv7-M platform the stack i setup in hardware
when calling ISR and there is no issue.

Attached are some test code with this functionality which
implements a SVC call trap and prints the information
pushed to the stack at the call location.

The only thing is that I not able to find the SVC value in the
XPSR register. Otherwise working as expected.

I see that the language defined traps are using a HLT instruction.
This I do not think I am able to catch in the runtime. Perhaps
this should be a replaceable function in the runtime?

Ideally I would like to use a SVC call with R0 set to the
trap number. Then I could handle this in the runtime and when in
deployed mode be able to hard reset the MCU.

Attached are also the general runtime armv7mrun.asm.

Is there perhaps a way to reuse this in the stm32f4run.asm code
and just replace the empty ISR block in armv7mrun.asm?

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord about 1 year ago

After testing directly on hardware I found some issues:

  • QEMU does not complain about non-aligned access to memory, but the MCU triggers
    a hardfault trap. Turns out the malloc code in the runtime needs to be updated here.
    Also the free list allocator needs to be updated.
  • Semihosting is a really expensive operation. Print character by character is really
    slow, and I had to implement some line buffering.

Attached updated runtime code with fix for aligned memory access and clearing
for RAM on startup.

The alignment issue on the free list allocator was a one line fix:

- Heap := ADR(heapStart);
+ Heap := ADDRESS(SET32(ADR(heapStart) + 3) * (-SET32(3))); (* round up address to next word *)

No I am able to run about 600 tests also directly on hardware without any issue.

There is some failures related to REAL support, but these I will look at on a later
stage (emulated, non-FPU) as it is not needed now.

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord 7 months ago

Regarding using GDB with the runtime.

I see most examples I find online that the ELF format is used in order
to supply extra information to GDB. I do not think I can use the DWARF
information without an ELF file. At least I could not find information on this.

Is it possible to create this with the linkbin command for a
runtime with fixed memory locations, e.g. bare metal target?

I tried to convert my runtime following the armt32linuxrun.asm, adding the
elf header information from this file excluding the DLL sections, but just ended
up with a very large binary file of 800Mb.

I followed the link instructions at : https://ecs.openbrace.org/manual/manualse13.html#x19-370003.6

RE: First try with ARM MCU bare betal - Added by Florian Negele 7 months ago

The debugging information linked together with a program currently depends on the linker to resolve the addresses of debugging symbols. Getting the address of a symbol from an external source like the map file is not supported at the moment, sorry.

RE: First try with ARM MCU bare betal - Added by Runar Tenfjord 6 months ago

Thanks for the quick reply and clarification on this subject.

    (1-17/17)