Forums » Programming with the ECS » Oberon »
Optimization opportunity in string routines
Added by Runar Tenfjord about 1 year ago
There is a nice trick which can be used to check for zero byte in a "word" by some clever bit manipulation.
This allow for an impressive optimization of some typical string routines.
Example:
MODULE Test; IMPORT SYSTEM; TYPE WORD = SYSTEM.ADDRESS; WSET = SYSTEM.SET; CONST WORDSIZE = SIZE(WORD); NULLMASK1 = SEL(WORDSIZE = 2, 0101H, SEL(WORDSIZE = 4, 01010101H, 0101010101010101H)); NULLMASK2 = SEL(WORDSIZE = 2, 8080H, SEL(WORDSIZE = 4, 80808080H, 8080808080808080H)); PROCEDURE Length(str-: ARRAY OF CHAR) : LENGTH; VAR word: WORD; adr : SYSTEM.ADDRESS; i: LENGTH; BEGIN i := 0; (* Check block by block *) adr := SYSTEM.ADR(str[0]); LOOP IF (i > (LEN(str) - WORDSIZE)) THEN EXIT END; SYSTEM.GET(adr + i, word); (* Ref.: https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord *) IF (WSET(word - NULLMASK1) * (-WSET(word)) * WSET(NULLMASK2)) # {} THEN EXIT END; INC(i, WORDSIZE) END; (* Find position in last block *) WHILE (i < LEN(str)) & (str[i] # 00X) DO INC(i) END; RETURN i END Length; BEGIN TRACE(Length("testing123")) END Test.
On my computer this yields 2x improvement on short strings (<16 characters) and up to 6x improvement on long strings.
This is attractive for Oberon as strings are not raw pointers and is therefore aligned in memory.
I believe this is simple enough to be not obfuscate the code like many libraries does (try reading the source code of glibc) and
yields an impressive improvement.
Replies (3)
RE: Optimization opportunity in string routines - Added by Florian Negele about 1 year ago
Thank you for the interesting trick. Unfortunately, strings are not aligned in my implementation.
RE: Optimization opportunity in string routines - Added by Runar Tenfjord about 1 year ago
Aha. I need to slightly modify the code then.
(The XDS compiler where I used this code has aligned storage for arrays and records)
The code would then read something like:
MODULE Test; IMPORT SYSTEM; TYPE WORD = SYSTEM.ADDRESS; WSET = SYSTEM.SET; CONST WORDSIZE = SIZE(WORD); NULLMASK1 = SEL(WORDSIZE = 2, 0101H, SEL(WORDSIZE = 4, 01010101H, 0101010101010101H)); NULLMASK2 = SEL(WORDSIZE = 2, 8080H, SEL(WORDSIZE = 4, 80808080H, 8080808080808080H)); PROCEDURE Length* (str-: ARRAY OF CHAR) : LENGTH; VAR word: WORD; adr : SYSTEM.ADDRESS; i: LENGTH; BEGIN i := 0; (* Check block by block *) adr := SYSTEM.ADR(str[0]); (* Ensure we are WORD aligned *) WHILE (i < LEN(str)) & (adr MOD WORDSIZE # 0) DO IF str[i] = 00X THEN RETURN i END; INC(i); END; LOOP IF (i > (LEN(str) - WORDSIZE)) THEN EXIT END; SYSTEM.GET(adr + i, word); (* Ref.: https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord *) IF (WSET(word - NULLMASK1) * (-WSET(word)) * WSET(NULLMASK2)) # {} THEN EXIT END; INC(i, WORDSIZE) END; (* Find position in last block *) WHILE (i < LEN(str)) & (str[i] # 00X) DO INC(i) END; RETURN i END Length; BEGIN TRACE(Length("testing123")) END Test.
This is actually giving better performance for longer strings than before
probably due to the proper aligned access to memory.
The smallest string is slightly slower due to more code to process (1.5x vs 2.0x).
RE: Optimization opportunity in string routines - Added by Runar Tenfjord about 1 year ago
After some research it seems that for a modern desktop CPU the alignment does not matter much
anymore. Many of the string routines can then be easily optimized with this method.
However for ARM 32 bit MCU unaligned access is not supported.
Probably similar for other MCU's.
The line should be:
WHILE (i < LEN(str)) & ((adr + i) MOD WORDSIZE # 0) DO