C Standard and GCC strike back

I have promised to send a nice Trident screenshot as soon as I manage to make it working. Here it is, Trident preferences in all the glory, with listed USB devices and bits of the log messages. Enjoy!

But wait, what does the title mean? Well, it just means we have a problem which has to be addressed. Good thing is, x86 is not affected, bad news - both ARM and M68K might suffer. To explain you the problem let's have a look at the reason of crash in Trident. There is a very tiny bit of code there:

chunkid = AROS_LONG2BE(currptr[0]);
chunklen = (AROS_LONG2BE(currptr[1]) + 9) & ~1UL;

It does look good, doesn't it? The currptr is a pointer to ULONG, we are fetching here the ID of the chunk and its size. So far so good. If you remember that the chunks in IFF files are aligned on 2 byte boundary you may ask if there is any alignment issue. But wait, CPU in RasPi is not an ancient pre-ARMv6 without support for unaligned addresses, so there should be no issue. You can check AROS bootstrap source to confirm that unaligned access is enabled. You can try to read each field separately just to confirm that everything is fine.

But the code above crashes out of sudden and does not allow Trident to start. What happens here?

In the code we fetch two subsequent ULONG types from an array or memory region. These are local variables which are worked on few lines later which means each of them has a register allocated by the compiler. And here it goes, this is a disassembled part of the code:

ldm     r5, {r3, r4}
cmp     r3, r8
movne   r2, #0, 0
andeq   r2, sl, #1, 0
add     r4, r4, #9, 0

Look only at the first instruction, the rest is not important here. The compiler decided to optimise the code and fetches both chunkid and chunklen at once, using one instruction. The variables are assigned as follows: currptr in r5, chunkid in r3, chunklen in r4. The ldm opcode fetches many registers from memory and stores them in the registers written in curly braces. It is heavily used on ARM for storing/restoring register contents on function entry/write or to implement more efficient memory copy function. It requires, however, memory aligned to at least 4 byte boundary even if unaligned memory accesses are enabled. IFF says chunks are aligned to 2 bytes. See the problem?

Why does gcc do that? Because it can, the C standard allows it to do so (Source: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf, Section 6.2.8 Alignment of objects):

Complete object types have alignment requirements which place restrictions on the addresses at which objects of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated.

GCC may assume that a pointer to ULONG type is aligned at least on sizeof(ULONG), i.e. 4 byte boundary, or better.

What to do now? Well, the way data is fetched from IFF files has to be changed. Globally, in entire AROS. One possibility is to write such a helper function:

static inline ULONG GetLONGBE(void *address)
   UBYTE *ptr = (UBYTE*)address;
   ULONG retval = 0;

   retval = (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3];

   return retval;

Here, the data is pointer through void * type which has no alignment requirement. In case of Big Endian machines this function will be translated to single CPU instruction fetching the data in safe, non crashing way. On LE machines this can be compiled either to one or two CPU instructions:

movl (%rdi), %eax
bswapl %eax

or, if compiled for intel CPU supporting move:

movbel (%rdi), %eax

So, my friends, time to fix almost entire AROS now :) Either by consequently checking file by file or by trying to start program after program and waiting for a crash.

Tier Benefits
Recent Posts