[gate-users] bug in ECAT6 IO routines due to aliasing
Kris Thielemans
kris.thielemans at csc.mrc.ac.uk
Fri Sep 10 16:19:24 CEST 2004
Hi all,
after 2 days of work, I found the reason why STIR ECAT6 IO is broken on
Solaris with gcc 3.3 or later. I'm forwarding to some more people, as it's a
bug in other ECAT libraries as well.
The reason is that (in get_vax_float and hostftovaxf) we are using
constructs such as
unsigned short * bufr;
// some stuff with bufr
return *(float *)bufr;
This is results in ill-defined behaviour. With gcc 3.4.1, the above results
in non-sensical results as soon as the optimisation level is 2 or higher. I
thought first that it was a compiler bug, but it's listed on the gcc page
as a non-bug. They refer to the following page for more info
http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
Here is the most relevant part of that email (sligthly edited by me)
-------------
The ISO specification (...) specifies that the result is *undefined* when
you
dereference a pointer that points to an object of a different
(incompatible) type.
There are cases where you wish to access the same memory as different
types:
float *f = 2.718;
printf("The memory word has value 0x%08x\n", *((int*)f));
You cannot do that in ISO C, but gcc has an *extension* in that it
considers memory in unions as having multiple types, so the following
will work in gcc (but is not guaranteed to work in other compilers!)
union {
int i;
float f;
} u;
u.f = 2.718;
-------------
My own way of looking at this is that a float* can have other alignment
rules than a short*. (I find it rather stupid that ISO C (and ANSI C++) call
the result undefined, but still allow you to do it, but fine).
The email above seems to suggest the following work-around:
unsigned short * bufr;
// some stuff with bufr
float tmp;
memcpy(&tmp, bufr, sizeof(float));
return tmp;
The reason that this works is that memcpy does not have alignment problems.
The same error occurs in the LLN matrix library (no surprise, as STIR ECAT6
code is derived from CTI code, and so is the LLN matrix code). xmedcon uses
the union trick apparently, but note that the page above claims this is not
portable.
Worrying of course is that the pointer casting stuff is used all over the
place in the ECAT IO routines (for example, swaw() etc.). This really would
have to be rewritten as well. However, my tests indicate that this is not
necessary yet...
Painful...
I have attached a small test program that I used to play around. I stripped
the whole thing down as much as possible, and only then checked the gcc
web-site :-(. You will see that on Solaris with gcc 3.4.1
# this will work
gcc -O -o test-aliasing test-aliasing.c ; ./test-aliasing
# this will not work
gcc -O2 -o test-aliasing test-aliasing.c ; ./test-aliasing
# this will not work in a different way!
gcc -O3 -o test-aliasing test-aliasing.c ; ./test-aliasing
Other stuff will happen on other systems. e.g. gcc 3.3 on NT (using cygwin)
fails the test only with O2.
If you add -DCORRECT it should all work.
Thanks to Tim Borgeaud for helping me to track this down.
Kris Thielemans
(kris.thielemans at imperial.ac.uk)
Hammersmith Imanet (formerly IRSL)
Cyclotron Building
Hammersmith Hospital
Du Cane Road
London W12 ONN, United Kingdom
Phone on : +44 (020)8383 3731
FAX on : +44 (020)8383 2029
web site address:
http://www.hammersmithimanet.com/~kris
-------------------------------------------
NOTE: My inbox has a SPAM filter that automatically throws away suspect
messages. If you expect a reply and don't get one, your message might have
been wrongly classified.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-aliasing.c
Type: application/octet-stream
Size: 924 bytes
Desc: not available
URL: <http://lists.opengatecollaboration.org/mailman/private/gate-users/attachments/20040910/9aa45df6/attachment.obj>
More information about the Gate-users
mailing list