2.6.2-rc2 sha256.c is miscompiled with gcc 3.2.3.
(User space is RedHat 3ES.)
Just an observation:
gcc 3.2.3 is miscompiling sha256.c when using
-O2 -march=pentium3
or pentium4.
gcc 3.3.x is ok, or the problem disappears
if I use arch=i686 or reduce the optimisation level to -O2.
Sympthoms are all the sha256 test vectors fail.
If I extract the guts of the file to compile in user space
the same problem occurs.
On Fri, 30 Jan 2004, R CHAN wrote:
> 2.6.2-rc2 sha256.c is miscompiled with gcc 3.2.3.
> (User space is RedHat 3ES.)
>
> Just an observation:
>
> gcc 3.2.3 is miscompiling sha256.c when using
> -O2 -march=pentium3
> or pentium4.
>
> gcc 3.3.x is ok, or the problem disappears
> if I use arch=i686 or reduce the optimisation level to -O2.
>
> Sympthoms are all the sha256 test vectors fail.
> If I extract the guts of the file to compile in user space
> the same problem occurs.
Have you noticed if this happens for any of the other crypto algorithms?
- James
--
James Morris
<[email protected]>
Has this been demonstrated on *any* system/arch using GCC 3.2.3 (or other
3.2 series) or is it limited in scope to the description below? Does it
seem to do this with other implementations (other than sha256.c) or other
kernels? Just trying to get an idea if this is a complier optimization bug
or something much more limited in scope.
My personal lab is currently being unboxed and I won't be able to run my own
tests for another week or so. (apologies in advance)
In any case, this is *extremely* serious from a number of angles.
Kind Regards,
-dsp
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of James Morris
Sent: Friday, January 30, 2004 9:40 AM
To: R CHAN
Cc: [email protected]; David S. Miller
Subject: Re: [CRYPTO]: Miscompiling sha256.c by gcc 3.2.3 and arch
pentium3,4
On Fri, 30 Jan 2004, R CHAN wrote:
> 2.6.2-rc2 sha256.c is miscompiled with gcc 3.2.3.
> (User space is RedHat 3ES.)
>
> Just an observation:
>
> gcc 3.2.3 is miscompiling sha256.c when using
> -O2 -march=pentium3
> or pentium4.
>
> gcc 3.3.x is ok, or the problem disappears
> if I use arch=i686 or reduce the optimisation level to -O2.
>
> Sympthoms are all the sha256 test vectors fail.
> If I extract the guts of the file to compile in user space
> the same problem occurs.
Have you noticed if this happens for any of the other crypto algorithms?
- James
--
James Morris
<[email protected]>
On Fri, Jan 30, 2004 at 10:04:00AM -0500, Dave Paris wrote:
> Has this been demonstrated on *any* system/arch using GCC 3.2.3 (or other
> 3.2 series) or is it limited in scope to the description below? Does it
> seem to do this with other implementations (other than sha256.c) or other
> kernels? Just trying to get an idea if this is a complier optimization bug
> or something much more limited in scope.
>
> My personal lab is currently being unboxed and I won't be able to run my own
> tests for another week or so. (apologies in advance)
>
> In any case, this is *extremely* serious from a number of angles.
GCC handling of automatic const variables with initializers is very fragile.
There have been numerous bugs in it in the past and last one has been fixed
just yesterday (on GCC trunk only for the time being). It will be
eventually backported once it gets some more testing on GCC mainline.
The problematic line in sha256.c is:
static void sha256_final(void* ctx, u8 *out)
{
...
const u8 padding[64] = { 0x80, };
where if you are unlucky, scheduler will with various different GCC versions
on various architectures reorder instructions so that store of 0x80 into the
struct is before clearing of the 64 bytes.
On the other side, doing this in sha256.c seems quite inefficient, IMHO much
better would be to have there static u8 padding[64] = { 0x80 };
because that will not mean clearing 64 bytes and writing another one on top
of it every time the routine is executed.
Jakub
I've narrowed the problem - it seems to be a specific optimisation bug with
gcc-3.2.3-24 that comes with RedHat 3ES. The problem is not kernel
related (sorry!)
as the guts of the file exhibits the same problem when compiled in user
space.
(All the non-sha256 test vectors in tcrypt.ko pass BTW.)
The older gcc32 that comes with Fedora Cora 1 and the gcc33 there get it
right.
For the actual files please check bug #114610 in
https://bugzilla.redhat.com.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=114610
- the bug report includes sha256.c, tcrypt.c, and types.h necessary to
demonstrate
the problem in user space.
Dave Paris wrote:
>Has this been demonstrated on *any* system/arch using GCC 3.2.3 (or other
>3.2 series) or is it limited in scope to the description below? Does it
>seem to do this with other implementations (other than sha256.c) or other
>kernels? Just trying to get an idea if this is a complier optimization bug
>or something much more limited in scope.
>
>My personal lab is currently being unboxed and I won't be able to run my own
>tests for another week or so. (apologies in advance)
>
>In any case, this is *extremely* serious from a number of angles.
>
>Kind Regards,
>-dsp
>
>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]]On Behalf Of James Morris
>Sent: Friday, January 30, 2004 9:40 AM
>To: R CHAN
>Cc: [email protected]; David S. Miller
>Subject: Re: [CRYPTO]: Miscompiling sha256.c by gcc 3.2.3 and arch
>pentium3,4
>
>
>On Fri, 30 Jan 2004, R CHAN wrote:
>
>
>
>>2.6.2-rc2 sha256.c is miscompiled with gcc 3.2.3.
>>(User space is RedHat 3ES.)
>>
>>Just an observation:
>>
>>gcc 3.2.3 is miscompiling sha256.c when using
>>-O2 -march=pentium3
>>or pentium4.
>>
>>gcc 3.3.x is ok, or the problem disappears
>>if I use arch=i686 or reduce the optimisation level to -O2.
>>
>>Sympthoms are all the sha256 test vectors fail.
>>If I extract the guts of the file to compile in user space
>>the same problem occurs.
>>
>>
>
>Have you noticed if this happens for any of the other crypto algorithms?
>
>
>
>- James
>--
>James Morris
><[email protected]>
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>
>
>
James Morris wrote:
> Have you noticed if this happens for any of the other crypto algorithms?
Just as a reminder, there is still an issue with extreme stack usage
of some of the algorithms, depending on compiler version and
flags.
The worst I have seen was around 16kb for twofish_setkey on 64 bit
s390 with gcc-3.1 (iirc). Right now, I get up to 4kb for this
function with gcc-3.3.1, which probably works but is definitely
a bad sign. I've seen this as well on other architectures (iirc
on x86_64), but not as severe.
Other algorithms are bad as well, these are the top scores from
J?rn Engel's checkstack.pl (s390 64bit 2.6.1 gcc-3.3.1):
0x00000a twofish_setkey: lay %r15,-3960(%r15)
0x0026fc aes_decrypt: lay %r15,-1168(%r15)
0x000c0c aes_encrypt: lay %r15,-1000(%r15)
0x00000e sha512_transform: lay %r15,-936(%r15)
0x001292 test_deflate: lay %r15,-784(%r15)
0x0028a2 cast6_decrypt: lay %r15,-696(%r15)
0x00d1a0 twofish_encrypt: lay %r15,-664(%r15)
0x001b34 setkey: lay %r15,-656(%r15)
0x00e2b0 twofish_decrypt: lay %r15,-624(%r15)
0x000c9e cast6_encrypt: lay %r15,-600(%r15)
0x000014 sha1_transform: lay %r15,-504(%r15)
^
This is the stack size --------|
Arnd <><
On Fri, 30 Jan 2004, Jakub Jelinek wrote:
> The problematic line in sha256.c is:
> static void sha256_final(void* ctx, u8 *out)
> {
> ...
> const u8 padding[64] = { 0x80, };
>
> where if you are unlucky, scheduler will with various different GCC versions
> on various architectures reorder instructions so that store of 0x80 into the
> struct is before clearing of the 64 bytes.
>
> On the other side, doing this in sha256.c seems quite inefficient, IMHO much
> better would be to have there static u8 padding[64] = { 0x80 };
> because that will not mean clearing 64 bytes and writing another one on top
> of it every time the routine is executed.
Proposed patch below. I think sha512 would have been ok, but might as
well make them the same.
R Chan, please test and let us know if it fixes the problem for you.
- James
--
James Morris
<[email protected]>
diff -urN -X dontdiff linux-2.6.2-rc2-mm2.o/crypto/sha256.c linux-2.6.2-rc2-mm2.w/crypto/sha256.c
--- linux-2.6.2-rc2-mm2.o/crypto/sha256.c 2003-09-27 20:50:15.000000000 -0400
+++ linux-2.6.2-rc2-mm2.w/crypto/sha256.c 2004-01-30 11:27:26.465531792 -0500
@@ -295,7 +295,7 @@
u8 bits[8];
unsigned int index, pad_len, t;
int i, j;
- const u8 padding[64] = { 0x80, };
+ static u8 padding[64] = { 0x80, };
/* Save number of bits */
t = sctx->count[0];
diff -urN -X dontdiff linux-2.6.2-rc2-mm2.o/crypto/sha512.c linux-2.6.2-rc2-mm2.w/crypto/sha512.c
--- linux-2.6.2-rc2-mm2.o/crypto/sha512.c 2003-09-27 20:51:04.000000000 -0400
+++ linux-2.6.2-rc2-mm2.w/crypto/sha512.c 2004-01-30 11:32:16.182488120 -0500
@@ -249,7 +249,7 @@
{
struct sha512_ctx *sctx = ctx;
- const static u8 padding[128] = { 0x80, };
+ static u8 padding[128] = { 0x80, };
u32 t;
u64 t2;
On Fri, 30 Jan 2004, Arnd Bergmann wrote:
> James Morris wrote:
> > Have you noticed if this happens for any of the other crypto algorithms?
>
> Just as a reminder, there is still an issue with extreme stack usage
> of some of the algorithms, depending on compiler version and
> flags.
>
> The worst I have seen was around 16kb for twofish_setkey on 64 bit
> s390 with gcc-3.1 (iirc). Right now, I get up to 4kb for this
> function with gcc-3.3.1, which probably works but is definitely
> a bad sign. I've seen this as well on other architectures (iirc
> on x86_64), but not as severe.
>
> Other algorithms are bad as well, these are the top scores from
> J?rn Engel's checkstack.pl (s390 64bit 2.6.1 gcc-3.3.1):
>
> 0x00000a twofish_setkey: lay %r15,-3960(%r15)
> 0x0026fc aes_decrypt: lay %r15,-1168(%r15)
> 0x000c0c aes_encrypt: lay %r15,-1000(%r15)
> 0x00000e sha512_transform: lay %r15,-936(%r15)
> 0x001292 test_deflate: lay %r15,-784(%r15)
> 0x0028a2 cast6_decrypt: lay %r15,-696(%r15)
> 0x00d1a0 twofish_encrypt: lay %r15,-664(%r15)
> 0x001b34 setkey: lay %r15,-656(%r15)
> 0x00e2b0 twofish_decrypt: lay %r15,-624(%r15)
> 0x000c9e cast6_encrypt: lay %r15,-600(%r15)
> 0x000014 sha1_transform: lay %r15,-504(%r15)
I'm not seeing anything like this on x86 (e.g. stack usage is 8 bytes),
and can't see an obvious way to fix the problem for your compiler.
- James
--
James Morris
<[email protected]>
On Fri, Jan 30, 2004 at 11:35:20AM -0500, James Morris wrote:
> - const u8 padding[64] = { 0x80, };
> + static u8 padding[64] = { 0x80, };
The RedHat bug suggests 'static const' as the appropriate replacement.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=114610#c4
Unfortunately that probably means an extra 64 bytes of text, rather than
the 10 or so bytes of instructions to do the memset and store. Ideally
padding[] would be allocated in BSS rather than text or the stack (and
initialized with { 0x80, } at runtime), but I guess you can't have
everything.
-andy
Jakub said something like...
> GCC handling of automatic const variables with initializers is very fragile.
> There have been numerous bugs in it in the past and last one has been fixed
> just yesterday (on GCC trunk only for the time being). It will be
> eventually backported once it gets some more testing on GCC mainline.
>
> The problematic line in sha256.c is:
> static void sha256_final(void* ctx, u8 *out)
> {
> ...
> const u8 padding[64] = { 0x80, };
Verry interesting. Good work Jakub.
JLC
--
http://www.certainkey.com
Suite 4560 CTTC
1125 Colonel By Dr.
Ottawa ON, K1S 5B6
On Fri, 30 Jan 2004 11:57:20 -0500 (EST) James Morris <[email protected]> wrote:
| On Fri, 30 Jan 2004, Arnd Bergmann wrote:
|
| > James Morris wrote:
| > > Have you noticed if this happens for any of the other crypto algorithms?
| >
| > Just as a reminder, there is still an issue with extreme stack usage
| > of some of the algorithms, depending on compiler version and
| > flags.
| >
| > The worst I have seen was around 16kb for twofish_setkey on 64 bit
| > s390 with gcc-3.1 (iirc). Right now, I get up to 4kb for this
| > function with gcc-3.3.1, which probably works but is definitely
| > a bad sign. I've seen this as well on other architectures (iirc
| > on x86_64), but not as severe.
| >
| > Other algorithms are bad as well, these are the top scores from
| > J?rn Engel's checkstack.pl (s390 64bit 2.6.1 gcc-3.3.1):
| >
| > 0x00000a twofish_setkey: lay %r15,-3960(%r15)
| > 0x0026fc aes_decrypt: lay %r15,-1168(%r15)
| > 0x000c0c aes_encrypt: lay %r15,-1000(%r15)
| > 0x00000e sha512_transform: lay %r15,-936(%r15)
| > 0x001292 test_deflate: lay %r15,-784(%r15)
| > 0x0028a2 cast6_decrypt: lay %r15,-696(%r15)
| > 0x00d1a0 twofish_encrypt: lay %r15,-664(%r15)
| > 0x001b34 setkey: lay %r15,-656(%r15)
| > 0x00e2b0 twofish_decrypt: lay %r15,-624(%r15)
| > 0x000c9e cast6_encrypt: lay %r15,-600(%r15)
| > 0x000014 sha1_transform: lay %r15,-504(%r15)
|
| I'm not seeing anything like this on x86 (e.g. stack usage is 8 bytes),
| and can't see an obvious way to fix the problem for your compiler.
Here's what I see on x86 and gcc 3.2 (for Linux 2.6.1).
What Linux version of the code are you looking at?
$0x180,%esp: c01fd5aa <aes_encrypt+4/1750>
$0x1b0,%esp: c01fecfa <aes_decrypt+4/17da>
$0x230,%esp: c0206acd <test_deflate+b/2f8>
$0x10c,%esp: c0205c90 <test_hmac+6/4fc>
$0x1fc,%esp: c01e9ed8 <sha1_transform+4/178a>
$0x120,%esp: c01eb842 <sha256_transform+6/1ef0>
$0x384,%esp: c01ed988 <sha512_transform+4/17e8>
$0x10c,%esp: c01f1c90 <twofish_setkey+4/7480>
--
~Randy
kernel-janitors project: http://janitor.kernelnewbies.org/
On Fri, 30 Jan 2004, Randy.Dunlap wrote:
> Here's what I see on x86 and gcc 3.2 (for Linux 2.6.1).
> What Linux version of the code are you looking at?
>
>
> $0x180,%esp: c01fd5aa <aes_encrypt+4/1750>
> $0x1b0,%esp: c01fecfa <aes_decrypt+4/17da>
> $0x230,%esp: c0206acd <test_deflate+b/2f8>
> $0x10c,%esp: c0205c90 <test_hmac+6/4fc>
> $0x1fc,%esp: c01e9ed8 <sha1_transform+4/178a>
> $0x120,%esp: c01eb842 <sha256_transform+6/1ef0>
> $0x384,%esp: c01ed988 <sha512_transform+4/17e8>
> $0x10c,%esp: c01f1c90 <twofish_setkey+4/7480>
>
2.6.2-rc2-mm2 with gcc version 3.3.1 20030915 (Red Hat Linux 3.3.1-5)
- James
--
James Morris
<[email protected]>
On Fri, 30 Jan 2004, James Morris wrote:
> On Fri, 30 Jan 2004, Randy.Dunlap wrote:
>
> > Here's what I see on x86 and gcc 3.2 (for Linux 2.6.1).
> > What Linux version of the code are you looking at?
> >
> >
> > $0x180,%esp: c01fd5aa <aes_encrypt+4/1750>
> > $0x1b0,%esp: c01fecfa <aes_decrypt+4/17da>
> > $0x230,%esp: c0206acd <test_deflate+b/2f8>
> > $0x10c,%esp: c0205c90 <test_hmac+6/4fc>
> > $0x1fc,%esp: c01e9ed8 <sha1_transform+4/178a>
> > $0x120,%esp: c01eb842 <sha256_transform+6/1ef0>
> > $0x384,%esp: c01ed988 <sha512_transform+4/17e8>
> > $0x10c,%esp: c01f1c90 <twofish_setkey+4/7480>
> >
>
> 2.6.2-rc2-mm2 with gcc version 3.3.1 20030915 (Red Hat Linux 3.3.1-5)
>
Actually, 24 bytes (not 8):
c02120b0 <twofish_setkey>:
c02120b0: 55 push %ebp
c02120b1: 57 push %edi
c02120b2: 56 push %esi
c02120b3: 53 push %ebx
c02120b4: 83 ec 18 sub $0x18,%esp
- James
--
James Morris
<[email protected]>
On Fri, Jan 30, 2004 at 11:14:07AM -0600, Andy Isaacson wrote:
> On Fri, Jan 30, 2004 at 11:35:20AM -0500, James Morris wrote:
> > - const u8 padding[64] = { 0x80, };
> > + static u8 padding[64] = { 0x80, };
>
> The RedHat bug suggests 'static const' as the appropriate replacement.
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=114610#c4
>
> Unfortunately that probably means an extra 64 bytes of text, rather than
> the 10 or so bytes of instructions to do the memset and store. Ideally
> padding[] would be allocated in BSS rather than text or the stack (and
> initialized with { 0x80, } at runtime), but I guess you can't have
> everything.
Or you can use
u8 padding[64] = { 0x80 };
if you really want to initialize it at runtime and want to work around the
compiler bug. It shouldn't be any less efficient than
const u8 padding[64] = { 0x80 };
since it is used just once, passed to non-inlined function.
Jakub
On Fri, 30 Jan 2004 11:35:20 -0500 (EST)
James Morris <[email protected]> wrote:
> Proposed patch below. I think sha512 would have been ok, but might as
> well make them the same.
>
> R Chan, please test and let us know if it fixes the problem for you.
I'm putting this into my tree(s), thanks James.