2012-02-15 03:58:39

by David Miller

[permalink] [raw]
Subject: sha-512...


FYI, I just started seeing this on sparc32 after all those
sha512 "optimizations":

crypto/sha512_generic.c: In function 'sha512_transform':
crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]


2012-02-15 04:01:30

by Herbert Xu

[permalink] [raw]
Subject: Re: sha-512...

On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
>
> FYI, I just started seeing this on sparc32 after all those
> sha512 "optimizations":
>
> crypto/sha512_generic.c: In function 'sha512_transform':
> crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]

Is that with the latest patch applied?

crypto: sha512 - Avoid stack bloat on i386

If so then this is not good. What was the original stack usage,
i.e., if you revert to the original percpu code?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2012-02-15 05:11:20

by David Miller

[permalink] [raw]
Subject: Re: sha-512...

From: Herbert Xu <[email protected]>
Date: Wed, 15 Feb 2012 15:01:28 +1100

> On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
>>
>> FYI, I just started seeing this on sparc32 after all those
>> sha512 "optimizations":
>>
>> crypto/sha512_generic.c: In function 'sha512_transform':
>> crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>
> Is that with the latest patch applied?
>
> crypto: sha512 - Avoid stack bloat on i386
>
> If so then this is not good.

Yes. And, of course, with that commit reverted it's even worse.
Reverting it makes the stack frame twice as large.

> What was the original stack usage, i.e., if you revert to the
> original percpu code?

If I revert:

commit 3a92d687c8015860a19213e3c102cad6b722f83c
commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd
commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3
commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d

the stackframe goes down to 888 bytes.

More detailed, the progression is:

master 1136
revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408
revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408
revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520
revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888

2012-02-15 05:16:12

by Herbert Xu

[permalink] [raw]
Subject: Re: sha-512...

On Wed, Feb 15, 2012 at 12:11:13AM -0500, David Miller wrote:
>
> > On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
> >>
> >> FYI, I just started seeing this on sparc32 after all those
> >> sha512 "optimizations":
> >>
> >> crypto/sha512_generic.c: In function 'sha512_transform':
> >> crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> >
> > Is that with the latest patch applied?
> >
> > crypto: sha512 - Avoid stack bloat on i386
> >
> > If so then this is not good.
>
> Yes. And, of course, with that commit reverted it's even worse.
> Reverting it makes the stack frame twice as large.
>
> > What was the original stack usage, i.e., if you revert to the
> > original percpu code?
>
> If I revert:
>
> commit 3a92d687c8015860a19213e3c102cad6b722f83c
> commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd
> commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3
> commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d
>
> the stackframe goes down to 888 bytes.
>
> More detailed, the progression is:
>
> master 1136
> revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408
> revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408
> revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520
> revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888

OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of
that is expected since we moved W onto the stack.

I guess we could go back to the percpu solution, what do you
think?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2012-02-15 05:23:59

by David Miller

[permalink] [raw]
Subject: Re: sha-512...

From: Herbert Xu <[email protected]>
Date: Wed, 15 Feb 2012 16:16:08 +1100

> OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of
> that is expected since we moved W onto the stack.

Right.

> I guess we could go back to the percpu solution, what do you
> think?

I'm not entirely sure, we might have to.

sha512 is notorious for generating terrible code with gcc on 32-bit
targets, so... The sha512 test in the glibc testsuite tends to
timeout on 32-bit sparc. :-)

2012-02-15 19:27:58

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: sha-512...

On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote:
> From: Herbert Xu <[email protected]>
> Date: Wed, 15 Feb 2012 16:16:08 +1100
>
> > OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of
> > that is expected since we moved W onto the stack.
>
> Right.
>
> > I guess we could go back to the percpu solution, what do you
> > think?
>
> I'm not entirely sure, we might have to.
>
> sha512 is notorious for generating terrible code with gcc on 32-bit
> targets, so... The sha512 test in the glibc testsuite tends to
> timeout on 32-bit sparc. :-)

Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig):

00000000 <sha512_transform>:
0: 9d e3 bc 78 save %sp, -904, %sp

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
b85a088f15f2070b7180735a231012843a5ac96c
"crypto: sha512 - use standard ror64()"

2012-02-15 21:00:24

by David Miller

[permalink] [raw]
Subject: Re: sha-512...

From: Alexey Dobriyan <[email protected]>
Date: Wed, 15 Feb 2012 22:27:52 +0300

> On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote:
>> From: Herbert Xu <[email protected]>
>> Date: Wed, 15 Feb 2012 16:16:08 +1100
>>
>> > OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of
>> > that is expected since we moved W onto the stack.
>>
>> Right.
>>
>> > I guess we could go back to the percpu solution, what do you
>> > think?
>>
>> I'm not entirely sure, we might have to.
>>
>> sha512 is notorious for generating terrible code with gcc on 32-bit
>> targets, so... The sha512 test in the glibc testsuite tends to
>> timeout on 32-bit sparc. :-)
>
> Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig):
>
> 00000000 <sha512_transform>:
> 0: 9d e3 bc 78 save %sp, -904, %sp
>
> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> b85a088f15f2070b7180735a231012843a5ac96c
> "crypto: sha512 - use standard ror64()"

I'm happy with a solution that involves pushing this change to Linus's
tree, it's pretty clear why it helps so much although I'm disappointed
that gcc can't se that the u64 shift argument passed in is always a
constant and therefore way within the range of a 32-bit value, ho hum
:-)

In fact, in my tree, this change brings the stack allocation instruction
down to:

save %sp, -824, %sp !

which is actually BETTER than what the old per-cpu code got:

save %sp, -984, %sp !

Therefore I highly recommend we apply that ror() change to Linus's
tree now. :-)

2012-02-16 00:16:45

by Herbert Xu

[permalink] [raw]
Subject: Re: sha-512...

On Wed, Feb 15, 2012 at 04:00:10PM -0500, David Miller wrote:
>
> In fact, in my tree, this change brings the stack allocation instruction
> down to:
>
> save %sp, -824, %sp !
>
> which is actually BETTER than what the old per-cpu code got:
>
> save %sp, -984, %sp !
>
> Therefore I highly recommend we apply that ror() change to Linus's
> tree now. :-)

Great, I'll push that out today.

Thanks,!
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt