2004-10-01 19:33:55

by Denis Vlasenko

[permalink] [raw]
Subject: [PATCH] small sha512 cleanup

Looks like open-coded be_to_cpu.
GCC produces rather poor code for this.
be_to_cpu produces asm()s which are ~4 times shorter.

Compile-tested only.

I am not sure whether input can be 64bit-unaligned.
If it indeed can be, replace:

((u64*)(input))[I] -> get_unaligned( ((u64*)(input))+I )
--
vda


Attachments:
(No filename) (299.00 B)
sha512.c.diff (0.98 kB)
Download all attachments

2004-10-01 20:51:26

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] reduce sha512_transform() stack usage, speedup

On Fri, 1 Oct 2004 23:38:11 +0300
Denis Vlasenko <[email protected]> wrote:

> WARNING: compile tested only.

You can't claim a "speed up" if you only compile test your
changes. Neither can you expect us to apply patches in
such a case.

It's not that difficult to load the tcrypt module and make
sure all the tests for the module you're changing still
pass.

2004-10-01 20:51:28

by Denis Vlasenko

[permalink] [raw]
Subject: [PATCH] reduce sha512_transform() stack usage, speedup

On top of previous:

Patch moves large temporary u64 W[80]
from stack to ctx struct:

* reduces stack usage by 640 bytes
* saves one 640-byte memset() per sha512_transform()
(we still do it after *all* iterations are done)
* quite unexpectedly saves 1.6k of code on i386
because stack offsets now fit into 8bits
and many stack addressing insns got 3 bytes smaller:

# size sha512.o.org sha512.o
text data bss dec hex filename
8281 372 0 8653 21cd sha512.o.org
6649 372 0 7021 1b6d sha512.o

# objdump -d sha512.o.org | cut -b9- >sha512.d.org
# objdump -d sha512.o | cut -b9- >sha512.d
# diff -u sha512.d.org sha512.d
[snip]
: 8b 4b 28 mov 0x28(%ebx),%ecx
: 8b 5b 2c mov 0x2c(%ebx),%ebx
-: 89 8d 44 fd ff ff mov %ecx,0xfffffd44(%ebp)
-: 89 9d 48 fd ff ff mov %ebx,0xfffffd48(%ebp)
-: 89 9d f4 fc ff ff mov %ebx,0xfffffcf4(%ebp)
+: 89 4d c4 mov %ecx,0xffffffc4(%ebp)
+: 89 5d c8 mov %ebx,0xffffffc8(%ebp)
+: 89 9d 64 ff ff ff mov %ebx,0xffffff64(%ebp)
: 8b 5d 08 mov 0x8(%ebp),%ebx
-: 89 8d f0 fc ff ff mov %ecx,0xfffffcf0(%ebp)
+: 89 8d 60 ff ff ff mov %ecx,0xffffff60(%ebp)
: 8b 42 30 mov 0x30(%edx),%eax
: 8b 52 34 mov 0x34(%edx),%edx
[snip]

WARNING: compile tested only.
--
vda


Attachments:
(No filename) (1.46 kB)
sha512.c.W.patch (1.17 kB)
Download all attachments

2004-10-01 21:33:03

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] reduce sha512_transform() stack usage, speedup

On Friday 01 October 2004 23:43, David S. Miller wrote:
> On Fri, 1 Oct 2004 23:38:11 +0300
> Denis Vlasenko <[email protected]> wrote:
>
> > WARNING: compile tested only.
>
> You can't claim a "speed up" if you only compile test your
> changes. Neither can you expect us to apply patches in
> such a case.

Speedup is rather tiny, most probably not measurable.
Patch optimizes out some memsets, otherwise code
practically did not change.

> It's not that difficult to load the tcrypt module and make
> sure all the tests for the module you're changing still
> pass.

Done:

testing sha384
test 1:
cb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7
pass
test 2:
3391fdddfc8dc7393707a65b1b4709397cf8b1d162af05abfe8f450de5f36bc6b0455a8520bc4e6f5fe95b1fe3c8452b
pass
test 3:
09330c33f71147e83d192fc782cd1b4753111b173b3b05d22fa08086e3b0f712fcc7c71a557e2db966c3e9fa91746039
pass
test 4:
3d208973ab3508dbbd7e2c2862ba290ad3010e4978c198dc4d8fd014e582823a89e16f9b2a7bbc1ac938e2d199e8bea4
pass
testing sha384 across pages
test 1:
3d208973ab3508dbbd7e2c2862ba290ad3010e4978c198dc4d8fd014e582823a89e16f9b2a7bbc1ac938e2d199e8bea4
pass

testing sha512
test 1:
ddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
pass
test 2:
204a8fc6dda82f0a0ced7beb8e08a41657c16ef468b228a8279be331a703c33596fd15c13b1b07f9aa1d3bea57789ca031ad85c7a71dd70354ec631238ca3445
pass
test 3:
8e959b75dae313da8cf4f72814fc143f8f7779c6eb9f7fa17299aeadb6889018501d289e4900f7e4331b99dec4b5433ac7d329eeb6dd26545e96e55b874be909
pass
test 4:
930d0cefcb30ff1133b6898121f1cf3d27578afcafe8677c5257cf069911f75d8f5831b56ebfda67b278e66dff8b84fe2b2870f742a580d8edb41987232850c9
pass
testing sha512 across pages
test 1:
930d0cefcb30ff1133b6898121f1cf3d27578afcafe8677c5257cf069911f75d8f5831b56ebfda67b278e66dff8b84fe2b2870f742a580d8edb41987232850c9
pass

Please consider applying.
--
vda