2020-10-21 15:18:36

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule

On Tue, Oct 20, 2020 at 09:36:00PM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 20 October 2020 21:40
> >
> > Putting the round constants and the message schedule arrays together in
> > one structure saves one register, which can be a significant benefit on
> > register-constrained architectures. On x86-32 (tested on Broadwell
> > Xeon), this gives a 10% performance benefit.
>
> I'm actually stunned it makes that much difference.
> The object code must be truly horrid (before and after).
>
> There are probably other strange tweaks that give a similar
> improvement.
>
> David
>

Hm yes, I took a closer look at the generated code, and gcc seems to be
doing something completely braindead. Before this change, it actually
copies 8 words at a time from SHA256_K onto the stack, and uses those
stack temporaries for the calculation. So this patch is giving a benefit
just because it only does the copy once instead of every time around the
loop.

It doesn't even really need a register to hold SHA256_K since this isn't
PIC code, it could just access it directly as SHA256_K(%ecx) if it just
multiplied the loop counter i by 4.