by Jason A. Donenfeld

[permalink] [raw]

Subject: Re: [PATCH crypto-next v2 1/3] crypto: poly1305 - add new 32 and 64-bit generic versions

Hi Eric,

On Fri, Dec 13, 2019 at 4:28 AM Eric Biggers <[email protected]> wrote:
> Now, it's possible that the performance gain outweighs this, and I too would
> like to have the C implementation of Poly1305 be faster. So if you'd like to
> argue for the performance gain, fine, and if there's a significant performance
> gain I don't have an objection. But I'm not sure why you're at the same time
> trying to argue that *adding* an extra implementation somehow makes the code
> easier to audit and doesn't add complexity...

Sorry, I don't mean to be confusing, but I clearly haven't written
very well. There are two things being discussed here, 32-bit and
64-bit, rather than just one. Let me clarify:

- The motivation for the 64-bit version is primarily performance. Its
performance isn't really in dispute. It's significant and good. I'll
put this in the commit message of the next series I send out.

- The motivation for the 32-bit version is primarily to have code that
can be compared line by line to the 64-bit version, in order to make
auditing easier given the situation with two implementations and also
for general cleanliness. I think there's enormous value in having the
other implementation be "parallel". Rather than two totally different
and foreign implementations, we have two related and comparable ones.
That's a good thing. As a *side note*, it might also be slightly
faster than the one it replaces, which is great and all I guess, but
not the primary motivation of the 32-bit version.

Does that make sense? That's why I appear to simultaneously be arguing
that performance matters and doesn't matter. The motivation for the
64-bit version is performance. The motivation for the 32-bit version
is cleanliness. Two things, which are related.

I'll make this clear in the commit message of the next series I send.
Sorry again for being confusing.

Jason

2019-12-15 17:14:25

by Andy Polyakov

[permalink] [raw]

Subject: Re: [PATCH crypto-next v2 2/3] crypto: x86_64/poly1305 - add faster implementations

>> * It removes the existing SSE2 code path. Most likely not that much of
>> an issue due to the new AVX variant.
>
> It's not clear that that sse2 code is even faster than the x86_64
> scalar code in the new implementation, actually. Either way,
> regardless of that, in spite of the previous sentence, I don't think
> it really matters, based on the chips we care about targeting.

There is remark in commentary section. SSE2 was faster on P4 and and
early Core processors, but for non-Intel and contemporary
non-AVX-capable processors, most notably from Atom family, scalar x86_64
*is* fastest option. As for scalar performance on legacy Intel
processors, for me omitting SSE2 meant ~33% loss for oldest P4 and less
for not as old ones. [Just in case, situation is naturally different on
32-bit systems. From coverage vs. performance viewpoint SSE2+AVX2 is
arguably more suitable mix in 32-bit case, AVX makes lesser sense,
because gain is not impressive enough in comparison to SSE2.]

Cheers.