2022-07-23 16:56:45

by Jason A. Donenfeld

[permalink] [raw]
Subject: arc4random - are you sure we want these?

Hi glibc developers,

I learned about the addition of the arc4random functions in glibc this
morning, thanks to Phoronix. I wish somebody would have CC'd me into
those discussions before it got committed, but here we are.

I really wonder whether this is a good idea, whether this is something
that glibc wants, and whether it's a design worth committing to in the
long term.

Firstly, for what use cases does this actually help? As of recent
changes to the Linux kernels -- now backported all the way to 4.9! --
getrandom() and /dev/urandom are extremely fast and operate over per-cpu
states locklessly. Sure you avoid a syscall by doing that in userspace,
but does it really matter? Who exactly benefits from this?

Seen that way, it seems like a lot of complexity for nothing, and
complexity that will lead to bugs and various oversights eventually.

For example, the kernel reseeds itself when virtual machines fork using
an identifier passed to the kernel via ACPI. It also reseeds itself on
system resume, both from ordinary S3 sleep but also, more importantly,
from hibernation. And in general, being the arbiter of entropy, the
kernel is much better poised to determine when it makes sense to reseed.

Glibc, on the other hand, can employ some heuristics and make some
decisions -- on fork, after 16 MiB, and the like -- but in general these
are lacking, compared to the much wider array of information the kernel
has.

You miss out on this with arc4random, and if that information _is_ to be
exported to userspace somehow in the future, it would be awfully nice to
design the userspace interface alongside the kernel one.

For that reason, past discussion of having some random number generation
in userspace libcs has geared toward doing this in the vDSO, somehow,
where the kernel can be part and parcel of that effort.

Seen from this perspective, going with OpenBSD's older paradigm might be
rather limiting. Why not work together, between the kernel and libc, to
see if we can come up with something better, before settling on an
interface with semantics that are hard to walk back later?

As-is, it's hard to recommend that anybody really use these functions.
Just keep using getrandom(2), which has mostly favorable semantics.

Yes, I get it: it's fun to make a random number generator, and so lots
of projects figure out some way to make yet another one somewhere
somehow. But the tendency to do so feels like a weird computer tinkerer
disease rather something that has ever helped the overall ecosystem.

So I'm wondering: who actually needs this, and why? What's the
performance requirement like, and why is getrandom(2) insufficient? And
is this really the best approach to take? If this is something needed,
how would you feel about working together on a vDSO approach instead? Or
maybe nobody actually needs this in the first place?

And secondly, is there anyway that glibc can *not* do this, or has that
ship fully sailed, and I really missed out by not being part of that
discussion whenever it was happening?

Thanks,
Jason


2022-07-23 23:02:34

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Cristian,

On Sat, Jul 23, 2022 at 03:04:36PM -0400, Cristian Rodríguez wrote:
> On linux just making this interface call "something" from the VDSO that
>
> - does not block.
> - cannot ever fail or if it does indeed need to bail out it kills the
> calling thread as last resort.
>
> (if neither of those can be provided, we are back to square one)
>
> Will be beyond awesome because it could be usable everywhere,
> including the dynamic linker, malloc or whatever else
> question is..is there any at least experimental patch with a hope of
> beign accepted available ?

Doesn't getrandom() already basically have this quality? If you call
getrandom(0), it'll block until the RNG is initialized once (which now
happens pretty reliably early on in boot). If you call getrandom(GRND_
INSECURE), it will skip that blocking. Both mechanisms are reliable and
available on all current kernel.org stable kernels.

Is there something about these you don't like and think need fixing? I'm
open to suggestions on how to further improve that interface if it has a
notable shortcoming.

If somebody has a compelling performance case that's widespread and
can't be fixed in the kernel alone, I wouldn't be adverse to vDSOing it.
But such an undertaking would probably be contingent on doing this with
the glibc developers, rather than trying to retroactively bandaid an
addition that shipped broken with a documentation cop-out.

Jason

2022-07-23 23:02:34

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Adhemerval,

Thanks for your reply.

On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto wrote:
> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
>
> Mainly performance, since glibc both export getrandom and getentropy.

Okay so your motivation is performance. But can you tell me what your
performance goals actually are? All kernel.org stable kernels from 4.9
and upwards now have really fast per-cpu lockless implementations of
getrandom() and /dev/urandom. If your goal is performance, I would be
very, very interested to find out a circumstance where this is
insufficient.

> There were some discussion on maillist and we also decided to explicit
> state this is not a CSRNG on our documentation.

Okay that's all the more reason why this is a completely garbage
endeavor. Sorry for the strong language, but the last thing anybody
needs is another PRNG that's "half way" between being good for crypto
and not. If it's not good for crypto, people will use it anyway,
especially since you're winking at them saying, "oh but actually
chacha20 is fine technically so....", and then fast-forward a few years
when you realize you can lean on your non-crypto commitment and make
things different. Never underestimate the power of a poorly defined
function definition. If your goal isn't to make a real CSPRNG, why make
this kind of thing at all?

And it's especially ridiculous since the OpenBSD arc4random *is* used
for crypto. So now you've really muddied the waters. (And naturally the
OpenBSD arc4random was done in conjunction with their kernel
development, since the same people work on both, which isn't what's
happened here.)

So your "it's a CSPRNG wink wink but the documentation says not, so
actually we're off the hook for doing this well" is a cop-out that will
lead to trouble.

Going back to my original point: what are the performance requirements
that point toward a userspace RNG being required here? If it's not
actually necessary, then let's not do this. If it is necessary for some
legitimate widespread reason, then let's do this right, and actually
make something you're comfortable calling cryptographically secure. And
let's get this right from the beginning, so that the new interface
doesn't come with all sorts of caveats, "this is safe for glibc ≥
4.3.2.1 only", or whatever else.

Again, I'm not adverse to the general concept. I just haven't seen
anything really justifying adding the complexity for it. And then
assuming that justification does exist somewhere, this approach doesn't
seem to be a particularly well planned one. As soon as you find yourself
reaching for the "documentation cop-out", something has gone amiss.

> The vDSO approach would be good think and if even the kernel provides it
> I think it would feasible to wire-up arc4random to use it if the underlying
> kernel supports it.

So if you justify the performance requirement, wouldn't it make more
sense to just back getrandom() itself with a vDSO call? So that way,
kernels with that get bits faster (but by how much, really? c'mon...),
and kernels without it have things as normal as possible.

If your concern is instances in which getrandom() can fail, I'd like to
here what those concerns are so that interface can be fixed and
improved.

> But in the end I think if we are clear about in on the documentation,
> and provide alternative when the users are aware of the limitation, I do
> not think it is bad decision.

This really strikes me as an almost comically ominous expectation.
Design interfaces that don't have dangerous pitfalls. While
documentation might somehow technically absolve you of responsibility,
it doesn't actually help make the ecosystem safer by providing optimal
interfaces that don't have cop outs.

Anyway, to reiterate:

- Can you show me some concerning performance numbers on the current
batch of kernel.org stable kernels, and the use cases for which those
numbers are concerning, and how widespread you think those use cases
are?

- If this really *is* necessary for some reason, can we do it well out
of the gate, with good coordination between kernel and userland,
instead of half-assing it initially and covering that up with a
documentation note?

Jason

2022-07-23 23:28:16

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?



On 23/07/22 13:25, Jason A. Donenfeld wrote:
> [Resending to right address.]
>
> Hi glibc developers,
>
> I learned about the addition of the arc4random functions in glibc this
> morning, thanks to Phoronix. I wish somebody would have CC'd me into
> those discussions before it got committed, but here we are.

Florian has sent the initial version about four years ago in on libc
alpha ([email protected]). This is the maillist used for glibc
development, RFC, and general discussions.

>
> I really wonder whether this is a good idea, whether this is something
> that glibc wants, and whether it's a design worth committing to in the
> long term.

I think so, this is something developers have been asking us since
2007 [1] and used and ported on multiples OS (OpenBSD, FreeBSD, MAcOSX).

>
> Firstly, for what use cases does this actually help? As of recent
> changes to the Linux kernels -- now backported all the way to 4.9! --
> getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> states locklessly. Sure you avoid a syscall by doing that in userspace,
> but does it really matter? Who exactly benefits from this?

Mainly performance, since glibc both export getrandom and getentropy.
There were some discussion on maillist and we also decided to explicit
state this is not a CSRNG on our documentation.

>
> Seen that way, it seems like a lot of complexity for nothing, and
> complexity that will lead to bugs and various oversights eventually.
>
> For example, the kernel reseeds itself when virtual machines fork using
> an identifier passed to the kernel via ACPI. It also reseeds itself on
> system resume, both from ordinary S3 sleep but also, more importantly,
> from hibernation. And in general, being the arbiter of entropy, the
> kernel is much better poised to determine when it makes sense to reseed.
>
> Glibc, on the other hand, can employ some heuristics and make some
> decisions -- on fork, after 16 MiB, and the like -- but in general these
> are lacking, compared to the much wider array of information the kernel
> has.
>
> You miss out on this with arc4random, and if that information _is_ to be
> exported to userspace somehow in the future, it would be awfully nice to
> design the userspace interface alongside the kernel one.
>
> For that reason, past discussion of having some random number generation
> in userspace libcs has geared toward doing this in the vDSO, somehow,
> where the kernel can be part and parcel of that effort.
>
> Seen from this perspective, going with OpenBSD's older paradigm might be
> rather limiting. Why not work together, between the kernel and libc, to
> see if we can come up with something better, before settling on an
> interface with semantics that are hard to walk back later?

Mainly because there are some programs out there that can still benefit
from a wide-spread interface instead of relying on a not yet implemented
interface that will be only available in a future kernel. But at same
time there nothing prevents us to either use the vDSO-like interface or
improve our implementation with better heuristics or even use a different
cipher algorithm.

There are even some discussion on making arc4random fallback to getrandom
if a tunable or if kernel is set on some strict manner.

>
> As-is, it's hard to recommend that anybody really use these functions.
> Just keep using getrandom(2), which has mostly favorable semantics.
>
> Yes, I get it: it's fun to make a random number generator, and so lots
> of projects figure out some way to make yet another one somewhere
> somehow. But the tendency to do so feels like a weird computer tinkerer
> disease rather something that has ever helped the overall ecosystem.

I did not added because it was 'fun' not I was trying to be clever here,
my initial plan was to use a de-facto implementation based on OpenBSD
exactly to avoid the pitfalls on trying to come up a new RNG scheme.

>
> So I'm wondering: who actually needs this, and why? What's the
> performance requirement like, and why is getrandom(2) insufficient? And
> is this really the best approach to take? If this is something needed,
> how would you feel about working together on a vDSO approach instead? Or
> maybe nobody actually needs this in the first place?

The vDSO approach would be good think and if even the kernel provides it
I think it would feasible to wire-up arc4random to use it if the underlying
kernel supports it. The OpenBSD, for instance, has a feature to instruct
kernel provide direct random data to ELF segment [4]; and they use it to
seed various libc hardening features (way more versatile than AT_RANDOM
and more fail proff than getrandom, as we saw on some environment where).

>
> And secondly, is there anyway that glibc can *not* do this, or has that
> ship fully sailed, and I really missed out by not being part of that
> discussion whenever it was happening?

Well, we are in fact discussing adding arc4random since Florian initial
proposal [2], roughly 4 years ago; and the initial bug report asking is
from 15 years ago.

I still think it is a good addition to provide arc4random for the same
reason we are adding proposing using strlcpy [3]: developers still use
such interface, being source-code compatibility with the BSDs might
help developer to avoid rollout their out implementation (even if some
developers do agree that are not the best interface), and focusing on
one implementation might improve the general ecosystem. As Paul noted,
coreutils has its own RNG, while having a arc4random like interface
might free it to so (at least on glibc systems).

But in the end I think if we are clear about in on the documentation,
and provide alternative when the users are aware of the limitation, I do
not think it is bad decision.

>
> Thanks,
> Jason

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=4417
[2] https://sourceware.org/pipermail/libc-alpha/2018-March/092081.html
[3] https://sourceware.org/pipermail/libc-alpha/2022-June/140093.html
[4] https://github.com/openbsd/src/blob/master/libexec/ld.so/SPECS.randomdata

2022-07-23 23:52:57

by Jason A. Donenfeld

[permalink] [raw]
Subject: arc4random - are you sure we want these?

[Resending to right address.]

Hi glibc developers,

I learned about the addition of the arc4random functions in glibc this
morning, thanks to Phoronix. I wish somebody would have CC'd me into
those discussions before it got committed, but here we are.

I really wonder whether this is a good idea, whether this is something
that glibc wants, and whether it's a design worth committing to in the
long term.

Firstly, for what use cases does this actually help? As of recent
changes to the Linux kernels -- now backported all the way to 4.9! --
getrandom() and /dev/urandom are extremely fast and operate over per-cpu
states locklessly. Sure you avoid a syscall by doing that in userspace,
but does it really matter? Who exactly benefits from this?

Seen that way, it seems like a lot of complexity for nothing, and
complexity that will lead to bugs and various oversights eventually.

For example, the kernel reseeds itself when virtual machines fork using
an identifier passed to the kernel via ACPI. It also reseeds itself on
system resume, both from ordinary S3 sleep but also, more importantly,
from hibernation. And in general, being the arbiter of entropy, the
kernel is much better poised to determine when it makes sense to reseed.

Glibc, on the other hand, can employ some heuristics and make some
decisions -- on fork, after 16 MiB, and the like -- but in general these
are lacking, compared to the much wider array of information the kernel
has.

You miss out on this with arc4random, and if that information _is_ to be
exported to userspace somehow in the future, it would be awfully nice to
design the userspace interface alongside the kernel one.

For that reason, past discussion of having some random number generation
in userspace libcs has geared toward doing this in the vDSO, somehow,
where the kernel can be part and parcel of that effort.

Seen from this perspective, going with OpenBSD's older paradigm might be
rather limiting. Why not work together, between the kernel and libc, to
see if we can come up with something better, before settling on an
interface with semantics that are hard to walk back later?

As-is, it's hard to recommend that anybody really use these functions.
Just keep using getrandom(2), which has mostly favorable semantics.

Yes, I get it: it's fun to make a random number generator, and so lots
of projects figure out some way to make yet another one somewhere
somehow. But the tendency to do so feels like a weird computer tinkerer
disease rather something that has ever helped the overall ecosystem.

So I'm wondering: who actually needs this, and why? What's the
performance requirement like, and why is getrandom(2) insufficient? And
is this really the best approach to take? If this is something needed,
how would you feel about working together on a vDSO approach instead? Or
maybe nobody actually needs this in the first place?

And secondly, is there anyway that glibc can *not* do this, or has that
ship fully sailed, and I really missed out by not being part of that
discussion whenever it was happening?

Thanks,
Jason

2022-07-24 16:51:50

by Cristian Rodríguez

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Sat, Jul 23, 2022 at 6:59 PM Jason A. Donenfeld <[email protected]> wrote:

> Doesn't getrandom() already basically have this quality?

In current kernels. yes. problems with old kernels remain..The syscall
overhead being too high for some use cases is still a remaining
problem,
if that was overcomed it could be used literally for everything,
including simulations and other stuff.

2022-07-24 22:20:33

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Cristian,

On Sun, Jul 24, 2022 at 12:23:43PM -0400, Cristian Rodríguez wrote:
> On Sat, Jul 23, 2022 at 6:59 PM Jason A. Donenfeld <[email protected]> wrote:
>
> > Doesn't getrandom() already basically have this quality?
>
> In current kernels. yes. problems with old kernels remain..

Can you outline specifically which kernels you think those are and what
the problems you think there are? And how arc4random as currently
implemented does away with those problems?

I kind of suspect you don't have something specific in mind...

> The syscall
> overhead being too high for some use cases is still a remaining
> problem,

Really? Do you have any numbers? I would be very surprised to hear that
this is affecting things that intend to use arc4random as a substitute.
Could you give me specifics on this? Again, this sounds made up in the
absence of something real, widespread, and particular.

> if that was overcomed it could be used literally for everything,
> including simulations and other stuff.

You mentioned simulations, but actually simulations are one thing where
you want repeatable randomness -- something insecure with a seed that
gives a good distribution and is extremely fast, so that you can repeat
your simulation with the same data need-be. For this there are various
LFSRs and such that work fine and are well explored. But that's not what
getrandom() is, nor arc4random().

More generally speaking, there are well-defined RNGs that are for
simulations and take seeds, and there are well-defined RNGs that are
sufficient for crypto, and then there's a massive valley of ill-defined
junk in between that people keep shooting themselves in the foot with.

The fact that you won't even call arc4random cryptographically secure
(according to Adhemerval's comment) indicates to me that something has
gone wrong here.

So, please, I urge you to put the breaks on this a little bit. Come up
with numbers. Let's lay out the interfaces and properties we want. And
then we'll see what we can draw up together.

But now I'm just repeating myself. See my earlier reply here:
https://lore.kernel.org/linux-crypto/[email protected]/

Jason

2022-07-25 00:25:28

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Paul,

Sorry I missed your reply earlier. I'm not a subscriber so I missed this
as I somehow fell out of the CC.

On Sat, Jul 23, 2022 at 05:18:05PM +0000, Paul Eggert wrote:
> On 7/23/22 09:25, Jason A. Donenfeld via Libc-alpha wrote:
> > it's hard to recommend that anybody really use these functions.
> > Just keep using getrandom(2), which has mostly favorable semantics.
>
> Yes, that's what I plan to do in GNU projects like Coreutils and Emacs.
>
> Although I don't recommend arc4random, I suppose it was added for
> source-code compatibility with the BSDs (I wasn't involved in the decision).

Source code compatibility isn't exactly a bad goal. But according to
Adhemerval you don't plan on this being a secure thing -- hence
mentioning as such in the documentation as he mentioned -- so it seems
like a maybe-okay goal gone bad. But, anyway, if the goal is just basic
source code compatibility, back it with simple calls to getrandom() to
start, and if later there are performance issues (big if!), we can look
into vDSO tricks and such to speed that up. There's no need to add a
whole new huge fraught mechanism for that.

> > is there anyway that glibc can *not* do this, or has that
> > ship fully sailed
>
> It hasn't fully sailed since we haven't done a release.

Well that's good. I'd recommend just backing it out until it can be done
in a way that glibc developers feel comfortable calling safe (and others
too, of course, but at the very least you don't want to start out making
something you feel the need to warn about in the documentation).

> That's a bit harsh. Coreutils still has its own random number generator
> because it needed to be portable to a bunch of platforms and there was
> no standard. Eventually we'll rip it out but there's no rush. Having
> written much of that code I can reliably assert that it was not fun.

I'm happy to help with this if you need. I recently cleaned up some
stuff similar sounding in systemd for their uses; random-util.c there
might be of interest.

Jason

2022-07-25 00:29:13

by Paul Eggert

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On 7/23/22 09:25, Jason A. Donenfeld via Libc-alpha wrote:
> it's hard to recommend that anybody really use these functions.
> Just keep using getrandom(2), which has mostly favorable semantics.

Yes, that's what I plan to do in GNU projects like Coreutils and Emacs.

Although I don't recommend arc4random, I suppose it was added for
source-code compatibility with the BSDs (I wasn't involved in the decision).

> is there anyway that glibc can *not* do this, or has that
> ship fully sailed

It hasn't fully sailed since we haven't done a release.

> it's fun to make a random number generator, and so lots
> of projects figure out some way to make yet another one somewhere
> somehow.

That's a bit harsh. Coreutils still has its own random number generator
because it needed to be portable to a bunch of platforms and there was
no standard. Eventually we'll rip it out but there's no rush. Having
written much of that code I can reliably assert that it was not fun.

2022-07-25 10:29:11

by Florian Weimer

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

* Cristian Rodríguez:

> On Sat, Jul 23, 2022 at 12:25 PM Jason A. Donenfeld via Libc-alpha
> <[email protected]> wrote:
>
>> For that reason, past discussion of having some random number generation
>> in userspace libcs has geared toward doing this in the vDSO, somehow,
>> where the kernel can be part and parcel of that effort.
>
> On linux just making this interface call "something" from the VDSO that
>
> - does not block.
> - cannot ever fail or if it does indeed need to bail out it kills the
> calling thread as last resort.
>
> (if neither of those can be provided, we are back to square one)
>
> Will be beyond awesome because it could be usable everywhere,
> including the dynamic linker, malloc or whatever else
> question is..is there any at least experimental patch with a hope of
> beign accepted available ?

I agree that this would be nice, but we'd like have to donate
thread-specific data for kernel use, and that's currently totally
vaporware.

The “cannot ever fail” part is impossible to achieve due to old kernels
and seccomp filters. Low-level userspace needs to paper over it in some
way, so that applications don't have to deal with it.

Thanks,
Florian

2022-07-25 10:29:11

by Florian Weimer

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

* Jason A. Donenfeld via Libc-alpha:

> I really wonder whether this is a good idea, whether this is something
> that glibc wants, and whether it's a design worth committing to in the
> long term.

Do you object to the interface, or the implementation?

The implementation can be improved easily enough at a later date.

> Firstly, for what use cases does this actually help? As of recent
> changes to the Linux kernels -- now backported all the way to 4.9! --
> getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> states locklessly. Sure you avoid a syscall by doing that in userspace,
> but does it really matter? Who exactly benefits from this?

getrandom may be fast for bulk generation. It's not that great for
generating a few bits here and there. For example, shuffling a
1,000-element array takes 18 microseconds with arc4random_uniform in
glibc, and 255 microseconds with the naïve getrandom-based
implementation (with slightly biased results; measured on an Intel
i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).

> You miss out on this with arc4random, and if that information _is_ to be
> exported to userspace somehow in the future, it would be awfully nice to
> design the userspace interface alongside the kernel one.

What is the kernel interface you are talking about? From an interface
standpoint, arc4random_buf and getrandom are very similar, with the main
difference is that arc4random_buf cannot report failure (except by
terminating the process).

> Seen from this perspective, going with OpenBSD's older paradigm might be
> rather limiting. Why not work together, between the kernel and libc, to
> see if we can come up with something better, before settling on an
> interface with semantics that are hard to walk back later?

Historically, kernel developers were not interested in solving some of
the hard problems (especially early seeding) that prevent the use of
getrandom during early userspace stages.

> As-is, it's hard to recommend that anybody really use these functions.
> Just keep using getrandom(2), which has mostly favorable semantics.

Some applications still need to run in configurations where getrandom is
not available (either because the kernel is too old, or because it has
been disabled via seccomp).

> Yes, I get it: it's fun to make a random number generator, and so lots
> of projects figure out some way to make yet another one somewhere
> somehow. But the tendency to do so feels like a weird computer tinkerer
> disease rather something that has ever helped the overall ecosystem.

The performance numbers suggest that we benefit from buffering in user
space. It might not be necessary to implement expansion in userspace.
getrandom (or /dev/urandom) with a moderately-sized buffer could be
sufficient.

But that's an implementation detail, and something we can revisit later.
If we vDSO acceleration for getrandom (maybe using the userspace
thread-specific data donation we discussed for rseq), we might
eventually do way with the buffering in glibc. Again this is an
implementation detail we can change easily enough.

Thanks,
Florian

2022-07-25 11:08:02

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Florian,

On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer wrote:
> > I really wonder whether this is a good idea, whether this is something
> > that glibc wants, and whether it's a design worth committing to in the
> > long term.
>
> Do you object to the interface, or the implementation?
>
> The implementation can be improved easily enough at a later date.

Sort of both, as I don't think it's wise to commit to the former without
a good idea of the full ideal space of the latter, and very clearly from
reading that discussion, that hasn't been explored.

In particular, Adhemerval has said you won't be committing to making
arc4random suitable for crypto, going so far as to mention it's not a
CSPRNG in the documentation. As I described in my reply to him (please
read that), the "documentation cop-out" will lead to tears inevitably.
Not only is that dangerous and bad to do alone, but it severely muddies
the waters with what other operating systems suggest about its permitted
use cases.

Here's that email for reference:
https://lore.kernel.org/linux-crypto/[email protected]/

If you're going to ship an interface that people *will* use for
sensitive things -- especially considering Paul's comment about the intent
being "source code compatibility" -- then you must not ship it knowingly
broken by design. There's no amount of documentation papering that makes
this okay. Until you know how to implement it well, don't ship the
interface. And maybe in the process of trying to implement it well,
you'll find something suboptimal about the interface that can be
fixed.

> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
>
> getrandom may be fast for bulk generation. It's not that great for
> generating a few bits here and there. For example, shuffling a
> 1,000-element array takes 18 microseconds with arc4random_uniform in
> glibc, and 255 microseconds with the naïve getrandom-based
> implementation (with slightly biased results; measured on an Intel
> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).

So maybe we should look into vDSO'ing getrandom(), if this is a problem
for real use cases, and you find that these sorts of things are
widespread in real code?

> > You miss out on this with arc4random, and if that information _is_ to be
> > exported to userspace somehow in the future, it would be awfully nice to
> > design the userspace interface alongside the kernel one.
>
> What is the kernel interface you are talking about? From an interface
> standpoint, arc4random_buf and getrandom are very similar, with the main
> difference is that arc4random_buf cannot report failure (except by
> terminating the process).

Referring to information above about reseeding. So in this case it would
be some form of a generation counter most likely. There's also been some
discussion about exporting some aspect of the vmgenid counter to
userspace.

> > Seen from this perspective, going with OpenBSD's older paradigm might be
> > rather limiting. Why not work together, between the kernel and libc, to
> > see if we can come up with something better, before settling on an
> > interface with semantics that are hard to walk back later?
>
> Historically, kernel developers were not interested in solving some of
> the hard problems (especially early seeding) that prevent the use of
> getrandom during early userspace stages.

I really don't know what you're talking about here. I understood you up
until the opening parenthesis, and initially thought to reply, "but I am
interested! let's work together" or something, but then you mentioned
getrandom()'s issues with early userspace, and I became confused. If you
use getrandom(GRND_INSECURE), it won't block and you'll get bytes even
before the rng has seeded. If you use getrandom(0), the kernel's RNG
will use jitter to seed itself ASAP so it doesn't block forever (on
platforms where that's possible, anyhow). Both of these qualities mostly
predate my heavy involvement. So your statement confuses me. But with
that said, if you do find some lack of interest on something you think
is important, please give me a try, and maybe you'll have better luck. I
very much am interested in solving longstanding problems in this domain.

> > As-is, it's hard to recommend that anybody really use these functions.
> > Just keep using getrandom(2), which has mostly favorable semantics.
>
> Some applications still need to run in configurations where getrandom is
> not available (either because the kernel is too old, or because it has
> been disabled via seccomp).

I don't quite understand this. People without getrandom() typically
fallback to using /dev/urandom. "But what if FD in derp derp mountns
derp rlimit derp explosion derp?!" Yes, sure, which is why getrandom()
came about. But doesn't arc4random() fallback to using /dev/urandom in
this exact same way? I don't see how arc4random() really changes the
equation here, except that maybe I should amend my statement to say,
"Just keep using getrandom(2) or /dev/urandom, which has mostly
favorable semantics." (After all, I didn't see any wild-n-crazy fallback
to AT_RANDOM like what systemd does with random-util.c:
https://github.com/systemd/systemd/blob/main/src/basic/random-util.c )

Seen in that sense, as I wrote to Paul, if you're after arc4random for
source code compatibility -- or because you simply like its non-failing
interface and want to commit to that no matter the costs whatsoever --
then you could start by making that a light shim around getrandom()
(falling back to /dev/urandom, I guess), and then we can look into ways
of accelerating getrandom() for new kernels. This way you don't ship
something broken out of the gate, and there's still room for
improvement. Though I would still note that committing to the interface
early like this comes with some concern.

> The performance numbers suggest that we benefit from buffering in user
> space.

The question is whether it's safe and advisable to buffer this way in
userspace. Does userspace have the right information now of when to
discard the buffer and get a new one? I suspect it does not.

> But that's an implementation detail, and something we can revisit later.

No, these are not mere implementation details. When Adhemerval is
talking about warning people in the documentation that this shouldn't be
used for crypto, that should be a wake up call that something is really
off here. Don't ship things you know are broken, and then call that an
"implementation detail" that can be hedged with "documentation".

If a new function, extra_deluxe_memset(), occasionally wrote a 0x41
somewhere unexpected, you'd laugh if somebody called that a mere
implementation detail and suggested you just slap a warning in the
documentation and call it a day.

Jason

2022-07-25 12:44:05

by Florian Weimer

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

* Jason A. Donenfeld via Libc-alpha:

> Hi Florian,
>
> On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer wrote:
>> > I really wonder whether this is a good idea, whether this is something
>> > that glibc wants, and whether it's a design worth committing to in the
>> > long term.
>>
>> Do you object to the interface, or the implementation?
>>
>> The implementation can be improved easily enough at a later date.
>
> Sort of both, as I don't think it's wise to commit to the former without
> a good idea of the full ideal space of the latter, and very clearly from
> reading that discussion, that hasn't been explored.

But we are only concerned with the application interface. Do we really
expect that to be different from arc4random_buf and its variants?

The interface between glibc and the kernel can be changed without
impacting applications.

> In particular, Adhemerval has said you won't be committing to making
> arc4random suitable for crypto, going so far as to mention it's not a
> CSPRNG in the documentation.

Below you suggest to use GRND_INSECURE to avoid deadlocks during
booting. It's documented in the UAPI header as “Return
non-cryptographic random bytes”. I assume it's broadly equivalent to
reading from /dev/urandom (which we need to support for backwards
compatibility, and currently use to avoid blocking). This means that we
cannot really document the resulting bits as cryptographically strong
from an application perspective because the kernel is not willing to
make this commitment.

>> > Firstly, for what use cases does this actually help? As of recent
>> > changes to the Linux kernels -- now backported all the way to 4.9! --
>> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
>> > states locklessly. Sure you avoid a syscall by doing that in userspace,
>> > but does it really matter? Who exactly benefits from this?
>>
>> getrandom may be fast for bulk generation. It's not that great for
>> generating a few bits here and there. For example, shuffling a
>> 1,000-element array takes 18 microseconds with arc4random_uniform in
>> glibc, and 255 microseconds with the naïve getrandom-based
>> implementation (with slightly biased results; measured on an Intel
>> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).
>
> So maybe we should look into vDSO'ing getrandom(), if this is a problem
> for real use cases, and you find that these sorts of things are
> widespread in real code?

We can investigate that, but it doesn't change the application
interface.

>> > You miss out on this with arc4random, and if that information _is_ to be
>> > exported to userspace somehow in the future, it would be awfully nice to
>> > design the userspace interface alongside the kernel one.
>>
>> What is the kernel interface you are talking about? From an interface
>> standpoint, arc4random_buf and getrandom are very similar, with the main
>> difference is that arc4random_buf cannot report failure (except by
>> terminating the process).
>
> Referring to information above about reseeding. So in this case it would
> be some form of a generation counter most likely. There's also been some
> discussion about exporting some aspect of the vmgenid counter to
> userspace.

We don't need any of that in userspace if the staging buffer is managed
by the kernel, which is why the thread-specific data donation is so
attractive as an approach. The kernel knows where all these buffers are
located and can invalidate them as needed.

>> > Seen from this perspective, going with OpenBSD's older paradigm might be
>> > rather limiting. Why not work together, between the kernel and libc, to
>> > see if we can come up with something better, before settling on an
>> > interface with semantics that are hard to walk back later?
>>
>> Historically, kernel developers were not interested in solving some of
>> the hard problems (especially early seeding) that prevent the use of
>> getrandom during early userspace stages.
>
> I really don't know what you're talking about here. I understood you up
> until the opening parenthesis, and initially thought to reply, "but I am
> interested! let's work together" or something, but then you mentioned
> getrandom()'s issues with early userspace, and I became confused. If you
> use getrandom(GRND_INSECURE), it won't block and you'll get bytes even
> before the rng has seeded. If you use getrandom(0), the kernel's RNG
> will use jitter to seed itself ASAP so it doesn't block forever (on
> platforms where that's possible, anyhow). Both of these qualities mostly
> predate my heavy involvement. So your statement confuses me. But with
> that said, if you do find some lack of interest on something you think
> is important, please give me a try, and maybe you'll have better luck. I
> very much am interested in solving longstanding problems in this domain.

I tried to de-escalate here, and clearly that didn't work. The context
here is that historically, working with the “random” kernel maintainers
has been very difficult for many groups of people. Many of us are tired
of those non-productive discussions. I forgot that this has recently
changed on the kernel side. I understand that it's taking years to
overcome these perceptions. glibc is still struggling with this, too.

Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but
as I wrote above, it's UAPI documentation is a bit scary. Maybe it
would be possible to clarify this in the manual pages a bit? I *assume*
that if we are willing to read from /dev/urandom, we can use
GRND_INSECURE right away to avoid that fallback path on sufficiently new
kernels. But it would be nice to have confirmation.

>> > As-is, it's hard to recommend that anybody really use these functions.
>> > Just keep using getrandom(2), which has mostly favorable semantics.
>>
>> Some applications still need to run in configurations where getrandom is
>> not available (either because the kernel is too old, or because it has
>> been disabled via seccomp).
>
> I don't quite understand this. People without getrandom() typically
> fallback to using /dev/urandom. "But what if FD in derp derp mountns
> derp rlimit derp explosion derp?!" Yes, sure, which is why getrandom()
> came about. But doesn't arc4random() fallback to using /dev/urandom in
> this exact same way? I don't see how arc4random() really changes the
> equation here, except that maybe I should amend my statement to say,
> "Just keep using getrandom(2) or /dev/urandom, which has mostly
> favorable semantics." (After all, I didn't see any wild-n-crazy fallback
> to AT_RANDOM like what systemd does with random-util.c:
> https://github.com/systemd/systemd/blob/main/src/basic/random-util.c )

I had some patches with AT_RANDOM fallback, including overwriting
AT_RANDOM with output from the seeded PRNG. It's certainly messy. I
probably didn't bother to post these patches given how bizarre the whole
thing was. I did have fallback to CPU instructions, but that turned out
to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware,
unclear).

> Seen in that sense, as I wrote to Paul, if you're after arc4random for
> source code compatibility -- or because you simply like its non-failing
> interface and want to commit to that no matter the costs whatsoever --
> then you could start by making that a light shim around getrandom()
> (falling back to /dev/urandom, I guess), and then we can look into ways
> of accelerating getrandom() for new kernels. This way you don't ship
> something broken out of the gate, and there's still room for
> improvement. Though I would still note that committing to the interface
> early like this comes with some concern.

The ChaCha20 generator we currently have in the tree may not be
required, true. But this doesn't make what we have today “broken”, it's
merely overly complicated. And replacing that with a straight buffer
from getrandom does not change the external interface, so we can do this
any time we want.

>> The performance numbers suggest that we benefit from buffering in user
>> space.
>
> The question is whether it's safe and advisable to buffer this way in
> userspace. Does userspace have the right information now of when to
> discard the buffer and get a new one? I suspect it does not.

Not completely, no, but we can cover many cases. I do not currently see
a way around that if we want to promote arc4random_uniform(limit) as a
replacement for random() % limit.

>> But that's an implementation detail, and something we can revisit later.
>
> No, these are not mere implementation details. When Adhemerval is
> talking about warning people in the documentation that this shouldn't be
> used for crypto, that should be a wake up call that something is really
> off here. Don't ship things you know are broken, and then call that an
> "implementation detail" that can be hedged with "documentation".

Again, given the issues around GRND_INSECURE (the reason why it exists),
we do not have much choice on the glibc side. And these issues will be
there for the foreseeable future, whether glibc provides arc4random or
not.

Thanks,
Florian

2022-07-25 13:27:15

by Jeffrey Walton

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Mon, Jul 25, 2022 at 7:08 AM Jason A. Donenfeld <[email protected]> wrote:
> ...
> > The performance numbers suggest that we benefit from buffering in user
> > space.
>
> The question is whether it's safe and advisable to buffer this way in
> userspace. Does userspace have the right information now of when to
> discard the buffer and get a new one? I suspect it does not.

I _think_ the sharp edge on userspace buffering is generator state.
Most generator threat models I have seen assume the attacker does not
know the generator's state. If buffering occurs in the application,
then it may be easier for an attacker to learn of the generator's
state. If buffering occurs in the kernel, then generator state should
be private from an userspace application's view.

Jeff

2022-07-25 13:47:55

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Florian,

On Mon, Jul 25, 2022 at 02:39:24PM +0200, Florian Weimer wrote:
> Below you suggest to use GRND_INSECURE to avoid deadlocks during
> booting. It's documented in the UAPI header as “Return
> non-cryptographic random bytes”. I assume it's broadly equivalent to
> reading from /dev/urandom (which we need to support for backwards
> compatibility, and currently use to avoid blocking). This means that we
> cannot really document the resulting bits as cryptographically strong
> from an application perspective because the kernel is not willing to
> make this commitment.
> Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but
> as I wrote above, it's UAPI documentation is a bit scary. Maybe it
> would be possible to clarify this in the manual pages a bit? I *assume*
> that if we are willing to read from /dev/urandom, we can use
> GRND_INSECURE right away to avoid that fallback path on sufficiently new
> kernels. But it would be nice to have confirmation.

getrandom(GRND_INSECURE) is the same as getrandom(0), except before the
RNG is seeded, in which case the former will return ~garbage randomness
while the latter will block. The only current difference between
getrandom(GRND_INSECURE) and /dev/urandom is the latter will try for a
second to do the jitter entropy thing if the RNG isn't seeded yet.

I agree that the documentation around this is really bad. Actually, so
much of the documentation is out of date or confusing. Thanks for the
kick on this: I really do need to rewrite that / clean it up.

So with my random.c maintainer hat on: getrandom(GRND_INSECURE) will
return the same "quality" randomness as getrandom(0), except before
the RNG is initialized. I'll fix up the docs for that, but feel free to
refer to this statement ahead of that if you need.

Code-wise, the only relevant branch related to GRND_INSECURE is:

if (!crng_ready() && !(flags & GRND_INSECURE)) {
if (flags & GRND_NONBLOCK)
return -EAGAIN;
ret = wait_for_random_bytes();
if (unlikely(ret))
return ret;
}

That means: if it's not ready, and you didn't pass _INSECURE, and you
didn't pass _NONBLOCK, then wait for the RNG to be ready, and error out
if that's interrupted by a signal. Other than that one block, it
continues on to do the same thing as getrandom(0).

With that said, however, I think it'd be nice if you used only blocking
randomness, and shove the initialization problem at init systems and
bootloaders and such. In 5.20, for example, there'll be an x86 boot
protocol for GRUB and kexec and hypervisors and such to pass a seed, and
since a long time, there exists a device tree attribute for the same.
Proliferating "unsafe" /dev/urandom-style usage doesn't seem good for
the ecosystem at large. And I'm in general interest in seeing progress
on decades long initialization-time seeding concerns.

> > Sort of both, as I don't think it's wise to commit to the former without
> > a good idea of the full ideal space of the latter, and very clearly from
> > reading that discussion, that hasn't been explored.
>
> But we are only concerned with the application interface. Do we really
> expect that to be different from arc4random_buf and its variants?
>
> The interface between glibc and the kernel can be changed without
> impacting applications.

I feel like you missed the whole thrust of my argument, in which I
caution against shipping something that's known-broken, particularly
when it pertains to something sensitive like generating secret keys.

Regarding the application interface: it's still unclear what's best
until we start trying to see what the implementation would look like.
Just to pick something floating around in my head now since reading your
last email: there seems to be some question about whether arc4random
should block or not. If it's used for crypto, it probably should. But
maybe you want an interface that doesn't. Perhaps that discussion leads
naturally to exposing a flag. Or not! And then there are related
questions about what the return value should be, if any. The point is
that the devil is often in the details with these things, and I worry
about putting the cart before the horse here.

> >> > You miss out on this with arc4random, and if that information _is_ to be
> >> > exported to userspace somehow in the future, it would be awfully nice to
> >> > design the userspace interface alongside the kernel one.
> >>
> >> What is the kernel interface you are talking about? From an interface
> >> standpoint, arc4random_buf and getrandom are very similar, with the main
> >> difference is that arc4random_buf cannot report failure (except by
> >> terminating the process).
> >
> > Referring to information above about reseeding. So in this case it would
> > be some form of a generation counter most likely. There's also been some
> > discussion about exporting some aspect of the vmgenid counter to
> > userspace.
>
> We don't need any of that in userspace if the staging buffer is managed
> by the kernel, which is why the thread-specific data donation is so
> attractive as an approach. The kernel knows where all these buffers are
> located and can invalidate them as needed.

There still might be a need for userspace to have that information, for
network protocol implementations that need to drop their ephemeral keys
on a virtual machine fork, for example. But that's kind of a different
discussion. For the purposes of a vDSO'd getrandom(), I agree that the
kernel managing a buffer that's just an opaque blob to userspace is
probably the best option.

> I tried to de-escalate here, and clearly that didn't work. The context
> here is that historically, working with the “random” kernel maintainers
> has been very difficult for many groups of people. Many of us are tired
> of those non-productive discussions. I forgot that this has recently
> changed on the kernel side. I understand that it's taking years to
> overcome these perceptions. glibc is still struggling with this, too.

Oh, I see what you're getting at. Yea, sure, things are potentially
different now. I'm eager to work on this, so if you're finding things
that are lacking, I'm all ears for fixing them.

> I had some patches with AT_RANDOM fallback, including overwriting
> AT_RANDOM with output from the seeded PRNG. It's certainly messy. I
> probably didn't bother to post these patches given how bizarre the whole
> thing was. I did have fallback to CPU instructions, but that turned out
> to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware,
> unclear).

Yea, it's kind of tricky as other things might be using AT_RANDOM also
and then you have a whole race issue and domain separation and whatnot.
The thing in systemd isn't really good for crypto -- no forward secrecy
and such -- but it's ostensibly better than random().

> The ChaCha20 generator we currently have in the tree may not be
> required, true. But this doesn't make what we have today “broken”, it's
> merely overly complicated. And replacing that with a straight buffer
> from getrandom does not change the external interface, so we can do this
> any time we want.

Whether you use chacha20 in a fast key erasure construction, or you
buffer lots of bytes of getrandom() that you overwrite with zeros as you
use doesn't really matter in the sense that these are both just forms of
buffering. With the chacha20 one, you're reseeding after 16 megs, but of
course the state is smaller, but that doesn't matter. For purposes here,
we may as well treat that as buffering 16 megs of getrandom() output. My
concern with this buffering is that userspace doesn't know when to
invalidate the buffer. So a userspace that's using arc4random() for
crypto will potentially be missing something *important* that a
userspace who used getrandom() instead would have.

When I brought this up with Adhemerval, his reply was that it doesn't
matter anyway because arc4random() is going to be documented as not for
cryptography. So it sounded like the author of it finds it worse too. So
yikes.

The whole point is that you shouldn't ship something sensitive that is
worse than what it will potentially replace, right out of the gate. Slow
down and get the thing right, and then ship it.

> Not completely, no, but we can cover many cases. I do not currently see
> a way around that if we want to promote arc4random_uniform(limit) as a
> replacement for random() % limit.

I agree that the rejection sampling is the most useful function being
added. Let's say, just for the sake of argument, that you instead added
`getrandom_u{64,32,16,8}_uniform(u_type limit, unsigned long flags)`
that expanded to doing `getrandom(&integer, flags)` and then rejection
sampling on that in a loop like usual. It wouldn't be super great, so
the first optimization would be to observe that the cost of 32 bytes and
the cost of 4 bytes is the same, so you just grab 32 bytes at a time,
which basically guarantees you'll get a good number when rejection
sampling. Alright, fine, but then maybe you want to use it for
shuffling, and then we have your syscall overhead measurements. But
that's where the vDSO approach comes into play for making it fast. Old
systems would have something work that's still safe. New systems would
have something work that's safe and fast. Nobody gets something less
safe. (As a sidenote, notice how my hypothetical API gives larger types
than arc4random_uniform's fixed u32, just sayin'.)

Now, spitballing new APIs is kind of besides the point here, as there
are 100 different ways to bikeshed that, but what I'm trying to suggest
is that there's a way of adding what you want to libc without reducing
the quality of it for users, right from the beginning. So why not start
out conservatively?

Or, if you insist on providing these functions t o d a y, and won't heed
my warnings about designing the APIs alongside the implementations, then
just make them thin wrappers over getrandom(0) *without* doing fancy
buffering, and then optimizations later can improve it. That would be
the incremental approach, which wouldn't harm potential users. It also
wouldn't shut the door on doing the buffering: if the kernel
optimization improvements go nowhere, and you decide it's a lost cause,
you can always change the way it works later, and make that decision
then.

Jason

2022-07-25 13:48:51

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Jeffrey,

Please keep [email protected] CC'd.

On Mon, Jul 25, 2022 at 09:25:58AM -0400, Jeffrey Walton wrote:
> On Mon, Jul 25, 2022 at 7:08 AM Jason A. Donenfeld <[email protected]> wrote:
> > ...
> > > The performance numbers suggest that we benefit from buffering in user
> > > space.
> >
> > The question is whether it's safe and advisable to buffer this way in
> > userspace. Does userspace have the right information now of when to
> > discard the buffer and get a new one? I suspect it does not.
>
> I _think_ the sharp edge on userspace buffering is generator state.
> Most generator threat models I have seen assume the attacker does not
> know the generator's state. If buffering occurs in the application,
> then it may be easier for an attacker to learn of the generator's
> state. If buffering occurs in the kernel, then generator state should
> be private from an userspace application's view.

I guess that's one concern, if you're worried about heartbleed-like
attacks, in which an undetected RNG state compromise might be easier to
pull off.

What I have in mind, though, are the various triggers and heuristics
that the kernel uses for when it needs to reseed. These userspace
doesn't know about.

Jason

2022-07-25 14:01:09

by Cristian Rodríguez

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Mon, Jul 25, 2022 at 9:44 AM Jason A. Donenfeld via Libc-alpha
<[email protected]> wrote:

> Or, if you insist on providing these functions t o d a y, and won't heed
> my warnings about designing the APIs alongside the implementations, then
> just make them thin wrappers over getrandom(0) *without* doing fancy
> buffering, and then optimizations later can improve it. That would be
> the incremental approach, which wouldn't harm potential users. It also
> wouldn't shut the door on doing the buffering: if the kernel
> optimization improvements go nowhere, and you decide it's a lost cause,
> you can always change the way it works later, and make that decision
> then.

My 2CLP here if that matters..I agree with this sentiment/approach.
provide this functions for source compat which all juist call
getrandom and abort on failure *for now*
and then a future iteration can have something done about the syscall
overhead with kernel help.

2022-07-25 15:13:12

by Rich Felker

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer via Libc-alpha wrote:
> * Jason A. Donenfeld via Libc-alpha:
>
> > I really wonder whether this is a good idea, whether this is something
> > that glibc wants, and whether it's a design worth committing to in the
> > long term.
>
> Do you object to the interface, or the implementation?

That was *exactly* my first question too.

> The implementation can be improved easily enough at a later date.
>
> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
>
> getrandom may be fast for bulk generation. It's not that great for
> generating a few bits here and there. For example, shuffling a
> 1,000-element array takes 18 microseconds with arc4random_uniform in
> glibc, and 255 microseconds with the naïve getrandom-based
> implementation (with slightly biased results; measured on an Intel
> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).
>
> > You miss out on this with arc4random, and if that information _is_ to be
> > exported to userspace somehow in the future, it would be awfully nice to
> > design the userspace interface alongside the kernel one.
>
> What is the kernel interface you are talking about? From an interface
> standpoint, arc4random_buf and getrandom are very similar, with the main
> difference is that arc4random_buf cannot report failure (except by
> terminating the process).
>
> > Seen from this perspective, going with OpenBSD's older paradigm might be
> > rather limiting. Why not work together, between the kernel and libc, to
> > see if we can come up with something better, before settling on an
> > interface with semantics that are hard to walk back later?
>
> Historically, kernel developers were not interested in solving some of
> the hard problems (especially early seeding) that prevent the use of
> getrandom during early userspace stages.
>
> > As-is, it's hard to recommend that anybody really use these functions.
> > Just keep using getrandom(2), which has mostly favorable semantics.
>
> Some applications still need to run in configurations where getrandom is
> not available (either because the kernel is too old, or because it has
> been disabled via seccomp).
>
> > Yes, I get it: it's fun to make a random number generator, and so lots
> > of projects figure out some way to make yet another one somewhere
> > somehow. But the tendency to do so feels like a weird computer tinkerer
> > disease rather something that has ever helped the overall ecosystem.
>
> The performance numbers suggest that we benefit from buffering in user
> space. It might not be necessary to implement expansion in userspace.
> getrandom (or /dev/urandom) with a moderately-sized buffer could be
> sufficient.

FWIW I'd rather have a few kB of shareable entropy-expansion .text in
userspace than a few kB per process (or even per thread? >_<) of
nonshareable data any day.

> But that's an implementation detail, and something we can revisit later.
> If we vDSO acceleration for getrandom (maybe using the userspace
> thread-specific data donation we discussed for rseq), we might
> eventually do way with the buffering in glibc. Again this is an
> implementation detail we can change easily enough.

Exactly.

FWIW I've been kinda waiting to see what glibc would do on this after
the posix_random proposal failed, before considering much what we
should do in musl, but the value I see in either is not as an
optimization but as honoring a well-known interface so we have fewer
applications doing their own stupid YOLO stuff trying to get secure
entropy and botching it. So far the best we have is getentropy but it
fails on old kernels. At some point musl will probably implement both
arc4random and getentropy with secure fallback process for old
kernels -- certainly the fallback is needed for meeting the arc4random
contract and I'd like it on both places.

Rich

2022-07-25 16:09:18

by Rich Felker

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
> On 23/07/22 13:25, Jason A. Donenfeld wrote:
> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
>
> Mainly performance, since glibc both export getrandom and getentropy.
> There were some discussion on maillist and we also decided to explicit
> state this is not a CSRNG on our documentation.

This is an extreme documentation/specification bug that *hurts*
portability and security. The core contract of the historical
arc4random function is that it *is* a CSPRNG. Having a function by
that name that's allowed not to be one means now all software using it
has to add detection for the broken glibc variant.

If the glibc implementation has flaws that actually make it not a
CSPRNG, this absolutely needs to be fixed. Not doing so is
irresponsible and will set everyone back a long ways.

If this is just a case of trying to be "cautious" about overpromising
things, the documentation needs fixed to specify that this is a
CSPRNG. I'm particularly worried about the wording "these still use a
Pseudo-Random generator and should not be used in cryptographic
contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make
it not cryptographically safe. The safety depends on the original
source of the entropy and the practical irreversibility and other
cryptographic properties of the extension function. The fact that this
has been stated so poorly in the documentation really has me worried
that someone does not understand the issues. I haven't dug into the
list mails or actual code to determine to what extent that's the case,
but it's really, *really* worrying.

Rich

2022-07-25 16:09:47

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?



On 25/07/22 12:33, Rich Felker wrote:
>
> If this is just a case of trying to be "cautious" about overpromising
> things, the documentation needs fixed to specify that this is a
> CSPRNG. I'm particularly worried about the wording "these still use a
> Pseudo-Random generator and should not be used in cryptographic
> contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make
> it not cryptographically safe. The safety depends on the original
> source of the entropy and the practical irreversibility and other
> cryptographic properties of the extension function. The fact that this
> has been stated so poorly in the documentation really has me worried
> that someone does not understand the issues. I haven't dug into the
> list mails or actual code to determine to what extent that's the case,
> but it's really, *really* worrying.

That's the main drive to avoid calling CSPRNGs, since nor me or Florian
is secure enough to certify current scheme can actually follow all the
requirements. It does follow OpenBSD strategy of a fast-key-erasure
random-number generators, although all strategies of key reseeding are
basically heuristics.

If I understand Jason argument correctly, unless we have a kernel API
which it actually handles the buffer (so it can reseed or clear when it
seems fit), there is no point is proving a CSPRNGs in userspace, use
getrandom instead.


2022-07-25 16:23:19

by Sandy Harris

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Rich Felker <[email protected]> wrote:

> This is an extreme documentation/specification bug that *hurts*
> portability and security. The core contract of the historical
> arc4random function is that it *is* a CSPRNG. Having a function by
> that name that's allowed not to be one means now all software using it
> has to add detection for the broken glibc variant.
>
> If the glibc implementation has flaws that actually make it not a
> CSPRNG, this absolutely needs to be fixed. Not doing so is
> irresponsible and will set everyone back a long ways.

Exactly!

2022-07-25 16:29:54

by Rich Felker

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Mon, Jul 25, 2022 at 03:43:57PM +0200, Jason A. Donenfeld via Libc-alpha wrote:
> Hi Florian,
>
> On Mon, Jul 25, 2022 at 02:39:24PM +0200, Florian Weimer wrote:
> > Below you suggest to use GRND_INSECURE to avoid deadlocks during
> > booting. It's documented in the UAPI header as “Return
> > non-cryptographic random bytes”. I assume it's broadly equivalent to
> > reading from /dev/urandom (which we need to support for backwards
> > compatibility, and currently use to avoid blocking). This means that we
> > cannot really document the resulting bits as cryptographically strong
> > from an application perspective because the kernel is not willing to
> > make this commitment.
> > Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but
> > as I wrote above, it's UAPI documentation is a bit scary. Maybe it
> > would be possible to clarify this in the manual pages a bit? I *assume*
> > that if we are willing to read from /dev/urandom, we can use
> > GRND_INSECURE right away to avoid that fallback path on sufficiently new
> > kernels. But it would be nice to have confirmation.
>
> getrandom(GRND_INSECURE) is the same as getrandom(0), except before the
> RNG is seeded, in which case the former will return ~garbage randomness
> while the latter will block. The only current difference between
> getrandom(GRND_INSECURE) and /dev/urandom is the latter will try for a
> second to do the jitter entropy thing if the RNG isn't seeded yet.
>
> I agree that the documentation around this is really bad. Actually, so
> much of the documentation is out of date or confusing. Thanks for the
> kick on this: I really do need to rewrite that / clean it up.
>
> So with my random.c maintainer hat on: getrandom(GRND_INSECURE) will
> return the same "quality" randomness as getrandom(0), except before
> the RNG is initialized. I'll fix up the docs for that, but feel free to
> refer to this statement ahead of that if you need.
>
> Code-wise, the only relevant branch related to GRND_INSECURE is:
>
> if (!crng_ready() && !(flags & GRND_INSECURE)) {
> if (flags & GRND_NONBLOCK)
> return -EAGAIN;
> ret = wait_for_random_bytes();
> if (unlikely(ret))
> return ret;
> }
>
> That means: if it's not ready, and you didn't pass _INSECURE, and you
> didn't pass _NONBLOCK, then wait for the RNG to be ready, and error out
> if that's interrupted by a signal. Other than that one block, it
> continues on to do the same thing as getrandom(0).
>
> With that said, however, I think it'd be nice if you used only blocking
> randomness, and shove the initialization problem at init systems and
> bootloaders and such. In 5.20, for example, there'll be an x86 boot
> protocol for GRUB and kexec and hypervisors and such to pass a seed, and
> since a long time, there exists a device tree attribute for the same.
> Proliferating "unsafe" /dev/urandom-style usage doesn't seem good for
> the ecosystem at large. And I'm in general interest in seeing progress
> on decades long initialization-time seeding concerns.

arc4random's contract is supposed to be that it always succeeds and
always produces cryptographic output. It cannot use GRND_INSECURE or
other insecure fallback methods to avoid blocking. It has to block.
This function (inherently, in its contract) is not usable for early
boot stuff where one is pretending to want actual cryptographic
entropy but is just as happy getting some "high quality" non-CS stuff,
and thereby would be just as happy with rand() or likely even with
"42". Programs that will run in that context on Linux need to be
explicitly aware of the messy "early boot" situation and figure out
how they're going to handle it securely or if they even wanted CS
randomness to begin with. Fortunately virtually nothing has to do
that. On most (non-embedded) systems, init can just bring up a rw
filesystem with saved entropy on it early and load that, then provide
a fully-working environment to programs it invokes.

> > I had some patches with AT_RANDOM fallback, including overwriting
> > AT_RANDOM with output from the seeded PRNG. It's certainly messy. I
> > probably didn't bother to post these patches given how bizarre the whole
> > thing was. I did have fallback to CPU instructions, but that turned out
> > to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware,
> > unclear).
>
> Yea, it's kind of tricky as other things might be using AT_RANDOM also
> and then you have a whole race issue and domain separation and whatnot.
> The thing in systemd isn't really good for crypto -- no forward secrecy
> and such -- but it's ostensibly better than random().

AT_RANDOM is unusable as a fallback here because it's equivalent to
GRND_INSECURE. It's silently broken at early boot time. In musl we're
likely going to end up using the legacy SYS_sysctl on pre-getrandom
kernels even though it spammed syslog just because it seems to be the
only way to get blocking secure entropy on those kernels.

Rich

2022-07-25 16:57:13

by Florian Weimer

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

* Rich Felker:

> AT_RANDOM is unusable as a fallback here because it's equivalent to
> GRND_INSECURE. It's silently broken at early boot time. In musl we're
> likely going to end up using the legacy SYS_sysctl on pre-getrandom
> kernels even though it spammed syslog just because it seems to be the
> only way to get blocking secure entropy on those kernels.

Even pre-getrandom, sysctl was rarely enabled in kernel configurations
if I recall correctly. I doubt it is an option to avoid process
termination with old kernels/seccomp filters.

Thanks,
Florian

2022-07-25 16:58:09

by Florian Weimer

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

* Rich Felker:

> On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
>> On 23/07/22 13:25, Jason A. Donenfeld wrote:
>> > Firstly, for what use cases does this actually help? As of recent
>> > changes to the Linux kernels -- now backported all the way to 4.9! --
>> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
>> > states locklessly. Sure you avoid a syscall by doing that in userspace,
>> > but does it really matter? Who exactly benefits from this?
>>
>> Mainly performance, since glibc both export getrandom and getentropy.
>> There were some discussion on maillist and we also decided to explicit
>> state this is not a CSRNG on our documentation.
>
> This is an extreme documentation/specification bug that *hurts*
> portability and security. The core contract of the historical
> arc4random function is that it *is* a CSPRNG. Having a function by
> that name that's allowed not to be one means now all software using it
> has to add detection for the broken glibc variant.
>
> If the glibc implementation has flaws that actually make it not a
> CSPRNG, this absolutely needs to be fixed. Not doing so is
> irresponsible and will set everyone back a long ways.

The core issue is that on some kernels/architectures, reading from
/dev/urandom can degrade to GRND_INSECURE (approximately), and while the
result is likely still unpredictable, not everyone would label that as a
CSPRNG.

If we document arc4random as a CSPRNG, this means that we would have to
ditch the fallback code and abort the process if the getrandom system
call is not available: when reading from /dev/urandom as a fallback, we
have no way of knowing if we are in any of the impacted execution
environments. Based on your other comments, it seems that you are
interested in such fallbacks, too, but I don't think you can actually
have both (CSPRNG + fallback).

And then there is the certification issue. We really want applications
that already use OpenSSL for other cryptography to use RAND_bytes
instead of arc4random. Likewise for GNUTLS and gnutls_rnd. What should
authors of those cryptographic libraries? That's less clear, and really
depends on the constraints they operate in (e.g., they may target only a
subset of architectures and kernel versions).

Thanks,
Florian

2022-07-25 17:03:18

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi Florian,

On Mon, Jul 25, 2022 at 06:40:54PM +0200, Florian Weimer wrote:
> The core issue is that on some kernels/architectures, reading from
> /dev/urandom can degrade to GRND_INSECURE (approximately), and while the
> result is likely still unpredictable, not everyone would label that as a
> CSPRNG.

On some old kernels (though I think not all?), you can poll on
/dev/random. This isn't perfect, as the ancient "non blocking pool"
initialized after the "blocking pool", but it's not too imperfect
either. Take a look at the previously linked random-util.c.

> If we document arc4random as a CSPRNG, this means that we would have to
> ditch the fallback code and abort the process if the getrandom system
> call is not available: when reading from /dev/urandom as a fallback, we
> have no way of knowing if we are in any of the impacted execution
> environments. Based on your other comments, it seems that you are
> interested in such fallbacks, too, but I don't think you can actually
> have both (CSPRNG + fallback).
>
> And then there is the certification issue. We really want applications
> that already use OpenSSL for other cryptography to use RAND_bytes
> instead of arc4random. Likewise for GNUTLS and gnutls_rnd. What should
> authors of those cryptographic libraries? That's less clear, and really
> depends on the constraints they operate in (e.g., they may target only a
> subset of architectures and kernel versions).

I think all of this is yet another indication that there are some major
things to work out -- should we block or not? is buffering safe? is the
interface correct? -- and so we should just back out the arc4random
commit until this has been explored a bit more. We're not gaining
anything from rushing this, especially as a "source code compatibility"
thing, if there's not even agreement between OSes on what the function
does inside.

Jason

PS: please try to keep [email protected] CC'd. I've been
bouncing these manually when not, but it's hard to keep up with that.

2022-07-25 20:49:16

by Paul Eggert

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On 7/24/22 16:55, Jason A. Donenfeld wrote:

> Sorry I missed your reply earlier. I'm not a subscriber so I missed this
> as I somehow fell out of the CC.

Your email provider (Google) rejected email from cs.ucla.edu on the
grounds that its IP address 131.179.128.68 has a "very low reputation".
Google provided no way to appeal or fix the problem.

I am using "Reply All" for this message because Google likely won't
deliver it to you directly. Perhaps someone else can forward it to you
for me. (Sorry to bother the list.)

Perhaps this is a subtle way to encourage our department's faculty to
let Google manage our email. We've resisted so far, though.

2022-07-25 22:00:52

by Rich Felker

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Mon, Jul 25, 2022 at 02:33:05PM -0400, Cristian Rodríguez via Libc-alpha wrote:
> On Mon, Jul 25, 2022 at 1:44 PM Rich Felker <[email protected]> wrote:
>
> > Then don't fallback to /dev/urandom.
>
> Those are my thoughts as well.. but __libc_fatal() if there is no
> usable getrandom syscall with the needed semantics, in short making
> this interface usable only when the kernel is.
>
> This is quite drastic, but probably the only sane way to go.

You can at least try the sysctl and possibly also /dev approaches and
only treat this as fatal as a last resort. If you can inspect
entropy_avail or poll /dev/random to determine that the pool is
initialized this is very safe, I think. And some research on distro
practices might uncover whether this should be believed to be
complete.

(Note: I know some folks have raised seccomp sandboxing as an issue
too, but unlike kernel which is sometimes locked in by legacy
hardware, bad seccomp filters are in principle always fixable and are
a form of user/admin error since it's not valid to make assumptions
about what syscalls libc needs.)

Rich

2022-07-25 22:01:46

by Cristian Rodríguez

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Mon, Jul 25, 2022 at 1:44 PM Rich Felker <[email protected]> wrote:

> Then don't fallback to /dev/urandom.

Those are my thoughts as well.. but __libc_fatal() if there is no
usable getrandom syscall with the needed semantics, in short making
this interface usable only when the kernel is.

This is quite drastic, but probably the only sane way to go.

2022-07-25 23:15:05

by Rich Felker

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Mon, Jul 25, 2022 at 06:40:54PM +0200, Florian Weimer via Libc-alpha wrote:
> * Rich Felker:
>
> > On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
> >> On 23/07/22 13:25, Jason A. Donenfeld wrote:
> >> > Firstly, for what use cases does this actually help? As of recent
> >> > changes to the Linux kernels -- now backported all the way to 4.9! --
> >> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> >> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> >> > but does it really matter? Who exactly benefits from this?
> >>
> >> Mainly performance, since glibc both export getrandom and getentropy.
> >> There were some discussion on maillist and we also decided to explicit
> >> state this is not a CSRNG on our documentation.
> >
> > This is an extreme documentation/specification bug that *hurts*
> > portability and security. The core contract of the historical
> > arc4random function is that it *is* a CSPRNG. Having a function by
> > that name that's allowed not to be one means now all software using it
> > has to add detection for the broken glibc variant.
> >
> > If the glibc implementation has flaws that actually make it not a
> > CSPRNG, this absolutely needs to be fixed. Not doing so is
> > irresponsible and will set everyone back a long ways.
>
> The core issue is that on some kernels/architectures, reading from
> /dev/urandom can degrade to GRND_INSECURE (approximately), and while the
> result is likely still unpredictable, not everyone would label that as a
> CSPRNG.

Then don't fallback to /dev/urandom. It's not even a failsafe fallback
anyway (ENFILE, EMFILE, sandboxes, etc.) so it can't safely be used
here. Instead use SYS_sysctl and poll for entropy_avail, looping until
it's ready. AFAICT this works reliably on all kernels as far back as
glibc supports (assuming nothing idiotic like intentionally patching
or configuring out random support, but then it's PEBKAC error, as no
distros did this).

Rich

2022-07-25 23:15:42

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH] arc4random: simplify design for better safety

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, but it's the best approximation
we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Cristian Rodríguez <[email protected]>
Cc: Paul Eggert <[email protected]>
Cc: [email protected]
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
LICENSES | 23 -
include/stdlib.h | 3 -
stdlib/Makefile | 2 -
stdlib/arc4random.c | 205 ++-----
stdlib/arc4random.h | 48 --
stdlib/chacha20.c | 191 ------
stdlib/tst-arc4random-chacha20.c | 167 -----
sysdeps/aarch64/Makefile | 4 -
sysdeps/aarch64/chacha20-aarch64.S | 314 ----------
sysdeps/aarch64/chacha20_arch.h | 40 --
sysdeps/generic/chacha20_arch.h | 24 -
sysdeps/generic/tls-internal.c | 10 -
sysdeps/mach/hurd/_Fork.c | 2 -
sysdeps/nptl/_Fork.c | 2 -
.../powerpc/powerpc64/be/multiarch/Makefile | 4 -
.../powerpc64/be/multiarch/chacha20-ppc.c | 1 -
.../powerpc64/be/multiarch/chacha20_arch.h | 42 --
sysdeps/powerpc/powerpc64/power8/Makefile | 5 -
.../powerpc/powerpc64/power8/chacha20-ppc.c | 256 --------
.../powerpc/powerpc64/power8/chacha20_arch.h | 37 --
sysdeps/s390/s390-64/Makefile | 6 -
sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------
sysdeps/s390/s390-64/chacha20_arch.h | 45 --
sysdeps/unix/sysv/linux/tls-internal.c | 10 -
sysdeps/x86_64/Makefile | 7 -
sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ----------
sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ----------
sysdeps/x86_64/chacha20_arch.h | 55 --
28 files changed, 52 insertions(+), 2663 deletions(-)
delete mode 100644 stdlib/arc4random.h
delete mode 100644 stdlib/chacha20.c
delete mode 100644 stdlib/tst-arc4random-chacha20.c
delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
delete mode 100644 sysdeps/aarch64/chacha20_arch.h
delete mode 100644 sysdeps/generic/chacha20_arch.h
delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <[email protected]>
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, see
<https://www.gnu.org/licenses/>. */
-
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
libc_hidden_proto (__arc4random_uniform);
extern void __arc4random_buf_internal (void *buffer, size_t len)
attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
- in child process. */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;

extern double __strtod_internal (const char *__restrict __nptr,
char **__restrict __endptr, int __group)
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
# tests

tests-internal := \
- tst-arc4random-chacha20 \
tst-strtod1i \
tst-strtod3 \
tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
# tests-internal

tests-static := \
- tst-arc4random-chacha20 \
tst-secure-getenv \
# tests-static

diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..23a4167987 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
Copyright (C) 2022 Free Software Foundation, Inc.
This file is part of the GNU C Library.

@@ -16,61 +16,14 @@
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */

-#include <arc4random.h>
#include <errno.h>
#include <not-cancel.h>
#include <stdio.h>
#include <stdlib.h>
+#include <sys/poll.h>
#include <sys/mman.h>
#include <sys/param.h>
#include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
- consumed in 'buf' while 'count' is the maximum number of bytes until a
- reseed.
-
- Both the initial seed and reseed try to obtain entropy from the kernel
- and abort the process if none could be obtained.
-
- The state 'buf' improves the usage of the cipher calls, allowing to call
- optimized implementations (if the architecture provides it) and minimize
- function call overhead. */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state. */
-void
-__arc4random_fork_subprocess (void)
-{
- struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
- if (state != NULL)
- {
- explicit_bzero (state, sizeof (*state));
- /* Force key init. */
- state->count = -1;
- }
-}
-
-/* Return the current thread random state or try to create one if there is
- none available. In the case malloc can not allocate a state, arc4random
- will try to get entropy with arc4random_getentropy. */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
- struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
- if (state == NULL)
- {
- state = malloc (sizeof (struct arc4random_state_t));
- if (state != NULL)
- {
- /* Force key initialization on first call. */
- state->count = -1;
- __glibc_tls_internal ()->rand_state = state;
- }
- }
- return state;
-}

static void
arc4random_getrandom_failure (void)
@@ -78,106 +31,67 @@ arc4random_getrandom_failure (void)
__libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
}

-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
{
- chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
-
- /* Mix optional user provided data. */
- if (rnd != NULL)
- {
- size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
- for (size_t i = 0; i < m; i++)
- state->buf[i] ^= rnd[i];
- }
-
- /* Immediately reinit for backtracking resistance. */
- chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
- explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
- state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
+ static bool have_getrandom = true, seen_initialized = false;
+ int fd;

-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
- if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+ if (n == 0)
return;

- int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
- O_RDONLY | O_CLOEXEC));
- if (fd != -1)
+ for (;;)
{
- uint8_t *p = rnd;
- uint8_t *end = p + len;
- do
- {
- ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
- if (ret <= 0)
- arc4random_getrandom_failure ();
- p += ret;
- }
- while (p < end);
-
- if (__close_nocancel (fd) == 0)
- return;
+ ssize_t l;
+
+ if (!have_getrandom)
+ break;
+
+ l = __getrandom_nocancel (p, n, 0);
+ if (l > 0)
+ {
+ if ((size_t) l == n)
+ return; /* Done reading, success. */
+ p = (uint8_t *) p + l;
+ n -= l;
+ continue; /* Interrupted by a signal; keep going. */
+ }
+ else if (l == 0)
+ arc4random_getrandom_failure (); /* Weird, should never happen. */
+ else if (errno == ENOSYS)
+ {
+ have_getrandom = false;
+ break; /* No syscall, so fallback to /dev/urandom. */
+ }
+ arc4random_getrandom_failure (); /* Unknown other error, should never happen. */
}
- arc4random_getrandom_failure ();
-}

-/* Check if the thread context STATE should be reseed with kernel entropy
- depending of requested LEN bytes. If there is less than requested,
- the state is either initialized or reseeded, otherwise the internal
- counter subtract the requested length. */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
- if (state->count <= len || state->count == -1)
+ if (!seen_initialized)
{
- uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
- arc4random_getentropy (rnd, sizeof rnd);
-
- if (state->count == -1)
- chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
- else
- arc4random_rekey (state, rnd, sizeof rnd);
-
- explicit_bzero (rnd, sizeof rnd);
-
- /* Invalidate the buf. */
- state->have = 0;
- memset (state->buf, 0, sizeof state->buf);
- state->count = CHACHA20_RESEED_SIZE;
+ struct pollfd pfd = { .events = POLLIN };
+ pfd.fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+ if (pfd.fd < 0)
+ arc4random_getrandom_failure ();
+ if (__poll(&pfd, 1, -1) < 0)
+ arc4random_getrandom_failure ();
+ if (__close_nocancel(pfd.fd) < 0)
+ arc4random_getrandom_failure ();
+ seen_initialized = true;
}
- else
- state->count -= len;
-}

-void
-__arc4random_buf (void *buffer, size_t len)
-{
- struct arc4random_state_t *state = arc4random_get_state ();
- if (__glibc_unlikely (state == NULL))
- {
- arc4random_getentropy (buffer, len);
- return;
- }
-
- arc4random_check_stir (state, len);
- while (len > 0)
+ fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY);
+ if (fd < 0)
+ arc4random_getrandom_failure ();
+ while (n)
{
- if (state->have > 0)
- {
- size_t m = MIN (len, state->have);
- uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
- memcpy (buffer, ks, m);
- explicit_bzero (ks, m);
- buffer += m;
- len -= m;
- state->have -= m;
- }
- if (state->have == 0)
- arc4random_rekey (state, NULL, 0);
+ ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+ if (l <= 0)
+ arc4random_getrandom_failure ();
+ p = (uint8_t *) p + l;
+ n -= l;
}
+ if (__close_nocancel (fd) < 0)
+ arc4random_getrandom_failure ();
}
libc_hidden_def (__arc4random_buf)
weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +100,7 @@ uint32_t
__arc4random (void)
{
uint32_t r;
-
- struct arc4random_state_t *state = arc4random_get_state ();
- if (__glibc_unlikely (state == NULL))
- {
- arc4random_getentropy (&r, sizeof (uint32_t));
- return r;
- }
-
- arc4random_check_stir (state, sizeof (uint32_t));
- if (state->have < sizeof (uint32_t))
- arc4random_rekey (state, NULL, 0);
- uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
- memcpy (&r, ks, sizeof (uint32_t));
- memset (ks, 0, sizeof (uint32_t));
- state->have -= sizeof (uint32_t);
-
+ __arc4random_buf(&r, sizeof(r));
return r;
}
libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state. */
-#define CHACHA20_STATE_LEN 16
-#define CHACHA20_BLOCK_SIZE 64
-
-/* Maximum number bytes until reseed (16 MB). */
-#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
- backtracking protection and to allow better used of vectorized
- chacha20 implementations. */
-#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
- uint32_t ctx[CHACHA20_STATE_LEN];
- size_t have;
- size_t count;
- uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce. */
-#define CHACHA20_IV_SIZE 16
-#define CHACHA20_KEY_SIZE 32
-
-#define CHACHA20_STATE_LEN 16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
- XOR of the keystream with the plaintext because the plaintext is a
- stream of zeros. */
-
-enum chacha20_constants
-{
- CHACHA20_CONSTANT_EXPA = 0x61707865U,
- CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
- CHACHA20_CONSTANT_2_BY = 0x79622d32U,
- CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
- uint32_t r;
- memcpy (&r, p, sizeof (r));
- return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
- memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v) __builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v) (v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
- state[0] = CHACHA20_CONSTANT_EXPA;
- state[1] = CHACHA20_CONSTANT_ND_3;
- state[2] = CHACHA20_CONSTANT_2_BY;
- state[3] = CHACHA20_CONSTANT_TE_K;
-
- state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
- state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
- state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
- state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
- state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
- state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
- state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
- state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
- state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
- state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
- state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
- state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
- return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
- v ^= read_unaligned_32 (src);
-#endif
- write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
- uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
- uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
- x0 = state[0];
- x1 = state[1];
- x2 = state[2];
- x3 = state[3];
- x4 = state[4];
- x5 = state[5];
- x6 = state[6];
- x7 = state[7];
- x8 = state[8];
- x9 = state[9];
- x10 = state[10];
- x11 = state[11];
- x12 = state[12];
- x13 = state[13];
- x14 = state[14];
- x15 = state[15];
-
- for (int i = 0; i < 20; i += 2)
- {
-#define QROUND(_x0, _x1, _x2, _x3) \
- do { \
- _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \
- _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \
- _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \
- _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \
- } while(0)
-
- QROUND (x0, x4, x8, x12);
- QROUND (x1, x5, x9, x13);
- QROUND (x2, x6, x10, x14);
- QROUND (x3, x7, x11, x15);
-
- QROUND (x0, x5, x10, x15);
- QROUND (x1, x6, x11, x12);
- QROUND (x2, x7, x8, x13);
- QROUND (x3, x4, x9, x14);
- }
-
- state_final (&src[0], &dst[0], set_state (x0 + state[0]));
- state_final (&src[4], &dst[4], set_state (x1 + state[1]));
- state_final (&src[8], &dst[8], set_state (x2 + state[2]));
- state_final (&src[12], &dst[12], set_state (x3 + state[3]));
- state_final (&src[16], &dst[16], set_state (x4 + state[4]));
- state_final (&src[20], &dst[20], set_state (x5 + state[5]));
- state_final (&src[24], &dst[24], set_state (x6 + state[6]));
- state_final (&src[28], &dst[28], set_state (x7 + state[7]));
- state_final (&src[32], &dst[32], set_state (x8 + state[8]));
- state_final (&src[36], &dst[36], set_state (x9 + state[9]));
- state_final (&src[40], &dst[40], set_state (x10 + state[10]));
- state_final (&src[44], &dst[44], set_state (x11 + state[11]));
- state_final (&src[48], &dst[48], set_state (x12 + state[12]));
- state_final (&src[52], &dst[52], set_state (x13 + state[13]));
- state_final (&src[56], &dst[56], set_state (x14 + state[14]));
- state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
- state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- while (bytes >= CHACHA20_BLOCK_SIZE)
- {
- chacha20_block (state, dst, src);
-
- bytes -= CHACHA20_BLOCK_SIZE;
- dst += CHACHA20_BLOCK_SIZE;
- src += CHACHA20_BLOCK_SIZE;
- }
-
- if (__glibc_unlikely (bytes != 0))
- {
- uint8_t stream[CHACHA20_BLOCK_SIZE];
- chacha20_block (state, stream, src);
- memcpy (dst, stream, bytes);
- explicit_bzero (stream, sizeof stream);
- }
-}
-
-/* Get the architecture optimized version. */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
- actual does. */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
- const uint8_t key[CHACHA20_KEY_SIZE] =
- {
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- };
- const uint8_t iv[CHACHA20_IV_SIZE] =
- {
- 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- };
- const uint8_t expected1[CHACHA20_BUFSIZE] =
- {
- 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
- 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
- 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
- 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
- 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
- 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
- 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
- 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
- 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
- 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
- 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
- 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
- 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
- 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
- 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
- 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
- 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
- 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
- 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
- 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
- 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
- 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
- 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
- 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
- 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
- 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
- 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
- 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
- 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
- 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
- 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
- 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
- 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
- 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
- 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
- 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
- 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
- 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
- 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
- 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
- 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
- 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
- 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
- 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
- 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
- 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
- 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
- };
-
- const uint8_t expected2[CHACHA20_BUFSIZE] =
- {
- 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
- 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
- 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
- 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
- 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
- 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
- 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
- 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
- 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
- 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
- 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
- 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
- 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
- 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
- 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
- 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
- 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
- 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
- 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
- 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
- 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
- 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
- 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
- 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
- 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
- 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
- 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
- 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
- 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
- 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
- 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
- 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
- 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
- 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
- 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
- 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
- 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
- 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
- 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
- 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
- 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
- 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
- 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
- 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
- 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
- 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
- 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
- };
-
- /* Check with the expected internal arc4random keystream buffer. Some
- architecture optimizations expects a buffer with a minimum size which
- is a multiple of then ChaCha20 blocksize, so they might not be prepared
- to handle smaller buffers. */
-
- uint8_t output[CHACHA20_BUFSIZE];
-
- uint32_t state[CHACHA20_STATE_LEN];
- chacha20_init (state, key, iv);
-
- /* Check with the initial state. */
- uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
- chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
- TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
- /* And on the next round. */
- chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
- TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
- return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
gen-as-const-headers += tlsdesc.sym
endif

-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
ifeq ($(subdir),gmon)
CFLAGS-mcount.c += -mgeneral-regs-only
endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
- http://cr.yp.to/chacha.html:
-
- chacha-regs.c version 20080118
- D. J. Bernstein
- Public domain. */
-
-#include <sysdep.h>
-
-/* Only LE is supported. */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
- adrp reg, name ; \
- add reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
- ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT x0
-#define DST x1
-#define SRC x2
-#define NBLKS x3
-#define ROUND x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR v0
-#define VTMP0 v1
-#define VTMP1 v2
-#define VTMP2 v3
-#define VTMP3 v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8 v7
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
- zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
- zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
- zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
- zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
- vpunpckhdq(x1, x0, t2); \
- vpunpckldq(x1, x0, x0); \
- \
- vpunpckldq(x3, x2, t1); \
- vpunpckhdq(x3, x2, x2); \
- \
- vpunpckhqdq(t1, x0, x1); \
- vpunpcklqdq(t1, x0, x0); \
- \
- vpunpckhqdq(x2, t2, x3); \
- vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
- 4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
- eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
- add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
- shl dst1.4s, src1.4s, #(c); \
- shl dst2.4s, src2.4s, #(c); \
- shl dst3.4s, src3.4s, #(c); \
- shl dst4.4s, src4.4s, #(c); \
- sri dst1.4s, src1.4s, #(32 - (c)); \
- sri dst2.4s, src2.4s, #(32 - (c)); \
- sri dst3.4s, src3.4s, #(32 - (c)); \
- sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
- tbl dst1.16b, {src1.16b}, ROT8.16b; \
- tbl dst2.16b, {src2.16b}, ROT8.16b; \
- tbl dst3.16b, {src3.16b}, ROT8.16b; \
- tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
- rev32 dst1.8h, src1.8h; \
- rev32 dst2.8h, src2.8h; \
- rev32 dst3.8h, src3.8h; \
- rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
- PLUS(a1,b1); PLUS(a2,b2); \
- PLUS(a3,b3); PLUS(a4,b4); \
- XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \
- XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \
- ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \
- PLUS(c1,d1); PLUS(c2,d2); \
- PLUS(c3,d3); PLUS(c4,d4); \
- XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \
- XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \
- ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \
- PLUS(a1,b1); PLUS(a2,b2); \
- PLUS(a3,b3); PLUS(a4,b4); \
- XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \
- XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \
- ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \
- PLUS(c1,d1); PLUS(c2,d2); \
- PLUS(c3,d3); PLUS(c4,d4); \
- XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \
- XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \
- ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
- .long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
- .byte 3,0,1,2
- .byte 7,4,5,6
- .byte 11,8,9,10
- .byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
- /* input:
- * x0: input
- * x1: dst
- * x2: src
- * x3: nblks (multiple of 4)
- */
-
- GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
- add INPUT_CTR, INPUT, #(12*4);
- ld1 {ROT8.16b}, [CTR];
- GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
- mov INPUT_POS, INPUT;
- ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
- /* Construct counter vectors X12 and X13 */
-
- ld1 {X15.16b}, [INPUT_CTR];
- mov ROUND, #20;
- ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
- dup X12.4s, X15.s[0];
- dup X13.4s, X15.s[1];
- ldr CTR, [INPUT_CTR];
- add X12.4s, X12.4s, VCTR.4s;
- dup X0.4s, VTMP1.s[0];
- dup X1.4s, VTMP1.s[1];
- dup X2.4s, VTMP1.s[2];
- dup X3.4s, VTMP1.s[3];
- dup X14.4s, X15.s[2];
- cmhi VTMP0.4s, VCTR.4s, X12.4s;
- dup X15.4s, X15.s[3];
- add CTR, CTR, #4; /* Update counter */
- dup X4.4s, VTMP2.s[0];
- dup X5.4s, VTMP2.s[1];
- dup X6.4s, VTMP2.s[2];
- dup X7.4s, VTMP2.s[3];
- sub X13.4s, X13.4s, VTMP0.4s;
- dup X8.4s, VTMP3.s[0];
- dup X9.4s, VTMP3.s[1];
- dup X10.4s, VTMP3.s[2];
- dup X11.4s, VTMP3.s[3];
- mov X12_TMP.16b, X12.16b;
- mov X13_TMP.16b, X13.16b;
- str CTR, [INPUT_CTR];
-
-L(round2):
- subs ROUND, ROUND, #2
- QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13,
- X2, X6, X10, X14, X3, X7, X11, X15,
- tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
- QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12,
- X2, X7, X8, X13, X3, X4, X9, X14,
- tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
- b.ne L(round2);
-
- ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
- PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */
- PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */
-
- dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
- dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
- dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
- dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
- PLUS(X0, VTMP2);
- PLUS(X1, VTMP3);
- PLUS(X2, X12_TMP);
- PLUS(X3, X13_TMP);
-
- dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
- dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
- dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
- dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
- ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
- mov INPUT_POS, INPUT;
- PLUS(X4, VTMP2);
- PLUS(X5, VTMP3);
- PLUS(X6, X12_TMP);
- PLUS(X7, X13_TMP);
-
- dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
- dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
- dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
- dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
- dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
- dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
- PLUS(X8, VTMP2);
- PLUS(X9, VTMP3);
- PLUS(X10, X12_TMP);
- PLUS(X11, X13_TMP);
- PLUS(X14, VTMP0);
- PLUS(X15, VTMP1);
-
- transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
- transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
- transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
- transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
- subs NBLKS, NBLKS, #4;
-
- st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
- st1 {X1.16b,X5.16b}, [DST], #32;
- st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
- st1 {X10.16b,X14.16b}, [DST], #32;
- st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
- b.ne L(loop4);
-
- ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
- "CHACHA20_BUFSIZE not multiple of 4");
- _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
- "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
- __chacha20_neon_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
- chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
deleted file mode 100644
index 1b4559ccbc..0000000000
--- a/sysdeps/generic/chacha20_arch.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Chacha20 implementation, generic interface for encrypt.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */

-#include <stdlib/arc4random.h>
#include <string.h>
#include <tls-internal.h>

@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
{
free (__tls_internal.strsignal_buf);
free (__tls_internal.strerror_l_buf);
-
- if (__tls_internal.rand_state != NULL)
- {
- /* Clear any lingering random state prior so if the thread stack is
- cached it won't leak any data. */
- explicit_bzero (__tls_internal.rand_state,
- sizeof (*__tls_internal.rand_state));
- free (__tls_internal.rand_state);
- }
}
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
_hurd_malloc_fork_child ();
call_function_static_weak (__malloc_fork_unlock_child);

- call_function_static_weak (__arc4random_fork_subprocess);
-
/* Run things that want to run in the child task to set up. */
RUN_HOOK (_hurd_fork_child_hook, ());

diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
self->robust_head.list = &self->robust_head;
INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
sizeof (struct robust_list_head));
-
- call_function_static_weak (__arc4random_fork_subprocess);
}
return pid;
}
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
- "CHACHA20_BUFSIZE not multiple of 4");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
- unsigned long int hwcap = GLRO(dl_hwcap);
- unsigned long int hwcap2 = GLRO(dl_hwcap2);
- if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
- __chacha20_power8_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
- else
- chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
ifeq ($(subdir),string)
sysdep_routines += strcasestr-ppc64
endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
- Copyright (C) 2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
- { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
- return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
- return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
- vector4x_u32 vec;
- vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
- vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
- le_bswap_const);
-#endif
- return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
- vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
- le_bswap_const);
-#endif
- vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
- static const vector16x_u8 swap32 =
- { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
- vector2x_u64 vec, add, sum;
-
- vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
- add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
- sum = vec + add;
- return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
- return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
- 4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv) \
- __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
- ((ds) += (s))
-
-#define XOR(ds,s) \
- ((ds) ^= (s))
-
-#define ADD_U64(v,a) \
- (v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
- vector4x_u32 t1 = vec_mergeh(x0, x2); \
- vector4x_u32 t2 = vec_mergel(x0, x2); \
- vector4x_u32 t3 = vec_mergeh(x1, x3); \
- x3 = vec_mergel(x1, x3); \
- x0 = vec_mergeh(t1, t3); \
- x1 = vec_mergel(t1, t3); \
- x2 = vec_mergeh(t2, x3); \
- x3 = vec_mergel(t2, x3); \
- })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t nblks)
-{
- vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
- vector4x_u32 counter_4 = { 4, 0, 0, 0 };
- vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
- vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
- vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
- vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
- vector4x_u32 state0, state1, state2, state3;
- vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
- vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
- vector4x_u32 tmp;
- int i;
-
- /* Force preload of constants to vector registers. */
- __asm__ ("": "+v" (counters_0123) :: "memory");
- __asm__ ("": "+v" (counter_4) :: "memory");
- __asm__ ("": "+v" (rotate_16) :: "memory");
- __asm__ ("": "+v" (rotate_12) :: "memory");
- __asm__ ("": "+v" (rotate_8) :: "memory");
- __asm__ ("": "+v" (rotate_7) :: "memory");
-
- state0 = vec_vsx_ld (0 * 16, state);
- state1 = vec_vsx_ld (1 * 16, state);
- state2 = vec_vsx_ld (2 * 16, state);
- state3 = vec_vsx_ld (3 * 16, state);
-
- do
- {
- v0 = vec_splat (state0, 0);
- v1 = vec_splat (state0, 1);
- v2 = vec_splat (state0, 2);
- v3 = vec_splat (state0, 3);
- v4 = vec_splat (state1, 0);
- v5 = vec_splat (state1, 1);
- v6 = vec_splat (state1, 2);
- v7 = vec_splat (state1, 3);
- v8 = vec_splat (state2, 0);
- v9 = vec_splat (state2, 1);
- v10 = vec_splat (state2, 2);
- v11 = vec_splat (state2, 3);
- v12 = vec_splat (state3, 0);
- v13 = vec_splat (state3, 1);
- v14 = vec_splat (state3, 2);
- v15 = vec_splat (state3, 3);
-
- v12 += counters_0123;
- v13 -= vec_cmplt (v12, counters_0123);
-
- for (i = 20; i > 0; i -= 2)
- {
- QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13)
- QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15)
- QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12)
- QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14)
- }
-
- v0 += vec_splat (state0, 0);
- v1 += vec_splat (state0, 1);
- v2 += vec_splat (state0, 2);
- v3 += vec_splat (state0, 3);
- v4 += vec_splat (state1, 0);
- v5 += vec_splat (state1, 1);
- v6 += vec_splat (state1, 2);
- v7 += vec_splat (state1, 3);
- v8 += vec_splat (state2, 0);
- v9 += vec_splat (state2, 1);
- v10 += vec_splat (state2, 2);
- v11 += vec_splat (state2, 3);
- tmp = vec_splat( state3, 0);
- tmp += counters_0123;
- v12 += tmp;
- v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
- v14 += vec_splat (state3, 2);
- v15 += vec_splat (state3, 3);
- ADD_U64 (state3, counter_4);
-
- transpose_4x4 (v0, v1, v2, v3);
- transpose_4x4 (v4, v5, v6, v7);
- transpose_4x4 (v8, v9, v10, v11);
- transpose_4x4 (v12, v13, v14, v15);
-
- vec_store_le (v0, (64 * 0 + 16 * 0), dst);
- vec_store_le (v1, (64 * 1 + 16 * 0), dst);
- vec_store_le (v2, (64 * 2 + 16 * 0), dst);
- vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
- vec_store_le (v4, (64 * 0 + 16 * 1), dst);
- vec_store_le (v5, (64 * 1 + 16 * 1), dst);
- vec_store_le (v6, (64 * 2 + 16 * 1), dst);
- vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
- vec_store_le (v8, (64 * 0 + 16 * 2), dst);
- vec_store_le (v9, (64 * 1 + 16 * 2), dst);
- vec_store_le (v10, (64 * 2 + 16 * 2), dst);
- vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
- vec_store_le (v12, (64 * 0 + 16 * 3), dst);
- vec_store_le (v13, (64 * 1 + 16 * 3), dst);
- vec_store_le (v14, (64 * 2 + 16 * 3), dst);
- vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
- src += 4*64;
- dst += 4*64;
-
- nblks -= 4;
- }
- while (nblks);
-
- vec_vsx_st (state3, 3 * 16, state);
-
- return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
- "CHACHA20_BUFSIZE not multiple of 4");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
- __chacha20_power8_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
endif

endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
- chacha20-s390x \
- # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher
-
- Copyright (C) 2020 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
- 0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
- 0x80|((value)&0x7f), \
- 0x80|(((value)>>7)&0x7f), \
- 0x80|(((value)>>14)&0x7f), \
- 0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
- .cfi_escape \
- 0x0f, /* DW_CFA_def_cfa_expression */ \
- DW_SLEB128_7BIT(11), /* length */ \
- 0x7f, /* DW_OP_breg15, rsp + constant */ \
- DW_SLEB128_28BIT(rsp_offs), \
- 0x06, /* DW_OP_deref */ \
- 0x23, /* DW_OP_plus_constu */ \
- DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
- .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
- .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
- .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
- .long 0, 0, 0, 1
-.Ladd_counter_0123:
- .long 0, 1, 2, 3
-.Ladd_counter_4567:
- .long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST %r3
-#define SRC %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
- STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
- STACK_FRAME_PARAMS)
-
-#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9 (STACK_F8 + 8)
-#define STACK_F10 (STACK_F9 + 8)
-#define STACK_F11 (STACK_F10 + 8)
-#define STACK_F12 (STACK_F11 + 8)
-#define STACK_F13 (STACK_F12 + 8)
-#define STACK_F14 (STACK_F13 + 8)
-#define STACK_F15 (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST (STACK_INPUT + 8)
-#define STACK_SRC (STACK_DST + 8)
-#define STACK_NBLKS (STACK_SRC + 8)
-#define STACK_POCTX (STACK_NBLKS + 8)
-#define STACK_POSRC (STACK_POCTX + 8)
-
-#define STACK_G0_H3 STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
- lgr %r0, %r15; \
- lghi %r1, ~15; \
- stmg %r6, last_r, 6 * 8(%r15); \
- aghi %r0, -STACK_MAX; \
- ngr %r0, %r1; \
- lgr %r1, %r15; \
- cfi_def_cfa_register(1); \
- lgr %r15, %r0; \
- stg %r1, 0(%r15); \
- cfi_cfa_on_stack(0, 0); \
- std %f8, STACK_F8(%r15); \
- std %f9, STACK_F9(%r15); \
- std %f10, STACK_F10(%r15); \
- std %f11, STACK_F11(%r15); \
- std %f12, STACK_F12(%r15); \
- std %f13, STACK_F13(%r15); \
- std %f14, STACK_F14(%r15); \
- std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
- lg %r1, 0(%r15); \
- ld %f8, STACK_F8(%r15); \
- ld %f9, STACK_F9(%r15); \
- ld %f10, STACK_F10(%r15); \
- ld %f11, STACK_F11(%r15); \
- ld %f12, STACK_F12(%r15); \
- ld %f13, STACK_F13(%r15); \
- ld %f14, STACK_F14(%r15); \
- ld %f15, STACK_F15(%r15); \
- lmg %r6, last_r, 6 * 8(%r1); \
- lgr %r15, %r1; \
- cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
- vaf dst, dst, src;
-
-#define XOR(dst,src) \
- vx dst, dst, src;
-
-#define ROTATE(v1,c) \
- verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
- vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
- OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
- OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
- round macros
- **********************************************************************/
-
-/**********************************************************************
- 8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
- x8,x9,x10,x11,x12,x13,x14,x15,\
- y0,y1,y2,y3,y4,y5,y6,y7,\
- y8,y9,y10,y11,y12,y13,y14,y15,\
- op1,op2,op3,op4,op5,op6,op7,op8,\
- op9,op10,op11,op12) \
- op1; \
- PLUS(x0, x1); PLUS(x4, x5); \
- PLUS(x8, x9); PLUS(x12, x13); \
- PLUS(y0, y1); PLUS(y4, y5); \
- PLUS(y8, y9); PLUS(y12, y13); \
- op2; \
- XOR(x3, x0); XOR(x7, x4); \
- XOR(x11, x8); XOR(x15, x12); \
- XOR(y3, y0); XOR(y7, y4); \
- XOR(y11, y8); XOR(y15, y12); \
- op3; \
- ROTATE(x3, 16); ROTATE(x7, 16); \
- ROTATE(x11, 16); ROTATE(x15, 16); \
- ROTATE(y3, 16); ROTATE(y7, 16); \
- ROTATE(y11, 16); ROTATE(y15, 16); \
- op4; \
- PLUS(x2, x3); PLUS(x6, x7); \
- PLUS(x10, x11); PLUS(x14, x15); \
- PLUS(y2, y3); PLUS(y6, y7); \
- PLUS(y10, y11); PLUS(y14, y15); \
- op5; \
- XOR(x1, x2); XOR(x5, x6); \
- XOR(x9, x10); XOR(x13, x14); \
- XOR(y1, y2); XOR(y5, y6); \
- XOR(y9, y10); XOR(y13, y14); \
- op6; \
- ROTATE(x1,12); ROTATE(x5,12); \
- ROTATE(x9,12); ROTATE(x13,12); \
- ROTATE(y1,12); ROTATE(y5,12); \
- ROTATE(y9,12); ROTATE(y13,12); \
- op7; \
- PLUS(x0, x1); PLUS(x4, x5); \
- PLUS(x8, x9); PLUS(x12, x13); \
- PLUS(y0, y1); PLUS(y4, y5); \
- PLUS(y8, y9); PLUS(y12, y13); \
- op8; \
- XOR(x3, x0); XOR(x7, x4); \
- XOR(x11, x8); XOR(x15, x12); \
- XOR(y3, y0); XOR(y7, y4); \
- XOR(y11, y8); XOR(y15, y12); \
- op9; \
- ROTATE(x3,8); ROTATE(x7,8); \
- ROTATE(x11,8); ROTATE(x15,8); \
- ROTATE(y3,8); ROTATE(y7,8); \
- ROTATE(y11,8); ROTATE(y15,8); \
- op10; \
- PLUS(x2, x3); PLUS(x6, x7); \
- PLUS(x10, x11); PLUS(x14, x15); \
- PLUS(y2, y3); PLUS(y6, y7); \
- PLUS(y10, y11); PLUS(y14, y15); \
- op11; \
- XOR(x1, x2); XOR(x5, x6); \
- XOR(x9, x10); XOR(x13, x14); \
- XOR(y1, y2); XOR(y5, y6); \
- XOR(y9, y10); XOR(y13, y14); \
- op12; \
- ROTATE(x1,7); ROTATE(x5,7); \
- ROTATE(x9,7); ROTATE(x13,7); \
- ROTATE(y1,7); ROTATE(y5,7); \
- ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
- y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
- QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
- x8,x9,x10,x11,x12,x13,x14,x15,\
- y0,y1,y2,y3,y4,y5,y6,y7,\
- y8,y9,y10,y11,y12,y13,y14,y15,\
- ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
- vmrhf tmp0, v0, v1; \
- vmrhf tmp1, v2, v3; \
- vmrlf tmp2, v0, v1; \
- vmrlf v3, v2, v3; \
- vmrhf tmpa, va, vb; \
- vmrhf tmpb, vc, vd; \
- vmrlf tmpc, va, vb; \
- vmrlf vd, vc, vd; \
- vpdi v0, tmp0, tmp1, 0; \
- vpdi v1, tmp0, tmp1, 5; \
- vpdi v2, tmp2, v3, 0; \
- vpdi v3, tmp2, v3, 5; \
- vpdi va, tmpa, tmpb, 0; \
- vpdi vb, tmpa, tmpb, 5; \
- vpdi vc, tmpc, vd, 0; \
- vpdi vd, tmpc, vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
- /* input:
- * %r2: input
- * %r3: dst
- * %r4: src
- * %r5: nblks (multiple of 8)
- */
-
- START_STACK(%r8);
- lgr NBLKS, %r5;
-
- larl %r7, .Lconsts;
-
- /* Load counter. */
- lg %r8, (12 * 4)(INPUT);
- rllg %r8, %r8, 32;
-
-.balign 4
- /* Process eight chacha20 blocks per loop. */
-.Lloop8:
- vlm Y0, Y3, 0(INPUT);
-
- slgfi NBLKS, 8;
- lghi ROUND, (20 / 2);
-
- /* Construct counter vectors X12/X13 & Y12/Y13. */
- vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
- vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
- vrepf Y12, Y3, 0;
- vrepf Y13, Y3, 1;
- vaccf X5, Y12, X4;
- vaccf Y5, Y12, Y4;
- vaf X12, Y12, X4;
- vaf Y12, Y12, Y4;
- vaf X13, Y13, X5;
- vaf Y13, Y13, Y5;
-
- vrepf X0, Y0, 0;
- vrepf X1, Y0, 1;
- vrepf X2, Y0, 2;
- vrepf X3, Y0, 3;
- vrepf X4, Y1, 0;
- vrepf X5, Y1, 1;
- vrepf X6, Y1, 2;
- vrepf X7, Y1, 3;
- vrepf X8, Y2, 0;
- vrepf X9, Y2, 1;
- vrepf X10, Y2, 2;
- vrepf X11, Y2, 3;
- vrepf X14, Y3, 2;
- vrepf X15, Y3, 3;
-
- /* Store counters for blocks 0-7. */
- vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
- vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
- vlr Y0, X0;
- vlr Y1, X1;
- vlr Y2, X2;
- vlr Y3, X3;
- vlr Y4, X4;
- vlr Y5, X5;
- vlr Y6, X6;
- vlr Y7, X7;
- vlr Y8, X8;
- vlr Y9, X9;
- vlr Y10, X10;
- vlr Y11, X11;
- vlr Y14, X14;
- vlr Y15, X15;
-
- /* Update and store counter. */
- agfi %r8, 8;
- rllg %r5, %r8, 32;
- stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
- QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13,
- X2, X6, X10, X14, X3, X7, X11, X15,
- Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13,
- Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15);
- QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12,
- X2, X7, X8, X13, X3, X4, X9, X14,
- Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12,
- Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14);
- brctg ROUND, .Lround2_8;
-
- /* Store blocks 4-7. */
- vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
- /* Load counters for blocks 0-3. */
- vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
- lghi ROUND, 1;
- j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
- /* Load blocks 4-7. */
- vlm X0, X15, STACK_Y0_Y15(%r15);
-
- /* Load counters for blocks 4-7. */
- vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
- lghi ROUND, 0;
-
-.balign 4
- /* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
- vlm Y12, Y15, 0(INPUT);
- PLUS(X12, Y0);
- PLUS(X13, Y1);
- vrepf Y0, Y12, 0;
- vrepf Y1, Y12, 1;
- vrepf Y2, Y12, 2;
- vrepf Y3, Y12, 3;
- vrepf Y4, Y13, 0;
- vrepf Y5, Y13, 1;
- vrepf Y6, Y13, 2;
- vrepf Y7, Y13, 3;
- vrepf Y8, Y14, 0;
- vrepf Y9, Y14, 1;
- vrepf Y10, Y14, 2;
- vrepf Y11, Y14, 3;
- vrepf Y14, Y15, 2;
- vrepf Y15, Y15, 3;
- PLUS(X0, Y0);
- PLUS(X1, Y1);
- PLUS(X2, Y2);
- PLUS(X3, Y3);
- PLUS(X4, Y4);
- PLUS(X5, Y5);
- PLUS(X6, Y6);
- PLUS(X7, Y7);
- PLUS(X8, Y8);
- PLUS(X9, Y9);
- PLUS(X10, Y10);
- PLUS(X11, Y11);
- PLUS(X14, Y14);
- PLUS(X15, Y15);
-
- vl Y15, (.Lbswap32 - .Lconsts)(%r7);
- TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
- Y9, Y10, Y11, Y12, Y13, Y14);
- TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
- Y9, Y10, Y11, Y12, Y13, Y14);
-
- vlm Y0, Y14, 0(SRC);
- vperm X0, X0, X0, Y15;
- vperm X1, X1, X1, Y15;
- vperm X2, X2, X2, Y15;
- vperm X3, X3, X3, Y15;
- vperm X4, X4, X4, Y15;
- vperm X5, X5, X5, Y15;
- vperm X6, X6, X6, Y15;
- vperm X7, X7, X7, Y15;
- vperm X8, X8, X8, Y15;
- vperm X9, X9, X9, Y15;
- vperm X10, X10, X10, Y15;
- vperm X11, X11, X11, Y15;
- vperm X12, X12, X12, Y15;
- vperm X13, X13, X13, Y15;
- vperm X14, X14, X14, Y15;
- vperm X15, X15, X15, Y15;
- vl Y15, (15 * 16)(SRC);
-
- XOR(Y0, X0);
- XOR(Y1, X4);
- XOR(Y2, X8);
- XOR(Y3, X12);
- XOR(Y4, X1);
- XOR(Y5, X5);
- XOR(Y6, X9);
- XOR(Y7, X13);
- XOR(Y8, X2);
- XOR(Y9, X6);
- XOR(Y10, X10);
- XOR(Y11, X14);
- XOR(Y12, X3);
- XOR(Y13, X7);
- XOR(Y14, X11);
- XOR(Y15, X15);
- vstm Y0, Y15, 0(DST);
-
- aghi SRC, 256;
- aghi DST, 256;
-
- clgije ROUND, 1, .Lsecond_output_4blks_8;
-
- clgijhe NBLKS, 8, .Lloop8;
-
-
- END_STACK(%r8);
- xgr %r2, %r2;
- br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
- _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
- "CHACHA20_BUFSIZE not multiple of 8");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
- if (GLRO(dl_hwcap) & HWCAP_S390_VX)
- {
- __chacha20_s390x_vx_blocks8 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
- return;
- }
-#endif
- chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */

-#include <stdlib/arc4random.h>
#include <string.h>
#include <tls-internal.h>

@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
struct pthread *self = THREAD_SELF;
free (self->tls_state.strsignal_buf);
free (self->tls_state.strerror_l_buf);
-
- if (self->tls_state.rand_state != NULL)
- {
- /* Clear any lingering random state prior so if the thread stack is
- cached it won't leak any data. */
- explicit_bzero (self->tls_state.rand_state,
- sizeof (*self->tls_state.rand_state));
- free (self->tls_state.rand_state);
- }
}
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
gen-as-const-headers += link-defines.sym
endif

-ifeq ($(subdir),stdlib)
-sysdep_routines += \
- chacha20-amd64-sse2 \
- chacha20-amd64-avx2 \
- # sysdep_routines
-endif
-
ifeq ($(subdir),gmon)
sysdep_routines += _mcount
# We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher
-
- Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
- http://cr.yp.to/chacha.html:
-
- chacha-regs.c version 20080118
- D. J. Bernstein
- Public domain. */
-
-#include <sysdep.h>
-
-#ifdef PIC
-# define rRIP (%rip)
-#else
-# define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST %rsi
-#define SRC %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP (32 + STACK_VEC_X13)
-#define STACK_TMP1 (32 + STACK_TMP)
-
-#define STACK_MAX (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
- vpunpckhdq x1, x0, t2; \
- vpunpckldq x1, x0, x0; \
- \
- vpunpckldq x3, x2, t1; \
- vpunpckhdq x3, x2, x2; \
- \
- vpunpckhqdq t1, x0, x1; \
- vpunpcklqdq t1, x0, x0; \
- \
- vpunpckhqdq x2, t2, x3; \
- vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
- vmovdqa x0, t1; \
- vperm2i128 $0x20, x1, x0, x0; \
- vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
- 8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp) \
- vpsrld $(32 - (c)), v1, tmp; \
- vpslld $(c), v1, v1; \
- vpaddb tmp, v1, v1; \
- vpsrld $(32 - (c)), v2, tmp; \
- vpslld $(c), v2, v2; \
- vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf) \
- vpshufb shuf, v1, v1; \
- vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
- vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
- vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
- interleave_op1,interleave_op2,\
- interleave_op3,interleave_op4) \
- vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \
- interleave_op1; \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE_SHUF_2(d1, d2, tmp1); \
- interleave_op2; \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 12, tmp1); \
- vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \
- interleave_op3; \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE_SHUF_2(d1, d2, tmp1); \
- interleave_op4; \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 7, tmp1);
-
- .section .text.avx2, "ax", @progbits
- .align 32
-chacha20_data:
-L(shuf_rol16):
- .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
- .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
- .byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
- .long 0x80000000
-
- .hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
- /* input:
- * %rdi: input
- * %rsi: dst
- * %rdx: src
- * %rcx: nblks (multiple of 8)
- */
- vzeroupper;
-
- pushq %rbp;
- cfi_adjust_cfa_offset(8);
- cfi_rel_offset(rbp, 0)
- movq %rsp, %rbp;
- cfi_def_cfa_register(rbp);
-
- subq $STACK_MAX, %rsp;
- andq $~31, %rsp;
-
-L(loop8):
- mov $20, ROUND;
-
- /* Construct counter vectors X12 and X13 */
- vpmovzxbd L(inc_counter) rRIP, X0;
- vpbroadcastd L(unsigned_cmp) rRIP, X2;
- vpbroadcastd (12 * 4)(INPUT), X12;
- vpbroadcastd (13 * 4)(INPUT), X13;
- vpaddd X0, X12, X12;
- vpxor X2, X0, X0;
- vpxor X2, X12, X1;
- vpcmpgtd X1, X0, X0;
- vpsubd X0, X13, X13;
- vmovdqa X12, (STACK_VEC_X12)(%rsp);
- vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
- /* Load vectors */
- vpbroadcastd (0 * 4)(INPUT), X0;
- vpbroadcastd (1 * 4)(INPUT), X1;
- vpbroadcastd (2 * 4)(INPUT), X2;
- vpbroadcastd (3 * 4)(INPUT), X3;
- vpbroadcastd (4 * 4)(INPUT), X4;
- vpbroadcastd (5 * 4)(INPUT), X5;
- vpbroadcastd (6 * 4)(INPUT), X6;
- vpbroadcastd (7 * 4)(INPUT), X7;
- vpbroadcastd (8 * 4)(INPUT), X8;
- vpbroadcastd (9 * 4)(INPUT), X9;
- vpbroadcastd (10 * 4)(INPUT), X10;
- vpbroadcastd (11 * 4)(INPUT), X11;
- vpbroadcastd (14 * 4)(INPUT), X14;
- vpbroadcastd (15 * 4)(INPUT), X15;
- vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
- QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,)
- vmovdqa (STACK_TMP)(%rsp), X15;
- vmovdqa X8, (STACK_TMP)(%rsp);
- QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,)
- QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,)
- vmovdqa (STACK_TMP)(%rsp), X8;
- vmovdqa X15, (STACK_TMP)(%rsp);
- QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,)
- sub $2, ROUND;
- jnz L(round2);
-
- vmovdqa X8, (STACK_TMP1)(%rsp);
-
- /* tmp := X15 */
- vpbroadcastd (0 * 4)(INPUT), X15;
- PLUS(X0, X15);
- vpbroadcastd (1 * 4)(INPUT), X15;
- PLUS(X1, X15);
- vpbroadcastd (2 * 4)(INPUT), X15;
- PLUS(X2, X15);
- vpbroadcastd (3 * 4)(INPUT), X15;
- PLUS(X3, X15);
- vpbroadcastd (4 * 4)(INPUT), X15;
- PLUS(X4, X15);
- vpbroadcastd (5 * 4)(INPUT), X15;
- PLUS(X5, X15);
- vpbroadcastd (6 * 4)(INPUT), X15;
- PLUS(X6, X15);
- vpbroadcastd (7 * 4)(INPUT), X15;
- PLUS(X7, X15);
- transpose_4x4(X0, X1, X2, X3, X8, X15);
- transpose_4x4(X4, X5, X6, X7, X8, X15);
- vmovdqa (STACK_TMP1)(%rsp), X8;
- transpose_16byte_2x2(X0, X4, X15);
- transpose_16byte_2x2(X1, X5, X15);
- transpose_16byte_2x2(X2, X6, X15);
- transpose_16byte_2x2(X3, X7, X15);
- vmovdqa (STACK_TMP)(%rsp), X15;
- vmovdqu X0, (64 * 0 + 16 * 0)(DST)
- vmovdqu X1, (64 * 1 + 16 * 0)(DST)
- vpbroadcastd (8 * 4)(INPUT), X0;
- PLUS(X8, X0);
- vpbroadcastd (9 * 4)(INPUT), X0;
- PLUS(X9, X0);
- vpbroadcastd (10 * 4)(INPUT), X0;
- PLUS(X10, X0);
- vpbroadcastd (11 * 4)(INPUT), X0;
- PLUS(X11, X0);
- vmovdqa (STACK_VEC_X12)(%rsp), X0;
- PLUS(X12, X0);
- vmovdqa (STACK_VEC_X13)(%rsp), X0;
- PLUS(X13, X0);
- vpbroadcastd (14 * 4)(INPUT), X0;
- PLUS(X14, X0);
- vpbroadcastd (15 * 4)(INPUT), X0;
- PLUS(X15, X0);
- vmovdqu X2, (64 * 2 + 16 * 0)(DST)
- vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
- /* Update counter */
- addq $8, (12 * 4)(INPUT);
-
- transpose_4x4(X8, X9, X10, X11, X0, X1);
- transpose_4x4(X12, X13, X14, X15, X0, X1);
- vmovdqu X4, (64 * 4 + 16 * 0)(DST)
- vmovdqu X5, (64 * 5 + 16 * 0)(DST)
- transpose_16byte_2x2(X8, X12, X0);
- transpose_16byte_2x2(X9, X13, X0);
- transpose_16byte_2x2(X10, X14, X0);
- transpose_16byte_2x2(X11, X15, X0);
- vmovdqu X6, (64 * 6 + 16 * 0)(DST)
- vmovdqu X7, (64 * 7 + 16 * 0)(DST)
- vmovdqu X8, (64 * 0 + 16 * 2)(DST)
- vmovdqu X9, (64 * 1 + 16 * 2)(DST)
- vmovdqu X10, (64 * 2 + 16 * 2)(DST)
- vmovdqu X11, (64 * 3 + 16 * 2)(DST)
- vmovdqu X12, (64 * 4 + 16 * 2)(DST)
- vmovdqu X13, (64 * 5 + 16 * 2)(DST)
- vmovdqu X14, (64 * 6 + 16 * 2)(DST)
- vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
- sub $8, NBLKS;
- lea (8 * 64)(DST), DST;
- lea (8 * 64)(SRC), SRC;
- jnz L(loop8);
-
- vzeroupper;
-
- /* eax zeroed by round loop. */
- leave;
- cfi_adjust_cfa_offset(-8)
- cfi_def_cfa_register(%rsp);
- ret;
- int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher
-
- Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
- http://cr.yp.to/chacha.html:
-
- chacha-regs.c version 20080118
- D. J. Bernstein
- Public domain. */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-# define rRIP (%rip)
-#else
-# define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
- ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST %rsi
-#define SRC %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP (16 + STACK_VEC_X13)
-#define STACK_TMP1 (16 + STACK_TMP)
-#define STACK_TMP2 (16 + STACK_TMP1)
-
-#define STACK_MAX (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
- movdqa x0, t2; \
- punpckhdq x1, t2; \
- punpckldq x1, x0; \
- \
- movdqa x2, t1; \
- punpckldq x3, t1; \
- punpckhdq x3, x2; \
- \
- movdqa x0, x1; \
- punpckhqdq t1, x1; \
- punpcklqdq t1, x0; \
- \
- movdqa t2, x3; \
- punpckhqdq x2, x3; \
- punpcklqdq x2, t2; \
- movdqa t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
- movd mem32, xreg; \
- pshufd $0, xreg, xreg;
-
-/**********************************************************************
- 4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2) \
- movdqa v1, tmp1; \
- movdqa v2, tmp2; \
- psrld $(32 - (c)), v1; \
- pslld $(c), tmp1; \
- paddb tmp1, v1; \
- psrld $(32 - (c)), v2; \
- pslld $(c), tmp2; \
- paddb tmp2, v2;
-
-#define XOR(ds,s) \
- pxor s, ds;
-
-#define PLUS(ds,s) \
- paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE2(d1, d2, 16, tmp1, tmp2); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 12, tmp1, tmp2); \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE2(d1, d2, 8, tmp1, tmp2); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 7, tmp1, tmp2);
-
- .section .text.sse2,"ax",@progbits
-
-chacha20_data:
- .align 16
-L(counter1):
- .long 1,0,0,0
-L(inc_counter):
- .long 0,1,2,3
-L(unsigned_cmp):
- .long 0x80000000,0x80000000,0x80000000,0x80000000
-
- .hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
- /* input:
- * %rdi: input
- * %rsi: dst
- * %rdx: src
- * %rcx: nblks (multiple of 4)
- */
-
- pushq %rbp;
- cfi_adjust_cfa_offset(8);
- cfi_rel_offset(rbp, 0)
- movq %rsp, %rbp;
- cfi_def_cfa_register(%rbp);
-
- subq $STACK_MAX, %rsp;
- andq $~15, %rsp;
-
-L(loop4):
- mov $20, ROUND;
-
- /* Construct counter vectors X12 and X13 */
- movdqa L(inc_counter) rRIP, X0;
- movdqa L(unsigned_cmp) rRIP, X2;
- PBROADCASTD((12 * 4)(INPUT), X12);
- PBROADCASTD((13 * 4)(INPUT), X13);
- paddd X0, X12;
- movdqa X12, X1;
- pxor X2, X0;
- pxor X2, X1;
- pcmpgtd X1, X0;
- psubd X0, X13;
- movdqa X12, (STACK_VEC_X12)(%rsp);
- movdqa X13, (STACK_VEC_X13)(%rsp);
-
- /* Load vectors */
- PBROADCASTD((0 * 4)(INPUT), X0);
- PBROADCASTD((1 * 4)(INPUT), X1);
- PBROADCASTD((2 * 4)(INPUT), X2);
- PBROADCASTD((3 * 4)(INPUT), X3);
- PBROADCASTD((4 * 4)(INPUT), X4);
- PBROADCASTD((5 * 4)(INPUT), X5);
- PBROADCASTD((6 * 4)(INPUT), X6);
- PBROADCASTD((7 * 4)(INPUT), X7);
- PBROADCASTD((8 * 4)(INPUT), X8);
- PBROADCASTD((9 * 4)(INPUT), X9);
- PBROADCASTD((10 * 4)(INPUT), X10);
- PBROADCASTD((11 * 4)(INPUT), X11);
- PBROADCASTD((14 * 4)(INPUT), X14);
- PBROADCASTD((15 * 4)(INPUT), X15);
- movdqa X11, (STACK_TMP)(%rsp);
- movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
- QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15)
- movdqa (STACK_TMP)(%rsp), X11;
- movdqa (STACK_TMP1)(%rsp), X15;
- movdqa X8, (STACK_TMP)(%rsp);
- movdqa X9, (STACK_TMP1)(%rsp);
- QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9)
- QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9)
- movdqa (STACK_TMP)(%rsp), X8;
- movdqa (STACK_TMP1)(%rsp), X9;
- movdqa X11, (STACK_TMP)(%rsp);
- movdqa X15, (STACK_TMP1)(%rsp);
- QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15)
- sub $2, ROUND;
- jnz L(round2_4);
-
- /* tmp := X15 */
- movdqa (STACK_TMP)(%rsp), X11;
- PBROADCASTD((0 * 4)(INPUT), X15);
- PLUS(X0, X15);
- PBROADCASTD((1 * 4)(INPUT), X15);
- PLUS(X1, X15);
- PBROADCASTD((2 * 4)(INPUT), X15);
- PLUS(X2, X15);
- PBROADCASTD((3 * 4)(INPUT), X15);
- PLUS(X3, X15);
- PBROADCASTD((4 * 4)(INPUT), X15);
- PLUS(X4, X15);
- PBROADCASTD((5 * 4)(INPUT), X15);
- PLUS(X5, X15);
- PBROADCASTD((6 * 4)(INPUT), X15);
- PLUS(X6, X15);
- PBROADCASTD((7 * 4)(INPUT), X15);
- PLUS(X7, X15);
- PBROADCASTD((8 * 4)(INPUT), X15);
- PLUS(X8, X15);
- PBROADCASTD((9 * 4)(INPUT), X15);
- PLUS(X9, X15);
- PBROADCASTD((10 * 4)(INPUT), X15);
- PLUS(X10, X15);
- PBROADCASTD((11 * 4)(INPUT), X15);
- PLUS(X11, X15);
- movdqa (STACK_VEC_X12)(%rsp), X15;
- PLUS(X12, X15);
- movdqa (STACK_VEC_X13)(%rsp), X15;
- PLUS(X13, X15);
- movdqa X13, (STACK_TMP)(%rsp);
- PBROADCASTD((14 * 4)(INPUT), X15);
- PLUS(X14, X15);
- movdqa (STACK_TMP1)(%rsp), X15;
- movdqa X14, (STACK_TMP1)(%rsp);
- PBROADCASTD((15 * 4)(INPUT), X13);
- PLUS(X15, X13);
- movdqa X15, (STACK_TMP2)(%rsp);
-
- /* Update counter */
- addq $4, (12 * 4)(INPUT);
-
- TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
- movdqu X0, (64 * 0 + 16 * 0)(DST)
- movdqu X1, (64 * 1 + 16 * 0)(DST)
- movdqu X2, (64 * 2 + 16 * 0)(DST)
- movdqu X3, (64 * 3 + 16 * 0)(DST)
- TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
- movdqa (STACK_TMP)(%rsp), X13;
- movdqa (STACK_TMP1)(%rsp), X14;
- movdqa (STACK_TMP2)(%rsp), X15;
- movdqu X4, (64 * 0 + 16 * 1)(DST)
- movdqu X5, (64 * 1 + 16 * 1)(DST)
- movdqu X6, (64 * 2 + 16 * 1)(DST)
- movdqu X7, (64 * 3 + 16 * 1)(DST)
- TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
- movdqu X8, (64 * 0 + 16 * 2)(DST)
- movdqu X9, (64 * 1 + 16 * 2)(DST)
- movdqu X10, (64 * 2 + 16 * 2)(DST)
- movdqu X11, (64 * 3 + 16 * 2)(DST)
- TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
- movdqu X12, (64 * 0 + 16 * 3)(DST)
- movdqu X13, (64 * 1 + 16 * 3)(DST)
- movdqu X14, (64 * 2 + 16 * 3)(DST)
- movdqu X15, (64 * 3 + 16 * 3)(DST)
-
- sub $4, NBLKS;
- lea (4 * 64)(DST), DST;
- lea (4 * 64)(SRC), SRC;
- jnz L(loop4);
-
- /* eax zeroed by round loop. */
- leave;
- cfi_adjust_cfa_offset(-8)
- cfi_def_cfa_register(%rsp);
- ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
- "CHACHA20_BUFSIZE not multiple of 4 or 8");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
- __chacha20_avx2_blocks8 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
- const struct cpu_features* cpu_features = __get_cpu_features ();
-
- /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */
- if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
- && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
- __chacha20_avx2_blocks8 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
- else
- __chacha20_sse2_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
--
2.35.1

2022-07-25 23:16:21

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] arc4random: simplify design for better safety

If you're just following along on the mailing list, without actively
trying to apply this to a glibc tree, that diff might be hard to read.
The meat of it is the below function implementation. Notably this is
basically the same as systemd's crypto_random_bytes() (which I recently
rewrote there).

void
__arc4random_buf (void *p, size_t n)
{
static bool have_getrandom = true, seen_initialized = false;
int fd;

if (n == 0)
return;

for (;;)
{
ssize_t l;

if (!have_getrandom)
break;

l = __getrandom_nocancel (p, n, 0);
if (l > 0)
{
if ((size_t) l == n)
return; /* Done reading, success. */
p = (uint8_t *) p + l;
n -= l;
continue; /* Interrupted by a signal; keep going. */
}
else if (l == 0)
arc4random_getrandom_failure (); /* Weird, should never happen. */
else if (errno == ENOSYS)
{
have_getrandom = false;
break; /* No syscall, so fallback to /dev/urandom. */
}
arc4random_getrandom_failure (); /* Unknown other error, should never happen. */
}

if (!seen_initialized)
{
struct pollfd pfd = { .events = POLLIN };
pfd.fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
if (pfd.fd < 0)
arc4random_getrandom_failure ();
if (__poll(&pfd, 1, -1) < 0)
arc4random_getrandom_failure ();
if (__close_nocancel(pfd.fd) < 0)
arc4random_getrandom_failure ();
seen_initialized = true;
}

fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY);
if (fd < 0)
arc4random_getrandom_failure ();
while (n)
{
ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
if (l <= 0)
arc4random_getrandom_failure ();
p = (uint8_t *) p + l;
n -= l;
}
if (__close_nocancel (fd) < 0)
arc4random_getrandom_failure ();
}
libc_hidden_def (__arc4random_buf)
weak_alias (__arc4random_buf, arc4random_buf)

2022-07-25 23:33:28

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH v2] arc4random: simplify design for better safety

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, but it's the best approximation
we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Cristian Rodríguez <[email protected]>
Cc: Paul Eggert <[email protected]>
Cc: [email protected]
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
LICENSES | 23 -
include/stdlib.h | 3 -
stdlib/Makefile | 2 -
stdlib/arc4random.c | 204 ++-----
stdlib/arc4random.h | 48 --
stdlib/chacha20.c | 191 ------
stdlib/tst-arc4random-chacha20.c | 167 -----
sysdeps/aarch64/Makefile | 4 -
sysdeps/aarch64/chacha20-aarch64.S | 314 ----------
sysdeps/aarch64/chacha20_arch.h | 40 --
sysdeps/generic/chacha20_arch.h | 24 -
sysdeps/generic/tls-internal.c | 10 -
sysdeps/mach/hurd/_Fork.c | 2 -
sysdeps/nptl/_Fork.c | 2 -
.../powerpc/powerpc64/be/multiarch/Makefile | 4 -
.../powerpc64/be/multiarch/chacha20-ppc.c | 1 -
.../powerpc64/be/multiarch/chacha20_arch.h | 42 --
sysdeps/powerpc/powerpc64/power8/Makefile | 5 -
.../powerpc/powerpc64/power8/chacha20-ppc.c | 256 --------
.../powerpc/powerpc64/power8/chacha20_arch.h | 37 --
sysdeps/s390/s390-64/Makefile | 6 -
sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------
sysdeps/s390/s390-64/chacha20_arch.h | 45 --
sysdeps/unix/sysv/linux/tls-internal.c | 10 -
sysdeps/x86_64/Makefile | 7 -
sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ----------
sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ----------
sysdeps/x86_64/chacha20_arch.h | 55 --
28 files changed, 53 insertions(+), 2661 deletions(-)
delete mode 100644 stdlib/arc4random.h
delete mode 100644 stdlib/chacha20.c
delete mode 100644 stdlib/tst-arc4random-chacha20.c
delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
delete mode 100644 sysdeps/aarch64/chacha20_arch.h
delete mode 100644 sysdeps/generic/chacha20_arch.h
delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <[email protected]>
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, see
<https://www.gnu.org/licenses/>. */
-
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
libc_hidden_proto (__arc4random_uniform);
extern void __arc4random_buf_internal (void *buffer, size_t len)
attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
- in child process. */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;

extern double __strtod_internal (const char *__restrict __nptr,
char **__restrict __endptr, int __group)
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
# tests

tests-internal := \
- tst-arc4random-chacha20 \
tst-strtod1i \
tst-strtod3 \
tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
# tests-internal

tests-static := \
- tst-arc4random-chacha20 \
tst-secure-getenv \
# tests-static

diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..80c55cde63 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
Copyright (C) 2022 Free Software Foundation, Inc.
This file is part of the GNU C Library.

@@ -16,61 +16,14 @@
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */

-#include <arc4random.h>
#include <errno.h>
#include <not-cancel.h>
#include <stdio.h>
#include <stdlib.h>
+#include <sys/poll.h>
#include <sys/mman.h>
#include <sys/param.h>
#include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
- consumed in 'buf' while 'count' is the maximum number of bytes until a
- reseed.
-
- Both the initial seed and reseed try to obtain entropy from the kernel
- and abort the process if none could be obtained.
-
- The state 'buf' improves the usage of the cipher calls, allowing to call
- optimized implementations (if the architecture provides it) and minimize
- function call overhead. */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state. */
-void
-__arc4random_fork_subprocess (void)
-{
- struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
- if (state != NULL)
- {
- explicit_bzero (state, sizeof (*state));
- /* Force key init. */
- state->count = -1;
- }
-}
-
-/* Return the current thread random state or try to create one if there is
- none available. In the case malloc can not allocate a state, arc4random
- will try to get entropy with arc4random_getentropy. */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
- struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
- if (state == NULL)
- {
- state = malloc (sizeof (struct arc4random_state_t));
- if (state != NULL)
- {
- /* Force key initialization on first call. */
- state->count = -1;
- __glibc_tls_internal ()->rand_state = state;
- }
- }
- return state;
-}

static void
arc4random_getrandom_failure (void)
@@ -78,106 +31,70 @@ arc4random_getrandom_failure (void)
__libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
}

-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
{
- chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
+ static bool have_getrandom = true, seen_initialized = false;
+ int fd;

- /* Mix optional user provided data. */
- if (rnd != NULL)
- {
- size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
- for (size_t i = 0; i < m; i++)
- state->buf[i] ^= rnd[i];
- }
-
- /* Immediately reinit for backtracking resistance. */
- chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
- explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
- state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
-
-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
- if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+ if (n == 0)
return;

- int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
- O_RDONLY | O_CLOEXEC));
- if (fd != -1)
+ for (;;)
{
- uint8_t *p = rnd;
- uint8_t *end = p + len;
- do
- {
- ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
- if (ret <= 0)
- arc4random_getrandom_failure ();
- p += ret;
- }
- while (p < end);
+ ssize_t l;

- if (__close_nocancel (fd) == 0)
- return;
- }
- arc4random_getrandom_failure ();
-}
+ if (!have_getrandom)
+ break;

-/* Check if the thread context STATE should be reseed with kernel entropy
- depending of requested LEN bytes. If there is less than requested,
- the state is either initialized or reseeded, otherwise the internal
- counter subtract the requested length. */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
- if (state->count <= len || state->count == -1)
- {
- uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
- arc4random_getentropy (rnd, sizeof rnd);
-
- if (state->count == -1)
- chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
- else
- arc4random_rekey (state, rnd, sizeof rnd);
-
- explicit_bzero (rnd, sizeof rnd);
-
- /* Invalidate the buf. */
- state->have = 0;
- memset (state->buf, 0, sizeof state->buf);
- state->count = CHACHA20_RESEED_SIZE;
+ l = __getrandom_nocancel (p, n, 0);
+ if (l > 0)
+ {
+ if ((size_t) l == n)
+ return; /* Done reading, success. */
+ p = (uint8_t *) p + l;
+ n -= l;
+ continue; /* Interrupted by a signal; keep going. */
+ }
+ else if (l == 0)
+ arc4random_getrandom_failure (); /* Weird, should never happen. */
+ else if (errno == ENOSYS)
+ {
+ have_getrandom = false;
+ break; /* No syscall, so fallback to /dev/urandom. */
+ }
+ arc4random_getrandom_failure (); /* Unknown error, should never happen. */
}
- else
- state->count -= len;
-}

-void
-__arc4random_buf (void *buffer, size_t len)
-{
- struct arc4random_state_t *state = arc4random_get_state ();
- if (__glibc_unlikely (state == NULL))
+ if (!seen_initialized)
{
- arc4random_getentropy (buffer, len);
- return;
+ struct pollfd pfd = { .events = POLLIN };
+ pfd.fd = TEMP_FAILURE_RETRY (
+ __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+ if (pfd.fd < 0)
+ arc4random_getrandom_failure ();
+ if (__poll (&pfd, 1, -1) < 0)
+ arc4random_getrandom_failure ();
+ if (__close_nocancel (pfd.fd) < 0)
+ arc4random_getrandom_failure ();
+ seen_initialized = true;
}

- arc4random_check_stir (state, len);
- while (len > 0)
+ fd = TEMP_FAILURE_RETRY (
+ __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+ if (fd < 0)
+ arc4random_getrandom_failure ();
+ do
{
- if (state->have > 0)
- {
- size_t m = MIN (len, state->have);
- uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
- memcpy (buffer, ks, m);
- explicit_bzero (ks, m);
- buffer += m;
- len -= m;
- state->have -= m;
- }
- if (state->have == 0)
- arc4random_rekey (state, NULL, 0);
+ ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+ if (l <= 0)
+ arc4random_getrandom_failure ();
+ p = (uint8_t *) p + l;
+ n -= l;
}
+ while (n);
+ if (__close_nocancel (fd) < 0)
+ arc4random_getrandom_failure ();
}
libc_hidden_def (__arc4random_buf)
weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +103,7 @@ uint32_t
__arc4random (void)
{
uint32_t r;
-
- struct arc4random_state_t *state = arc4random_get_state ();
- if (__glibc_unlikely (state == NULL))
- {
- arc4random_getentropy (&r, sizeof (uint32_t));
- return r;
- }
-
- arc4random_check_stir (state, sizeof (uint32_t));
- if (state->have < sizeof (uint32_t))
- arc4random_rekey (state, NULL, 0);
- uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
- memcpy (&r, ks, sizeof (uint32_t));
- memset (ks, 0, sizeof (uint32_t));
- state->have -= sizeof (uint32_t);
-
+ __arc4random_buf (&r, sizeof (r));
return r;
}
libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state. */
-#define CHACHA20_STATE_LEN 16
-#define CHACHA20_BLOCK_SIZE 64
-
-/* Maximum number bytes until reseed (16 MB). */
-#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
- backtracking protection and to allow better used of vectorized
- chacha20 implementations. */
-#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
- uint32_t ctx[CHACHA20_STATE_LEN];
- size_t have;
- size_t count;
- uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce. */
-#define CHACHA20_IV_SIZE 16
-#define CHACHA20_KEY_SIZE 32
-
-#define CHACHA20_STATE_LEN 16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
- XOR of the keystream with the plaintext because the plaintext is a
- stream of zeros. */
-
-enum chacha20_constants
-{
- CHACHA20_CONSTANT_EXPA = 0x61707865U,
- CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
- CHACHA20_CONSTANT_2_BY = 0x79622d32U,
- CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
- uint32_t r;
- memcpy (&r, p, sizeof (r));
- return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
- memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v) __builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v) (v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
- state[0] = CHACHA20_CONSTANT_EXPA;
- state[1] = CHACHA20_CONSTANT_ND_3;
- state[2] = CHACHA20_CONSTANT_2_BY;
- state[3] = CHACHA20_CONSTANT_TE_K;
-
- state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
- state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
- state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
- state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
- state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
- state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
- state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
- state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
- state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
- state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
- state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
- state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
- return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
- v ^= read_unaligned_32 (src);
-#endif
- write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
- uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
- uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
- x0 = state[0];
- x1 = state[1];
- x2 = state[2];
- x3 = state[3];
- x4 = state[4];
- x5 = state[5];
- x6 = state[6];
- x7 = state[7];
- x8 = state[8];
- x9 = state[9];
- x10 = state[10];
- x11 = state[11];
- x12 = state[12];
- x13 = state[13];
- x14 = state[14];
- x15 = state[15];
-
- for (int i = 0; i < 20; i += 2)
- {
-#define QROUND(_x0, _x1, _x2, _x3) \
- do { \
- _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \
- _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \
- _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \
- _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \
- } while(0)
-
- QROUND (x0, x4, x8, x12);
- QROUND (x1, x5, x9, x13);
- QROUND (x2, x6, x10, x14);
- QROUND (x3, x7, x11, x15);
-
- QROUND (x0, x5, x10, x15);
- QROUND (x1, x6, x11, x12);
- QROUND (x2, x7, x8, x13);
- QROUND (x3, x4, x9, x14);
- }
-
- state_final (&src[0], &dst[0], set_state (x0 + state[0]));
- state_final (&src[4], &dst[4], set_state (x1 + state[1]));
- state_final (&src[8], &dst[8], set_state (x2 + state[2]));
- state_final (&src[12], &dst[12], set_state (x3 + state[3]));
- state_final (&src[16], &dst[16], set_state (x4 + state[4]));
- state_final (&src[20], &dst[20], set_state (x5 + state[5]));
- state_final (&src[24], &dst[24], set_state (x6 + state[6]));
- state_final (&src[28], &dst[28], set_state (x7 + state[7]));
- state_final (&src[32], &dst[32], set_state (x8 + state[8]));
- state_final (&src[36], &dst[36], set_state (x9 + state[9]));
- state_final (&src[40], &dst[40], set_state (x10 + state[10]));
- state_final (&src[44], &dst[44], set_state (x11 + state[11]));
- state_final (&src[48], &dst[48], set_state (x12 + state[12]));
- state_final (&src[52], &dst[52], set_state (x13 + state[13]));
- state_final (&src[56], &dst[56], set_state (x14 + state[14]));
- state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
- state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- while (bytes >= CHACHA20_BLOCK_SIZE)
- {
- chacha20_block (state, dst, src);
-
- bytes -= CHACHA20_BLOCK_SIZE;
- dst += CHACHA20_BLOCK_SIZE;
- src += CHACHA20_BLOCK_SIZE;
- }
-
- if (__glibc_unlikely (bytes != 0))
- {
- uint8_t stream[CHACHA20_BLOCK_SIZE];
- chacha20_block (state, stream, src);
- memcpy (dst, stream, bytes);
- explicit_bzero (stream, sizeof stream);
- }
-}
-
-/* Get the architecture optimized version. */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
- actual does. */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
- const uint8_t key[CHACHA20_KEY_SIZE] =
- {
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- };
- const uint8_t iv[CHACHA20_IV_SIZE] =
- {
- 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
- };
- const uint8_t expected1[CHACHA20_BUFSIZE] =
- {
- 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
- 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
- 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
- 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
- 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
- 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
- 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
- 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
- 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
- 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
- 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
- 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
- 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
- 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
- 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
- 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
- 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
- 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
- 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
- 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
- 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
- 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
- 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
- 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
- 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
- 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
- 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
- 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
- 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
- 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
- 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
- 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
- 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
- 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
- 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
- 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
- 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
- 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
- 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
- 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
- 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
- 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
- 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
- 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
- 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
- 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
- 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
- };
-
- const uint8_t expected2[CHACHA20_BUFSIZE] =
- {
- 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
- 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
- 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
- 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
- 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
- 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
- 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
- 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
- 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
- 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
- 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
- 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
- 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
- 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
- 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
- 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
- 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
- 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
- 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
- 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
- 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
- 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
- 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
- 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
- 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
- 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
- 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
- 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
- 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
- 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
- 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
- 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
- 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
- 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
- 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
- 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
- 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
- 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
- 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
- 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
- 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
- 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
- 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
- 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
- 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
- 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
- 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
- };
-
- /* Check with the expected internal arc4random keystream buffer. Some
- architecture optimizations expects a buffer with a minimum size which
- is a multiple of then ChaCha20 blocksize, so they might not be prepared
- to handle smaller buffers. */
-
- uint8_t output[CHACHA20_BUFSIZE];
-
- uint32_t state[CHACHA20_STATE_LEN];
- chacha20_init (state, key, iv);
-
- /* Check with the initial state. */
- uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
- chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
- TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
- /* And on the next round. */
- chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
- TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
- return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
gen-as-const-headers += tlsdesc.sym
endif

-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
ifeq ($(subdir),gmon)
CFLAGS-mcount.c += -mgeneral-regs-only
endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
- http://cr.yp.to/chacha.html:
-
- chacha-regs.c version 20080118
- D. J. Bernstein
- Public domain. */
-
-#include <sysdep.h>
-
-/* Only LE is supported. */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
- adrp reg, name ; \
- add reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
- ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT x0
-#define DST x1
-#define SRC x2
-#define NBLKS x3
-#define ROUND x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR v0
-#define VTMP0 v1
-#define VTMP1 v2
-#define VTMP2 v3
-#define VTMP3 v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8 v7
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
- zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
- zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
- zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
- zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
- vpunpckhdq(x1, x0, t2); \
- vpunpckldq(x1, x0, x0); \
- \
- vpunpckldq(x3, x2, t1); \
- vpunpckhdq(x3, x2, x2); \
- \
- vpunpckhqdq(t1, x0, x1); \
- vpunpcklqdq(t1, x0, x0); \
- \
- vpunpckhqdq(x2, t2, x3); \
- vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
- 4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
- eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
- add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
- shl dst1.4s, src1.4s, #(c); \
- shl dst2.4s, src2.4s, #(c); \
- shl dst3.4s, src3.4s, #(c); \
- shl dst4.4s, src4.4s, #(c); \
- sri dst1.4s, src1.4s, #(32 - (c)); \
- sri dst2.4s, src2.4s, #(32 - (c)); \
- sri dst3.4s, src3.4s, #(32 - (c)); \
- sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
- tbl dst1.16b, {src1.16b}, ROT8.16b; \
- tbl dst2.16b, {src2.16b}, ROT8.16b; \
- tbl dst3.16b, {src3.16b}, ROT8.16b; \
- tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
- rev32 dst1.8h, src1.8h; \
- rev32 dst2.8h, src2.8h; \
- rev32 dst3.8h, src3.8h; \
- rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
- PLUS(a1,b1); PLUS(a2,b2); \
- PLUS(a3,b3); PLUS(a4,b4); \
- XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \
- XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \
- ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \
- PLUS(c1,d1); PLUS(c2,d2); \
- PLUS(c3,d3); PLUS(c4,d4); \
- XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \
- XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \
- ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \
- PLUS(a1,b1); PLUS(a2,b2); \
- PLUS(a3,b3); PLUS(a4,b4); \
- XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \
- XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \
- ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \
- PLUS(c1,d1); PLUS(c2,d2); \
- PLUS(c3,d3); PLUS(c4,d4); \
- XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \
- XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \
- ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
- .long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
- .byte 3,0,1,2
- .byte 7,4,5,6
- .byte 11,8,9,10
- .byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
- /* input:
- * x0: input
- * x1: dst
- * x2: src
- * x3: nblks (multiple of 4)
- */
-
- GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
- add INPUT_CTR, INPUT, #(12*4);
- ld1 {ROT8.16b}, [CTR];
- GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
- mov INPUT_POS, INPUT;
- ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
- /* Construct counter vectors X12 and X13 */
-
- ld1 {X15.16b}, [INPUT_CTR];
- mov ROUND, #20;
- ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
- dup X12.4s, X15.s[0];
- dup X13.4s, X15.s[1];
- ldr CTR, [INPUT_CTR];
- add X12.4s, X12.4s, VCTR.4s;
- dup X0.4s, VTMP1.s[0];
- dup X1.4s, VTMP1.s[1];
- dup X2.4s, VTMP1.s[2];
- dup X3.4s, VTMP1.s[3];
- dup X14.4s, X15.s[2];
- cmhi VTMP0.4s, VCTR.4s, X12.4s;
- dup X15.4s, X15.s[3];
- add CTR, CTR, #4; /* Update counter */
- dup X4.4s, VTMP2.s[0];
- dup X5.4s, VTMP2.s[1];
- dup X6.4s, VTMP2.s[2];
- dup X7.4s, VTMP2.s[3];
- sub X13.4s, X13.4s, VTMP0.4s;
- dup X8.4s, VTMP3.s[0];
- dup X9.4s, VTMP3.s[1];
- dup X10.4s, VTMP3.s[2];
- dup X11.4s, VTMP3.s[3];
- mov X12_TMP.16b, X12.16b;
- mov X13_TMP.16b, X13.16b;
- str CTR, [INPUT_CTR];
-
-L(round2):
- subs ROUND, ROUND, #2
- QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13,
- X2, X6, X10, X14, X3, X7, X11, X15,
- tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
- QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12,
- X2, X7, X8, X13, X3, X4, X9, X14,
- tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
- b.ne L(round2);
-
- ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
- PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */
- PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */
-
- dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
- dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
- dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
- dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
- PLUS(X0, VTMP2);
- PLUS(X1, VTMP3);
- PLUS(X2, X12_TMP);
- PLUS(X3, X13_TMP);
-
- dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
- dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
- dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
- dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
- ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
- mov INPUT_POS, INPUT;
- PLUS(X4, VTMP2);
- PLUS(X5, VTMP3);
- PLUS(X6, X12_TMP);
- PLUS(X7, X13_TMP);
-
- dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
- dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
- dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
- dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
- dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
- dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
- PLUS(X8, VTMP2);
- PLUS(X9, VTMP3);
- PLUS(X10, X12_TMP);
- PLUS(X11, X13_TMP);
- PLUS(X14, VTMP0);
- PLUS(X15, VTMP1);
-
- transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
- transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
- transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
- transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
- subs NBLKS, NBLKS, #4;
-
- st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
- st1 {X1.16b,X5.16b}, [DST], #32;
- st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
- st1 {X10.16b,X14.16b}, [DST], #32;
- st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
- b.ne L(loop4);
-
- ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
- "CHACHA20_BUFSIZE not multiple of 4");
- _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
- "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
- __chacha20_neon_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
- chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
deleted file mode 100644
index 1b4559ccbc..0000000000
--- a/sysdeps/generic/chacha20_arch.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Chacha20 implementation, generic interface for encrypt.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */

-#include <stdlib/arc4random.h>
#include <string.h>
#include <tls-internal.h>

@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
{
free (__tls_internal.strsignal_buf);
free (__tls_internal.strerror_l_buf);
-
- if (__tls_internal.rand_state != NULL)
- {
- /* Clear any lingering random state prior so if the thread stack is
- cached it won't leak any data. */
- explicit_bzero (__tls_internal.rand_state,
- sizeof (*__tls_internal.rand_state));
- free (__tls_internal.rand_state);
- }
}
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
_hurd_malloc_fork_child ();
call_function_static_weak (__malloc_fork_unlock_child);

- call_function_static_weak (__arc4random_fork_subprocess);
-
/* Run things that want to run in the child task to set up. */
RUN_HOOK (_hurd_fork_child_hook, ());

diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
self->robust_head.list = &self->robust_head;
INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
sizeof (struct robust_list_head));
-
- call_function_static_weak (__arc4random_fork_subprocess);
}
return pid;
}
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
- "CHACHA20_BUFSIZE not multiple of 4");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
- unsigned long int hwcap = GLRO(dl_hwcap);
- unsigned long int hwcap2 = GLRO(dl_hwcap2);
- if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
- __chacha20_power8_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
- else
- chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
ifeq ($(subdir),string)
sysdep_routines += strcasestr-ppc64
endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
- Copyright (C) 2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
- { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
- return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
- return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
- vector4x_u32 vec;
- vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
- vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
- le_bswap_const);
-#endif
- return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
- vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
- le_bswap_const);
-#endif
- vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
- static const vector16x_u8 swap32 =
- { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
- vector2x_u64 vec, add, sum;
-
- vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
- add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
- sum = vec + add;
- return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
- return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
- 4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv) \
- __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
- ((ds) += (s))
-
-#define XOR(ds,s) \
- ((ds) ^= (s))
-
-#define ADD_U64(v,a) \
- (v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
- vector4x_u32 t1 = vec_mergeh(x0, x2); \
- vector4x_u32 t2 = vec_mergel(x0, x2); \
- vector4x_u32 t3 = vec_mergeh(x1, x3); \
- x3 = vec_mergel(x1, x3); \
- x0 = vec_mergeh(t1, t3); \
- x1 = vec_mergel(t1, t3); \
- x2 = vec_mergeh(t2, x3); \
- x3 = vec_mergel(t2, x3); \
- })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t nblks)
-{
- vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
- vector4x_u32 counter_4 = { 4, 0, 0, 0 };
- vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
- vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
- vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
- vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
- vector4x_u32 state0, state1, state2, state3;
- vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
- vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
- vector4x_u32 tmp;
- int i;
-
- /* Force preload of constants to vector registers. */
- __asm__ ("": "+v" (counters_0123) :: "memory");
- __asm__ ("": "+v" (counter_4) :: "memory");
- __asm__ ("": "+v" (rotate_16) :: "memory");
- __asm__ ("": "+v" (rotate_12) :: "memory");
- __asm__ ("": "+v" (rotate_8) :: "memory");
- __asm__ ("": "+v" (rotate_7) :: "memory");
-
- state0 = vec_vsx_ld (0 * 16, state);
- state1 = vec_vsx_ld (1 * 16, state);
- state2 = vec_vsx_ld (2 * 16, state);
- state3 = vec_vsx_ld (3 * 16, state);
-
- do
- {
- v0 = vec_splat (state0, 0);
- v1 = vec_splat (state0, 1);
- v2 = vec_splat (state0, 2);
- v3 = vec_splat (state0, 3);
- v4 = vec_splat (state1, 0);
- v5 = vec_splat (state1, 1);
- v6 = vec_splat (state1, 2);
- v7 = vec_splat (state1, 3);
- v8 = vec_splat (state2, 0);
- v9 = vec_splat (state2, 1);
- v10 = vec_splat (state2, 2);
- v11 = vec_splat (state2, 3);
- v12 = vec_splat (state3, 0);
- v13 = vec_splat (state3, 1);
- v14 = vec_splat (state3, 2);
- v15 = vec_splat (state3, 3);
-
- v12 += counters_0123;
- v13 -= vec_cmplt (v12, counters_0123);
-
- for (i = 20; i > 0; i -= 2)
- {
- QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13)
- QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15)
- QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12)
- QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14)
- }
-
- v0 += vec_splat (state0, 0);
- v1 += vec_splat (state0, 1);
- v2 += vec_splat (state0, 2);
- v3 += vec_splat (state0, 3);
- v4 += vec_splat (state1, 0);
- v5 += vec_splat (state1, 1);
- v6 += vec_splat (state1, 2);
- v7 += vec_splat (state1, 3);
- v8 += vec_splat (state2, 0);
- v9 += vec_splat (state2, 1);
- v10 += vec_splat (state2, 2);
- v11 += vec_splat (state2, 3);
- tmp = vec_splat( state3, 0);
- tmp += counters_0123;
- v12 += tmp;
- v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
- v14 += vec_splat (state3, 2);
- v15 += vec_splat (state3, 3);
- ADD_U64 (state3, counter_4);
-
- transpose_4x4 (v0, v1, v2, v3);
- transpose_4x4 (v4, v5, v6, v7);
- transpose_4x4 (v8, v9, v10, v11);
- transpose_4x4 (v12, v13, v14, v15);
-
- vec_store_le (v0, (64 * 0 + 16 * 0), dst);
- vec_store_le (v1, (64 * 1 + 16 * 0), dst);
- vec_store_le (v2, (64 * 2 + 16 * 0), dst);
- vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
- vec_store_le (v4, (64 * 0 + 16 * 1), dst);
- vec_store_le (v5, (64 * 1 + 16 * 1), dst);
- vec_store_le (v6, (64 * 2 + 16 * 1), dst);
- vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
- vec_store_le (v8, (64 * 0 + 16 * 2), dst);
- vec_store_le (v9, (64 * 1 + 16 * 2), dst);
- vec_store_le (v10, (64 * 2 + 16 * 2), dst);
- vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
- vec_store_le (v12, (64 * 0 + 16 * 3), dst);
- vec_store_le (v13, (64 * 1 + 16 * 3), dst);
- vec_store_le (v14, (64 * 2 + 16 * 3), dst);
- vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
- src += 4*64;
- dst += 4*64;
-
- nblks -= 4;
- }
- while (nblks);
-
- vec_vsx_st (state3, 3 * 16, state);
-
- return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
- "CHACHA20_BUFSIZE not multiple of 4");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
- __chacha20_power8_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
endif

endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
- chacha20-s390x \
- # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher
-
- Copyright (C) 2020 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
- 0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
- 0x80|((value)&0x7f), \
- 0x80|(((value)>>7)&0x7f), \
- 0x80|(((value)>>14)&0x7f), \
- 0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
- .cfi_escape \
- 0x0f, /* DW_CFA_def_cfa_expression */ \
- DW_SLEB128_7BIT(11), /* length */ \
- 0x7f, /* DW_OP_breg15, rsp + constant */ \
- DW_SLEB128_28BIT(rsp_offs), \
- 0x06, /* DW_OP_deref */ \
- 0x23, /* DW_OP_plus_constu */ \
- DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
- .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
- .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
- .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
- .long 0, 0, 0, 1
-.Ladd_counter_0123:
- .long 0, 1, 2, 3
-.Ladd_counter_4567:
- .long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST %r3
-#define SRC %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
- STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
- STACK_FRAME_PARAMS)
-
-#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9 (STACK_F8 + 8)
-#define STACK_F10 (STACK_F9 + 8)
-#define STACK_F11 (STACK_F10 + 8)
-#define STACK_F12 (STACK_F11 + 8)
-#define STACK_F13 (STACK_F12 + 8)
-#define STACK_F14 (STACK_F13 + 8)
-#define STACK_F15 (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST (STACK_INPUT + 8)
-#define STACK_SRC (STACK_DST + 8)
-#define STACK_NBLKS (STACK_SRC + 8)
-#define STACK_POCTX (STACK_NBLKS + 8)
-#define STACK_POSRC (STACK_POCTX + 8)
-
-#define STACK_G0_H3 STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
- lgr %r0, %r15; \
- lghi %r1, ~15; \
- stmg %r6, last_r, 6 * 8(%r15); \
- aghi %r0, -STACK_MAX; \
- ngr %r0, %r1; \
- lgr %r1, %r15; \
- cfi_def_cfa_register(1); \
- lgr %r15, %r0; \
- stg %r1, 0(%r15); \
- cfi_cfa_on_stack(0, 0); \
- std %f8, STACK_F8(%r15); \
- std %f9, STACK_F9(%r15); \
- std %f10, STACK_F10(%r15); \
- std %f11, STACK_F11(%r15); \
- std %f12, STACK_F12(%r15); \
- std %f13, STACK_F13(%r15); \
- std %f14, STACK_F14(%r15); \
- std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
- lg %r1, 0(%r15); \
- ld %f8, STACK_F8(%r15); \
- ld %f9, STACK_F9(%r15); \
- ld %f10, STACK_F10(%r15); \
- ld %f11, STACK_F11(%r15); \
- ld %f12, STACK_F12(%r15); \
- ld %f13, STACK_F13(%r15); \
- ld %f14, STACK_F14(%r15); \
- ld %f15, STACK_F15(%r15); \
- lmg %r6, last_r, 6 * 8(%r1); \
- lgr %r15, %r1; \
- cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
- vaf dst, dst, src;
-
-#define XOR(dst,src) \
- vx dst, dst, src;
-
-#define ROTATE(v1,c) \
- verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
- vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
- OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
- OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
- round macros
- **********************************************************************/
-
-/**********************************************************************
- 8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
- x8,x9,x10,x11,x12,x13,x14,x15,\
- y0,y1,y2,y3,y4,y5,y6,y7,\
- y8,y9,y10,y11,y12,y13,y14,y15,\
- op1,op2,op3,op4,op5,op6,op7,op8,\
- op9,op10,op11,op12) \
- op1; \
- PLUS(x0, x1); PLUS(x4, x5); \
- PLUS(x8, x9); PLUS(x12, x13); \
- PLUS(y0, y1); PLUS(y4, y5); \
- PLUS(y8, y9); PLUS(y12, y13); \
- op2; \
- XOR(x3, x0); XOR(x7, x4); \
- XOR(x11, x8); XOR(x15, x12); \
- XOR(y3, y0); XOR(y7, y4); \
- XOR(y11, y8); XOR(y15, y12); \
- op3; \
- ROTATE(x3, 16); ROTATE(x7, 16); \
- ROTATE(x11, 16); ROTATE(x15, 16); \
- ROTATE(y3, 16); ROTATE(y7, 16); \
- ROTATE(y11, 16); ROTATE(y15, 16); \
- op4; \
- PLUS(x2, x3); PLUS(x6, x7); \
- PLUS(x10, x11); PLUS(x14, x15); \
- PLUS(y2, y3); PLUS(y6, y7); \
- PLUS(y10, y11); PLUS(y14, y15); \
- op5; \
- XOR(x1, x2); XOR(x5, x6); \
- XOR(x9, x10); XOR(x13, x14); \
- XOR(y1, y2); XOR(y5, y6); \
- XOR(y9, y10); XOR(y13, y14); \
- op6; \
- ROTATE(x1,12); ROTATE(x5,12); \
- ROTATE(x9,12); ROTATE(x13,12); \
- ROTATE(y1,12); ROTATE(y5,12); \
- ROTATE(y9,12); ROTATE(y13,12); \
- op7; \
- PLUS(x0, x1); PLUS(x4, x5); \
- PLUS(x8, x9); PLUS(x12, x13); \
- PLUS(y0, y1); PLUS(y4, y5); \
- PLUS(y8, y9); PLUS(y12, y13); \
- op8; \
- XOR(x3, x0); XOR(x7, x4); \
- XOR(x11, x8); XOR(x15, x12); \
- XOR(y3, y0); XOR(y7, y4); \
- XOR(y11, y8); XOR(y15, y12); \
- op9; \
- ROTATE(x3,8); ROTATE(x7,8); \
- ROTATE(x11,8); ROTATE(x15,8); \
- ROTATE(y3,8); ROTATE(y7,8); \
- ROTATE(y11,8); ROTATE(y15,8); \
- op10; \
- PLUS(x2, x3); PLUS(x6, x7); \
- PLUS(x10, x11); PLUS(x14, x15); \
- PLUS(y2, y3); PLUS(y6, y7); \
- PLUS(y10, y11); PLUS(y14, y15); \
- op11; \
- XOR(x1, x2); XOR(x5, x6); \
- XOR(x9, x10); XOR(x13, x14); \
- XOR(y1, y2); XOR(y5, y6); \
- XOR(y9, y10); XOR(y13, y14); \
- op12; \
- ROTATE(x1,7); ROTATE(x5,7); \
- ROTATE(x9,7); ROTATE(x13,7); \
- ROTATE(y1,7); ROTATE(y5,7); \
- ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
- y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
- QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
- x8,x9,x10,x11,x12,x13,x14,x15,\
- y0,y1,y2,y3,y4,y5,y6,y7,\
- y8,y9,y10,y11,y12,y13,y14,y15,\
- ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
- vmrhf tmp0, v0, v1; \
- vmrhf tmp1, v2, v3; \
- vmrlf tmp2, v0, v1; \
- vmrlf v3, v2, v3; \
- vmrhf tmpa, va, vb; \
- vmrhf tmpb, vc, vd; \
- vmrlf tmpc, va, vb; \
- vmrlf vd, vc, vd; \
- vpdi v0, tmp0, tmp1, 0; \
- vpdi v1, tmp0, tmp1, 5; \
- vpdi v2, tmp2, v3, 0; \
- vpdi v3, tmp2, v3, 5; \
- vpdi va, tmpa, tmpb, 0; \
- vpdi vb, tmpa, tmpb, 5; \
- vpdi vc, tmpc, vd, 0; \
- vpdi vd, tmpc, vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
- /* input:
- * %r2: input
- * %r3: dst
- * %r4: src
- * %r5: nblks (multiple of 8)
- */
-
- START_STACK(%r8);
- lgr NBLKS, %r5;
-
- larl %r7, .Lconsts;
-
- /* Load counter. */
- lg %r8, (12 * 4)(INPUT);
- rllg %r8, %r8, 32;
-
-.balign 4
- /* Process eight chacha20 blocks per loop. */
-.Lloop8:
- vlm Y0, Y3, 0(INPUT);
-
- slgfi NBLKS, 8;
- lghi ROUND, (20 / 2);
-
- /* Construct counter vectors X12/X13 & Y12/Y13. */
- vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
- vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
- vrepf Y12, Y3, 0;
- vrepf Y13, Y3, 1;
- vaccf X5, Y12, X4;
- vaccf Y5, Y12, Y4;
- vaf X12, Y12, X4;
- vaf Y12, Y12, Y4;
- vaf X13, Y13, X5;
- vaf Y13, Y13, Y5;
-
- vrepf X0, Y0, 0;
- vrepf X1, Y0, 1;
- vrepf X2, Y0, 2;
- vrepf X3, Y0, 3;
- vrepf X4, Y1, 0;
- vrepf X5, Y1, 1;
- vrepf X6, Y1, 2;
- vrepf X7, Y1, 3;
- vrepf X8, Y2, 0;
- vrepf X9, Y2, 1;
- vrepf X10, Y2, 2;
- vrepf X11, Y2, 3;
- vrepf X14, Y3, 2;
- vrepf X15, Y3, 3;
-
- /* Store counters for blocks 0-7. */
- vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
- vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
- vlr Y0, X0;
- vlr Y1, X1;
- vlr Y2, X2;
- vlr Y3, X3;
- vlr Y4, X4;
- vlr Y5, X5;
- vlr Y6, X6;
- vlr Y7, X7;
- vlr Y8, X8;
- vlr Y9, X9;
- vlr Y10, X10;
- vlr Y11, X11;
- vlr Y14, X14;
- vlr Y15, X15;
-
- /* Update and store counter. */
- agfi %r8, 8;
- rllg %r5, %r8, 32;
- stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
- QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13,
- X2, X6, X10, X14, X3, X7, X11, X15,
- Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13,
- Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15);
- QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12,
- X2, X7, X8, X13, X3, X4, X9, X14,
- Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12,
- Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14);
- brctg ROUND, .Lround2_8;
-
- /* Store blocks 4-7. */
- vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
- /* Load counters for blocks 0-3. */
- vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
- lghi ROUND, 1;
- j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
- /* Load blocks 4-7. */
- vlm X0, X15, STACK_Y0_Y15(%r15);
-
- /* Load counters for blocks 4-7. */
- vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
- lghi ROUND, 0;
-
-.balign 4
- /* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
- vlm Y12, Y15, 0(INPUT);
- PLUS(X12, Y0);
- PLUS(X13, Y1);
- vrepf Y0, Y12, 0;
- vrepf Y1, Y12, 1;
- vrepf Y2, Y12, 2;
- vrepf Y3, Y12, 3;
- vrepf Y4, Y13, 0;
- vrepf Y5, Y13, 1;
- vrepf Y6, Y13, 2;
- vrepf Y7, Y13, 3;
- vrepf Y8, Y14, 0;
- vrepf Y9, Y14, 1;
- vrepf Y10, Y14, 2;
- vrepf Y11, Y14, 3;
- vrepf Y14, Y15, 2;
- vrepf Y15, Y15, 3;
- PLUS(X0, Y0);
- PLUS(X1, Y1);
- PLUS(X2, Y2);
- PLUS(X3, Y3);
- PLUS(X4, Y4);
- PLUS(X5, Y5);
- PLUS(X6, Y6);
- PLUS(X7, Y7);
- PLUS(X8, Y8);
- PLUS(X9, Y9);
- PLUS(X10, Y10);
- PLUS(X11, Y11);
- PLUS(X14, Y14);
- PLUS(X15, Y15);
-
- vl Y15, (.Lbswap32 - .Lconsts)(%r7);
- TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
- Y9, Y10, Y11, Y12, Y13, Y14);
- TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
- Y9, Y10, Y11, Y12, Y13, Y14);
-
- vlm Y0, Y14, 0(SRC);
- vperm X0, X0, X0, Y15;
- vperm X1, X1, X1, Y15;
- vperm X2, X2, X2, Y15;
- vperm X3, X3, X3, Y15;
- vperm X4, X4, X4, Y15;
- vperm X5, X5, X5, Y15;
- vperm X6, X6, X6, Y15;
- vperm X7, X7, X7, Y15;
- vperm X8, X8, X8, Y15;
- vperm X9, X9, X9, Y15;
- vperm X10, X10, X10, Y15;
- vperm X11, X11, X11, Y15;
- vperm X12, X12, X12, Y15;
- vperm X13, X13, X13, Y15;
- vperm X14, X14, X14, Y15;
- vperm X15, X15, X15, Y15;
- vl Y15, (15 * 16)(SRC);
-
- XOR(Y0, X0);
- XOR(Y1, X4);
- XOR(Y2, X8);
- XOR(Y3, X12);
- XOR(Y4, X1);
- XOR(Y5, X5);
- XOR(Y6, X9);
- XOR(Y7, X13);
- XOR(Y8, X2);
- XOR(Y9, X6);
- XOR(Y10, X10);
- XOR(Y11, X14);
- XOR(Y12, X3);
- XOR(Y13, X7);
- XOR(Y14, X11);
- XOR(Y15, X15);
- vstm Y0, Y15, 0(DST);
-
- aghi SRC, 256;
- aghi DST, 256;
-
- clgije ROUND, 1, .Lsecond_output_4blks_8;
-
- clgijhe NBLKS, 8, .Lloop8;
-
-
- END_STACK(%r8);
- xgr %r2, %r2;
- br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
- _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
- "CHACHA20_BUFSIZE not multiple of 8");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
- if (GLRO(dl_hwcap) & HWCAP_S390_VX)
- {
- __chacha20_s390x_vx_blocks8 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
- return;
- }
-#endif
- chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */

-#include <stdlib/arc4random.h>
#include <string.h>
#include <tls-internal.h>

@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
struct pthread *self = THREAD_SELF;
free (self->tls_state.strsignal_buf);
free (self->tls_state.strerror_l_buf);
-
- if (self->tls_state.rand_state != NULL)
- {
- /* Clear any lingering random state prior so if the thread stack is
- cached it won't leak any data. */
- explicit_bzero (self->tls_state.rand_state,
- sizeof (*self->tls_state.rand_state));
- free (self->tls_state.rand_state);
- }
}
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
gen-as-const-headers += link-defines.sym
endif

-ifeq ($(subdir),stdlib)
-sysdep_routines += \
- chacha20-amd64-sse2 \
- chacha20-amd64-avx2 \
- # sysdep_routines
-endif
-
ifeq ($(subdir),gmon)
sysdep_routines += _mcount
# We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
-
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher
-
- Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
- http://cr.yp.to/chacha.html:
-
- chacha-regs.c version 20080118
- D. J. Bernstein
- Public domain. */
-
-#include <sysdep.h>
-
-#ifdef PIC
-# define rRIP (%rip)
-#else
-# define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST %rsi
-#define SRC %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP (32 + STACK_VEC_X13)
-#define STACK_TMP1 (32 + STACK_TMP)
-
-#define STACK_MAX (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
- vpunpckhdq x1, x0, t2; \
- vpunpckldq x1, x0, x0; \
- \
- vpunpckldq x3, x2, t1; \
- vpunpckhdq x3, x2, x2; \
- \
- vpunpckhqdq t1, x0, x1; \
- vpunpcklqdq t1, x0, x0; \
- \
- vpunpckhqdq x2, t2, x3; \
- vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
- vmovdqa x0, t1; \
- vperm2i128 $0x20, x1, x0, x0; \
- vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
- 8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp) \
- vpsrld $(32 - (c)), v1, tmp; \
- vpslld $(c), v1, v1; \
- vpaddb tmp, v1, v1; \
- vpsrld $(32 - (c)), v2, tmp; \
- vpslld $(c), v2, v2; \
- vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf) \
- vpshufb shuf, v1, v1; \
- vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
- vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
- vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
- interleave_op1,interleave_op2,\
- interleave_op3,interleave_op4) \
- vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \
- interleave_op1; \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE_SHUF_2(d1, d2, tmp1); \
- interleave_op2; \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 12, tmp1); \
- vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \
- interleave_op3; \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE_SHUF_2(d1, d2, tmp1); \
- interleave_op4; \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 7, tmp1);
-
- .section .text.avx2, "ax", @progbits
- .align 32
-chacha20_data:
-L(shuf_rol16):
- .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
- .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
- .byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
- .long 0x80000000
-
- .hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
- /* input:
- * %rdi: input
- * %rsi: dst
- * %rdx: src
- * %rcx: nblks (multiple of 8)
- */
- vzeroupper;
-
- pushq %rbp;
- cfi_adjust_cfa_offset(8);
- cfi_rel_offset(rbp, 0)
- movq %rsp, %rbp;
- cfi_def_cfa_register(rbp);
-
- subq $STACK_MAX, %rsp;
- andq $~31, %rsp;
-
-L(loop8):
- mov $20, ROUND;
-
- /* Construct counter vectors X12 and X13 */
- vpmovzxbd L(inc_counter) rRIP, X0;
- vpbroadcastd L(unsigned_cmp) rRIP, X2;
- vpbroadcastd (12 * 4)(INPUT), X12;
- vpbroadcastd (13 * 4)(INPUT), X13;
- vpaddd X0, X12, X12;
- vpxor X2, X0, X0;
- vpxor X2, X12, X1;
- vpcmpgtd X1, X0, X0;
- vpsubd X0, X13, X13;
- vmovdqa X12, (STACK_VEC_X12)(%rsp);
- vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
- /* Load vectors */
- vpbroadcastd (0 * 4)(INPUT), X0;
- vpbroadcastd (1 * 4)(INPUT), X1;
- vpbroadcastd (2 * 4)(INPUT), X2;
- vpbroadcastd (3 * 4)(INPUT), X3;
- vpbroadcastd (4 * 4)(INPUT), X4;
- vpbroadcastd (5 * 4)(INPUT), X5;
- vpbroadcastd (6 * 4)(INPUT), X6;
- vpbroadcastd (7 * 4)(INPUT), X7;
- vpbroadcastd (8 * 4)(INPUT), X8;
- vpbroadcastd (9 * 4)(INPUT), X9;
- vpbroadcastd (10 * 4)(INPUT), X10;
- vpbroadcastd (11 * 4)(INPUT), X11;
- vpbroadcastd (14 * 4)(INPUT), X14;
- vpbroadcastd (15 * 4)(INPUT), X15;
- vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
- QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,)
- vmovdqa (STACK_TMP)(%rsp), X15;
- vmovdqa X8, (STACK_TMP)(%rsp);
- QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,)
- QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,)
- vmovdqa (STACK_TMP)(%rsp), X8;
- vmovdqa X15, (STACK_TMP)(%rsp);
- QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,)
- sub $2, ROUND;
- jnz L(round2);
-
- vmovdqa X8, (STACK_TMP1)(%rsp);
-
- /* tmp := X15 */
- vpbroadcastd (0 * 4)(INPUT), X15;
- PLUS(X0, X15);
- vpbroadcastd (1 * 4)(INPUT), X15;
- PLUS(X1, X15);
- vpbroadcastd (2 * 4)(INPUT), X15;
- PLUS(X2, X15);
- vpbroadcastd (3 * 4)(INPUT), X15;
- PLUS(X3, X15);
- vpbroadcastd (4 * 4)(INPUT), X15;
- PLUS(X4, X15);
- vpbroadcastd (5 * 4)(INPUT), X15;
- PLUS(X5, X15);
- vpbroadcastd (6 * 4)(INPUT), X15;
- PLUS(X6, X15);
- vpbroadcastd (7 * 4)(INPUT), X15;
- PLUS(X7, X15);
- transpose_4x4(X0, X1, X2, X3, X8, X15);
- transpose_4x4(X4, X5, X6, X7, X8, X15);
- vmovdqa (STACK_TMP1)(%rsp), X8;
- transpose_16byte_2x2(X0, X4, X15);
- transpose_16byte_2x2(X1, X5, X15);
- transpose_16byte_2x2(X2, X6, X15);
- transpose_16byte_2x2(X3, X7, X15);
- vmovdqa (STACK_TMP)(%rsp), X15;
- vmovdqu X0, (64 * 0 + 16 * 0)(DST)
- vmovdqu X1, (64 * 1 + 16 * 0)(DST)
- vpbroadcastd (8 * 4)(INPUT), X0;
- PLUS(X8, X0);
- vpbroadcastd (9 * 4)(INPUT), X0;
- PLUS(X9, X0);
- vpbroadcastd (10 * 4)(INPUT), X0;
- PLUS(X10, X0);
- vpbroadcastd (11 * 4)(INPUT), X0;
- PLUS(X11, X0);
- vmovdqa (STACK_VEC_X12)(%rsp), X0;
- PLUS(X12, X0);
- vmovdqa (STACK_VEC_X13)(%rsp), X0;
- PLUS(X13, X0);
- vpbroadcastd (14 * 4)(INPUT), X0;
- PLUS(X14, X0);
- vpbroadcastd (15 * 4)(INPUT), X0;
- PLUS(X15, X0);
- vmovdqu X2, (64 * 2 + 16 * 0)(DST)
- vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
- /* Update counter */
- addq $8, (12 * 4)(INPUT);
-
- transpose_4x4(X8, X9, X10, X11, X0, X1);
- transpose_4x4(X12, X13, X14, X15, X0, X1);
- vmovdqu X4, (64 * 4 + 16 * 0)(DST)
- vmovdqu X5, (64 * 5 + 16 * 0)(DST)
- transpose_16byte_2x2(X8, X12, X0);
- transpose_16byte_2x2(X9, X13, X0);
- transpose_16byte_2x2(X10, X14, X0);
- transpose_16byte_2x2(X11, X15, X0);
- vmovdqu X6, (64 * 6 + 16 * 0)(DST)
- vmovdqu X7, (64 * 7 + 16 * 0)(DST)
- vmovdqu X8, (64 * 0 + 16 * 2)(DST)
- vmovdqu X9, (64 * 1 + 16 * 2)(DST)
- vmovdqu X10, (64 * 2 + 16 * 2)(DST)
- vmovdqu X11, (64 * 3 + 16 * 2)(DST)
- vmovdqu X12, (64 * 4 + 16 * 2)(DST)
- vmovdqu X13, (64 * 5 + 16 * 2)(DST)
- vmovdqu X14, (64 * 6 + 16 * 2)(DST)
- vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
- sub $8, NBLKS;
- lea (8 * 64)(DST), DST;
- lea (8 * 64)(SRC), SRC;
- jnz L(loop8);
-
- vzeroupper;
-
- /* eax zeroed by round loop. */
- leave;
- cfi_adjust_cfa_offset(-8)
- cfi_def_cfa_register(%rsp);
- ret;
- int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher
-
- Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
-
- This file is part of Libgcrypt.
-
- Libgcrypt is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as
- published by the Free Software Foundation; either version 2.1 of
- the License, or (at your option) any later version.
-
- Libgcrypt is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
- http://cr.yp.to/chacha.html:
-
- chacha-regs.c version 20080118
- D. J. Bernstein
- Public domain. */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-# define rRIP (%rip)
-#else
-# define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
- ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST %rsi
-#define SRC %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP (16 + STACK_VEC_X13)
-#define STACK_TMP1 (16 + STACK_TMP)
-#define STACK_TMP2 (16 + STACK_TMP1)
-
-#define STACK_MAX (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
- helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
- movdqa x0, t2; \
- punpckhdq x1, t2; \
- punpckldq x1, x0; \
- \
- movdqa x2, t1; \
- punpckldq x3, t1; \
- punpckhdq x3, x2; \
- \
- movdqa x0, x1; \
- punpckhqdq t1, x1; \
- punpcklqdq t1, x0; \
- \
- movdqa t2, x3; \
- punpckhqdq x2, x3; \
- punpcklqdq x2, t2; \
- movdqa t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
- movd mem32, xreg; \
- pshufd $0, xreg, xreg;
-
-/**********************************************************************
- 4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2) \
- movdqa v1, tmp1; \
- movdqa v2, tmp2; \
- psrld $(32 - (c)), v1; \
- pslld $(c), tmp1; \
- paddb tmp1, v1; \
- psrld $(32 - (c)), v2; \
- pslld $(c), tmp2; \
- paddb tmp2, v2;
-
-#define XOR(ds,s) \
- pxor s, ds;
-
-#define PLUS(ds,s) \
- paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE2(d1, d2, 16, tmp1, tmp2); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 12, tmp1, tmp2); \
- PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
- ROTATE2(d1, d2, 8, tmp1, tmp2); \
- PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
- ROTATE2(b1, b2, 7, tmp1, tmp2);
-
- .section .text.sse2,"ax",@progbits
-
-chacha20_data:
- .align 16
-L(counter1):
- .long 1,0,0,0
-L(inc_counter):
- .long 0,1,2,3
-L(unsigned_cmp):
- .long 0x80000000,0x80000000,0x80000000,0x80000000
-
- .hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
- /* input:
- * %rdi: input
- * %rsi: dst
- * %rdx: src
- * %rcx: nblks (multiple of 4)
- */
-
- pushq %rbp;
- cfi_adjust_cfa_offset(8);
- cfi_rel_offset(rbp, 0)
- movq %rsp, %rbp;
- cfi_def_cfa_register(%rbp);
-
- subq $STACK_MAX, %rsp;
- andq $~15, %rsp;
-
-L(loop4):
- mov $20, ROUND;
-
- /* Construct counter vectors X12 and X13 */
- movdqa L(inc_counter) rRIP, X0;
- movdqa L(unsigned_cmp) rRIP, X2;
- PBROADCASTD((12 * 4)(INPUT), X12);
- PBROADCASTD((13 * 4)(INPUT), X13);
- paddd X0, X12;
- movdqa X12, X1;
- pxor X2, X0;
- pxor X2, X1;
- pcmpgtd X1, X0;
- psubd X0, X13;
- movdqa X12, (STACK_VEC_X12)(%rsp);
- movdqa X13, (STACK_VEC_X13)(%rsp);
-
- /* Load vectors */
- PBROADCASTD((0 * 4)(INPUT), X0);
- PBROADCASTD((1 * 4)(INPUT), X1);
- PBROADCASTD((2 * 4)(INPUT), X2);
- PBROADCASTD((3 * 4)(INPUT), X3);
- PBROADCASTD((4 * 4)(INPUT), X4);
- PBROADCASTD((5 * 4)(INPUT), X5);
- PBROADCASTD((6 * 4)(INPUT), X6);
- PBROADCASTD((7 * 4)(INPUT), X7);
- PBROADCASTD((8 * 4)(INPUT), X8);
- PBROADCASTD((9 * 4)(INPUT), X9);
- PBROADCASTD((10 * 4)(INPUT), X10);
- PBROADCASTD((11 * 4)(INPUT), X11);
- PBROADCASTD((14 * 4)(INPUT), X14);
- PBROADCASTD((15 * 4)(INPUT), X15);
- movdqa X11, (STACK_TMP)(%rsp);
- movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
- QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15)
- movdqa (STACK_TMP)(%rsp), X11;
- movdqa (STACK_TMP1)(%rsp), X15;
- movdqa X8, (STACK_TMP)(%rsp);
- movdqa X9, (STACK_TMP1)(%rsp);
- QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9)
- QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9)
- movdqa (STACK_TMP)(%rsp), X8;
- movdqa (STACK_TMP1)(%rsp), X9;
- movdqa X11, (STACK_TMP)(%rsp);
- movdqa X15, (STACK_TMP1)(%rsp);
- QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15)
- sub $2, ROUND;
- jnz L(round2_4);
-
- /* tmp := X15 */
- movdqa (STACK_TMP)(%rsp), X11;
- PBROADCASTD((0 * 4)(INPUT), X15);
- PLUS(X0, X15);
- PBROADCASTD((1 * 4)(INPUT), X15);
- PLUS(X1, X15);
- PBROADCASTD((2 * 4)(INPUT), X15);
- PLUS(X2, X15);
- PBROADCASTD((3 * 4)(INPUT), X15);
- PLUS(X3, X15);
- PBROADCASTD((4 * 4)(INPUT), X15);
- PLUS(X4, X15);
- PBROADCASTD((5 * 4)(INPUT), X15);
- PLUS(X5, X15);
- PBROADCASTD((6 * 4)(INPUT), X15);
- PLUS(X6, X15);
- PBROADCASTD((7 * 4)(INPUT), X15);
- PLUS(X7, X15);
- PBROADCASTD((8 * 4)(INPUT), X15);
- PLUS(X8, X15);
- PBROADCASTD((9 * 4)(INPUT), X15);
- PLUS(X9, X15);
- PBROADCASTD((10 * 4)(INPUT), X15);
- PLUS(X10, X15);
- PBROADCASTD((11 * 4)(INPUT), X15);
- PLUS(X11, X15);
- movdqa (STACK_VEC_X12)(%rsp), X15;
- PLUS(X12, X15);
- movdqa (STACK_VEC_X13)(%rsp), X15;
- PLUS(X13, X15);
- movdqa X13, (STACK_TMP)(%rsp);
- PBROADCASTD((14 * 4)(INPUT), X15);
- PLUS(X14, X15);
- movdqa (STACK_TMP1)(%rsp), X15;
- movdqa X14, (STACK_TMP1)(%rsp);
- PBROADCASTD((15 * 4)(INPUT), X13);
- PLUS(X15, X13);
- movdqa X15, (STACK_TMP2)(%rsp);
-
- /* Update counter */
- addq $4, (12 * 4)(INPUT);
-
- TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
- movdqu X0, (64 * 0 + 16 * 0)(DST)
- movdqu X1, (64 * 1 + 16 * 0)(DST)
- movdqu X2, (64 * 2 + 16 * 0)(DST)
- movdqu X3, (64 * 3 + 16 * 0)(DST)
- TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
- movdqa (STACK_TMP)(%rsp), X13;
- movdqa (STACK_TMP1)(%rsp), X14;
- movdqa (STACK_TMP2)(%rsp), X15;
- movdqu X4, (64 * 0 + 16 * 1)(DST)
- movdqu X5, (64 * 1 + 16 * 1)(DST)
- movdqu X6, (64 * 2 + 16 * 1)(DST)
- movdqu X7, (64 * 3 + 16 * 1)(DST)
- TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
- movdqu X8, (64 * 0 + 16 * 2)(DST)
- movdqu X9, (64 * 1 + 16 * 2)(DST)
- movdqu X10, (64 * 2 + 16 * 2)(DST)
- movdqu X11, (64 * 3 + 16 * 2)(DST)
- TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
- movdqu X12, (64 * 0 + 16 * 3)(DST)
- movdqu X13, (64 * 1 + 16 * 3)(DST)
- movdqu X14, (64 * 2 + 16 * 3)(DST)
- movdqu X15, (64 * 3 + 16 * 3)(DST)
-
- sub $4, NBLKS;
- lea (4 * 64)(DST), DST;
- lea (4 * 64)(SRC), SRC;
- jnz L(loop4);
-
- /* eax zeroed by round loop. */
- leave;
- cfi_adjust_cfa_offset(-8)
- cfi_def_cfa_register(%rsp);
- ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
- Copyright (C) 2022 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
- const uint8_t *src, size_t nblks)
- attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
- size_t bytes)
-{
- _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
- "CHACHA20_BUFSIZE not multiple of 4 or 8");
- _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
- "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
- __chacha20_avx2_blocks8 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
- const struct cpu_features* cpu_features = __get_cpu_features ();
-
- /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */
- if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
- && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
- __chacha20_avx2_blocks8 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
- else
- __chacha20_sse2_blocks4 (state, dst, src,
- CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
--
2.35.1

2022-07-26 00:11:14

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

On Tue, Jul 26, 2022 at 01:28:10AM +0200, Jason A. Donenfeld wrote:
> Rather than buffering 16 MiB of entropy in userspace (by way of
> chacha20), simply call getrandom() every time.
>
> This approach is doubtlessly slower, for now, but trying to prematurely
> optimize arc4random appears to be leading toward all sorts of nasty
> properties and gotchas. Instead, this patch takes a much more
> conservative approach. The interface is added as a basic loop wrapper
> around getrandom(), and then later, the kernel and libc together can
> work together on optimizing that.
>
> This prevents numerous issues in which userspace is unaware of when it
> really must throw away its buffer, since we avoid buffering all
> together. Future improvements may include userspace learning more from
> the kernel about when to do that, which might make these sorts of
> chacha20-based optimizations more possible. The current heuristic of 16
> MiB is meaningless garbage that doesn't correspond to anything the
> kernel might know about. So for now, let's just do something
> conservative that we know is correct and won't lead to cryptographic
> issues for users of this function.
>
> This patch might be considered along the lines of, "optimization is the
> root of all evil," in that the much more complex implementation it
> replaces moves too fast without considering security implications,
> whereas the incremental approach done here is a much safer way of going
> about things. Once this lands, we can take our time in optimizing this
> properly using new interplay between the kernel and userspace.
>
> getrandom(0) is used, since that's the one that ensures the bytes
> returned are cryptographically secure. But on systems without it, we
> fallback to using /dev/urandom. This is unfortunate because it means
> opening a file descriptor, but there's not much of a choice. Secondly,
> as part of the fallback, in order to get more or less the same
> properties of getrandom(0), we poll on /dev/random, and if the poll
> succeeds at least once, then we assume the RNG is initialized. This is a
> rough approximation, as the ancient "non-blocking pool" initialized
> after the "blocking pool", not before, but it's the best approximation
> we can do.
>
> The motivation for including arc4random, in the first place, is to have
> source-level compatibility with existing code. That means this patch
> doesn't attempt to litigate the interface itself. It does, however,
> choose a conservative approach for implementing it.
>
> Cc: Adhemerval Zanella Netto <[email protected]>
> Cc: Florian Weimer <[email protected]>
> Cc: Cristian Rodr?guez <[email protected]>
> Cc: Paul Eggert <[email protected]>
> Cc: [email protected]
> Signed-off-by: Jason A. Donenfeld <[email protected]>

This looks good to me.

There are still a few bits that need to be removed/updated. With a quick grep,
I found:

sysdeps/generic/tls-internal-struct.h: struct arc4random_state_t *rand_state;

sysdeps/unix/sysv/linux/tls-internal.h:/* Reset the arc4random TCB state on fork. *

NEWS: ... The functions use a pseudo-random number generator along with
NEWS: entropy from the kernel.


Also, the documentation in manual/math.texi should say that the randomness is
cryptographically secure.

- Eric

2022-07-26 01:20:32

by Mark Harris

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Jason A. Donenfeld wrote:
> + l = __getrandom_nocancel (p, n, 0);
> + if (l > 0)
> + {
> + if ((size_t) l == n)
> + return; /* Done reading, success. */
> + p = (uint8_t *) p + l;
> + n -= l;
> + continue; /* Interrupted by a signal; keep going. */
> + }
> + else if (l == 0)
> + arc4random_getrandom_failure (); /* Weird, should never happen. */
> + else if (errno == ENOSYS)
> + {
> + have_getrandom = false;
> + break; /* No syscall, so fallback to /dev/urandom. */
> + }
> + arc4random_getrandom_failure (); /* Unknown error, should never happen. */

Isn't EINTR also possible? Aborting in that case does not seem reasonable.

Also the __getrandom_nocancel function does not set errno on Linux; it
just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags).
So unless that is changed, it doesn't look like this ENOSYS check will
detect old Linux kernels.

> + struct pollfd pfd = { .events = POLLIN };
> + pfd.fd = TEMP_FAILURE_RETRY (
> + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> + if (pfd.fd < 0)
> + arc4random_getrandom_failure ();
> + if (__poll (&pfd, 1, -1) < 0)
> + arc4random_getrandom_failure ();
> + if (__close_nocancel (pfd.fd) < 0)
> + arc4random_getrandom_failure ();

The TEMP_FAILURE_RETRY handles EINTR on open, but __poll can also
result in EINTR.


- Mark

2022-07-26 10:00:03

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

* Jason A. Donenfeld:

> + pfd.fd = TEMP_FAILURE_RETRY (
> + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> + if (pfd.fd < 0)
> + arc4random_getrandom_failure ();
> + if (__poll (&pfd, 1, -1) < 0)
> + arc4random_getrandom_failure ();
> + if (__close_nocancel (pfd.fd) < 0)
> + arc4random_getrandom_failure ();

What happens if /dev/random is actually /dev/urandom? Will the poll
call fail?

I think we need a no-cancel variant of poll here, and we also need to
handle EINTR gracefully.

Performance-wise, my 1000 element shuffle benchmark runs about 14 times
slower without userspace buffering. (For comparison, just removing
ChaCha20 while keeping a 256-byte buffer makes it run roughly 25% slower
than current master.) Our random() implementation is quite slow, so
arc4random() as a replacement call is competitive. The unbuffered
version, not so much.

Running the benchmark, I see 40% of the time spent in chacha_permute in
the kernel, that is really quite odd. Why doesn't the system call
overhead dominate?

Thanks,
Florian

2022-07-26 10:27:08

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hi Eric,

On Mon, Jul 25, 2022 at 04:59:17PM -0700, Eric Biggers wrote:
> This looks good to me.
>
> There are still a few bits that need to be removed/updated. With a quick grep,
> I found:
>
> sysdeps/generic/tls-internal-struct.h: struct arc4random_state_t *rand_state;
>
> sysdeps/unix/sysv/linux/tls-internal.h:/* Reset the arc4random TCB state on fork. *
>
> NEWS: ... The functions use a pseudo-random number generator along with
> NEWS: entropy from the kernel.
>
>
> Also, the documentation in manual/math.texi should say that the randomness is
> cryptographically secure.

Thanks for the notes. I'll clean that all up in v3.

Jason

2022-07-26 10:52:50

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hi Mark,

On Mon, Jul 25, 2022 at 06:10:06PM -0700, Mark Harris wrote:
> Jason A. Donenfeld wrote:
> > + l = __getrandom_nocancel (p, n, 0);
> > + if (l > 0)
> > + {
> > + if ((size_t) l == n)
> > + return; /* Done reading, success. */
> > + p = (uint8_t *) p + l;
> > + n -= l;
> > + continue; /* Interrupted by a signal; keep going. */
> > + }
> > + else if (l == 0)
> > + arc4random_getrandom_failure (); /* Weird, should never happen. */
> > + else if (errno == ENOSYS)
> > + {
> > + have_getrandom = false;
> > + break; /* No syscall, so fallback to /dev/urandom. */
> > + }
> > + arc4random_getrandom_failure (); /* Unknown error, should never happen. */
>
> Isn't EINTR also possible? Aborting in that case does not seem reasonable.

Not in current kernels, where it always returns at least PAGE_SIZE bytes
before checking for pending signals. In older kernels, if there was a
signal pending at the top, it would do no work and return -ERESTARTSYS,
which I believe should then get restarted by glibc's syscaller? I might
be wrong about how restarts work though, so if you know better, please
let me know. TEMP_FAILURE_RETRY relies on errno, so that's not what we
want. I guess I can just add a case for it.

> Also the __getrandom_nocancel function does not set errno on Linux; it
> just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags).
> So unless that is changed, it doesn't look like this ENOSYS check will
> detect old Linux kernels.

Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
as a return value, right? I'll adjust the code to account for that.

> > + struct pollfd pfd = { .events = POLLIN };
> > + pfd.fd = TEMP_FAILURE_RETRY (
> > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> > + if (pfd.fd < 0)
> > + arc4random_getrandom_failure ();
> > + if (__poll (&pfd, 1, -1) < 0)
> > + arc4random_getrandom_failure ();
> > + if (__close_nocancel (pfd.fd) < 0)
> > + arc4random_getrandom_failure ();
>
> The TEMP_FAILURE_RETRY handles EINTR on open, but __poll can also
> result in EINTR.

Thanks. I'll surround the __poll in TEMP_FAILURE_RETRY.

Thank you for the review! v3 will have the above changes.

Jason

2022-07-26 11:10:59

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hi Florian,

On Tue, Jul 26, 2022 at 11:55:23AM +0200, Florian Weimer wrote:
> * Jason A. Donenfeld:
>
> > + pfd.fd = TEMP_FAILURE_RETRY (
> > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> > + if (pfd.fd < 0)
> > + arc4random_getrandom_failure ();
> > + if (__poll (&pfd, 1, -1) < 0)
> > + arc4random_getrandom_failure ();
> > + if (__close_nocancel (pfd.fd) < 0)
> > + arc4random_getrandom_failure ();
>
> What happens if /dev/random is actually /dev/urandom? Will the poll
> call fail?

Yes. I'm unsure if you're asking this because it'd be a nice
simplification to only have to open one fd, or because you're worried
about confusion. I don't think the confusion problem is one we should
take too seriously, but if you're concerned, we can always fstat and
check the maj/min. Seems a bit much, though.

> I think we need a no-cancel variant of poll here, and we also need to
> handle EINTR gracefully.

Thanks for the note about poll nocancel. I'll try to add this. I don't
totally know how to manage that pluming, but I'll give it my best shot.

> Performance-wise, my 1000 element shuffle benchmark runs about 14 times
> slower without userspace buffering. (For comparison, just removing
> ChaCha20 while keeping a 256-byte buffer makes it run roughly 25% slower
> than current master.) Our random() implementation is quite slow, so
> arc4random() as a replacement call is competitive. The unbuffered
> version, not so much.

Yes, as mentioned, this is slower. But let's get something down first
that's *correct*, and then after we can start optimizing it. Let's not
prematurely optimize and create a problematic function that nobody
should use.

> Running the benchmark, I see 40% of the time spent in chacha_permute in
> the kernel, that is really quite odd. Why doesn't the system call
> overhead dominate?

Huh, that is interesting. I guess if you're reading 4 bytes for an
integer, it winds up computing a whole chacha block each time, with half
of it doing fast key erasure and half of it being returnable to the
caller. When we later figure out a safer way to buffer, ostensibly this
will go away. But for now, we really should not prematurely optimize.

I'll have v3 out shortly with your suggested fixes.

Jason

2022-07-26 11:11:23

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

* Jason A. Donenfeld:

> Not in current kernels, where it always returns at least PAGE_SIZE bytes
> before checking for pending signals. In older kernels, if there was a
> signal pending at the top, it would do no work and return -ERESTARTSYS,
> which I believe should then get restarted by glibc's syscaller?

glibc does not handle ERESTARTSYS, it's a kernel-internal error code
that's not exported in UAPI headers and must not leak to userspace
(except perhaps via ptrace). I believe restarts are handled in the
kernel signal code, by tweaking the program counter. Looking at that,
ERESTARTSYS gets translated to EINTR for !SA_RESTART system calls:

/* Are we from a system call? */
if (syscall_get_nr(current, regs) != -1) {
/* If so, check system call restarting.. */
switch (syscall_get_error(current, regs)) {
case -ERESTART_RESTARTBLOCK:
case -ERESTARTNOHAND:
regs->ax = -EINTR;
break;

case -ERESTARTSYS:
if (!(ksig->ka.sa.sa_flags & SA_RESTART)) {
regs->ax = -EINTR;
break;
}
fallthrough;
case -ERESTARTNOINTR:
regs->ax = regs->orig_ax;
regs->ip -= 2;
break;
}
}

(arch/x86/kernel/signal.c)

Thanks,
Florian

2022-07-26 11:15:36

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

* Jason A. Donenfeld:

> Hi Florian,
>
> On Tue, Jul 26, 2022 at 11:55:23AM +0200, Florian Weimer wrote:
>> * Jason A. Donenfeld:
>>
>> > + pfd.fd = TEMP_FAILURE_RETRY (
>> > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
>> > + if (pfd.fd < 0)
>> > + arc4random_getrandom_failure ();
>> > + if (__poll (&pfd, 1, -1) < 0)
>> > + arc4random_getrandom_failure ();
>> > + if (__close_nocancel (pfd.fd) < 0)
>> > + arc4random_getrandom_failure ();
>>
>> What happens if /dev/random is actually /dev/urandom? Will the poll
>> call fail?
>
> Yes. I'm unsure if you're asking this because it'd be a nice
> simplification to only have to open one fd, or because you're worried
> about confusion. I don't think the confusion problem is one we should
> take too seriously, but if you're concerned, we can always fstat and
> check the maj/min. Seems a bit much, though.

Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used
to be the only way to get some applications working because they tried
to read from /dev/random at a higher rate than the system was estimating
entropy coming in. We may have to do something differently here if the
failing poll causes too much breakage.

>> Running the benchmark, I see 40% of the time spent in chacha_permute in
>> the kernel, that is really quite odd. Why doesn't the system call
>> overhead dominate?
>
> Huh, that is interesting. I guess if you're reading 4 bytes for an
> integer, it winds up computing a whole chacha block each time, with half
> of it doing fast key erasure and half of it being returnable to the
> caller. When we later figure out a safer way to buffer, ostensibly this
> will go away. But for now, we really should not prematurely optimize.

Yeah, I can't really argue against that, given that I said before that I
wasn't too worried about the implementation.

Thanks,
Florian

2022-07-26 11:31:08

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hey Florian,

On Tue, Jul 26, 2022 at 01:12:28PM +0200, Florian Weimer wrote:
> >> What happens if /dev/random is actually /dev/urandom? Will the poll
> >> call fail?
> >
> > Yes. I'm unsure if you're asking this because it'd be a nice
> > simplification to only have to open one fd, or because you're worried
> > about confusion. I don't think the confusion problem is one we should
> > take too seriously, but if you're concerned, we can always fstat and
> > check the maj/min. Seems a bit much, though.
>
> Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used
> to be the only way to get some applications working because they tried
> to read from /dev/random at a higher rate than the system was estimating
> entropy coming in. We may have to do something differently here if the
> failing poll causes too much breakage.

The "backup plan" would be to sleep-loop-read /proc/sys/kernel/random/entropy_avail
until it passes a certain threshold one time. This might also work on even older
kernels than the poll() trick. But that's pretty darn ugly, so it's not
obvious to me where the cut-off in frustration is, when we throw our
hands up and decide the ugliness is worth it compared to whatever
problems we happen to be facing at the time with the poll() technique.
But at least there is an alternative, should we need it.

Jason

2022-07-26 11:36:45

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety



On 25/07/22 20:28, Jason A. Donenfeld wrote:
> Rather than buffering 16 MiB of entropy in userspace (by way of
> chacha20), simply call getrandom() every time.
>
> This approach is doubtlessly slower, for now, but trying to prematurely
> optimize arc4random appears to be leading toward all sorts of nasty
> properties and gotchas. Instead, this patch takes a much more
> conservative approach. The interface is added as a basic loop wrapper
> around getrandom(), and then later, the kernel and libc together can
> work together on optimizing that.
>
> This prevents numerous issues in which userspace is unaware of when it
> really must throw away its buffer, since we avoid buffering all
> together. Future improvements may include userspace learning more from
> the kernel about when to do that, which might make these sorts of
> chacha20-based optimizations more possible. The current heuristic of 16
> MiB is meaningless garbage that doesn't correspond to anything the
> kernel might know about. So for now, let's just do something
> conservative that we know is correct and won't lead to cryptographic
> issues for users of this function.
>
> This patch might be considered along the lines of, "optimization is the
> root of all evil," in that the much more complex implementation it
> replaces moves too fast without considering security implications,
> whereas the incremental approach done here is a much safer way of going
> about things. Once this lands, we can take our time in optimizing this
> properly using new interplay between the kernel and userspace.
>
> getrandom(0) is used, since that's the one that ensures the bytes
> returned are cryptographically secure. But on systems without it, we
> fallback to using /dev/urandom. This is unfortunate because it means
> opening a file descriptor, but there's not much of a choice. Secondly,
> as part of the fallback, in order to get more or less the same
> properties of getrandom(0), we poll on /dev/random, and if the poll
> succeeds at least once, then we assume the RNG is initialized. This is a
> rough approximation, as the ancient "non-blocking pool" initialized
> after the "blocking pool", not before, but it's the best approximation
> we can do.
>
> The motivation for including arc4random, in the first place, is to have
> source-level compatibility with existing code. That means this patch
> doesn't attempt to litigate the interface itself. It does, however,
> choose a conservative approach for implementing it.
>
> Cc: Adhemerval Zanella Netto <[email protected]>
> Cc: Florian Weimer <[email protected]>
> Cc: Cristian Rodríguez <[email protected]>
> Cc: Paul Eggert <[email protected]>
> Cc: [email protected]
> Signed-off-by: Jason A. Donenfeld <[email protected]>

Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment,
sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build),
maybe also change the NEWS to state this is not a CSPRNG, and we definitely need
to update the manual. Some comments below.


> ---
> LICENSES | 23 -
> include/stdlib.h | 3 -
> stdlib/Makefile | 2 -
> stdlib/arc4random.c | 204 ++-----
> stdlib/arc4random.h | 48 --
> stdlib/chacha20.c | 191 ------
> stdlib/tst-arc4random-chacha20.c | 167 -----
> sysdeps/aarch64/Makefile | 4 -
> sysdeps/aarch64/chacha20-aarch64.S | 314 ----------
> sysdeps/aarch64/chacha20_arch.h | 40 --
> sysdeps/generic/chacha20_arch.h | 24 -
> sysdeps/generic/tls-internal.c | 10 -
> sysdeps/mach/hurd/_Fork.c | 2 -
> sysdeps/nptl/_Fork.c | 2 -
> .../powerpc/powerpc64/be/multiarch/Makefile | 4 -
> .../powerpc64/be/multiarch/chacha20-ppc.c | 1 -
> .../powerpc64/be/multiarch/chacha20_arch.h | 42 --
> sysdeps/powerpc/powerpc64/power8/Makefile | 5 -
> .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 --------
> .../powerpc/powerpc64/power8/chacha20_arch.h | 37 --
> sysdeps/s390/s390-64/Makefile | 6 -
> sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------
> sysdeps/s390/s390-64/chacha20_arch.h | 45 --
> sysdeps/unix/sysv/linux/tls-internal.c | 10 -
> sysdeps/x86_64/Makefile | 7 -
> sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ----------
> sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ----------
> sysdeps/x86_64/chacha20_arch.h | 55 --
> 28 files changed, 53 insertions(+), 2661 deletions(-)
> delete mode 100644 stdlib/arc4random.h
> delete mode 100644 stdlib/chacha20.c
> delete mode 100644 stdlib/tst-arc4random-chacha20.c
> delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
> delete mode 100644 sysdeps/aarch64/chacha20_arch.h
> delete mode 100644 sysdeps/generic/chacha20_arch.h
> delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
> delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
> delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
> delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
> delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
> delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
> delete mode 100644 sysdeps/x86_64/chacha20_arch.h
>
> diff --git a/LICENSES b/LICENSES
> index cd04fb6e84..530893b1dc 100644
> --- a/LICENSES
> +++ b/LICENSES
> @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <[email protected]>
> You should have received a copy of the GNU Lesser General Public
> License along with this library; if not, see
> <https://www.gnu.org/licenses/>. */
> -
> -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
> -sysdeps/x86_64/chacha20-amd64-avx2.S, and
> -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
> -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
> -with the following notices:
> -
> -Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
> -
> -This file is part of Libgcrypt.
> -
> -Libgcrypt is free software; you can redistribute it and/or modify
> -it under the terms of the GNU Lesser General Public License as
> -published by the Free Software Foundation; either version 2.1 of
> -the License, or (at your option) any later version.
> -
> -Libgcrypt is distributed in the hope that it will be useful,
> -but WITHOUT ANY WARRANTY; without even the implied warranty of
> -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> -GNU Lesser General Public License for more details.
> -
> -You should have received a copy of the GNU Lesser General Public
> -License along with this program; if not, see <https://www.gnu.org/licenses/>.
> diff --git a/include/stdlib.h b/include/stdlib.h
> index cae7f7cdf8..db51f4a4f6 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
> libc_hidden_proto (__arc4random_uniform);
> extern void __arc4random_buf_internal (void *buffer, size_t len)
> attribute_hidden;
> -/* Called from the fork function to reinitialize the internal cipher state
> - in child process. */
> -extern void __arc4random_fork_subprocess (void) attribute_hidden;
>
> extern double __strtod_internal (const char *__restrict __nptr,
> char **__restrict __endptr, int __group)
> diff --git a/stdlib/Makefile b/stdlib/Makefile
> index a900962685..f7b25c1981 100644
> --- a/stdlib/Makefile
> +++ b/stdlib/Makefile
> @@ -246,7 +246,6 @@ tests := \
> # tests
>
> tests-internal := \
> - tst-arc4random-chacha20 \
> tst-strtod1i \
> tst-strtod3 \
> tst-strtod4 \
> @@ -256,7 +255,6 @@ tests-internal := \
> # tests-internal
>
> tests-static := \
> - tst-arc4random-chacha20 \
> tst-secure-getenv \
> # tests-static
>
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> index 65547e79aa..80c55cde63 100644
> --- a/stdlib/arc4random.c
> +++ b/stdlib/arc4random.c
> @@ -1,4 +1,4 @@
> -/* Pseudo Random Number Generator based on ChaCha20.
> +/* Pseudo Random Number Generator
> Copyright (C) 2022 Free Software Foundation, Inc.
> This file is part of the GNU C Library.
>
> @@ -16,61 +16,14 @@
> License along with the GNU C Library; if not, see
> <https://www.gnu.org/licenses/>. */
>
> -#include <arc4random.h>
> #include <errno.h>
> #include <not-cancel.h>
> #include <stdio.h>
> #include <stdlib.h>
> +#include <sys/poll.h>
> #include <sys/mman.h>
> #include <sys/param.h>
> #include <sys/random.h>
> -#include <tls-internal.h>
> -
> -/* arc4random keeps two counters: 'have' is the current valid bytes not yet
> - consumed in 'buf' while 'count' is the maximum number of bytes until a
> - reseed.
> -
> - Both the initial seed and reseed try to obtain entropy from the kernel
> - and abort the process if none could be obtained.
> -
> - The state 'buf' improves the usage of the cipher calls, allowing to call
> - optimized implementations (if the architecture provides it) and minimize
> - function call overhead. */
> -
> -#include <chacha20.c>
> -
> -/* Called from the fork function to reset the state. */
> -void
> -__arc4random_fork_subprocess (void)
> -{
> - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
> - if (state != NULL)
> - {
> - explicit_bzero (state, sizeof (*state));
> - /* Force key init. */
> - state->count = -1;
> - }
> -}
> -
> -/* Return the current thread random state or try to create one if there is
> - none available. In the case malloc can not allocate a state, arc4random
> - will try to get entropy with arc4random_getentropy. */
> -static struct arc4random_state_t *
> -arc4random_get_state (void)
> -{
> - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
> - if (state == NULL)
> - {
> - state = malloc (sizeof (struct arc4random_state_t));
> - if (state != NULL)
> - {
> - /* Force key initialization on first call. */
> - state->count = -1;
> - __glibc_tls_internal ()->rand_state = state;
> - }
> - }
> - return state;
> -}
>
> static void
> arc4random_getrandom_failure (void)
> @@ -78,106 +31,70 @@ arc4random_getrandom_failure (void)
> __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
> }
>
> -static void
> -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
> +void
> +__arc4random_buf (void *p, size_t n)
> {
> - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> + static bool have_getrandom = true, seen_initialized = false;
> + int fd;

I think it should reasonable to assume that getrandom syscall will be always
supported and using arc4random in an enviroment with filtered getrandom does
not make much sense. We are trying to avoid add this static syscall checks
where possible, also plain load/store to se the static have_getrandom
is strickly a race-condition, although it should not really matter (we use
relaxed load/store in such optimization (check
sysdeps/unix/sysv/linux/mips/mips64/getdents64.c).

Also, does it make sense to fallback if we build for a kernel that should
always support getrandom?

>
> - /* Mix optional user provided data. */
> - if (rnd != NULL)
> - {
> - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> - for (size_t i = 0; i < m; i++)
> - state->buf[i] ^= rnd[i];
> - }
> -
> - /* Immediately reinit for backtracking resistance. */
> - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
> - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> -}
> -
> -static void
> -arc4random_getentropy (void *rnd, size_t len)
> -{
> - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
> + if (n == 0)
> return;
>
> - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
> - O_RDONLY | O_CLOEXEC));
> - if (fd != -1)
> + for (;;)
> {
> - uint8_t *p = rnd;
> - uint8_t *end = p + len;
> - do
> - {
> - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
> - if (ret <= 0)
> - arc4random_getrandom_failure ();
> - p += ret;
> - }
> - while (p < end);
> + ssize_t l;
>
> - if (__close_nocancel (fd) == 0)
> - return;
> - }
> - arc4random_getrandom_failure ();
> -}
> + if (!have_getrandom)
> + break;
>
> -/* Check if the thread context STATE should be reseed with kernel entropy
> - depending of requested LEN bytes. If there is less than requested,
> - the state is either initialized or reseeded, otherwise the internal
> - counter subtract the requested length. */
> -static void
> -arc4random_check_stir (struct arc4random_state_t *state, size_t len)
> -{
> - if (state->count <= len || state->count == -1)
> - {
> - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> - arc4random_getentropy (rnd, sizeof rnd);
> -
> - if (state->count == -1)
> - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
> - else
> - arc4random_rekey (state, rnd, sizeof rnd);
> -
> - explicit_bzero (rnd, sizeof rnd);
> -
> - /* Invalidate the buf. */
> - state->have = 0;
> - memset (state->buf, 0, sizeof state->buf);
> - state->count = CHACHA20_RESEED_SIZE;
> + l = __getrandom_nocancel (p, n, 0);

Do we need to worry about a potentially uncancellable blocking call here? I guess
using GRND_NONBLOCK does not really help.

> + if (l > 0)
> + {
> + if ((size_t) l == n)

Do we need the cast here?

> + return; /* Done reading, success. */

Minor style issue: use double space before period.

> + p = (uint8_t *) p + l;
> + n -= l;
> + continue; /* Interrupted by a signal; keep going. */
> + }
> + else if (l == 0)
> + arc4random_getrandom_failure (); /* Weird, should never happen. */
> + else if (errno == ENOSYS)
> + {
> + have_getrandom = false;
> + break; /* No syscall, so fallback to /dev/urandom. */
> + }
> + arc4random_getrandom_failure (); /* Unknown error, should never happen. */
> }
> - else
> - state->count -= len;
> -}
>
> -void
> -__arc4random_buf (void *buffer, size_t len)
> -{
> - struct arc4random_state_t *state = arc4random_get_state ();
> - if (__glibc_unlikely (state == NULL))
> + if (!seen_initialized)
> {
> - arc4random_getentropy (buffer, len);
> - return;
> + struct pollfd pfd = { .events = POLLIN };> + pfd.fd = TEMP_FAILURE_RETRY (
> + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> + if (pfd.fd < 0)
> + arc4random_getrandom_failure ();
> + if (__poll (&pfd, 1, -1) < 0)
> + arc4random_getrandom_failure ();

As Florian said we will need a non cancellable poll here. Since you are setting
the timeout as undefined, I think it would be simple to just add a non cancellable
wrapper as:

int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
{
#ifndef __NR_ppoll_time64
# define __NR_ppoll_time64 __NR_ppoll
#endif
return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
}

So we don't need to handle the timeout for 64-bit time_t wrappers.

> + if (__close_nocancel (pfd.fd) < 0)
> + arc4random_getrandom_failure ();
> + seen_initialized = true;

I think we will need to use relaxed atomics, and maybe se the type to int (not sure
if atomic wrappers correctly on bool types on all architectures).

> }
>
> - arc4random_check_stir (state, len);
> - while (len > 0)
> + fd = TEMP_FAILURE_RETRY (
> + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> + if (fd < 0)
> + arc4random_getrandom_failure ();
> + do
> {
> - if (state->have > 0)
> - {
> - size_t m = MIN (len, state->have);
> - uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> - memcpy (buffer, ks, m);
> - explicit_bzero (ks, m);
> - buffer += m;
> - len -= m;
> - state->have -= m;
> - }
> - if (state->have == 0)
> - arc4random_rekey (state, NULL, 0);
> + ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
> + if (l <= 0)
> + arc4random_getrandom_failure ();
> + p = (uint8_t *) p + l;
> + n -= l;
> }
> + while (n);
> + if (__close_nocancel (fd) < 0)
> + arc4random_getrandom_failure ();
> }
> libc_hidden_def (__arc4random_buf)
> weak_alias (__arc4random_buf, arc4random_buf)
> @@ -186,22 +103,7 @@ uint32_t
> __arc4random (void)
> {
> uint32_t r;
> -
> - struct arc4random_state_t *state = arc4random_get_state ();
> - if (__glibc_unlikely (state == NULL))
> - {
> - arc4random_getentropy (&r, sizeof (uint32_t));
> - return r;
> - }
> -
> - arc4random_check_stir (state, sizeof (uint32_t));
> - if (state->have < sizeof (uint32_t))
> - arc4random_rekey (state, NULL, 0);
> - uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> - memcpy (&r, ks, sizeof (uint32_t));
> - memset (ks, 0, sizeof (uint32_t));
> - state->have -= sizeof (uint32_t);
> -
> + __arc4random_buf (&r, sizeof (r));
> return r;
> }
> libc_hidden_def (__arc4random)
> diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
> deleted file mode 100644
> index cd39389c19..0000000000
> --- a/stdlib/arc4random.h
> +++ /dev/null
> @@ -1,48 +0,0 @@
> -/* Arc4random definition used on TLS.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#ifndef _CHACHA20_H
> -#define _CHACHA20_H
> -
> -#include <stddef.h>
> -#include <stdint.h>
> -
> -/* Internal ChaCha20 state. */
> -#define CHACHA20_STATE_LEN 16
> -#define CHACHA20_BLOCK_SIZE 64
> -
> -/* Maximum number bytes until reseed (16 MB). */
> -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024)
> -
> -/* Internal arc4random buffer, used on each feedback step so offer some
> - backtracking protection and to allow better used of vectorized
> - chacha20 implementations. */
> -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE)
> -
> -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
> - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
> -
> -struct arc4random_state_t
> -{
> - uint32_t ctx[CHACHA20_STATE_LEN];
> - size_t have;
> - size_t count;
> - uint8_t buf[CHACHA20_BUFSIZE];
> -};
> -
> -#endif
> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
> deleted file mode 100644
> index 2745a81315..0000000000
> --- a/stdlib/chacha20.c
> +++ /dev/null
> @@ -1,191 +0,0 @@
> -/* Generic ChaCha20 implementation (used on arc4random).
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <array_length.h>
> -#include <endian.h>
> -#include <stddef.h>
> -#include <stdint.h>
> -#include <string.h>
> -
> -/* 32-bit stream position, then 96-bit nonce. */
> -#define CHACHA20_IV_SIZE 16
> -#define CHACHA20_KEY_SIZE 32
> -
> -#define CHACHA20_STATE_LEN 16
> -
> -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
> - XOR of the keystream with the plaintext because the plaintext is a
> - stream of zeros. */
> -
> -enum chacha20_constants
> -{
> - CHACHA20_CONSTANT_EXPA = 0x61707865U,
> - CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
> - CHACHA20_CONSTANT_2_BY = 0x79622d32U,
> - CHACHA20_CONSTANT_TE_K = 0x6b206574U
> -};
> -
> -static inline uint32_t
> -read_unaligned_32 (const uint8_t *p)
> -{
> - uint32_t r;
> - memcpy (&r, p, sizeof (r));
> - return r;
> -}
> -
> -static inline void
> -write_unaligned_32 (uint8_t *p, uint32_t v)
> -{
> - memcpy (p, &v, sizeof (v));
> -}
> -
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
> -# define set_state(v) __builtin_bswap32 ((v))
> -#else
> -# define read_unaligned_le32(p) read_unaligned_32 ((p))
> -# define set_state(v) (v)
> -#endif
> -
> -static inline void
> -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
> -{
> - state[0] = CHACHA20_CONSTANT_EXPA;
> - state[1] = CHACHA20_CONSTANT_ND_3;
> - state[2] = CHACHA20_CONSTANT_2_BY;
> - state[3] = CHACHA20_CONSTANT_TE_K;
> -
> - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
> - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
> - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
> - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
> - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
> - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
> - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
> - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
> -
> - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
> - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
> - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
> - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
> -}
> -
> -static inline uint32_t
> -rotl32 (unsigned int shift, uint32_t word)
> -{
> - return (word << (shift & 31)) | (word >> ((-shift) & 31));
> -}
> -
> -static void
> -state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
> -{
> -#ifdef CHACHA20_XOR_FINAL
> - v ^= read_unaligned_32 (src);
> -#endif
> - write_unaligned_32 (dst, v);
> -}
> -
> -static inline void
> -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
> -{
> - uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
> - uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
> -
> - x0 = state[0];
> - x1 = state[1];
> - x2 = state[2];
> - x3 = state[3];
> - x4 = state[4];
> - x5 = state[5];
> - x6 = state[6];
> - x7 = state[7];
> - x8 = state[8];
> - x9 = state[9];
> - x10 = state[10];
> - x11 = state[11];
> - x12 = state[12];
> - x13 = state[13];
> - x14 = state[14];
> - x15 = state[15];
> -
> - for (int i = 0; i < 20; i += 2)
> - {
> -#define QROUND(_x0, _x1, _x2, _x3) \
> - do { \
> - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \
> - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \
> - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \
> - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \
> - } while(0)
> -
> - QROUND (x0, x4, x8, x12);
> - QROUND (x1, x5, x9, x13);
> - QROUND (x2, x6, x10, x14);
> - QROUND (x3, x7, x11, x15);
> -
> - QROUND (x0, x5, x10, x15);
> - QROUND (x1, x6, x11, x12);
> - QROUND (x2, x7, x8, x13);
> - QROUND (x3, x4, x9, x14);
> - }
> -
> - state_final (&src[0], &dst[0], set_state (x0 + state[0]));
> - state_final (&src[4], &dst[4], set_state (x1 + state[1]));
> - state_final (&src[8], &dst[8], set_state (x2 + state[2]));
> - state_final (&src[12], &dst[12], set_state (x3 + state[3]));
> - state_final (&src[16], &dst[16], set_state (x4 + state[4]));
> - state_final (&src[20], &dst[20], set_state (x5 + state[5]));
> - state_final (&src[24], &dst[24], set_state (x6 + state[6]));
> - state_final (&src[28], &dst[28], set_state (x7 + state[7]));
> - state_final (&src[32], &dst[32], set_state (x8 + state[8]));
> - state_final (&src[36], &dst[36], set_state (x9 + state[9]));
> - state_final (&src[40], &dst[40], set_state (x10 + state[10]));
> - state_final (&src[44], &dst[44], set_state (x11 + state[11]));
> - state_final (&src[48], &dst[48], set_state (x12 + state[12]));
> - state_final (&src[52], &dst[52], set_state (x13 + state[13]));
> - state_final (&src[56], &dst[56], set_state (x14 + state[14]));
> - state_final (&src[60], &dst[60], set_state (x15 + state[15]));
> -
> - state[12]++;
> -}
> -
> -static void
> -__attribute_maybe_unused__
> -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
> - size_t bytes)
> -{
> - while (bytes >= CHACHA20_BLOCK_SIZE)
> - {
> - chacha20_block (state, dst, src);
> -
> - bytes -= CHACHA20_BLOCK_SIZE;
> - dst += CHACHA20_BLOCK_SIZE;
> - src += CHACHA20_BLOCK_SIZE;
> - }
> -
> - if (__glibc_unlikely (bytes != 0))
> - {
> - uint8_t stream[CHACHA20_BLOCK_SIZE];
> - chacha20_block (state, stream, src);
> - memcpy (dst, stream, bytes);
> - explicit_bzero (stream, sizeof stream);
> - }
> -}
> -
> -/* Get the architecture optimized version. */
> -#include <chacha20_arch.h>
> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
> deleted file mode 100644
> index 45ba54920d..0000000000
> --- a/stdlib/tst-arc4random-chacha20.c
> +++ /dev/null
> @@ -1,167 +0,0 @@
> -/* Basic tests for chacha20 cypher used in arc4random.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <arc4random.h>
> -#include <support/check.h>
> -#include <sys/cdefs.h>
> -
> -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
> - actual does. */
> -#include <chacha20.c>
> -
> -static int
> -do_test (void)
> -{
> - const uint8_t key[CHACHA20_KEY_SIZE] =
> - {
> - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> - };
> - const uint8_t iv[CHACHA20_IV_SIZE] =
> - {
> - 0x0, 0x0, 0x0, 0x0,
> - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> - };
> - const uint8_t expected1[CHACHA20_BUFSIZE] =
> - {
> - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
> - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
> - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
> - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
> - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
> - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
> - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
> - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
> - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
> - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
> - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
> - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
> - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
> - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
> - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
> - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
> - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
> - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
> - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
> - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
> - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
> - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
> - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
> - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
> - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
> - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
> - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
> - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
> - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
> - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
> - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
> - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
> - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
> - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
> - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
> - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
> - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
> - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
> - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
> - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
> - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
> - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
> - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
> - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
> - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
> - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
> - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
> - };
> -
> - const uint8_t expected2[CHACHA20_BUFSIZE] =
> - {
> - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
> - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
> - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
> - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
> - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
> - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
> - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
> - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
> - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
> - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
> - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
> - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
> - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
> - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
> - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
> - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
> - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
> - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
> - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
> - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
> - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
> - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
> - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
> - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
> - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
> - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
> - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
> - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
> - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
> - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
> - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
> - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
> - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
> - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
> - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
> - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
> - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
> - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
> - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
> - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
> - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
> - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
> - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
> - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
> - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
> - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
> - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
> - };
> -
> - /* Check with the expected internal arc4random keystream buffer. Some
> - architecture optimizations expects a buffer with a minimum size which
> - is a multiple of then ChaCha20 blocksize, so they might not be prepared
> - to handle smaller buffers. */
> -
> - uint8_t output[CHACHA20_BUFSIZE];
> -
> - uint32_t state[CHACHA20_STATE_LEN];
> - chacha20_init (state, key, iv);
> -
> - /* Check with the initial state. */
> - uint8_t input[CHACHA20_BUFSIZE] = { 0 };
> -
> - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
> - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
> -
> - /* And on the next round. */
> - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
> - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
> -
> - return 0;
> -}
> -
> -#include <support/test-driver.c>
> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
> index 7dfd1b62dd..17fb1c5b72 100644
> --- a/sysdeps/aarch64/Makefile
> +++ b/sysdeps/aarch64/Makefile
> @@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
> gen-as-const-headers += tlsdesc.sym
> endif
>
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-aarch64
> -endif
> -
> ifeq ($(subdir),gmon)
> CFLAGS-mcount.c += -mgeneral-regs-only
> endif
> diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
> deleted file mode 100644
> index cce5291c5c..0000000000
> --- a/sysdeps/aarch64/chacha20-aarch64.S
> +++ /dev/null
> @@ -1,314 +0,0 @@
> -/* Optimized AArch64 implementation of ChaCha20 cipher.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> -
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -/* Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
> -
> - This file is part of Libgcrypt.
> -
> - Libgcrypt is free software; you can redistribute it and/or modify
> - it under the terms of the GNU Lesser General Public License as
> - published by the Free Software Foundation; either version 2.1 of
> - the License, or (at your option) any later version.
> -
> - Libgcrypt is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> - GNU Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -/* Based on D. J. Bernstein reference implementation at
> - http://cr.yp.to/chacha.html:
> -
> - chacha-regs.c version 20080118
> - D. J. Bernstein
> - Public domain. */
> -
> -#include <sysdep.h>
> -
> -/* Only LE is supported. */
> -#ifdef __AARCH64EL__
> -
> -#define GET_DATA_POINTER(reg, name) \
> - adrp reg, name ; \
> - add reg, reg, :lo12:name
> -
> -/* 'ret' instruction replacement for straight-line speculation mitigation */
> -#define ret_spec_stop \
> - ret; dsb sy; isb;
> -
> -.cpu generic+simd
> -
> -.text
> -
> -/* register macros */
> -#define INPUT x0
> -#define DST x1
> -#define SRC x2
> -#define NBLKS x3
> -#define ROUND x4
> -#define INPUT_CTR x5
> -#define INPUT_POS x6
> -#define CTR x7
> -
> -/* vector registers */
> -#define X0 v16
> -#define X4 v17
> -#define X8 v18
> -#define X12 v19
> -
> -#define X1 v20
> -#define X5 v21
> -
> -#define X9 v22
> -#define X13 v23
> -#define X2 v24
> -#define X6 v25
> -
> -#define X3 v26
> -#define X7 v27
> -#define X11 v28
> -#define X15 v29
> -
> -#define X10 v30
> -#define X14 v31
> -
> -#define VCTR v0
> -#define VTMP0 v1
> -#define VTMP1 v2
> -#define VTMP2 v3
> -#define VTMP3 v4
> -#define X12_TMP v5
> -#define X13_TMP v6
> -#define ROT8 v7
> -
> -/**********************************************************************
> - helper macros
> - **********************************************************************/
> -
> -#define _(...) __VA_ARGS__
> -
> -#define vpunpckldq(s1, s2, dst) \
> - zip1 dst.4s, s2.4s, s1.4s;
> -
> -#define vpunpckhdq(s1, s2, dst) \
> - zip2 dst.4s, s2.4s, s1.4s;
> -
> -#define vpunpcklqdq(s1, s2, dst) \
> - zip1 dst.2d, s2.2d, s1.2d;
> -
> -#define vpunpckhqdq(s1, s2, dst) \
> - zip2 dst.2d, s2.2d, s1.2d;
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
> - vpunpckhdq(x1, x0, t2); \
> - vpunpckldq(x1, x0, x0); \
> - \
> - vpunpckldq(x3, x2, t1); \
> - vpunpckhdq(x3, x2, x2); \
> - \
> - vpunpckhqdq(t1, x0, x1); \
> - vpunpcklqdq(t1, x0, x0); \
> - \
> - vpunpckhqdq(x2, t2, x3); \
> - vpunpcklqdq(x2, t2, x2);
> -
> -/**********************************************************************
> - 4-way chacha20
> - **********************************************************************/
> -
> -#define XOR(d,s1,s2) \
> - eor d.16b, s2.16b, s1.16b;
> -
> -#define PLUS(ds,s) \
> - add ds.4s, ds.4s, s.4s;
> -
> -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
> - shl dst1.4s, src1.4s, #(c); \
> - shl dst2.4s, src2.4s, #(c); \
> - shl dst3.4s, src3.4s, #(c); \
> - shl dst4.4s, src4.4s, #(c); \
> - sri dst1.4s, src1.4s, #(32 - (c)); \
> - sri dst2.4s, src2.4s, #(32 - (c)); \
> - sri dst3.4s, src3.4s, #(32 - (c)); \
> - sri dst4.4s, src4.4s, #(32 - (c));
> -
> -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
> - tbl dst1.16b, {src1.16b}, ROT8.16b; \
> - tbl dst2.16b, {src2.16b}, ROT8.16b; \
> - tbl dst3.16b, {src3.16b}, ROT8.16b; \
> - tbl dst4.16b, {src4.16b}, ROT8.16b;
> -
> -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
> - rev32 dst1.8h, src1.8h; \
> - rev32 dst2.8h, src2.8h; \
> - rev32 dst3.8h, src3.8h; \
> - rev32 dst4.8h, src4.8h;
> -
> -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
> - PLUS(a1,b1); PLUS(a2,b2); \
> - PLUS(a3,b3); PLUS(a4,b4); \
> - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \
> - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \
> - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \
> - PLUS(c1,d1); PLUS(c2,d2); \
> - PLUS(c3,d3); PLUS(c4,d4); \
> - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \
> - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \
> - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \
> - PLUS(a1,b1); PLUS(a2,b2); \
> - PLUS(a3,b3); PLUS(a4,b4); \
> - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \
> - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \
> - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \
> - PLUS(c1,d1); PLUS(c2,d2); \
> - PLUS(c3,d3); PLUS(c4,d4); \
> - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \
> - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \
> - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \
> -
> -.align 4
> -L(__chacha20_blocks4_data_inc_counter):
> - .long 0,1,2,3
> -
> -.align 4
> -L(__chacha20_blocks4_data_rot8):
> - .byte 3,0,1,2
> - .byte 7,4,5,6
> - .byte 11,8,9,10
> - .byte 15,12,13,14
> -
> -.hidden __chacha20_neon_blocks4
> -ENTRY (__chacha20_neon_blocks4)
> - /* input:
> - * x0: input
> - * x1: dst
> - * x2: src
> - * x3: nblks (multiple of 4)
> - */
> -
> - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
> - add INPUT_CTR, INPUT, #(12*4);
> - ld1 {ROT8.16b}, [CTR];
> - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
> - mov INPUT_POS, INPUT;
> - ld1 {VCTR.16b}, [CTR];
> -
> -L(loop4):
> - /* Construct counter vectors X12 and X13 */
> -
> - ld1 {X15.16b}, [INPUT_CTR];
> - mov ROUND, #20;
> - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
> -
> - dup X12.4s, X15.s[0];
> - dup X13.4s, X15.s[1];
> - ldr CTR, [INPUT_CTR];
> - add X12.4s, X12.4s, VCTR.4s;
> - dup X0.4s, VTMP1.s[0];
> - dup X1.4s, VTMP1.s[1];
> - dup X2.4s, VTMP1.s[2];
> - dup X3.4s, VTMP1.s[3];
> - dup X14.4s, X15.s[2];
> - cmhi VTMP0.4s, VCTR.4s, X12.4s;
> - dup X15.4s, X15.s[3];
> - add CTR, CTR, #4; /* Update counter */
> - dup X4.4s, VTMP2.s[0];
> - dup X5.4s, VTMP2.s[1];
> - dup X6.4s, VTMP2.s[2];
> - dup X7.4s, VTMP2.s[3];
> - sub X13.4s, X13.4s, VTMP0.4s;
> - dup X8.4s, VTMP3.s[0];
> - dup X9.4s, VTMP3.s[1];
> - dup X10.4s, VTMP3.s[2];
> - dup X11.4s, VTMP3.s[3];
> - mov X12_TMP.16b, X12.16b;
> - mov X13_TMP.16b, X13.16b;
> - str CTR, [INPUT_CTR];
> -
> -L(round2):
> - subs ROUND, ROUND, #2
> - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13,
> - X2, X6, X10, X14, X3, X7, X11, X15,
> - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
> - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12,
> - X2, X7, X8, X13, X3, X4, X9, X14,
> - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
> - b.ne L(round2);
> -
> - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
> -
> - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */
> - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */
> -
> - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
> - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
> - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
> - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
> - PLUS(X0, VTMP2);
> - PLUS(X1, VTMP3);
> - PLUS(X2, X12_TMP);
> - PLUS(X3, X13_TMP);
> -
> - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
> - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
> - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
> - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
> - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
> - mov INPUT_POS, INPUT;
> - PLUS(X4, VTMP2);
> - PLUS(X5, VTMP3);
> - PLUS(X6, X12_TMP);
> - PLUS(X7, X13_TMP);
> -
> - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
> - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
> - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
> - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
> - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
> - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
> - PLUS(X8, VTMP2);
> - PLUS(X9, VTMP3);
> - PLUS(X10, X12_TMP);
> - PLUS(X11, X13_TMP);
> - PLUS(X14, VTMP0);
> - PLUS(X15, VTMP1);
> -
> - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
> - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
> - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
> - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
> -
> - subs NBLKS, NBLKS, #4;
> -
> - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
> - st1 {X1.16b,X5.16b}, [DST], #32;
> - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
> - st1 {X10.16b,X14.16b}, [DST], #32;
> - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
> -
> - b.ne L(loop4);
> -
> - ret_spec_stop
> -END (__chacha20_neon_blocks4)
> -
> -#endif
> diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
> deleted file mode 100644
> index 37dbb917f1..0000000000
> --- a/sysdeps/aarch64/chacha20_arch.h
> +++ /dev/null
> @@ -1,40 +0,0 @@
> -/* Chacha20 implementation, used on arc4random.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <ldsodefs.h>
> -#include <stdbool.h>
> -
> -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t nblks)
> - attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> - size_t bytes)
> -{
> - _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> - "CHACHA20_BUFSIZE not multiple of 4");
> - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
> - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
> -#ifdef __AARCH64EL__
> - __chacha20_neon_blocks4 (state, dst, src,
> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#else
> - chacha20_crypt_generic (state, dst, src, bytes);
> -#endif
> -}
> diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
> deleted file mode 100644
> index 1b4559ccbc..0000000000
> --- a/sysdeps/generic/chacha20_arch.h
> +++ /dev/null
> @@ -1,24 +0,0 @@
> -/* Chacha20 implementation, generic interface for encrypt.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> - size_t bytes)
> -{
> - chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
> index 8a0f37d509..b32b31b5a9 100644
> --- a/sysdeps/generic/tls-internal.c
> +++ b/sysdeps/generic/tls-internal.c
> @@ -16,7 +16,6 @@
> License along with the GNU C Library; if not, see
> <https://www.gnu.org/licenses/>. */
>
> -#include <stdlib/arc4random.h>
> #include <string.h>
> #include <tls-internal.h>
>
> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
> {
> free (__tls_internal.strsignal_buf);
> free (__tls_internal.strerror_l_buf);
> -
> - if (__tls_internal.rand_state != NULL)
> - {
> - /* Clear any lingering random state prior so if the thread stack is
> - cached it won't leak any data. */
> - explicit_bzero (__tls_internal.rand_state,
> - sizeof (*__tls_internal.rand_state));
> - free (__tls_internal.rand_state);
> - }
> }
> diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
> index 667068c8cf..e60b86fab1 100644
> --- a/sysdeps/mach/hurd/_Fork.c
> +++ b/sysdeps/mach/hurd/_Fork.c
> @@ -662,8 +662,6 @@ retry:
> _hurd_malloc_fork_child ();
> call_function_static_weak (__malloc_fork_unlock_child);
>
> - call_function_static_weak (__arc4random_fork_subprocess);
> -
> /* Run things that want to run in the child task to set up. */
> RUN_HOOK (_hurd_fork_child_hook, ());
>
> diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
> index 7dc02569f6..dd568992e2 100644
> --- a/sysdeps/nptl/_Fork.c
> +++ b/sysdeps/nptl/_Fork.c
> @@ -43,8 +43,6 @@ _Fork (void)
> self->robust_head.list = &self->robust_head;
> INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
> sizeof (struct robust_list_head));
> -
> - call_function_static_weak (__arc4random_fork_subprocess);
> }
> return pid;
> }
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> deleted file mode 100644
> index 8c75165f7f..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> +++ /dev/null
> @@ -1,4 +0,0 @@
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-ppc
> -CFLAGS-chacha20-ppc.c += -mcpu=power8
> -endif
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> deleted file mode 100644
> index cf9e735326..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> +++ /dev/null
> @@ -1 +0,0 @@
> -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> deleted file mode 100644
> index 08494dc045..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -/* PowerPC optimization for ChaCha20.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -
> -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t nblks)
> - attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t bytes)
> -{
> - _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> - "CHACHA20_BUFSIZE not multiple of 4");
> - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
> - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
> -
> - unsigned long int hwcap = GLRO(dl_hwcap);
> - unsigned long int hwcap2 = GLRO(dl_hwcap2);
> - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
> - __chacha20_power8_blocks4 (state, dst, src,
> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> - else
> - chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
> index abb0aa3f11..71a59529f3 100644
> --- a/sysdeps/powerpc/powerpc64/power8/Makefile
> +++ b/sysdeps/powerpc/powerpc64/power8/Makefile
> @@ -1,8 +1,3 @@
> ifeq ($(subdir),string)
> sysdep_routines += strcasestr-ppc64
> endif
> -
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-ppc
> -CFLAGS-chacha20-ppc.c += -mcpu=power8
> -endif
> diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
> deleted file mode 100644
> index 0bbdcb9363..0000000000
> --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
> +++ /dev/null
> @@ -1,256 +0,0 @@
> -/* Optimized PowerPC implementation of ChaCha20 cipher.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> -
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
> - Copyright (C) 2019 Jussi Kivilinna <[email protected]>
> -
> - This file is part of Libgcrypt.
> -
> - Libgcrypt is free software; you can redistribute it and/or modify
> - it under the terms of the GNU Lesser General Public License as
> - published by the Free Software Foundation; either version 2.1 of
> - the License, or (at your option) any later version.
> -
> - Libgcrypt is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> - GNU Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -#include <altivec.h>
> -#include <endian.h>
> -#include <stddef.h>
> -#include <stdint.h>
> -#include <sys/cdefs.h>
> -
> -typedef vector unsigned char vector16x_u8;
> -typedef vector unsigned int vector4x_u32;
> -typedef vector unsigned long long vector2x_u64;
> -
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -static const vector16x_u8 le_bswap_const =
> - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
> -#endif
> -
> -static inline vector4x_u32
> -vec_rol_elems (vector4x_u32 v, unsigned int idx)
> -{
> -#if __BYTE_ORDER != __BIG_ENDIAN
> - return vec_sld (v, v, (16 - (4 * idx)) & 15);
> -#else
> - return vec_sld (v, v, (4 * idx) & 15);
> -#endif
> -}
> -
> -static inline vector4x_u32
> -vec_load_le (unsigned long offset, const unsigned char *ptr)
> -{
> - vector4x_u32 vec;
> - vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
> -#if __BYTE_ORDER == __BIG_ENDIAN
> - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
> - le_bswap_const);
> -#endif
> - return vec;
> -}
> -
> -static inline void
> -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
> -{
> -#if __BYTE_ORDER == __BIG_ENDIAN
> - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
> - le_bswap_const);
> -#endif
> - vec_vsx_st (vec, offset, (uint32_t *)ptr);
> -}
> -
> -
> -static inline vector4x_u32
> -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
> -{
> -#if __BYTE_ORDER == __BIG_ENDIAN
> - static const vector16x_u8 swap32 =
> - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
> - vector2x_u64 vec, add, sum;
> -
> - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
> - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
> - sum = vec + add;
> - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
> -#else
> - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
> -#endif
> -}
> -
> -/**********************************************************************
> - 4-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE(v1,rolv) \
> - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
> -
> -#define PLUS(ds,s) \
> - ((ds) += (s))
> -
> -#define XOR(ds,s) \
> - ((ds) ^= (s))
> -
> -#define ADD_U64(v,a) \
> - (v = vec_add_ctr_u64(v, a))
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0, x1, x2, x3) ({ \
> - vector4x_u32 t1 = vec_mergeh(x0, x2); \
> - vector4x_u32 t2 = vec_mergel(x0, x2); \
> - vector4x_u32 t3 = vec_mergeh(x1, x3); \
> - x3 = vec_mergel(x1, x3); \
> - x0 = vec_mergeh(t1, t3); \
> - x1 = vec_mergel(t1, t3); \
> - x2 = vec_mergeh(t2, x3); \
> - x3 = vec_mergel(t2, x3); \
> - })
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \
> - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
> - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \
> - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
> - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \
> - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
> - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \
> - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
> - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
> -
> -unsigned int attribute_hidden
> -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
> - size_t nblks)
> -{
> - vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
> - vector4x_u32 counter_4 = { 4, 0, 0, 0 };
> - vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
> - vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
> - vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
> - vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
> - vector4x_u32 state0, state1, state2, state3;
> - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
> - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
> - vector4x_u32 tmp;
> - int i;
> -
> - /* Force preload of constants to vector registers. */
> - __asm__ ("": "+v" (counters_0123) :: "memory");
> - __asm__ ("": "+v" (counter_4) :: "memory");
> - __asm__ ("": "+v" (rotate_16) :: "memory");
> - __asm__ ("": "+v" (rotate_12) :: "memory");
> - __asm__ ("": "+v" (rotate_8) :: "memory");
> - __asm__ ("": "+v" (rotate_7) :: "memory");
> -
> - state0 = vec_vsx_ld (0 * 16, state);
> - state1 = vec_vsx_ld (1 * 16, state);
> - state2 = vec_vsx_ld (2 * 16, state);
> - state3 = vec_vsx_ld (3 * 16, state);
> -
> - do
> - {
> - v0 = vec_splat (state0, 0);
> - v1 = vec_splat (state0, 1);
> - v2 = vec_splat (state0, 2);
> - v3 = vec_splat (state0, 3);
> - v4 = vec_splat (state1, 0);
> - v5 = vec_splat (state1, 1);
> - v6 = vec_splat (state1, 2);
> - v7 = vec_splat (state1, 3);
> - v8 = vec_splat (state2, 0);
> - v9 = vec_splat (state2, 1);
> - v10 = vec_splat (state2, 2);
> - v11 = vec_splat (state2, 3);
> - v12 = vec_splat (state3, 0);
> - v13 = vec_splat (state3, 1);
> - v14 = vec_splat (state3, 2);
> - v15 = vec_splat (state3, 3);
> -
> - v12 += counters_0123;
> - v13 -= vec_cmplt (v12, counters_0123);
> -
> - for (i = 20; i > 0; i -= 2)
> - {
> - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13)
> - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15)
> - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12)
> - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14)
> - }
> -
> - v0 += vec_splat (state0, 0);
> - v1 += vec_splat (state0, 1);
> - v2 += vec_splat (state0, 2);
> - v3 += vec_splat (state0, 3);
> - v4 += vec_splat (state1, 0);
> - v5 += vec_splat (state1, 1);
> - v6 += vec_splat (state1, 2);
> - v7 += vec_splat (state1, 3);
> - v8 += vec_splat (state2, 0);
> - v9 += vec_splat (state2, 1);
> - v10 += vec_splat (state2, 2);
> - v11 += vec_splat (state2, 3);
> - tmp = vec_splat( state3, 0);
> - tmp += counters_0123;
> - v12 += tmp;
> - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
> - v14 += vec_splat (state3, 2);
> - v15 += vec_splat (state3, 3);
> - ADD_U64 (state3, counter_4);
> -
> - transpose_4x4 (v0, v1, v2, v3);
> - transpose_4x4 (v4, v5, v6, v7);
> - transpose_4x4 (v8, v9, v10, v11);
> - transpose_4x4 (v12, v13, v14, v15);
> -
> - vec_store_le (v0, (64 * 0 + 16 * 0), dst);
> - vec_store_le (v1, (64 * 1 + 16 * 0), dst);
> - vec_store_le (v2, (64 * 2 + 16 * 0), dst);
> - vec_store_le (v3, (64 * 3 + 16 * 0), dst);
> -
> - vec_store_le (v4, (64 * 0 + 16 * 1), dst);
> - vec_store_le (v5, (64 * 1 + 16 * 1), dst);
> - vec_store_le (v6, (64 * 2 + 16 * 1), dst);
> - vec_store_le (v7, (64 * 3 + 16 * 1), dst);
> -
> - vec_store_le (v8, (64 * 0 + 16 * 2), dst);
> - vec_store_le (v9, (64 * 1 + 16 * 2), dst);
> - vec_store_le (v10, (64 * 2 + 16 * 2), dst);
> - vec_store_le (v11, (64 * 3 + 16 * 2), dst);
> -
> - vec_store_le (v12, (64 * 0 + 16 * 3), dst);
> - vec_store_le (v13, (64 * 1 + 16 * 3), dst);
> - vec_store_le (v14, (64 * 2 + 16 * 3), dst);
> - vec_store_le (v15, (64 * 3 + 16 * 3), dst);
> -
> - src += 4*64;
> - dst += 4*64;
> -
> - nblks -= 4;
> - }
> - while (nblks);
> -
> - vec_vsx_st (state3, 3 * 16, state);
> -
> - return 0;
> -}
> diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
> deleted file mode 100644
> index ded06762b6..0000000000
> --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
> +++ /dev/null
> @@ -1,37 +0,0 @@
> -/* PowerPC optimization for ChaCha20.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -
> -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t nblks)
> - attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t bytes)
> -{
> - _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> - "CHACHA20_BUFSIZE not multiple of 4");
> - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
> - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
> -
> - __chacha20_power8_blocks4 (state, dst, src,
> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -}
> diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
> index 96c110f490..66ed844e68 100644
> --- a/sysdeps/s390/s390-64/Makefile
> +++ b/sysdeps/s390/s390-64/Makefile
> @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
> endif
>
> endif # $(subdir) == elf
> -
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += \
> - chacha20-s390x \
> - # sysdep_routines
> -endif
> diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
> deleted file mode 100644
> index e38504d370..0000000000
> --- a/sysdeps/s390/s390-64/chacha20-s390x.S
> +++ /dev/null
> @@ -1,573 +0,0 @@
> -/* Optimized s390x implementation of ChaCha20 cipher.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher
> -
> - Copyright (C) 2020 Jussi Kivilinna <[email protected]>
> -
> - This file is part of Libgcrypt.
> -
> - Libgcrypt is free software; you can redistribute it and/or modify
> - it under the terms of the GNU Lesser General Public License as
> - published by the Free Software Foundation; either version 2.1 of
> - the License, or (at your option) any later version.
> -
> - Libgcrypt is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> - GNU Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -#include <sysdep.h>
> -
> -#ifdef HAVE_S390_VX_ASM_SUPPORT
> -
> -/* CFA expressions are used for pointing CFA and registers to
> - * SP relative offsets. */
> -# define DW_REGNO_SP 15
> -
> -/* Fixed length encoding used for integers for now. */
> -# define DW_SLEB128_7BIT(value) \
> - 0x00|((value) & 0x7f)
> -# define DW_SLEB128_28BIT(value) \
> - 0x80|((value)&0x7f), \
> - 0x80|(((value)>>7)&0x7f), \
> - 0x80|(((value)>>14)&0x7f), \
> - 0x00|(((value)>>21)&0x7f)
> -
> -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
> - .cfi_escape \
> - 0x0f, /* DW_CFA_def_cfa_expression */ \
> - DW_SLEB128_7BIT(11), /* length */ \
> - 0x7f, /* DW_OP_breg15, rsp + constant */ \
> - DW_SLEB128_28BIT(rsp_offs), \
> - 0x06, /* DW_OP_deref */ \
> - 0x23, /* DW_OP_plus_constu */ \
> - DW_SLEB128_28BIT((cfa_depth)+160)
> -
> -.machine "z13+vx"
> -.text
> -
> -.balign 16
> -.Lconsts:
> -.Lwordswap:
> - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
> -.Lbswap128:
> - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
> -.Lbswap32:
> - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
> -.Lone:
> - .long 0, 0, 0, 1
> -.Ladd_counter_0123:
> - .long 0, 1, 2, 3
> -.Ladd_counter_4567:
> - .long 4, 5, 6, 7
> -
> -/* register macros */
> -#define INPUT %r2
> -#define DST %r3
> -#define SRC %r4
> -#define NBLKS %r0
> -#define ROUND %r1
> -
> -/* stack structure */
> -
> -#define STACK_FRAME_STD (8 * 16 + 8 * 4)
> -#define STACK_FRAME_F8_F15 (8 * 8)
> -#define STACK_FRAME_Y0_Y15 (16 * 16)
> -#define STACK_FRAME_CTR (4 * 16)
> -#define STACK_FRAME_PARAMS (6 * 8)
> -
> -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
> - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
> - STACK_FRAME_PARAMS)
> -
> -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15)
> -#define STACK_F9 (STACK_F8 + 8)
> -#define STACK_F10 (STACK_F9 + 8)
> -#define STACK_F11 (STACK_F10 + 8)
> -#define STACK_F12 (STACK_F11 + 8)
> -#define STACK_F13 (STACK_F12 + 8)
> -#define STACK_F14 (STACK_F13 + 8)
> -#define STACK_F15 (STACK_F14 + 8)
> -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
> -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR)
> -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS)
> -#define STACK_DST (STACK_INPUT + 8)
> -#define STACK_SRC (STACK_DST + 8)
> -#define STACK_NBLKS (STACK_SRC + 8)
> -#define STACK_POCTX (STACK_NBLKS + 8)
> -#define STACK_POSRC (STACK_POCTX + 8)
> -
> -#define STACK_G0_H3 STACK_Y0_Y15
> -
> -/* vector registers */
> -#define A0 %v0
> -#define A1 %v1
> -#define A2 %v2
> -#define A3 %v3
> -
> -#define B0 %v4
> -#define B1 %v5
> -#define B2 %v6
> -#define B3 %v7
> -
> -#define C0 %v8
> -#define C1 %v9
> -#define C2 %v10
> -#define C3 %v11
> -
> -#define D0 %v12
> -#define D1 %v13
> -#define D2 %v14
> -#define D3 %v15
> -
> -#define E0 %v16
> -#define E1 %v17
> -#define E2 %v18
> -#define E3 %v19
> -
> -#define F0 %v20
> -#define F1 %v21
> -#define F2 %v22
> -#define F3 %v23
> -
> -#define G0 %v24
> -#define G1 %v25
> -#define G2 %v26
> -#define G3 %v27
> -
> -#define H0 %v28
> -#define H1 %v29
> -#define H2 %v30
> -#define H3 %v31
> -
> -#define IO0 E0
> -#define IO1 E1
> -#define IO2 E2
> -#define IO3 E3
> -#define IO4 F0
> -#define IO5 F1
> -#define IO6 F2
> -#define IO7 F3
> -
> -#define S0 G0
> -#define S1 G1
> -#define S2 G2
> -#define S3 G3
> -
> -#define TMP0 H0
> -#define TMP1 H1
> -#define TMP2 H2
> -#define TMP3 H3
> -
> -#define X0 A0
> -#define X1 A1
> -#define X2 A2
> -#define X3 A3
> -#define X4 B0
> -#define X5 B1
> -#define X6 B2
> -#define X7 B3
> -#define X8 C0
> -#define X9 C1
> -#define X10 C2
> -#define X11 C3
> -#define X12 D0
> -#define X13 D1
> -#define X14 D2
> -#define X15 D3
> -
> -#define Y0 E0
> -#define Y1 E1
> -#define Y2 E2
> -#define Y3 E3
> -#define Y4 F0
> -#define Y5 F1
> -#define Y6 F2
> -#define Y7 F3
> -#define Y8 G0
> -#define Y9 G1
> -#define Y10 G2
> -#define Y11 G3
> -#define Y12 H0
> -#define Y13 H1
> -#define Y14 H2
> -#define Y15 H3
> -
> -/**********************************************************************
> - helper macros
> - **********************************************************************/
> -
> -#define _ /*_*/
> -
> -#define START_STACK(last_r) \
> - lgr %r0, %r15; \
> - lghi %r1, ~15; \
> - stmg %r6, last_r, 6 * 8(%r15); \
> - aghi %r0, -STACK_MAX; \
> - ngr %r0, %r1; \
> - lgr %r1, %r15; \
> - cfi_def_cfa_register(1); \
> - lgr %r15, %r0; \
> - stg %r1, 0(%r15); \
> - cfi_cfa_on_stack(0, 0); \
> - std %f8, STACK_F8(%r15); \
> - std %f9, STACK_F9(%r15); \
> - std %f10, STACK_F10(%r15); \
> - std %f11, STACK_F11(%r15); \
> - std %f12, STACK_F12(%r15); \
> - std %f13, STACK_F13(%r15); \
> - std %f14, STACK_F14(%r15); \
> - std %f15, STACK_F15(%r15);
> -
> -#define END_STACK(last_r) \
> - lg %r1, 0(%r15); \
> - ld %f8, STACK_F8(%r15); \
> - ld %f9, STACK_F9(%r15); \
> - ld %f10, STACK_F10(%r15); \
> - ld %f11, STACK_F11(%r15); \
> - ld %f12, STACK_F12(%r15); \
> - ld %f13, STACK_F13(%r15); \
> - ld %f14, STACK_F14(%r15); \
> - ld %f15, STACK_F15(%r15); \
> - lmg %r6, last_r, 6 * 8(%r1); \
> - lgr %r15, %r1; \
> - cfi_def_cfa_register(DW_REGNO_SP);
> -
> -#define PLUS(dst,src) \
> - vaf dst, dst, src;
> -
> -#define XOR(dst,src) \
> - vx dst, dst, src;
> -
> -#define ROTATE(v1,c) \
> - verllf v1, v1, (c)(0);
> -
> -#define WORD_ROTATE(v1,s) \
> - vsldb v1, v1, v1, ((s) * 4);
> -
> -#define DST_8(OPER, I, J) \
> - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
> - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
> -
> -/**********************************************************************
> - round macros
> - **********************************************************************/
> -
> -/**********************************************************************
> - 8-way chacha20 ("vertical")
> - **********************************************************************/
> -
> -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
> - x8,x9,x10,x11,x12,x13,x14,x15,\
> - y0,y1,y2,y3,y4,y5,y6,y7,\
> - y8,y9,y10,y11,y12,y13,y14,y15,\
> - op1,op2,op3,op4,op5,op6,op7,op8,\
> - op9,op10,op11,op12) \
> - op1; \
> - PLUS(x0, x1); PLUS(x4, x5); \
> - PLUS(x8, x9); PLUS(x12, x13); \
> - PLUS(y0, y1); PLUS(y4, y5); \
> - PLUS(y8, y9); PLUS(y12, y13); \
> - op2; \
> - XOR(x3, x0); XOR(x7, x4); \
> - XOR(x11, x8); XOR(x15, x12); \
> - XOR(y3, y0); XOR(y7, y4); \
> - XOR(y11, y8); XOR(y15, y12); \
> - op3; \
> - ROTATE(x3, 16); ROTATE(x7, 16); \
> - ROTATE(x11, 16); ROTATE(x15, 16); \
> - ROTATE(y3, 16); ROTATE(y7, 16); \
> - ROTATE(y11, 16); ROTATE(y15, 16); \
> - op4; \
> - PLUS(x2, x3); PLUS(x6, x7); \
> - PLUS(x10, x11); PLUS(x14, x15); \
> - PLUS(y2, y3); PLUS(y6, y7); \
> - PLUS(y10, y11); PLUS(y14, y15); \
> - op5; \
> - XOR(x1, x2); XOR(x5, x6); \
> - XOR(x9, x10); XOR(x13, x14); \
> - XOR(y1, y2); XOR(y5, y6); \
> - XOR(y9, y10); XOR(y13, y14); \
> - op6; \
> - ROTATE(x1,12); ROTATE(x5,12); \
> - ROTATE(x9,12); ROTATE(x13,12); \
> - ROTATE(y1,12); ROTATE(y5,12); \
> - ROTATE(y9,12); ROTATE(y13,12); \
> - op7; \
> - PLUS(x0, x1); PLUS(x4, x5); \
> - PLUS(x8, x9); PLUS(x12, x13); \
> - PLUS(y0, y1); PLUS(y4, y5); \
> - PLUS(y8, y9); PLUS(y12, y13); \
> - op8; \
> - XOR(x3, x0); XOR(x7, x4); \
> - XOR(x11, x8); XOR(x15, x12); \
> - XOR(y3, y0); XOR(y7, y4); \
> - XOR(y11, y8); XOR(y15, y12); \
> - op9; \
> - ROTATE(x3,8); ROTATE(x7,8); \
> - ROTATE(x11,8); ROTATE(x15,8); \
> - ROTATE(y3,8); ROTATE(y7,8); \
> - ROTATE(y11,8); ROTATE(y15,8); \
> - op10; \
> - PLUS(x2, x3); PLUS(x6, x7); \
> - PLUS(x10, x11); PLUS(x14, x15); \
> - PLUS(y2, y3); PLUS(y6, y7); \
> - PLUS(y10, y11); PLUS(y14, y15); \
> - op11; \
> - XOR(x1, x2); XOR(x5, x6); \
> - XOR(x9, x10); XOR(x13, x14); \
> - XOR(y1, y2); XOR(y5, y6); \
> - XOR(y9, y10); XOR(y13, y14); \
> - op12; \
> - ROTATE(x1,7); ROTATE(x5,7); \
> - ROTATE(x9,7); ROTATE(x13,7); \
> - ROTATE(y1,7); ROTATE(y5,7); \
> - ROTATE(y9,7); ROTATE(y13,7);
> -
> -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
> - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
> - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
> - x8,x9,x10,x11,x12,x13,x14,x15,\
> - y0,y1,y2,y3,y4,y5,y6,y7,\
> - y8,y9,y10,y11,y12,y13,y14,y15,\
> - ,,,,,,,,,,,)
> -
> -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
> - vmrhf tmp0, v0, v1; \
> - vmrhf tmp1, v2, v3; \
> - vmrlf tmp2, v0, v1; \
> - vmrlf v3, v2, v3; \
> - vmrhf tmpa, va, vb; \
> - vmrhf tmpb, vc, vd; \
> - vmrlf tmpc, va, vb; \
> - vmrlf vd, vc, vd; \
> - vpdi v0, tmp0, tmp1, 0; \
> - vpdi v1, tmp0, tmp1, 5; \
> - vpdi v2, tmp2, v3, 0; \
> - vpdi v3, tmp2, v3, 5; \
> - vpdi va, tmpa, tmpb, 0; \
> - vpdi vb, tmpa, tmpb, 5; \
> - vpdi vc, tmpc, vd, 0; \
> - vpdi vd, tmpc, vd, 5;
> -
> -.balign 8
> -.globl __chacha20_s390x_vx_blocks8
> -ENTRY (__chacha20_s390x_vx_blocks8)
> - /* input:
> - * %r2: input
> - * %r3: dst
> - * %r4: src
> - * %r5: nblks (multiple of 8)
> - */
> -
> - START_STACK(%r8);
> - lgr NBLKS, %r5;
> -
> - larl %r7, .Lconsts;
> -
> - /* Load counter. */
> - lg %r8, (12 * 4)(INPUT);
> - rllg %r8, %r8, 32;
> -
> -.balign 4
> - /* Process eight chacha20 blocks per loop. */
> -.Lloop8:
> - vlm Y0, Y3, 0(INPUT);
> -
> - slgfi NBLKS, 8;
> - lghi ROUND, (20 / 2);
> -
> - /* Construct counter vectors X12/X13 & Y12/Y13. */
> - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
> - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
> - vrepf Y12, Y3, 0;
> - vrepf Y13, Y3, 1;
> - vaccf X5, Y12, X4;
> - vaccf Y5, Y12, Y4;
> - vaf X12, Y12, X4;
> - vaf Y12, Y12, Y4;
> - vaf X13, Y13, X5;
> - vaf Y13, Y13, Y5;
> -
> - vrepf X0, Y0, 0;
> - vrepf X1, Y0, 1;
> - vrepf X2, Y0, 2;
> - vrepf X3, Y0, 3;
> - vrepf X4, Y1, 0;
> - vrepf X5, Y1, 1;
> - vrepf X6, Y1, 2;
> - vrepf X7, Y1, 3;
> - vrepf X8, Y2, 0;
> - vrepf X9, Y2, 1;
> - vrepf X10, Y2, 2;
> - vrepf X11, Y2, 3;
> - vrepf X14, Y3, 2;
> - vrepf X15, Y3, 3;
> -
> - /* Store counters for blocks 0-7. */
> - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
> - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
> -
> - vlr Y0, X0;
> - vlr Y1, X1;
> - vlr Y2, X2;
> - vlr Y3, X3;
> - vlr Y4, X4;
> - vlr Y5, X5;
> - vlr Y6, X6;
> - vlr Y7, X7;
> - vlr Y8, X8;
> - vlr Y9, X9;
> - vlr Y10, X10;
> - vlr Y11, X11;
> - vlr Y14, X14;
> - vlr Y15, X15;
> -
> - /* Update and store counter. */
> - agfi %r8, 8;
> - rllg %r5, %r8, 32;
> - stg %r5, (12 * 4)(INPUT);
> -
> -.balign 4
> -.Lround2_8:
> - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13,
> - X2, X6, X10, X14, X3, X7, X11, X15,
> - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13,
> - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15);
> - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12,
> - X2, X7, X8, X13, X3, X4, X9, X14,
> - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12,
> - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14);
> - brctg ROUND, .Lround2_8;
> -
> - /* Store blocks 4-7. */
> - vstm Y0, Y15, STACK_Y0_Y15(%r15);
> -
> - /* Load counters for blocks 0-3. */
> - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
> -
> - lghi ROUND, 1;
> - j .Lfirst_output_4blks_8;
> -
> -.balign 4
> -.Lsecond_output_4blks_8:
> - /* Load blocks 4-7. */
> - vlm X0, X15, STACK_Y0_Y15(%r15);
> -
> - /* Load counters for blocks 4-7. */
> - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
> -
> - lghi ROUND, 0;
> -
> -.balign 4
> - /* Output four chacha20 blocks per loop. */
> -.Lfirst_output_4blks_8:
> - vlm Y12, Y15, 0(INPUT);
> - PLUS(X12, Y0);
> - PLUS(X13, Y1);
> - vrepf Y0, Y12, 0;
> - vrepf Y1, Y12, 1;
> - vrepf Y2, Y12, 2;
> - vrepf Y3, Y12, 3;
> - vrepf Y4, Y13, 0;
> - vrepf Y5, Y13, 1;
> - vrepf Y6, Y13, 2;
> - vrepf Y7, Y13, 3;
> - vrepf Y8, Y14, 0;
> - vrepf Y9, Y14, 1;
> - vrepf Y10, Y14, 2;
> - vrepf Y11, Y14, 3;
> - vrepf Y14, Y15, 2;
> - vrepf Y15, Y15, 3;
> - PLUS(X0, Y0);
> - PLUS(X1, Y1);
> - PLUS(X2, Y2);
> - PLUS(X3, Y3);
> - PLUS(X4, Y4);
> - PLUS(X5, Y5);
> - PLUS(X6, Y6);
> - PLUS(X7, Y7);
> - PLUS(X8, Y8);
> - PLUS(X9, Y9);
> - PLUS(X10, Y10);
> - PLUS(X11, Y11);
> - PLUS(X14, Y14);
> - PLUS(X15, Y15);
> -
> - vl Y15, (.Lbswap32 - .Lconsts)(%r7);
> - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
> - Y9, Y10, Y11, Y12, Y13, Y14);
> - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
> - Y9, Y10, Y11, Y12, Y13, Y14);
> -
> - vlm Y0, Y14, 0(SRC);
> - vperm X0, X0, X0, Y15;
> - vperm X1, X1, X1, Y15;
> - vperm X2, X2, X2, Y15;
> - vperm X3, X3, X3, Y15;
> - vperm X4, X4, X4, Y15;
> - vperm X5, X5, X5, Y15;
> - vperm X6, X6, X6, Y15;
> - vperm X7, X7, X7, Y15;
> - vperm X8, X8, X8, Y15;
> - vperm X9, X9, X9, Y15;
> - vperm X10, X10, X10, Y15;
> - vperm X11, X11, X11, Y15;
> - vperm X12, X12, X12, Y15;
> - vperm X13, X13, X13, Y15;
> - vperm X14, X14, X14, Y15;
> - vperm X15, X15, X15, Y15;
> - vl Y15, (15 * 16)(SRC);
> -
> - XOR(Y0, X0);
> - XOR(Y1, X4);
> - XOR(Y2, X8);
> - XOR(Y3, X12);
> - XOR(Y4, X1);
> - XOR(Y5, X5);
> - XOR(Y6, X9);
> - XOR(Y7, X13);
> - XOR(Y8, X2);
> - XOR(Y9, X6);
> - XOR(Y10, X10);
> - XOR(Y11, X14);
> - XOR(Y12, X3);
> - XOR(Y13, X7);
> - XOR(Y14, X11);
> - XOR(Y15, X15);
> - vstm Y0, Y15, 0(DST);
> -
> - aghi SRC, 256;
> - aghi DST, 256;
> -
> - clgije ROUND, 1, .Lsecond_output_4blks_8;
> -
> - clgijhe NBLKS, 8, .Lloop8;
> -
> -
> - END_STACK(%r8);
> - xgr %r2, %r2;
> - br %r14;
> -END (__chacha20_s390x_vx_blocks8)
> -
> -#endif /* HAVE_S390_VX_ASM_SUPPORT */
> diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
> deleted file mode 100644
> index 0c6abf77e8..0000000000
> --- a/sysdeps/s390/s390-64/chacha20_arch.h
> +++ /dev/null
> @@ -1,45 +0,0 @@
> -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -#include <sys/auxv.h>
> -
> -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t nblks)
> - attribute_hidden;
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> - size_t bytes)
> -{
> -#ifdef HAVE_S390_VX_ASM_SUPPORT
> - _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
> - "CHACHA20_BUFSIZE not multiple of 8");
> - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
> - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
> -
> - if (GLRO(dl_hwcap) & HWCAP_S390_VX)
> - {
> - __chacha20_s390x_vx_blocks8 (state, dst, src,
> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> - return;
> - }
> -#endif
> - chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
> index 0326ebb767..c8a9ed2d40 100644
> --- a/sysdeps/unix/sysv/linux/tls-internal.c
> +++ b/sysdeps/unix/sysv/linux/tls-internal.c
> @@ -16,7 +16,6 @@
> License along with the GNU C Library; if not, see
> <https://www.gnu.org/licenses/>. */
>
> -#include <stdlib/arc4random.h>
> #include <string.h>
> #include <tls-internal.h>
>
> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
> struct pthread *self = THREAD_SELF;
> free (self->tls_state.strsignal_buf);
> free (self->tls_state.strerror_l_buf);
> -
> - if (self->tls_state.rand_state != NULL)
> - {
> - /* Clear any lingering random state prior so if the thread stack is
> - cached it won't leak any data. */
> - explicit_bzero (self->tls_state.rand_state,
> - sizeof (*self->tls_state.rand_state));
> - free (self->tls_state.rand_state);
> - }
> }
> diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> index 1178475d75..c19bef2dec 100644
> --- a/sysdeps/x86_64/Makefile
> +++ b/sysdeps/x86_64/Makefile
> @@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
> gen-as-const-headers += link-defines.sym
> endif
>
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += \
> - chacha20-amd64-sse2 \
> - chacha20-amd64-avx2 \
> - # sysdep_routines
> -endif
> -
> ifeq ($(subdir),gmon)
> sysdep_routines += _mcount
> # We cannot compile _mcount.S with -pg because that would create
> diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
> deleted file mode 100644
> index aefd1cdbd0..0000000000
> --- a/sysdeps/x86_64/chacha20-amd64-avx2.S
> +++ /dev/null
> @@ -1,328 +0,0 @@
> -/* Optimized AVX2 implementation of ChaCha20 cipher.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> -
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher
> -
> - Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
> -
> - This file is part of Libgcrypt.
> -
> - Libgcrypt is free software; you can redistribute it and/or modify
> - it under the terms of the GNU Lesser General Public License as
> - published by the Free Software Foundation; either version 2.1 of
> - the License, or (at your option) any later version.
> -
> - Libgcrypt is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> - GNU Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with this program; if not, see <https://www.gnu.org/licenses/>.
> -*/
> -
> -/* Based on D. J. Bernstein reference implementation at
> - http://cr.yp.to/chacha.html:
> -
> - chacha-regs.c version 20080118
> - D. J. Bernstein
> - Public domain. */
> -
> -#include <sysdep.h>
> -
> -#ifdef PIC
> -# define rRIP (%rip)
> -#else
> -# define rRIP
> -#endif
> -
> -/* register macros */
> -#define INPUT %rdi
> -#define DST %rsi
> -#define SRC %rdx
> -#define NBLKS %rcx
> -#define ROUND %eax
> -
> -/* stack structure */
> -#define STACK_VEC_X12 (32)
> -#define STACK_VEC_X13 (32 + STACK_VEC_X12)
> -#define STACK_TMP (32 + STACK_VEC_X13)
> -#define STACK_TMP1 (32 + STACK_TMP)
> -
> -#define STACK_MAX (32 + STACK_TMP1)
> -
> -/* vector registers */
> -#define X0 %ymm0
> -#define X1 %ymm1
> -#define X2 %ymm2
> -#define X3 %ymm3
> -#define X4 %ymm4
> -#define X5 %ymm5
> -#define X6 %ymm6
> -#define X7 %ymm7
> -#define X8 %ymm8
> -#define X9 %ymm9
> -#define X10 %ymm10
> -#define X11 %ymm11
> -#define X12 %ymm12
> -#define X13 %ymm13
> -#define X14 %ymm14
> -#define X15 %ymm15
> -
> -#define X0h %xmm0
> -#define X1h %xmm1
> -#define X2h %xmm2
> -#define X3h %xmm3
> -#define X4h %xmm4
> -#define X5h %xmm5
> -#define X6h %xmm6
> -#define X7h %xmm7
> -#define X8h %xmm8
> -#define X9h %xmm9
> -#define X10h %xmm10
> -#define X11h %xmm11
> -#define X12h %xmm12
> -#define X13h %xmm13
> -#define X14h %xmm14
> -#define X15h %xmm15
> -
> -/**********************************************************************
> - helper macros
> - **********************************************************************/
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
> - vpunpckhdq x1, x0, t2; \
> - vpunpckldq x1, x0, x0; \
> - \
> - vpunpckldq x3, x2, t1; \
> - vpunpckhdq x3, x2, x2; \
> - \
> - vpunpckhqdq t1, x0, x1; \
> - vpunpcklqdq t1, x0, x0; \
> - \
> - vpunpckhqdq x2, t2, x3; \
> - vpunpcklqdq x2, t2, x2;
> -
> -/* 2x2 128-bit matrix transpose */
> -#define transpose_16byte_2x2(x0,x1,t1) \
> - vmovdqa x0, t1; \
> - vperm2i128 $0x20, x1, x0, x0; \
> - vperm2i128 $0x31, x1, t1, x1;
> -
> -/**********************************************************************
> - 8-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE2(v1,v2,c,tmp) \
> - vpsrld $(32 - (c)), v1, tmp; \
> - vpslld $(c), v1, v1; \
> - vpaddb tmp, v1, v1; \
> - vpsrld $(32 - (c)), v2, tmp; \
> - vpslld $(c), v2, v2; \
> - vpaddb tmp, v2, v2;
> -
> -#define ROTATE_SHUF_2(v1,v2,shuf) \
> - vpshufb shuf, v1, v1; \
> - vpshufb shuf, v2, v2;
> -
> -#define XOR(ds,s) \
> - vpxor s, ds, ds;
> -
> -#define PLUS(ds,s) \
> - vpaddd s, ds, ds;
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
> - interleave_op1,interleave_op2,\
> - interleave_op3,interleave_op4) \
> - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \
> - interleave_op1; \
> - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
> - ROTATE_SHUF_2(d1, d2, tmp1); \
> - interleave_op2; \
> - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
> - ROTATE2(b1, b2, 12, tmp1); \
> - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \
> - interleave_op3; \
> - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
> - ROTATE_SHUF_2(d1, d2, tmp1); \
> - interleave_op4; \
> - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
> - ROTATE2(b1, b2, 7, tmp1);
> -
> - .section .text.avx2, "ax", @progbits
> - .align 32
> -chacha20_data:
> -L(shuf_rol16):
> - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
> -L(shuf_rol8):
> - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
> -L(inc_counter):
> - .byte 0,1,2,3,4,5,6,7
> -L(unsigned_cmp):
> - .long 0x80000000
> -
> - .hidden __chacha20_avx2_blocks8
> -ENTRY (__chacha20_avx2_blocks8)
> - /* input:
> - * %rdi: input
> - * %rsi: dst
> - * %rdx: src
> - * %rcx: nblks (multiple of 8)
> - */
> - vzeroupper;
> -
> - pushq %rbp;
> - cfi_adjust_cfa_offset(8);
> - cfi_rel_offset(rbp, 0)
> - movq %rsp, %rbp;
> - cfi_def_cfa_register(rbp);
> -
> - subq $STACK_MAX, %rsp;
> - andq $~31, %rsp;
> -
> -L(loop8):
> - mov $20, ROUND;
> -
> - /* Construct counter vectors X12 and X13 */
> - vpmovzxbd L(inc_counter) rRIP, X0;
> - vpbroadcastd L(unsigned_cmp) rRIP, X2;
> - vpbroadcastd (12 * 4)(INPUT), X12;
> - vpbroadcastd (13 * 4)(INPUT), X13;
> - vpaddd X0, X12, X12;
> - vpxor X2, X0, X0;
> - vpxor X2, X12, X1;
> - vpcmpgtd X1, X0, X0;
> - vpsubd X0, X13, X13;
> - vmovdqa X12, (STACK_VEC_X12)(%rsp);
> - vmovdqa X13, (STACK_VEC_X13)(%rsp);
> -
> - /* Load vectors */
> - vpbroadcastd (0 * 4)(INPUT), X0;
> - vpbroadcastd (1 * 4)(INPUT), X1;
> - vpbroadcastd (2 * 4)(INPUT), X2;
> - vpbroadcastd (3 * 4)(INPUT), X3;
> - vpbroadcastd (4 * 4)(INPUT), X4;
> - vpbroadcastd (5 * 4)(INPUT), X5;
> - vpbroadcastd (6 * 4)(INPUT), X6;
> - vpbroadcastd (7 * 4)(INPUT), X7;
> - vpbroadcastd (8 * 4)(INPUT), X8;
> - vpbroadcastd (9 * 4)(INPUT), X9;
> - vpbroadcastd (10 * 4)(INPUT), X10;
> - vpbroadcastd (11 * 4)(INPUT), X11;
> - vpbroadcastd (14 * 4)(INPUT), X14;
> - vpbroadcastd (15 * 4)(INPUT), X15;
> - vmovdqa X15, (STACK_TMP)(%rsp);
> -
> -L(round2):
> - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,)
> - vmovdqa (STACK_TMP)(%rsp), X15;
> - vmovdqa X8, (STACK_TMP)(%rsp);
> - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,)
> - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,)
> - vmovdqa (STACK_TMP)(%rsp), X8;
> - vmovdqa X15, (STACK_TMP)(%rsp);
> - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,)
> - sub $2, ROUND;
> - jnz L(round2);
> -
> - vmovdqa X8, (STACK_TMP1)(%rsp);
> -
> - /* tmp := X15 */
> - vpbroadcastd (0 * 4)(INPUT), X15;
> - PLUS(X0, X15);
> - vpbroadcastd (1 * 4)(INPUT), X15;
> - PLUS(X1, X15);
> - vpbroadcastd (2 * 4)(INPUT), X15;
> - PLUS(X2, X15);
> - vpbroadcastd (3 * 4)(INPUT), X15;
> - PLUS(X3, X15);
> - vpbroadcastd (4 * 4)(INPUT), X15;
> - PLUS(X4, X15);
> - vpbroadcastd (5 * 4)(INPUT), X15;
> - PLUS(X5, X15);
> - vpbroadcastd (6 * 4)(INPUT), X15;
> - PLUS(X6, X15);
> - vpbroadcastd (7 * 4)(INPUT), X15;
> - PLUS(X7, X15);
> - transpose_4x4(X0, X1, X2, X3, X8, X15);
> - transpose_4x4(X4, X5, X6, X7, X8, X15);
> - vmovdqa (STACK_TMP1)(%rsp), X8;
> - transpose_16byte_2x2(X0, X4, X15);
> - transpose_16byte_2x2(X1, X5, X15);
> - transpose_16byte_2x2(X2, X6, X15);
> - transpose_16byte_2x2(X3, X7, X15);
> - vmovdqa (STACK_TMP)(%rsp), X15;
> - vmovdqu X0, (64 * 0 + 16 * 0)(DST)
> - vmovdqu X1, (64 * 1 + 16 * 0)(DST)
> - vpbroadcastd (8 * 4)(INPUT), X0;
> - PLUS(X8, X0);
> - vpbroadcastd (9 * 4)(INPUT), X0;
> - PLUS(X9, X0);
> - vpbroadcastd (10 * 4)(INPUT), X0;
> - PLUS(X10, X0);
> - vpbroadcastd (11 * 4)(INPUT), X0;
> - PLUS(X11, X0);
> - vmovdqa (STACK_VEC_X12)(%rsp), X0;
> - PLUS(X12, X0);
> - vmovdqa (STACK_VEC_X13)(%rsp), X0;
> - PLUS(X13, X0);
> - vpbroadcastd (14 * 4)(INPUT), X0;
> - PLUS(X14, X0);
> - vpbroadcastd (15 * 4)(INPUT), X0;
> - PLUS(X15, X0);
> - vmovdqu X2, (64 * 2 + 16 * 0)(DST)
> - vmovdqu X3, (64 * 3 + 16 * 0)(DST)
> -
> - /* Update counter */
> - addq $8, (12 * 4)(INPUT);
> -
> - transpose_4x4(X8, X9, X10, X11, X0, X1);
> - transpose_4x4(X12, X13, X14, X15, X0, X1);
> - vmovdqu X4, (64 * 4 + 16 * 0)(DST)
> - vmovdqu X5, (64 * 5 + 16 * 0)(DST)
> - transpose_16byte_2x2(X8, X12, X0);
> - transpose_16byte_2x2(X9, X13, X0);
> - transpose_16byte_2x2(X10, X14, X0);
> - transpose_16byte_2x2(X11, X15, X0);
> - vmovdqu X6, (64 * 6 + 16 * 0)(DST)
> - vmovdqu X7, (64 * 7 + 16 * 0)(DST)
> - vmovdqu X8, (64 * 0 + 16 * 2)(DST)
> - vmovdqu X9, (64 * 1 + 16 * 2)(DST)
> - vmovdqu X10, (64 * 2 + 16 * 2)(DST)
> - vmovdqu X11, (64 * 3 + 16 * 2)(DST)
> - vmovdqu X12, (64 * 4 + 16 * 2)(DST)
> - vmovdqu X13, (64 * 5 + 16 * 2)(DST)
> - vmovdqu X14, (64 * 6 + 16 * 2)(DST)
> - vmovdqu X15, (64 * 7 + 16 * 2)(DST)
> -
> - sub $8, NBLKS;
> - lea (8 * 64)(DST), DST;
> - lea (8 * 64)(SRC), SRC;
> - jnz L(loop8);
> -
> - vzeroupper;
> -
> - /* eax zeroed by round loop. */
> - leave;
> - cfi_adjust_cfa_offset(-8)
> - cfi_def_cfa_register(%rsp);
> - ret;
> - int3;
> -END(__chacha20_avx2_blocks8)
> diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
> deleted file mode 100644
> index 351a1109c6..0000000000
> --- a/sysdeps/x86_64/chacha20-amd64-sse2.S
> +++ /dev/null
> @@ -1,311 +0,0 @@
> -/* Optimized SSE2 implementation of ChaCha20 cipher.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher
> -
> - Copyright (C) 2017-2019 Jussi Kivilinna <[email protected]>
> -
> - This file is part of Libgcrypt.
> -
> - Libgcrypt is free software; you can redistribute it and/or modify
> - it under the terms of the GNU Lesser General Public License as
> - published by the Free Software Foundation; either version 2.1 of
> - the License, or (at your option) any later version.
> -
> - Libgcrypt is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> - GNU Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with this program; if not, see <https://www.gnu.org/licenses/>.
> -*/
> -
> -/* Based on D. J. Bernstein reference implementation at
> - http://cr.yp.to/chacha.html:
> -
> - chacha-regs.c version 20080118
> - D. J. Bernstein
> - Public domain. */
> -
> -#include <sysdep.h>
> -#include <isa-level.h>
> -
> -#if MINIMUM_X86_ISA_LEVEL <= 2
> -
> -#ifdef PIC
> -# define rRIP (%rip)
> -#else
> -# define rRIP
> -#endif
> -
> -/* 'ret' instruction replacement for straight-line speculation mitigation */
> -#define ret_spec_stop \
> - ret; int3;
> -
> -/* register macros */
> -#define INPUT %rdi
> -#define DST %rsi
> -#define SRC %rdx
> -#define NBLKS %rcx
> -#define ROUND %eax
> -
> -/* stack structure */
> -#define STACK_VEC_X12 (16)
> -#define STACK_VEC_X13 (16 + STACK_VEC_X12)
> -#define STACK_TMP (16 + STACK_VEC_X13)
> -#define STACK_TMP1 (16 + STACK_TMP)
> -#define STACK_TMP2 (16 + STACK_TMP1)
> -
> -#define STACK_MAX (16 + STACK_TMP2)
> -
> -/* vector registers */
> -#define X0 %xmm0
> -#define X1 %xmm1
> -#define X2 %xmm2
> -#define X3 %xmm3
> -#define X4 %xmm4
> -#define X5 %xmm5
> -#define X6 %xmm6
> -#define X7 %xmm7
> -#define X8 %xmm8
> -#define X9 %xmm9
> -#define X10 %xmm10
> -#define X11 %xmm11
> -#define X12 %xmm12
> -#define X13 %xmm13
> -#define X14 %xmm14
> -#define X15 %xmm15
> -
> -/**********************************************************************
> - helper macros
> - **********************************************************************/
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
> - movdqa x0, t2; \
> - punpckhdq x1, t2; \
> - punpckldq x1, x0; \
> - \
> - movdqa x2, t1; \
> - punpckldq x3, t1; \
> - punpckhdq x3, x2; \
> - \
> - movdqa x0, x1; \
> - punpckhqdq t1, x1; \
> - punpcklqdq t1, x0; \
> - \
> - movdqa t2, x3; \
> - punpckhqdq x2, x3; \
> - punpcklqdq x2, t2; \
> - movdqa t2, x2;
> -
> -/* fill xmm register with 32-bit value from memory */
> -#define PBROADCASTD(mem32, xreg) \
> - movd mem32, xreg; \
> - pshufd $0, xreg, xreg;
> -
> -/**********************************************************************
> - 4-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE2(v1,v2,c,tmp1,tmp2) \
> - movdqa v1, tmp1; \
> - movdqa v2, tmp2; \
> - psrld $(32 - (c)), v1; \
> - pslld $(c), tmp1; \
> - paddb tmp1, v1; \
> - psrld $(32 - (c)), v2; \
> - pslld $(c), tmp2; \
> - paddb tmp2, v2;
> -
> -#define XOR(ds,s) \
> - pxor s, ds;
> -
> -#define PLUS(ds,s) \
> - paddd s, ds;
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \
> - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
> - ROTATE2(d1, d2, 16, tmp1, tmp2); \
> - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
> - ROTATE2(b1, b2, 12, tmp1, tmp2); \
> - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \
> - ROTATE2(d1, d2, 8, tmp1, tmp2); \
> - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \
> - ROTATE2(b1, b2, 7, tmp1, tmp2);
> -
> - .section .text.sse2,"ax",@progbits
> -
> -chacha20_data:
> - .align 16
> -L(counter1):
> - .long 1,0,0,0
> -L(inc_counter):
> - .long 0,1,2,3
> -L(unsigned_cmp):
> - .long 0x80000000,0x80000000,0x80000000,0x80000000
> -
> - .hidden __chacha20_sse2_blocks4
> -ENTRY (__chacha20_sse2_blocks4)
> - /* input:
> - * %rdi: input
> - * %rsi: dst
> - * %rdx: src
> - * %rcx: nblks (multiple of 4)
> - */
> -
> - pushq %rbp;
> - cfi_adjust_cfa_offset(8);
> - cfi_rel_offset(rbp, 0)
> - movq %rsp, %rbp;
> - cfi_def_cfa_register(%rbp);
> -
> - subq $STACK_MAX, %rsp;
> - andq $~15, %rsp;
> -
> -L(loop4):
> - mov $20, ROUND;
> -
> - /* Construct counter vectors X12 and X13 */
> - movdqa L(inc_counter) rRIP, X0;
> - movdqa L(unsigned_cmp) rRIP, X2;
> - PBROADCASTD((12 * 4)(INPUT), X12);
> - PBROADCASTD((13 * 4)(INPUT), X13);
> - paddd X0, X12;
> - movdqa X12, X1;
> - pxor X2, X0;
> - pxor X2, X1;
> - pcmpgtd X1, X0;
> - psubd X0, X13;
> - movdqa X12, (STACK_VEC_X12)(%rsp);
> - movdqa X13, (STACK_VEC_X13)(%rsp);
> -
> - /* Load vectors */
> - PBROADCASTD((0 * 4)(INPUT), X0);
> - PBROADCASTD((1 * 4)(INPUT), X1);
> - PBROADCASTD((2 * 4)(INPUT), X2);
> - PBROADCASTD((3 * 4)(INPUT), X3);
> - PBROADCASTD((4 * 4)(INPUT), X4);
> - PBROADCASTD((5 * 4)(INPUT), X5);
> - PBROADCASTD((6 * 4)(INPUT), X6);
> - PBROADCASTD((7 * 4)(INPUT), X7);
> - PBROADCASTD((8 * 4)(INPUT), X8);
> - PBROADCASTD((9 * 4)(INPUT), X9);
> - PBROADCASTD((10 * 4)(INPUT), X10);
> - PBROADCASTD((11 * 4)(INPUT), X11);
> - PBROADCASTD((14 * 4)(INPUT), X14);
> - PBROADCASTD((15 * 4)(INPUT), X15);
> - movdqa X11, (STACK_TMP)(%rsp);
> - movdqa X15, (STACK_TMP1)(%rsp);
> -
> -L(round2_4):
> - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15)
> - movdqa (STACK_TMP)(%rsp), X11;
> - movdqa (STACK_TMP1)(%rsp), X15;
> - movdqa X8, (STACK_TMP)(%rsp);
> - movdqa X9, (STACK_TMP1)(%rsp);
> - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9)
> - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9)
> - movdqa (STACK_TMP)(%rsp), X8;
> - movdqa (STACK_TMP1)(%rsp), X9;
> - movdqa X11, (STACK_TMP)(%rsp);
> - movdqa X15, (STACK_TMP1)(%rsp);
> - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15)
> - sub $2, ROUND;
> - jnz L(round2_4);
> -
> - /* tmp := X15 */
> - movdqa (STACK_TMP)(%rsp), X11;
> - PBROADCASTD((0 * 4)(INPUT), X15);
> - PLUS(X0, X15);
> - PBROADCASTD((1 * 4)(INPUT), X15);
> - PLUS(X1, X15);
> - PBROADCASTD((2 * 4)(INPUT), X15);
> - PLUS(X2, X15);
> - PBROADCASTD((3 * 4)(INPUT), X15);
> - PLUS(X3, X15);
> - PBROADCASTD((4 * 4)(INPUT), X15);
> - PLUS(X4, X15);
> - PBROADCASTD((5 * 4)(INPUT), X15);
> - PLUS(X5, X15);
> - PBROADCASTD((6 * 4)(INPUT), X15);
> - PLUS(X6, X15);
> - PBROADCASTD((7 * 4)(INPUT), X15);
> - PLUS(X7, X15);
> - PBROADCASTD((8 * 4)(INPUT), X15);
> - PLUS(X8, X15);
> - PBROADCASTD((9 * 4)(INPUT), X15);
> - PLUS(X9, X15);
> - PBROADCASTD((10 * 4)(INPUT), X15);
> - PLUS(X10, X15);
> - PBROADCASTD((11 * 4)(INPUT), X15);
> - PLUS(X11, X15);
> - movdqa (STACK_VEC_X12)(%rsp), X15;
> - PLUS(X12, X15);
> - movdqa (STACK_VEC_X13)(%rsp), X15;
> - PLUS(X13, X15);
> - movdqa X13, (STACK_TMP)(%rsp);
> - PBROADCASTD((14 * 4)(INPUT), X15);
> - PLUS(X14, X15);
> - movdqa (STACK_TMP1)(%rsp), X15;
> - movdqa X14, (STACK_TMP1)(%rsp);
> - PBROADCASTD((15 * 4)(INPUT), X13);
> - PLUS(X15, X13);
> - movdqa X15, (STACK_TMP2)(%rsp);
> -
> - /* Update counter */
> - addq $4, (12 * 4)(INPUT);
> -
> - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
> - movdqu X0, (64 * 0 + 16 * 0)(DST)
> - movdqu X1, (64 * 1 + 16 * 0)(DST)
> - movdqu X2, (64 * 2 + 16 * 0)(DST)
> - movdqu X3, (64 * 3 + 16 * 0)(DST)
> - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
> - movdqa (STACK_TMP)(%rsp), X13;
> - movdqa (STACK_TMP1)(%rsp), X14;
> - movdqa (STACK_TMP2)(%rsp), X15;
> - movdqu X4, (64 * 0 + 16 * 1)(DST)
> - movdqu X5, (64 * 1 + 16 * 1)(DST)
> - movdqu X6, (64 * 2 + 16 * 1)(DST)
> - movdqu X7, (64 * 3 + 16 * 1)(DST)
> - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
> - movdqu X8, (64 * 0 + 16 * 2)(DST)
> - movdqu X9, (64 * 1 + 16 * 2)(DST)
> - movdqu X10, (64 * 2 + 16 * 2)(DST)
> - movdqu X11, (64 * 3 + 16 * 2)(DST)
> - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
> - movdqu X12, (64 * 0 + 16 * 3)(DST)
> - movdqu X13, (64 * 1 + 16 * 3)(DST)
> - movdqu X14, (64 * 2 + 16 * 3)(DST)
> - movdqu X15, (64 * 3 + 16 * 3)(DST)
> -
> - sub $4, NBLKS;
> - lea (4 * 64)(DST), DST;
> - lea (4 * 64)(SRC), SRC;
> - jnz L(loop4);
> -
> - /* eax zeroed by round loop. */
> - leave;
> - cfi_adjust_cfa_offset(-8)
> - cfi_def_cfa_register(%rsp);
> - ret_spec_stop;
> -END (__chacha20_sse2_blocks4)
> -
> -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
> deleted file mode 100644
> index 6f3784e392..0000000000
> --- a/sysdeps/x86_64/chacha20_arch.h
> +++ /dev/null
> @@ -1,55 +0,0 @@
> -/* Chacha20 implementation, used on arc4random.
> - Copyright (C) 2022 Free Software Foundation, Inc.
> - This file is part of the GNU C Library.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <isa-level.h>
> -#include <ldsodefs.h>
> -#include <cpu-features.h>
> -#include <sys/param.h>
> -
> -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t nblks)
> - attribute_hidden;
> -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
> - const uint8_t *src, size_t nblks)
> - attribute_hidden;
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> - size_t bytes)
> -{
> - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
> - "CHACHA20_BUFSIZE not multiple of 4 or 8");
> - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
> - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
> -
> -#if MINIMUM_X86_ISA_LEVEL > 2
> - __chacha20_avx2_blocks8 (state, dst, src,
> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#else
> - const struct cpu_features* cpu_features = __get_cpu_features ();
> -
> - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */
> - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
> - __chacha20_avx2_blocks8 (state, dst, src,
> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> - else
> - __chacha20_sse2_blocks4 (state, dst, src,
> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#endif
> -}

2022-07-26 11:36:52

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety



On 26/07/22 08:20, Jason A. Donenfeld wrote:
> Hey Florian,
>
> On Tue, Jul 26, 2022 at 01:12:28PM +0200, Florian Weimer wrote:
>>>> What happens if /dev/random is actually /dev/urandom? Will the poll
>>>> call fail?
>>>
>>> Yes. I'm unsure if you're asking this because it'd be a nice
>>> simplification to only have to open one fd, or because you're worried
>>> about confusion. I don't think the confusion problem is one we should
>>> take too seriously, but if you're concerned, we can always fstat and
>>> check the maj/min. Seems a bit much, though.
>>
>> Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used
>> to be the only way to get some applications working because they tried
>> to read from /dev/random at a higher rate than the system was estimating
>> entropy coming in. We may have to do something differently here if the
>> failing poll causes too much breakage.
>
> The "backup plan" would be to sleep-loop-read /proc/sys/kernel/random/entropy_avail
> until it passes a certain threshold one time. This might also work on even older
> kernels than the poll() trick. But that's pretty darn ugly, so it's not
> obvious to me where the cut-off in frustration is, when we throw our
> hands up and decide the ugliness is worth it compared to whatever
> problems we happen to be facing at the time with the poll() technique.
> But at least there is an alternative, should we need it.

I think the poll trick is way better, although I also think it is very Linux
specific. Should we move it to Linux sysdeps?

The /proc/sys/kernel/random/entropy_avail would require to open another file
descriptor, which I think we avoid for arc4random if possible.

2022-07-26 11:57:43

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hi Adhemerval,

Thanks for your review.

On Tue, Jul 26, 2022 at 08:33:23AM -0300, Adhemerval Zanella Netto wrote:
> Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment,
> sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build),
> maybe also change the NEWS to state this is not a CSPRNG, and we definitely need
> to update the manual. Some comments below.

I think Eric already pointed those out, and they're fixed in v3 now.
PTAL.

> > + static bool have_getrandom = true, seen_initialized = false;
> > + int fd;
>
> I think it should reasonable to assume that getrandom syscall will be always
> supported and using arc4random in an enviroment with filtered getrandom does
> not make much sense. We are trying to avoid add this static syscall checks
> where possible,

I don't know glibc's requirements for kernels, though I do know that
it'd be nice to not have to write this fallback code in every program I
write and just use libc's thing. So in that sense, having the fallback
to /dev/urandom makes arc4random_buf a lot more useful. But with that
said, yea, maybe we shouldn't care about old kernels? getrandom is now
quite old and the stable kernels on kernel.org all have it.

From my perspective, I don't have a strongly developed opinion on what
makes sense for glibc. If Florian agrees with you, I'll send a v+1 with
the fallback code removed. If it's contentious, maybe the fallback code
should stay in and we can slate it for removal on another day, when the
minimum glibc kernel version gets raised or something like that.

> also plain load/store to se the static have_getrandom
> is strickly a race-condition, although it should not really matter (we use
> relaxed load/store in such optimization (check
> sysdeps/unix/sysv/linux/mips/mips64/getdents64.c).

I was aware of the race but figured it didn't matter, since two racing
threads will both set it to the same result eventually. But I didn't
know about the convention of using those relaxed wrapper functions.
Thanks for the tip. I'll do that for v4.

> Also, does it make sense to fallback if we build for a kernel that should
> always support getrandom?

I guess only if syscall filtering is a concern. But if not, then maybe
yea? We could do this in a follow-up commit, or I could do this in v4.
Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here?
If so, I think the way I'd implement that would be:

diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 978bf9287f..a33d9ff2c5 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -44,8 +44,10 @@ __arc4random_buf (void *p, size_t n)
{
ssize_t l;

+#if __LINUX_KERNEL_VERSION < something
if (!atomic_load_relaxed (&have_getrandom))
break;
+#endif

l = __getrandom_nocancel (p, n, 0);
if (l > 0)
@@ -60,11 +62,13 @@ __arc4random_buf (void *p, size_t n)
arc4random_getrandom_failure (); /* Weird, should never happen. */
else if (l == -EINTR)
continue; /* Interrupted by a signal; keep going. */
+#if __LINUX_KERNEL_VERSION < something
else if (l == -ENOSYS)
{
atomic_store_relaxed (&have_getrandom, false);
break; /* No syscall, so fallback to /dev/urandom. */
}
+#endif
arc4random_getrandom_failure (); /* Unknown error, should never happen. */
}

And then arc4random_getrandom_failure() being a noreturn function would
make gcc optimize out the rest.

Does that seem like a good approach?

> > + l = __getrandom_nocancel (p, n, 0);
>
> Do we need to worry about a potentially uncancellable blocking call here? I guess
> using GRND_NONBLOCK does not really help.

No, generally not. Also, keep in mind that getrandom(0) will trigger
jitter entropy if the kernel isn't already initialized.

>
> > + if (l > 0)
> > + {
> > + if ((size_t) l == n)
>
> Do we need the cast here?

Generally it's frowned upon to have implicit signed conversion, right? l
is signed while n is unsigned.

>
> > + return; /* Done reading, success. */
>
> Minor style issue: use double space before period.

I was really confused by this, and then opened up some other files and
saw you meant *after* period. :) Will do for v4.

> As Florian said we will need a non cancellable poll here. Since you are setting
> the timeout as undefined, I think it would be simple to just add a non cancellable
> wrapper as:
>
> int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
> {
> #ifndef __NR_ppoll_time64
> # define __NR_ppoll_time64 __NR_ppoll
> #endif
> return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
> }
>
> So we don't need to handle the timeout for 64-bit time_t wrappers.

Oh that sounds like a good solution to the time64 situation. I'll do
that for v4... BUT, I already implemented possibly the wrong solution
for v3. Could you take a look at what I did there and confirm that it's
wrong? If so, then I'll do exactly what you suggested here.

Thanks again for the review,
Jason

2022-07-26 12:25:08

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hey again,

On Tue, Jul 26, 2022 at 01:54:23PM +0200, Jason A. Donenfeld wrote:
> > As Florian said we will need a non cancellable poll here. Since you are setting
> > the timeout as undefined, I think it would be simple to just add a non cancellable
> > wrapper as:
> >
> > int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
> > {
> > #ifndef __NR_ppoll_time64
> > # define __NR_ppoll_time64 __NR_ppoll
> > #endif
> > return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
> > }
> >
> > So we don't need to handle the timeout for 64-bit time_t wrappers.
>
> Oh that sounds like a good solution to the time64 situation. I'll do
> that for v4... BUT, I already implemented possibly the wrong solution
> for v3. Could you take a look at what I did there and confirm that it's
> wrong? If so, then I'll do exactly what you suggested here.

Actually, forget my v3. What you're suggesting is also better because
it's ppoll, not poll, as poll isn't on all platforms. So I'll do things
exactly as you've described for v4.

Jason

2022-07-26 12:33:04

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

On Tue, Jul 26, 2022 at 01:54:23PM +0200, Jason A. Donenfeld wrote:
> > Also, does it make sense to fallback if we build for a kernel that should
> > always support getrandom?
>
> I guess only if syscall filtering is a concern. But if not, then maybe
> yea? We could do this in a follow-up commit, or I could do this in v4.
> Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here?
> If so, I think the way I'd implement that would be:
>
> [...]
>
> And then arc4random_getrandom_failure() being a noreturn function would
> make gcc optimize out the rest.
>
> Does that seem like a good approach?

It actually winds up looking a bit more like the below. Let me know if
you want that in v4.

diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index c0f132ea9b..8fcf41e7de 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -43,7 +43,7 @@ __arc4random_buf (void *p, size_t n)
{
ssize_t l;

- if (!atomic_load_relaxed (&have_getrandom))
+ if (!__ASSUME_GETRANDOM && !atomic_load_relaxed (&have_getrandom))
break;

l = __getrandom_nocancel (p, n, 0);
@@ -59,7 +59,7 @@ __arc4random_buf (void *p, size_t n)
arc4random_getrandom_failure (); /* Weird, should never happen. */
else if (l == -EINTR)
continue; /* Interrupted by a signal; keep going. */
- else if (l == -ENOSYS)
+ else if (!__ASSUME_GETRANDOM && l == -ENOSYS)
{
atomic_store_relaxed (&have_getrandom, false);
break; /* No syscall, so fallback to /dev/urandom. */
diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
index 74adc3956b..75d5f953d4 100644
--- a/sysdeps/unix/sysv/linux/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/kernel-features.h
@@ -236,4 +236,11 @@
# define __ASSUME_FUTEX_LOCK_PI2 0
#endif

+/* The getrandom() syscall was added in 3.17. */
+#if __LINUX_KERNEL_VERSION >= 0x031100
+# define __ASSUME_GETRANDOM 1
+#else
+# define __ASSUME_GETRANDOM 0
+#endif
+
#endif /* kernel-features.h */

2022-07-26 12:36:29

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety



On 26/07/22 08:54, Jason A. Donenfeld wrote:
> Hi Adhemerval,
>
> Thanks for your review.
>
> On Tue, Jul 26, 2022 at 08:33:23AM -0300, Adhemerval Zanella Netto wrote:
>> Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment,
>> sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build),
>> maybe also change the NEWS to state this is not a CSPRNG, and we definitely need
>> to update the manual. Some comments below.
>
> I think Eric already pointed those out, and they're fixed in v3 now.
> PTAL.
>
>>> + static bool have_getrandom = true, seen_initialized = false;
>>> + int fd;
>>
>> I think it should reasonable to assume that getrandom syscall will be always
>> supported and using arc4random in an enviroment with filtered getrandom does
>> not make much sense. We are trying to avoid add this static syscall checks
>> where possible,
>
> I don't know glibc's requirements for kernels, though I do know that
> it'd be nice to not have to write this fallback code in every program I
> write and just use libc's thing. So in that sense, having the fallback
> to /dev/urandom makes arc4random_buf a lot more useful. But with that
> said, yea, maybe we shouldn't care about old kernels? getrandom is now
> quite old and the stable kernels on kernel.org all have it.

We do not enforce kernels version anymore, although we still support the
--enable-kernel=x.y that changes on how glibc internally assume some syscall
(so there is no need to fallback if it were the case).

So the question is where we need the fallback code for --enable-kernel=3.17.
If kernel is returning ENOSYS in this case (and assuming you are running on
kernel newer than 3.17) it means some syscall filtering, and I am not sure
we should need to actually handle it. The main idea of adding this minor
optimization is to once we increase the minimum supported kernel we can
clean this code up.

>
> From my perspective, I don't have a strongly developed opinion on what
> makes sense for glibc. If Florian agrees with you, I'll send a v+1 with
> the fallback code removed. If it's contentious, maybe the fallback code
> should stay in and we can slate it for removal on another day, when the
> minimum glibc kernel version gets raised or something like that.

I think the fallback code make sense since the minimum supported kernel we
still support is 3.2, although I am not sure how getrandom and/or /dev/urandom
will play in such older kernels.

>
>> also plain load/store to se the static have_getrandom
>> is strickly a race-condition, although it should not really matter (we use
>> relaxed load/store in such optimization (check
>> sysdeps/unix/sysv/linux/mips/mips64/getdents64.c).
>
> I was aware of the race but figured it didn't matter, since two racing
> threads will both set it to the same result eventually. But I didn't
> know about the convention of using those relaxed wrapper functions.
> Thanks for the tip. I'll do that for v4.
>
>> Also, does it make sense to fallback if we build for a kernel that should
>> always support getrandom?
>
> I guess only if syscall filtering is a concern. But if not, then maybe
> yea? We could do this in a follow-up commit, or I could do this in v4.
> Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here?
> If so, I think the way I'd implement that would be:
>
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> index 978bf9287f..a33d9ff2c5 100644
> --- a/stdlib/arc4random.c
> +++ b/stdlib/arc4random.c
> @@ -44,8 +44,10 @@ __arc4random_buf (void *p, size_t n)
> {
> ssize_t l;
>
> +#if __LINUX_KERNEL_VERSION < something
> if (!atomic_load_relaxed (&have_getrandom))
> break;
> +#endif>
> l = __getrandom_nocancel (p, n, 0);
> if (l > 0)
> @@ -60,11 +62,13 @@ __arc4random_buf (void *p, size_t n)
> arc4random_getrandom_failure (); /* Weird, should never happen. */
> else if (l == -EINTR)
> continue; /* Interrupted by a signal; keep going. */
> +#if __LINUX_KERNEL_VERSION < something
> else if (l == -ENOSYS)
> {
> atomic_store_relaxed (&have_getrandom, false);
> break; /* No syscall, so fallback to /dev/urandom. */
> }
> +#endif
> arc4random_getrandom_failure (); /* Unknown error, should never happen. */
> }
>
> And then arc4random_getrandom_failure() being a noreturn function would
> make gcc optimize out the rest.
>
> Does that seem like a good approach?

I think so, although he __LINUX_KERNEL_VERSION is Linux-only that should
be moved to sysdeps/unix/sysv/linux.

Usually we do as a wrapper (static inline or hidden symbol), with the generic
implementation on sysdep/generic or include with Linux redefining on its own
folder.

We also a use __ASSUME macros (check sysdeps/unix/sysv/linux/kernel-features.h),
it should be something like __ASSUME_GETRANDOM (we did not have a use for it
because we do not want a fallback for getrandom implementation).

So I would add something like:

sysdeps/unix/sysv/linux/arc4random_impl.h


static inline int getentropy_arch (void *p, size_t n)
{
for (;;)
{
ssize_t l = __getrandom_nocancel (p, n, 0);
if (l > 0)
{
if (l == n)
return true;
}
else if (l == 0)
return -1;
else if (l == -EINTR)
continue;

#if !__ASSUME_GETRANDOM
if (l == -ENOSYS)
return 0;
#endif
return -1;
}
return 1;
}

And on stdlib/arc4random.c:

void
__arc4random_buf (void *p, size_t n)
{
if (n == 0)
return;

int s = getentropy_arch (p, n);
if (s > 0)
return;
if (s < 0)
arc4random_getrandom_failure ()

/* Fallback. */
}

>
>>> + l = __getrandom_nocancel (p, n, 0);
>>
>> Do we need to worry about a potentially uncancellable blocking call here? I guess
>> using GRND_NONBLOCK does not really help.
>
> No, generally not. Also, keep in mind that getrandom(0) will trigger
> jitter entropy if the kernel isn't already initialized.

Maybe add a comment stating it.

>
>>
>>> + if (l > 0)
>>> + {
>>> + if ((size_t) l == n)
>>
>> Do we need the cast here?
>
> Generally it's frowned upon to have implicit signed conversion, right? l
> is signed while n is unsigned.

Good question, I don't think we enforce it in fact.

>
>>
>>> + return; /* Done reading, success. */
>>
>> Minor style issue: use double space before period.
>
> I was really confused by this, and then opened up some other files and
> saw you meant *after* period. :) Will do for v4.

Yeah, I meant after indeed.

>
>> As Florian said we will need a non cancellable poll here. Since you are setting
>> the timeout as undefined, I think it would be simple to just add a non cancellable
>> wrapper as:
>>
>> int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
>> {
>> #ifndef __NR_ppoll_time64
>> # define __NR_ppoll_time64 __NR_ppoll
>> #endif
>> return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
>> }
>>
>> So we don't need to handle the timeout for 64-bit time_t wrappers.
>
> Oh that sounds like a good solution to the time64 situation. I'll do
> that for v4... BUT, I already implemented possibly the wrong solution
> for v3. Could you take a look at what I did there and confirm that it's
> wrong? If so, then I'll do exactly what you suggested here.
>
> Thanks again for the review,
> Jason

2022-07-26 12:57:16

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hi Adhemerval,

On Tue, Jul 26, 2022 at 09:34:57AM -0300, Adhemerval Zanella Netto wrote:
> kernel newer than 3.17) it means some syscall filtering, and I am not sure
> we should need to actually handle it.

One thing to keep in mind is that people who use CUSE-based /dev/urandom
implementations might not like this, as it means they'd also have to
intercept getrandom() rather than just ENOSYS'ing it. But maybe that's
fine. I don't know of anyone actually doing this in the real world at
the moment.

Jason

2022-07-26 13:13:39

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety



On 26/07/22 09:47, Jason A. Donenfeld wrote:
> Hi Adhemerval,
>
> On Tue, Jul 26, 2022 at 09:34:57AM -0300, Adhemerval Zanella Netto wrote:
>> kernel newer than 3.17) it means some syscall filtering, and I am not sure
>> we should need to actually handle it.
>
> One thing to keep in mind is that people who use CUSE-based /dev/urandom
> implementations might not like this, as it means they'd also have to
> intercept getrandom() rather than just ENOSYS'ing it. But maybe that's
> fine. I don't know of anyone actually doing this in the real world at
> the moment.
>

I think it is a fair assumption that if you trying to implement your own
character device in userland, we should know the implications for the
environment. From glibc standpoint, and I would for this whole thread,
we should assume that getrandom is de-facto API for entropy.

2022-07-26 14:39:46

by Yann Droneaud

[permalink] [raw]
Subject: Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?)

Hi,

Le 25/07/2022 à 14:39, Florian Weimer a écrit :

> * Jason A. Donenfeld via Libc-alpha:

>> (After all, I didn't see any wild-n-crazy fallback
>> to AT_RANDOM like what systemd does with random-util.c:
>> https://github.com/systemd/systemd/blob/main/src/basic/random-util.c )
> I had some patches with AT_RANDOM fallback, including overwriting
> AT_RANDOM with output from the seeded PRNG. It's certainly messy. I
> probably didn't bother to post these patches given how bizarre the whole
> thing was.


It's not that bizarre as I have some patches too: I tried to harden the
way stack_chk_guard and pointer_chk_guard were computed.
Those values are currently generated from slices of AT_RANDOM by the loader.


But I've seen in the wild program reusing AT_RANDOM, thus possibily
leaking stack_chk_guard and pointer_chk_guard values.


Having a proper (CS)PRNG in the loader, initialized from AT_RANDOM, that
overwrites AT_RANDOM (with fresh entropy if possible) after
initialization, would improve programs abusing AT_RANDOM purpose.


Regards.


--

Yann Droneaud

OPTEYA


2022-07-26 14:43:23

by Yann Droneaud

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

Hi,

Le 25/07/2022 à 14:39, Florian Weimer a écrit :
> * Jason A. Donenfeld via Libc-alpha:
>>> The performance numbers suggest that we benefit from buffering in user
>>> space.
>> The question is whether it's safe and advisable to buffer this way in
>> userspace. Does userspace have the right information now of when to
>> discard the buffer and get a new one? I suspect it does not.
> Not completely, no, but we can cover many cases. I do not currently see
> a way around that if we want to promote arc4random_uniform(limit) as a
> replacement for random() % limit.

+1

That the reason I've reviewed the implementation positively: for me
arc4random is not about generating secret keys but small integers.
I want to be able to divert developers from
    srand(time(NULL))
    identifier = rand() % 33
to
    identifier = arc4random_uniform(33)

Safe, fast, and reasonably secure.


Regards.


--
Yann Droneaud
OPTEYA


2022-07-26 16:52:12

by Mark Harris

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Jason A. Donenfeld wrote:
> On Mon, Jul 25, 2022 at 06:10:06PM -0700, Mark Harris wrote:
> > Jason A. Donenfeld wrote:
> > > + l = __getrandom_nocancel (p, n, 0);
> > > + if (l > 0)
> > > + {
> > > + if ((size_t) l == n)
> > > + return; /* Done reading, success. */
> > > + p = (uint8_t *) p + l;
> > > + n -= l;
> > > + continue; /* Interrupted by a signal; keep going. */
> > > + }
> > > + else if (l == 0)
> > > + arc4random_getrandom_failure (); /* Weird, should never happen. */
> > > + else if (errno == ENOSYS)
> > > + {
> > > + have_getrandom = false;
> > > + break; /* No syscall, so fallback to /dev/urandom. */
> > > + }
> > > + arc4random_getrandom_failure (); /* Unknown error, should never happen. */
> >
> > Isn't EINTR also possible? Aborting in that case does not seem reasonable.
>
> Not in current kernels, where it always returns at least PAGE_SIZE bytes
> before checking for pending signals. In older kernels, if there was a
> signal pending at the top, it would do no work and return -ERESTARTSYS,
> which I believe should then get restarted by glibc's syscaller? I might
> be wrong about how restarts work though, so if you know better, please
> let me know. TEMP_FAILURE_RETRY relies on errno, so that's not what we
> want. I guess I can just add a case for it.
>
> > Also the __getrandom_nocancel function does not set errno on Linux; it
> > just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags).
> > So unless that is changed, it doesn't look like this ENOSYS check will
> > detect old Linux kernels.
>
> Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
> as a return value, right? I'll adjust the code to account for that.

Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it
gets from the Linux kernel, but only on Linux does
__getrandom_nocancel use that. The Hurd and generic implementations
set errno on error. Previously the only call to this function did not
care about the specific error value so it didn't matter. Since you
are now using the error value in generic code, __getrandom_nocancel
should be changed on Linux to set errno like most other _nocancel
calls, and then it should go back to checking errno here.

And as Adhemerval mentioned, you only added a Linux implementation of
__ppoll_infinity_nocancel, but are calling it from generic code.

Also, by the way your patches cc'd directly to me get quarantined
because DKIM signature verification failed. The non-patch messages
pass DKIM and are fine.



- Mark

2022-07-26 18:53:02

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

Hi Mark,

On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote:
> > Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
> > as a return value, right? I'll adjust the code to account for that.
>
> Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it
> gets from the Linux kernel, but only on Linux does
> __getrandom_nocancel use that. The Hurd and generic implementations
> set errno on error. Previously the only call to this function did not
> care about the specific error value so it didn't matter. Since you
> are now using the error value in generic code, __getrandom_nocancel
> should be changed on Linux to set errno like most other _nocancel
> calls, and then it should go back to checking errno here.
>
> And as Adhemerval mentioned, you only added a Linux implementation of
> __ppoll_infinity_nocancel, but are calling it from generic code.

Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets
errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR.

> Also, by the way your patches cc'd directly to me get quarantined
> because DKIM signature verification failed. The non-patch messages
> pass DKIM and are fine.

That sure is odd. The emails are all going through the MTA. rspamd bug?
OpenSMTPD bug? Hmm...

Jason

2022-07-26 19:33:39

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] arc4random: simplify design for better safety

On Tue, Jul 26, 2022 at 08:42:51PM +0200, Jason A. Donenfeld wrote:
> Hi Mark,
>
> On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote:
> > > Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
> > > as a return value, right? I'll adjust the code to account for that.
> >
> > Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it
> > gets from the Linux kernel, but only on Linux does
> > __getrandom_nocancel use that. The Hurd and generic implementations
> > set errno on error. Previously the only call to this function did not
> > care about the specific error value so it didn't matter. Since you
> > are now using the error value in generic code, __getrandom_nocancel
> > should be changed on Linux to set errno like most other _nocancel
> > calls, and then it should go back to checking errno here.
> >
> > And as Adhemerval mentioned, you only added a Linux implementation of
> > __ppoll_infinity_nocancel, but are calling it from generic code.
>
> Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets
> errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR.
>
> > Also, by the way your patches cc'd directly to me get quarantined
> > because DKIM signature verification failed. The non-patch messages
> > pass DKIM and are fine.
>
> That sure is odd. The emails are all going through the MTA. rspamd bug?
> OpenSMTPD bug? Hmm...

It's because LICENSE has a ^L in it, which I guess doesn't go over well
with OpenSMPTD or rspamd or kernel.org's smtp server or some combination
thereof...

I just posted v5, by the way, in case it's in your spam folder.

Jason

2022-07-27 12:56:24

by Florian Weimer

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

* Theodore Ts'o:

> But even if you didn't take the latest kernels, I think you will find
> that if you actually benchmark how many queries per second a real-life
> secure web server or VPN gateway, even the original 5.15.0 /dev/random
> driver was plenty fast enough for real world cryptographic use cases.

The idea is to that arc4random() is suitable in pretty much all places
that have historically used random() (outside of deterministic
simulations). Straight calls to getrandom are much, much slower than
random(), and it's not even the system call overhead.

Thanks,
Florian

2022-07-27 20:32:46

by Theodore Ts'o

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote:
> * Theodore Ts'o:
>
> > But even if you didn't take the latest kernels, I think you will find
> > that if you actually benchmark how many queries per second a real-life
> > secure web server or VPN gateway, even the original 5.15.0 /dev/random
> > driver was plenty fast enough for real world cryptographic use cases.
>
> The idea is to that arc4random() is suitable in pretty much all places
> that have historically used random() (outside of deterministic
> simulations). Straight calls to getrandom are much, much slower than
> random(), and it's not even the system call overhead.

What are those places? And what are their performance and security
requirements? I've heard some people claim that arc4random() is
supposed to provide strong security guarantees. I've heard others
claim that it doesn't, or at least glibc was planning on disclaiming
security guaranteees. So there seems to be a lack of clarity about
the security requirements.

What about the performance requirements? Designing an interface where
the requirement "as fast as possible" is often not a great pathway to
success, because the reality is that engineering is always about
tradeoffs.

If there are no security requirements (given the claim that some
people want to put in the documentation disclaiming that arc4random
might not be secure), why not just have people continue to use
random(3)?

- Ted

2022-07-27 22:03:35

by Rich Felker

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Wed, Jul 27, 2022 at 04:15:24PM -0400, Theodore Ts'o via Libc-alpha wrote:
> On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote:
> > * Theodore Ts'o:
> >
> > > But even if you didn't take the latest kernels, I think you will find
> > > that if you actually benchmark how many queries per second a real-life
> > > secure web server or VPN gateway, even the original 5.15.0 /dev/random
> > > driver was plenty fast enough for real world cryptographic use cases.
> >
> > The idea is to that arc4random() is suitable in pretty much all places
> > that have historically used random() (outside of deterministic
> > simulations). Straight calls to getrandom are much, much slower than
> > random(), and it's not even the system call overhead.
>
> What are those places? And what are their performance and security
> requirements? I've heard some people claim that arc4random() is
> supposed to provide strong security guarantees. I've heard others
> claim that it doesn't, or at least glibc was planning on disclaiming
> security guaranteees. So there seems to be a lack of clarity about
> the security requirements.

The only place I've heard of a viable "soft requirement" for real
entropy is for salting the hash function used in hash table maps to
harden them against DoS via intentional collisions. This is a small
but arguably legitimate usage domain. Most use of random() is not
this, and should not be this -- the value of deterministic execution
for ability to reproduce crashes, debug, etc. is real, and the value
of actual entropy vs a deterministic-seeded prng is imaginary.

The purpose of arc4random has always been *cryptographically secure*
entropy, not "gratuitously replace random() and break reproducible
behavior because the programmer does not understand the difference".
Nobody should be advocating for using these functions for anything
except secure secrets.

Rich

2022-07-28 00:53:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Wed, Jul 27, 2022 at 05:59:49PM -0400, Rich Felker wrote:
> The only place I've heard of a viable "soft requirement" for real
> entropy is for salting the hash function used in hash table maps to
> harden them against DoS via intentional collisions. This is a small
> but arguably legitimate usage domain.

OK, so this is an issue that both Perl and Python have had to deal
with, as described here: https://lwn.net/Articles/474912/

Is that fair description of the use case which you are describing?
Because if it is, in the worst case, we only need a single random
value for every http request made to the server. Would you agree with
that?

I think you'll find that even the original getrandom(2) system call or
fetching a random value from /dev/urandom was plenty fast enough for
this particular use case. If you're on some slow, ancient CPU, the
webserver isn't going to be able to handle that many queries per
second. And if you're on a fast CPU, the original /dev/urandom and/or
getrandom(2) system call would be plenty fast enough.

This is why both Jason and I have been trying to push people to
clearly articular a specific use case and the attendant performance
requirement, so we can test the hypothesis regarding how critical it
is to have an userspace cryptographically secure RNG, with all of the
attendant opportunities for security vulnerabilities in the face of VM
snapshots, or VM's getting duplicated with a pre-spun execution image,
etc., etc.

Cheers,

- Ted

2022-07-28 00:56:14

by Cristian Rodríguez

[permalink] [raw]
Subject: Re: arc4random - are you sure we want these?

On Wed, Jul 27, 2022 at 4:15 PM Theodore Ts'o via Libc-alpha
<[email protected]> wrote:
>
> On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote:
> > * Theodore Ts'o:
> >
> > > But even if you didn't take the latest kernels, I think you will find
> > > that if you actually benchmark how many queries per second a real-life
> > > secure web server or VPN gateway, even the original 5.15.0 /dev/random
> > > driver was plenty fast enough for real world cryptographic use cases.
> >
> > The idea is to that arc4random() is suitable in pretty much all places
> > that have historically used random() (outside of deterministic
> > simulations). Straight calls to getrandom are much, much slower than
> > random(), and it's not even the system call overhead.
>
> What are those places?

Well pretty much everywhere a shared library is involved from the start..
On one very basic vm here there are 18 shared libraries using srandom,
thus perturbing each other states if loaded by the same process,
possibly in a catastrophic/predictable way.
and nobody uses the random_r interfaces.


> And what are their performance and security
> requirements?

Common programmers know nothing about this, even seasoned ones don't..
if it runs slow or is not CSPRNG then the average app will
use one userspace PRNG or CSPRNG or buffer from the kernel somewhere..
I do not have to justify this assertion..it is just a matter you
download libgcrypt, gnutls, openssl none of those libraries use the
kernel entropy
as the first option, all feed them to either proven or dubious s RNGs
schemes and then pass that to users.
Think on why that is and why we are discussing yet another interface
in the first place..