LinuxLists.cc - Stop breaking the CSRNG

2019-10-02 17:04:53

Subject: Stop breaking the CSRNG

Hi,

As OpenSSL, we want cryptograhic secure random numbers. Before
getrandom(), Linux never provided a good API for that, both
/dev/random and /dev/urandom have problems. getrandom() fixed
that, so we switched to it were available.

It was possible to combine /dev/random and /dev/urandom, and get
something that worked properly. You could call select() on
/dev/random and know that both were initialized when it returned.
But then select() started returning before /dev/random was
initialized, so that if you switch to /dev/urnadom, it's still
uninitialized.

A solution for that was that you could instead read 1 byte from
/dev/random, and then switch to /dev/urandom. But that also stopped
working, /dev/urandom can still be uninitialized when you can read from
/dev/random. So there no longer is a way to wait for /dev/urandom
to be initialized.

As a result of that, we now refuse to use /dev/urandom on recent
kernels, and require to use of getrandom(). (To make this work with
older userspace, this means we need to import all the different
__NR_getrandom defines, and do the system call ourself.)

But it seems people are now thinking about breaking getrandom() too,
to let it return data when it's not initialized by default. Please
don't.

If you think such a mode is useful for some applications, let them set
a flag, instead of the reverse.

Kurt

2019-10-03 04:11:50

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Stop breaking the CSRNG

On Wed, Oct 02, 2019 at 06:55:33PM +0200, Kurt Roeckx wrote:
>
> But it seems people are now thinking about breaking getrandom() too,
> to let it return data when it's not initialized by default. Please
> don't.

"It's complicated"

The problem is that whether a CRNG can be considered secure is a
property of the entire system, including the hardware, and given the
large number of hardware configurations which the kernel and OpenSSL
can be used, in practice, we can't assure that getrandom(2) is
"secure" without making certain assumptions. For example, if we
assume that the CPU is an x86 processor new enough to support RDRAND,
and that RDRAND is competently implemented (e.g., it won't disappear
after a suspend/resume) and doesn't have any backdoors implanted in
it, then it's easy to say that getrandom() will always be secure.

But if you assume that there is no hardware random number generator,
and everything is driven from a single master oscillator, with no
exernal input, and the CPU is utterly simple, with speculation or
anything else that might be non-determinstic, AND if we assume that
the idiots who make an IOT device use the same random seed across
millions of devices all cloned off of the same master imagine, there
is ***absoutely*** nothing the kernel can do to guarantee, with 100%
certainty, that the CRNG will be initialzied. (This is especially
true if the idiots who design the IOT device call OpenSSL to generate
their long-term private key the moment the device is first plugged in,
before any networking device is brought on-line.)

The point with all of this is that both the kernel and OpenSSL, and
whether or not they can be combined to create a secure overall
solution is going to be dependent on the hardware choices, and choices
of the distribution and the application programmers in terms of what
other software components are used, and when and where those
components try to request random numbers, especially super-early in
the boot process.

Historically, I've tried to work around this problem by being super
paranoid about the choices of thresholds before declaring the CRNG to
be initialized, while *also* making sure that at least on most common
x86 systems, the CRNG could be considered initialized before the root
file system was mounted read/write.

But over time, assumptions of what is common hardware changes. SSD's
replace HDD's; NAPI and other polling techniques are more common to
reduce the number of interrupts; the use of a single master oscillator
to drive the all of the various clocks on the system, etc. And
software changes --- systemd running boot scripts in parallel means
that boot times are reduced, which is good, but it also means the time
to when the root is mounted read/write is much shortened.

So in the absence of a hardware RNG, or a hardware random number
generator which is considered trusted (i.e., should RDRAND
beconsidered trusted?), there *will* be times when we will simply fail
to be able to generate secure random numbers (at least by our
hueristics, which can potentially be overly optimistic on some
hardware platforms, and overly conservative on others).

The question is then, what do we do? Do we hang the boot --- at which
point users will complain to Linus? Or do we just hope that things
are "good enough", and that even if the user has elected to say that
they don't trust RDRAND, that we'll hope it's competently implement
and not backdoored? Or do we assume that using a jitter entropy
scheme is actually secure, as opposed to security through obscurity
(and maybe is completely pointless on a simple and completely open
architecture with no speculation such as RISC-V)?

There really are no good choices here. The one thing which Linus has
made very clear is that hanging at boot is Not Acceptable. Long term,
the best we can do is to through the kitchen sink at the problem. So
we should try to use UEFI's RNG if available; use the TPM's RNG if
available; use RDRAND if available; try to use a seed file if
available (and hope it's not cloned to be identical on a million IOT
devices); and so on. Hopefully, they won't *all* incompetently
implemented and/or implanted with a backdoor from the NSA or MSS or
the KGB.

The only words of hope that I can give you is that it's likely that
there are so many zero day bugs in the kernel, in userspace
applications, and crypto libraries (including maybe OpenSSL), that we
don't have to make the CRNG impossible to attack in order to make a
difference. We just have to make it harder than finding and
exploiting zero day security bugs in *other* parts of the system.

"When a mountain bear is chasing after you, you don’t have to
outrun the bear. You only have to outrun the person running next to
you." :-)

Bottom line, we can do the best we can with each of our various
components, but without control over the hardware that will be in use,
or for OpenSSL, what applications are trying to call OpenSSL for, and
when they might try to generate long-term public keys during the first
boot, perfection is always going to be impossible to achieve. The
only thing we can choose is how do we handle failure.

And Linus has laid down the law that a performance improving commit
should never cause boot-ups to hang due to the lack of randomness.
Given that I can't control when some application might try to call
OpenSSL to generate a long-term public key, and OpenSSL certainly
can't control if it gets called during early boot, if getrandom(2)
ever boots, we can't meet Linus's demand.

And given that many users are just installing some kind of userspace
jitter entropy to square this particular circle, even though I don't
trust a jitter entropy scheme, even if it is insecure, we're also
using RDRAND, and ultimately I'll trust RDRAND more than I trust a
jitter entropy scheme. And that's where we are right now. Linus has
introduced a simple in-kernel jitter entropy system so getrandom(2)
will never boot. Is it secure? Who can say? I have my doubts on
RISC-V, but I don't use a RISC-V, and hopefully this will be a spur to
encourage all RISC-V implementations to include the cryptographic
extensions which include a RDRAND-like hardware random number
generator into ISA. And since all of *my* x86 systems have RDRAND,
I'm at least personally comfortable enough with where we've landed.
Your mileage may vary.

Regards,

- Ted

2019-10-03 10:16:59

by David Laight

[permalink] [raw]

Subject: RE: Stop breaking the CSRNG

From: Kurt Roeckx
> Sent: 02 October 2019 17:56
> As OpenSSL, we want cryptograhic secure random numbers. Before
> getrandom(), Linux never provided a good API for that, both
> /dev/random and /dev/urandom have problems. getrandom() fixed
> that, so we switched to it were available.

The fundamental problem is that you can't always get ' cryptograhic secure
random numbers'. No API changes are ever going to change that.

The system can either return an error or sleep (possibly indefinitely)
until some 'reasonably random' numbers are available.

A RISC-V system running on an FGPA (I've only used Altera NIOS cpu)
may have absolutely no sources of randomness at boot time.
Saying the architecture must include a random number instruction
doesn't help!
Generating random bits inside the FPGA is somewhere between 'difficult'
and impossible (forcing metastability between clock domains might work).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-10-03 11:53:11

by Adam Borowski

[permalink] [raw]

Subject: Re: Stop breaking the CSRNG

On Thu, Oct 03, 2019 at 10:13:39AM +0000, David Laight wrote:
> From: Kurt Roeckx
> > Sent: 02 October 2019 17:56
> > As OpenSSL, we want cryptograhic secure random numbers. Before
> > getrandom(), Linux never provided a good API for that, both
> > /dev/random and /dev/urandom have problems. getrandom() fixed
> > that, so we switched to it were available.
>
> The fundamental problem is that you can't always get ' cryptograhic secure
> random numbers'. No API changes are ever going to change that.
>
> The system can either return an error or sleep (possibly indefinitely)
> until some 'reasonably random' numbers are available.
>
> A RISC-V system running on an FGPA (I've only used Altera NIOS cpu)
> may have absolutely no sources of randomness at boot time.

I'd say this is a hardware security vulnerability; no different from eg.
having no or faulty MMU, speculation that allows exfiltrating data, etc.
We did not understand the seriousness of lacking hardware sources of
randomness, but that's a common thing to many other security
vulnerabilities.

Machines that lack any sources of entropy have their uses, but they're akin
to processors with no MMU. You should never run a world-accessible ssh
daemon on either of them.

> Saying the architecture must include a random number instruction
> doesn't help!

It won't fix existing systems, and is irrelevant to deeply embedded, but
communicating this requirement to SoC designers sounds like a good idea to
me. IoTrash appliance makers won't care but their security is already so
atrocious that lack of entropy is nowhere near the easiest way to get in,
while anyone else will at least notice the warning.

Any real-silicon hardware can include an entropy source, and if it doesn't,
shaming the maker is the way to go. Calling the problem a security
vulnerability (which I say it is) sends a stronger message.

Meow!
--
⢀⣴⠾⠻⢶⣦⠀ A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol,
⣾⠁⢠⠒⠀⣿⡁ 1kg raspberries, 0.4kg sugar; put into a big jar for 1 month.
⢿⡄⠘⠷⠚⠋⠀ Filter out and throw away the fruits (can dump them into a cake,
⠈⠳⣄⠀⠀⠀⠀ etc), let the drink age at least 3-6 months.

2019-10-03 21:19:33

by Kurt Roeckx

[permalink] [raw]

Subject: Re: Stop breaking the CSRNG

On Wed, Oct 02, 2019 at 11:36:55PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Oct 02, 2019 at 06:55:33PM +0200, Kurt Roeckx wrote:
> >
> > But it seems people are now thinking about breaking getrandom() too,
> > to let it return data when it's not initialized by default. Please
> > don't.
>
> "It's complicated"
>
> The problem is that whether a CRNG can be considered secure is a
> property of the entire system, including the hardware, and given the
> large number of hardware configurations which the kernel and OpenSSL
> can be used, in practice, we can't assure that getrandom(2) is
> "secure" without making certain assumptions.

I'm not saying it's easy. But getrandom() is documented as only
returning data after it has been initialized, which is an
important property of that interface and the main reason to switch
to it. And it seems that because someone's laptop hung during boot
because it doesn't find enough entrpoy is enough to break the
security of the rest. It seems that the only important thing is
that applications don't stop working, because it's clearly visible
that it's not working. Returning data before it's been initialized
doesn't have the effect of being visibly broken, but it's just as
broken, which is in my opinion worse.

> But if you assume that there is no hardware random number generator,
> and everything is driven from a single master oscillator, with no
> exernal input, and the CPU is utterly simple, with speculation or
> anything else that might be non-determinstic, AND if we assume that
> the idiots who make an IOT device use the same random seed across
> millions of devices all cloned off of the same master imagine, there
> is ***absoutely*** nothing the kernel can do to guarantee, with 100%
> certainty, that the CRNG will be initialzied. (This is especially
> true if the idiots who design the IOT device call OpenSSL to generate
> their long-term private key the moment the device is first plugged in,
> before any networking device is brought on-line.)

And returning data before it's been initialized will only make
that situtation worse. We can only hope that by refusing to return
data the idiot will properly fix it.

If the hardware can't provide it, the kernel shouldn't just pretend
the hardware did provide it.

> There really are no good choices here. The one thing which Linus has
> made very clear is that hanging at boot is Not Acceptable.

And I think it's not a kernel problem but a combination of
hardware, configuration and user space problem. The kernel can of
course be improved, and I'm sure it will.

I wonder if it's useful to extend getrandom() to provide an option
where the application can indicate it doesn't care about security
and just wants some number, like what /dev/urandom provides but
then as a system call. Other options could be that you're happy
with to get data after got an estimated 64 bit of entropy.

> And given that many users are just installing some kind of userspace
> jitter entropy to square this particular circle, even though I don't
> trust a jitter entropy scheme, even if it is insecure, we're also
> using RDRAND, and ultimately I'll trust RDRAND more than I trust a
> jitter entropy scheme. And that's where we are right now. Linus has
> introduced a simple in-kernel jitter entropy system

I don't trust it much either. And I think we should at least try
to estimate how much entropy it actually provides on various
systems, knowing that there will probably be systems where it
provides much less than what we think it does.

I'm willing to help analyze data if people can provide a list
of TSCs that are being added. The more samples the better. I think
you want to do this on an idle system.

Kurt

2019-10-06 12:20:20

by Pavel Machek

[permalink] [raw]

Subject: Re: Stop breaking the CSRNG

On Wed 2019-10-02 23:36:55, Theodore Y. Ts'o wrote:
> On Wed, Oct 02, 2019 at 06:55:33PM +0200, Kurt Roeckx wrote:
> >
> > But it seems people are now thinking about breaking getrandom() too,
> > to let it return data when it's not initialized by default. Please
> > don't.
>
> "It's complicated"
>
> The problem is that whether a CRNG can be considered secure is a
> property of the entire system, including the hardware, and given the
> large number of hardware configurations which the kernel and OpenSSL
> can be used, in practice, we can't assure that getrandom(2) is
> "secure" without making certain assumptions. For example, if we
> assume that the CPU is an x86 processor new enough to support RDRAND,
> and that RDRAND is competently implemented (e.g., it won't disappear
> after a suspend/resume) and doesn't have any backdoors implanted in
> it, then it's easy to say that getrandom() will always be secure.

Actually... if we have buggy AMD CPU with broken RDRAND, we should
still be able to get enough entropy during boot so that getrandom() is
cryptographically secure.

I don't think we get that right at the moment.

> Bottom line, we can do the best we can with each of our various
> components, but without control over the hardware that will be in use,
> or for OpenSSL, what applications are trying to call OpenSSL for, and
> when they might try to generate long-term public keys during the first
> boot, perfection is always going to be impossible to achieve. The
> only thing we can choose is how do we handle failure.
>
> And Linus has laid down the law that a performance improving commit
> should never cause boot-ups to hang due to the lack of randomness.
> Given that I can't control when some application might try to call
> OpenSSL to generate a long-term public key, and OpenSSL certainly
> can't control if it gets called during early boot, if getrandom(2)
> ever boots, we can't meet Linus's demand.

You can. You can just access disk while the userpsace is blocked on
getrandom. ("find /").

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (2.19 kB)
signature.asc (188.00 B)
Digital signature Download all attachments