Hi,
On Sun, Sep 08, 2019 at 01:59:27PM -0700, Linus Torvalds wrote:
> So we probably didn't strictly need an rc8 this release, but with LPC
> and the KS conference travel this upcoming week it just makes
> everything easier.
>
The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
which was merged in v5.3-rc1, *always* leads to a blocked boot on my
system due to low entropy.
The hardware is not a VM: it's a Thinkpad E480 (i5-8250U CPU), with
a standard Arch user-space.
It was discovered through bisecting the problem v5.2 => v5.3-rc1,
since v5.2 never had any similar issues. The issue still persists in
v5.3-rc8: reverting that commit always fixes the problem.
It seems that batching the directory lookup I/O requests (which are
possibly a lot during boot) is minimizing sources of disk-activity-
induced entropy? [2] [3]
Can this even be considered a user-space breakage? I'm honestly not
sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
early-on fixes the problem. I'm not sure about the status of older
CPUs though.
Thanks,
[1]
commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7
Author: zhangjs <[email protected]>
Date: Wed Jun 19 23:41:29 2019 -0400
ext4: make __ext4_get_inode_loc plug
Add a blk_plug to prevent the inode table readahead from being
submitted as small I/O requests.
Signed-off-by: zhangjs <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
[2] https://lkml.kernel.org/r/[email protected]
[3] block/blk-core.c :: blk_start_plug()
--
darwi
http://darwish.chasingpointers.com
On Tue, Sep 10, 2019 at 5:21 AM Ahmed S. Darwish <[email protected]> wrote:
>
> The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
> which was merged in v5.3-rc1, *always* leads to a blocked boot on my
> system due to low entropy.
Exactly what is it that blocks on entropy? Nobody should do that
during boot, because on some systems entropy is really really low
(think flash memory with polling IO etc).
That said, I would have expected that any PC gets plenty of entropy.
Are you sure it's entropy that is blocking, and not perhaps some odd
"forgot to unplug" situation?
> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.
It's definitely breakage, although rather odd. I would have expected
us to have other sources of entropy than just the disk. Did we stop
doing low bits of TSC from timer interrupts etc?
Ted, either way - ext4 IO patterns or random number entropy - this is
your code. Comments?
Linus
On Tue, Sep 10, 2019 at 06:21:07AM +0200, Ahmed S. Darwish wrote:
>
> The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
> which was merged in v5.3-rc1, *always* leads to a blocked boot on my
> system due to low entropy.
>
> The hardware is not a VM: it's a Thinkpad E480 (i5-8250U CPU), with
> a standard Arch user-space.
Hmm, I'm not seeing this on a Dell XPS 13 (model 9380) using a Debian
Bullseye (Testing) running a rc4+ kernel.
This could be because Debian is simply doing more I/O; or it could be
because I don't have some package installed which is trying to reading
from /dev/random or calling getrandom(2). Previously, Fedora ran into
blocking issues because of some FIPS compliance patches to some
userspace daemons. So it's going to be very user space dependent and
package dependent.
> It seems that batching the directory lookup I/O requests (which are
> possibly a lot during boot) is minimizing sources of disk-activity-
> induced entropy? [2] [3]
>
> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.
You can probably also fix this problem by adding random.trust_cpu=true
to the boot command line, or by enabling CONFIG_RANDOM_TRUST_CPU.
This obviously assumes that you trust Intel's implementation of
RDRAND, but that's true regardless of whether of whether you use rngd
or the kernel config option.
As far as whether it's considered user-space breakage; that's though.
File system performance improvements can cause a reduced amount of
I/O, and that can cause less entropy to be collected, and depending on
a complex combination of kernel config options, distribution-specific
patches, and what packages are loaded, that could potentially cause
boot hangs waiting for entropy. Does that we we're can't make any
file system performace improvements? Surely that doesn't seem like
the right answer.
It would be useful to figure out what process is blocking waiting on
entropy, since in general, trying to rely on cryptographic entropy in
early boot, especially if it is to generate cryptographic keys, is
going to be more dangerous compared to a "just in time" approach to
generating crypto keys. So this could also be considered a userspace
bug, depending on your point of view...
- Ted
On Tue, Sep 10, 2019 at 12:33 PM Linus Torvalds
<[email protected]> wrote:
>
> Are you sure it's entropy that is blocking, and not perhaps some odd
> "forgot to unplug" situation?
Looking at that code, it's all trivial, and it definitely unplugs properly.
Lack of entropy still sounds _very_ strange, and you . Are you doing
something odd at boot?
Does the boot continue if you press keys on the keyboard, or how did
you decide it was about entropy?
I guess sysrq-'t' followed by enough keyboard input to unblock the
boot process should give you something in dmesg that shows what is
blocked?
Linus
On Tue, Sep 10, 2019 at 12:33:12PM +0100, Linus Torvalds wrote:
> On Tue, Sep 10, 2019 at 5:21 AM Ahmed S. Darwish <[email protected]> wrote:
> >
> > The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
> > which was merged in v5.3-rc1, *always* leads to a blocked boot on my
> > system due to low entropy.
>
> Exactly what is it that blocks on entropy? Nobody should do that
> during boot, because on some systems entropy is really really low
> (think flash memory with polling IO etc).
>
Ok, I've tracked it down further. It's unfortunately GDM
intentionally blocking on a getrandom(buf, 16, 0).
Booting the system with an straced GDM service
("ExecStart=strace -f /usr/bin/gdm") reveals:
...
[ 3.779375] strace[262]: [pid 323] execve("/usr/lib/gnome-session-binary",
... /* 28 vars */) = 0
...
[ 4.019227] strace[262]: [pid 323] getrandom( <unfinished ...>
[ 79.601433] kernel: random: crng init done
[ 79.601443] kernel: random: 3 urandom warning(s) missed due to ratelimiting
[ 79.601262] strace[262]: [pid 323] <... getrandom resumed>..., 16, 0) = 16
[ 79.601262] strace[262]: [pid 323] getrandom(..., 16, 0) = 16
[ 79.603041] strace[262]: [pid 323] getrandom(..., 16, 0) = 16
[ 79.603041] strace[262]: [pid 323] getrandom(..., 16, 0) = 16
[ 79.603041] strace[262]: [pid 323] getrandom(..., 16, 0) = 16
As can be seen in the timestamps, the GDM boot was only continued
by typing randomly on the keyboard..
> That said, I would have expected that any PC gets plenty of entropy.
> Are you sure it's entropy that is blocking, and not perhaps some odd
> "forgot to unplug" situation?
>
Yes, doing any of below steps makes the problem reliably disappear:
- boot param "random.trust_cpu=on"
- rngd(8) enabled at boot (entropy source: x86 RDRAND + jitter)
- pressing random 3 or 4 keyboard keys while GDM boot is stuck
> > Can this even be considered a user-space breakage? I'm honestly not
> > sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> > early-on fixes the problem. I'm not sure about the status of older
> > CPUs though.
>
> It's definitely breakage, although rather odd. I would have expected
> us to have other sources of entropy than just the disk. Did we stop
> doing low bits of TSC from timer interrupts etc?
>
Exactly.
While gnome-session is obviously at fault here by requiring
*blocking* randomness at the boot path, it's still not requesting
much, just (5 * 16) bytes to be exact.
I guess an x86 laptop should be able to provide that, even without
RDRAND / random.trust_cpu=on (TSC jitter, etc.) ?
thanks,
--darwi
> Ted, either way - ext4 IO patterns or random number entropy - this is
> your code. Comments?
>
> Linus
Am 10.09.19 um 19:33 schrieb Ahmed S. Darwish:
> Yes, doing any of below steps makes the problem reliably disappear:
>
> - boot param "random.trust_cpu=on"
> - rngd(8) enabled at boot (entropy source: x86 RDRAND + jitter)
> - pressing random 3 or 4 keyboard keys while GDM boot is stuck
and on machines without or broken RDRAND (AMD) and nobody near the
keyboard to play some song on it?
On Tue, Sep 10, 2019 at 6:33 PM Ahmed S. Darwish <[email protected]> wrote:
>
> While gnome-session is obviously at fault here by requiring
> *blocking* randomness at the boot path, it's still not requesting
> much, just (5 * 16) bytes to be exact.
>
> I guess an x86 laptop should be able to provide that, even without
> RDRAND / random.trust_cpu=on (TSC jitter, etc.) ?
Yeah, the problem is partly because we can't trust "get_cycles()"
because not all architectures have it. So we use "jiffies" for the
entropy estimation, and my guess is that it just ends up estimating
you have little to no entropy from your disk IO.
So the timestamp counter value is added to the randomness pool, but
the jitter in the TSC values isn't then used to estimate the entropy
at all.
Just out of curiosity, what happens if you apply a patch like this
(intentionally whitespace-damaged, I don't want anybody to pick it up
without thinking about it) thing:
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..60709a7b4af1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1223,6 +1223,7 @@ static void add_timer_randomness(struct
timer_rand_state *state, unsigned $
* We take into account the first, second and third-order deltas
* in order to make our estimate.
*/
+ sample.jiffies += sample.cycles;
delta = sample.jiffies - state->last_time;
state->last_time = sample.jiffies;
which just makes the entropy estimation use the _sum_ of jiffies and
cycles as the base. On architectures that don't have a cycle counter,
it ends up being the same it used to be (just jiffies), and on
architectures that do have a timestamp counter the TSC differences
will overwhelm the jiffies differences, so you end up effectively
using the third-order TSC difference as the entropy estimation.
Which I think is what the code really wants - it's only using jiffies
because that is the only thing _guaranteed_ to change at all. But with
the sum, you get the best of both worlds, and should basically make
the entropy estimation use the "better of two counters".
Ted, comments? I'd hate to revert the ext4 thing just because it
happens to expose a bad thing in user space.
Linus
On Tue, Sep 10, 2019 at 07:21:54PM +0100, Linus Torvalds wrote:
> On Tue, Sep 10, 2019 at 6:33 PM Ahmed S. Darwish <[email protected]> wrote:
> >
> > While gnome-session is obviously at fault here by requiring
> > *blocking* randomness at the boot path, it's still not requesting
> > much, just (5 * 16) bytes to be exact.
It doesn't matter how much randomness it's requesting. With the new
cryptographic random number generator, the CRNG is either
initialized.... or it's not.
> Just out of curiosity, what happens if you apply a patch like this
> (intentionally whitespace-damaged, I don't want anybody to pick it up
> without thinking about it) thing...
> Which I think is what the code really wants - it's only using jiffies
> because that is the only thing _guaranteed_ to change at all. But with
> the sum, you get the best of both worlds, and should basically make
> the entropy estimation use the "better of two counters".
>
> Ted, comments? I'd hate to revert the ext4 thing just because it
> happens to expose a bad thing in user space.
Unfortuantely, I very much doubt this is going to work. That's
because the add_disk_randomness() path is only used for legacy
/dev/random (which actually only still exists because of some insane
PCI compliance issues which a number of end users really care about
--- or they care about because it makes the insane PCI complaince labs
go away).
Also, because by default, the vast majority of disks have
/sys/block/XXX/queue/add_random set to zero by default.
So the the way we get entropy these days for initializing the CRNG is
via the add_interrupt_randomness() path, where do something really
fast, and we assume that we get enough uncertainity from 8 interrupts
to give us one bit of entropy (64 interrupts to give us a byte of
entropy), and that we need 512 bits of entropy to consider the CRNG
fully initialized. (Yeah, there's a lot of conservatism in those
estimates, and so what we could do is decide to say, cut down the
number of bits needed to initialize the CRNG to be 256 bits, since
that's the size of the CHACHA20 cipher.)
Ultimately, though, we need to find *some* way to fix userspace's
assumptions that they can always get high quality entropy in early
boot, or we need to get over people's distrust of Intel and RDRAND.
Otherwise, future performance improvements in any part of the system
which reduces the number of interrupts is always going to potentially
result in somebody's misconfigured system or badly written
applications to fail to boot. :-(
- Ted
On Wed, Sep 11, 2019 at 5:07 PM Theodore Y. Ts'o <[email protected]> wrote:
> >
> > Ted, comments? I'd hate to revert the ext4 thing just because it
> > happens to expose a bad thing in user space.
>
> Unfortuantely, I very much doubt this is going to work. That's
> because the add_disk_randomness() path is only used for legacy
> /dev/random [...]
>
> Also, because by default, the vast majority of disks have
> /sys/block/XXX/queue/add_random set to zero by default.
Gaah. I was looking at the input randomness, since I thought that was
where the added randomness that Ahmed got things to work with came
from.
And that then made me just look at the legacy disk randomness (for the
obvious disk IO reasons) and I didn't look further.
> So the the way we get entropy these days for initializing the CRNG is
> via the add_interrupt_randomness() path, where do something really
> fast, and we assume that we get enough uncertainity from 8 interrupts
> to give us one bit of entropy (64 interrupts to give us a byte of
> entropy), and that we need 512 bits of entropy to consider the CRNG
> fully initialized. (Yeah, there's a lot of conservatism in those
> estimates, and so what we could do is decide to say, cut down the
> number of bits needed to initialize the CRNG to be 256 bits, since
> that's the size of the CHACHA20 cipher.)
So that's 4k interrupts if I counted right, and yeah, maybe Ahmed was
just close enough before, and the merging of the inode table IO then
took him below that limit.
> Ultimately, though, we need to find *some* way to fix userspace's
> assumptions that they can always get high quality entropy in early
> boot, or we need to get over people's distrust of Intel and RDRAND.
Well, even on a PC, sometimes rdrand just isn't there. AMD has screwed
it up a few times, and older Intel chips just don't have it.
So I'd be inclined to either lower the limit regardless - and perhaps
make the "user space asked for randomness much too early" be a big
*warning* instead of being a basically fatal hung machine?
Linus
On Wed, Sep 11, 2019 at 5:45 PM Linus Torvalds
<[email protected]> wrote:
>
> So I'd be inclined to either lower the limit regardless - and perhaps
> make the "user space asked for randomness much too early" be a big
> *warning* instead of being a basically fatal hung machine?
Hmm. Just testing - normally I run my laptop with TRUST_CPU enabled,
so I never see this any more, but warning (rather than waiting) is
what we still do for the kernel.
And I see
[ 0.231255] random: get_random_bytes called from
start_kernel+0x323/0x4f5 with crng_init=0
and that's this code:
add_latent_entropy();
add_device_randomness(command_line, strlen(command_line));
boot_init_stack_canary();
in particular, it's the boot_init_stack_canary() thing that asks for a
random number for the canary.
I don't actually see the 'crng init done' until much much later:
[ 21.741125] random: crng init done
but part of that may be that my early boot is slow due to having an
encrypted disk and so the bootup ends up waiting for me to type the
passphrase.
But this does show that
(a) we have the same issue in the kernel, and we don't block there
(b) initializing the crng really can be a timing problem
The interrupt thing is only going to get worse as disks turn into
ssd's and some of them end up using polling rather than interrupts..
So we're likely to see _fewer_ interrupts in the future, not more.
Linus
On Wed, Sep 11, 2019 at 06:00:19PM +0100, Linus Torvalds wrote:
> [ 0.231255] random: get_random_bytes called from
> start_kernel+0x323/0x4f5 with crng_init=0
>
> and that's this code:
>
> add_latent_entropy();
> add_device_randomness(command_line, strlen(command_line));
> boot_init_stack_canary();
>
> in particular, it's the boot_init_stack_canary() thing that asks for a
> random number for the canary.
>
> I don't actually see the 'crng init done' until much much later:
>
> [ 21.741125] random: crng init done
Yes, that's super early in the boot sequence. IIRC the stack canary
gets reinitialized later (or maybe it was only for the other CPU's in
SMP mode; I don't recall the details of the top of my head).
I think this one always fails, and perhaps we should have a way of
suppressing it --- but that's correct the in-kernel interface doesn't
block.
The /dev/urandom device doesn't block either, despite security
eggheads continually asking me to change it to block ala getrandom(2),
but I have always pushed because because I *know* changing
/dev/urandom to block would be asking for userspace regressions.
The compromise we came up with was that since getrandom(2) is a new
interface, we could make this have the behavior that the security
heads wanted, which is to make blocking unconditional, since the
theory was that *this* interface would be sane, and that userspace
applications which used it too early was buggy, and we could make it
*their* problem.
People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE,
or some such, which wouldn't block and would return "best efforts"
randomness. I haven't been super enthusiastic about such a flag
because I *know* it would be insecure. However, the next time a massive
security bug shows up on the front pages of the Wall Street Journal,
or on some web site such as https://factorable.net, it won't be the kernel's fault
since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such.
It doesn't really solve the problem, though.
> But this does show that
>
> (a) we have the same issue in the kernel, and we don't block there
Ultimately, I think the only right answer is to make it the
bootloader's responsibility to get us some decent entropy at boot
time. There are patches to allow ARM systems to pass in entropy via
the device tree. And in theory (assuming you trust the UEFI BIOS ---
stop laughing in the back!) we can use that get entropy which will
solve the problem for UEFI boot systems. I've been talking to Ron
Minnich about trying to get this support into the NERF bootloader, at
which point new servers from the Open Compute Project will have a
solution as well. (We can probably also get solutions for Chrome OS
devices, since those have TPM-like which are trusted to have a
comptently engineered hardware RNG --- I'm not sure I would trust all
TPM devices in commodity hardware, but again, at least we can shift
blame off of the kernel. :-P)
Still, these are all point solutions, and don't really solve the
problem on older systems, or non-x86 systems.
> (b) initializing the crng really can be a timing problem
>
> The interrupt thing is only going to get worse as disks turn into
> ssd's and some of them end up using polling rather than interrupts..
> So we're likely to see _fewer_ interrupts in the future, not more.
Yeah, agreed. Maybe we should have an "insecure_randomness" boot
option which blindly forces the CRNG to be initialized at boot, so
that at least people can get to a command line, if insecurely? I
don't have any good ideas about how to solve this problem in general.
:-( :-( :-(
- Ted
On Wed, Sep 11, 2019 at 05:45:38PM +0100, Linus Torvalds wrote:
> On Wed, Sep 11, 2019 at 5:07 PM Theodore Y. Ts'o <[email protected]> wrote:
> > >
> > > Ted, comments? I'd hate to revert the ext4 thing just because it
> > > happens to expose a bad thing in user space.
> >
> > Unfortuantely, I very much doubt this is going to work. That's
> > because the add_disk_randomness() path is only used for legacy
> > /dev/random [...]
> >
> > Also, because by default, the vast majority of disks have
> > /sys/block/XXX/queue/add_random set to zero by default.
>
> Gaah. I was looking at the input randomness, since I thought that was
> where the added randomness that Ahmed got things to work with came
> from.
>
> And that then made me just look at the legacy disk randomness (for the
> obvious disk IO reasons) and I didn't look further.
>
Yup, I confirm that the quick patch kept the situation as-is. I was
going to debug why, but now we know the answer..
> > So the the way we get entropy these days for initializing the CRNG is
> > via the add_interrupt_randomness() path, where do something really
> > fast, and we assume that we get enough uncertainity from 8 interrupts
> > to give us one bit of entropy (64 interrupts to give us a byte of
> > entropy), and that we need 512 bits of entropy to consider the CRNG
> > fully initialized. (Yeah, there's a lot of conservatism in those
> > estimates, and so what we could do is decide to say, cut down the
> > number of bits needed to initialize the CRNG to be 256 bits, since
> > that's the size of the CHACHA20 cipher.)
>
> So that's 4k interrupts if I counted right, and yeah, maybe Ahmed was
> just close enough before, and the merging of the inode table IO then
> took him below that limit.
>
> > Ultimately, though, we need to find *some* way to fix userspace's
> > assumptions that they can always get high quality entropy in early
> > boot, or we need to get over people's distrust of Intel and RDRAND.
>
> Well, even on a PC, sometimes rdrand just isn't there. AMD has screwed
> it up a few times, and older Intel chips just don't have it.
>
> So I'd be inclined to either lower the limit regardless -
ACK :)
> and perhaps make the "user space asked for randomness much too
> early" be a big *warning* instead of being a basically fatal hung
> machine?
Hmmm, regarding "randomness request much too early", how much is time
really a factor here?
I tested leaving the machine even for 15+ minutes, and it still didn't
continue booting: the boot is practically blocked forever...
Or is the thoery that hopefully once the machine is un-stuck, more
sources of entropy will be available? If that's the case, then
possibly (rate-limited):
"urandom: process XX asked for YY bytes. CRNG not yet initialized"
> Linus
thanks,
--
darwi
http://darwish.chasingpointers.com
On Wed, Sep 11, 2019 at 11:41:44PM +0200, Ahmed S. Darwish wrote:
> On Wed, Sep 11, 2019 at 05:45:38PM +0100, Linus Torvalds wrote:
[...]
> >
> > Well, even on a PC, sometimes rdrand just isn't there. AMD has screwed
> > it up a few times, and older Intel chips just don't have it.
> >
> > So I'd be inclined to either lower the limit regardless -
>
> ACK :)
>
> > and perhaps make the "user space asked for randomness much too
> > early" be a big *warning* instead of being a basically fatal hung
> > machine?
>
> Hmmm, regarding "randomness request much too early", how much is time
> really a factor here?
>
> I tested leaving the machine even for 15+ minutes, and it still didn't
> continue booting: the boot is practically blocked forever...
>
> Or is the thoery that hopefully once the machine is un-stuck, more
> sources of entropy will be available? If that's the case, then
> possibly (rate-limited):
>
> "urandom: process XX asked for YY bytes. CRNG not yet initialized"
>
^
getrandom: ....
(since urandom always succeeds, even if CRNG is not inited, and
it already prints a very similar warning in that case anyway..)
thanks,
--darwi
Hi Ted,
On Wed, Sep 11, 2019 at 01:36:24PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Sep 11, 2019 at 06:00:19PM +0100, Linus Torvalds wrote:
> > [ 0.231255] random: get_random_bytes called from
> > start_kernel+0x323/0x4f5 with crng_init=0
> >
> > and that's this code:
> >
> > add_latent_entropy();
> > add_device_randomness(command_line, strlen(command_line));
> > boot_init_stack_canary();
> >
> > in particular, it's the boot_init_stack_canary() thing that asks for a
> > random number for the canary.
> >
> > I don't actually see the 'crng init done' until much much later:
> >
> > [ 21.741125] random: crng init done
>
> Yes, that's super early in the boot sequence. IIRC the stack canary
> gets reinitialized later (or maybe it was only for the other CPU's in
> SMP mode; I don't recall the details of the top of my head).
>
> I think this one always fails, and perhaps we should have a way of
> suppressing it --- but that's correct the in-kernel interface doesn't
> block.
>
> The /dev/urandom device doesn't block either, despite security
> eggheads continually asking me to change it to block ala getrandom(2),
> but I have always pushed because because I *know* changing
> /dev/urandom to block would be asking for userspace regressions.
>
> The compromise we came up with was that since getrandom(2) is a new
> interface, we could make this have the behavior that the security
> heads wanted, which is to make blocking unconditional, since the
> theory was that *this* interface would be sane, and that userspace
> applications which used it too early was buggy, and we could make it
> *their* problem.
>
Hmmmm, IMHO it's almost impossible to define "too early" here... Does
it mean applications in the critical boot path? Does gnome-session =>
libICE => libbsd => getentropy() => getrandom() => generated MIT magic
cookie count as being too early? It's very hazy...
getrandom(2) basically has no guaranteed upper bound for the waiting
time. And in the report I submitted in the parent thread, the upper
bound is really "infinitely locked"...
Here is a trace_printk() log of all the getrandom() calls done from
system boot:
systemd-random--179 2.510228: getrandom(512 bytes, flags = 1)
systemd-random--179 2.510239: getrandom(512 bytes, flags = 0)
polkitd-294 3.903699: getrandom(8 bytes, flags = 1)
polkitd-294 3.904191: getrandom(8 bytes, flags = 1)
... + 45 similar instances
gnome-session-b-327 4.400620: getrandom(16 bytes, flags = 0)
... boot blocks here, until
pressing some keys
gnome-session-b-327 49.32140: getrandom(16 bytes, flags = 0)
... + 3 similar instances
gnome-shell-335 49.553594: getrandom(8 bytes, flags = 1)
gnome-shell-335 49.553600: getrandom(8 bytes, flags = 1)
... + 10 similar instances
Xwayland-345 50.129401: getrandom(8 bytes, flags = 1)
Xwayland-345 50.129491: getrandom(8 bytes, flags = 1)
... + 9 similar instances
gnome-shell-335 50.487543: getrandom(8 bytes, flags = 1)
gnome-shell-335 50.487550: getrandom(8 bytes, flags = 1)
... + 79 similar instances
gsd-xsettings-390 51.431638: getrandom(8 bytes, flags = 1)
gsd-clipboard-389 51.432693: getrandom(8 bytes, flags = 1)
gsd-xsettings-390 51.433899: getrandom(8 bytes, flags = 1)
gsd-smartcard-388 51.433924: getrandom(110 bytes, flags = 0)
gsd-smartcard-388 51.433936: getrandom(256 bytes, flags = 0)
... + 3 similar instances
And it goes on, including processes like gsd-power-, gsd-xsettings-,
gsd-clipboard-, gsd-print-notif, gsd-clipboard-, gsd-color,
gst-keyboard-, etc.
What's the boundary of "too early" here? It's kinda undefinable..
> People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE,
> or some such, which wouldn't block and would return "best efforts"
> randomness. I haven't been super enthusiastic about such a flag
> because I *know* it would be insecure. However, the next time a massive
> security bug shows up on the front pages of the Wall Street Journal,
> or on some web site such as https://factorable.net, it won't be the kernel's fault
> since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such.
> It doesn't really solve the problem, though.
>
At least for generating the MIT cookie, it would make some sort of
sense... Really caring about truly random-numbers while using Xorg
is almost like perfecting a hard-metal door for the paper house ;)
(Jokes aside, I understand that this cannot be the solution)
> > But this does show that
> >
> > (a) we have the same issue in the kernel, and we don't block there
>
> Ultimately, I think the only right answer is to make it the
> bootloader's responsibility to get us some decent entropy at boot
> time.
Just 8 days ago, systemd v243 was released, with systemd-random-seed(8)
now supporting *crediting* the entropy while loading the random seed:
https://systemd.io/RANDOM_SEEDS
systemd-random-seed do something similar to what OpenBSD does, by
preserving the seed across reboots at /var/lib/systemd/random-seed.
This is not enabled by default though. Will distributions enable it by
default in the future? I have no idea \_(.)_/
> There are patches to allow ARM systems to pass in entropy via
> the device tree. And in theory (assuming you trust the UEFI BIOS ---
> stop laughing in the back!) we can use that get entropy which will
> solve the problem for UEFI boot systems.
Hmmmm ...
> I've been talking to Ron
> Minnich about trying to get this support into the NERF bootloader, at
> which point new servers from the Open Compute Project will have a
> solution as well. (We can probably also get solutions for Chrome OS
> devices, since those have TPM-like which are trusted to have a
> comptently engineered hardware RNG --- I'm not sure I would trust all
> TPM devices in commodity hardware, but again, at least we can shift
> blame off of the kernel. :-P)
>
> Still, these are all point solutions, and don't really solve the
> problem on older systems, or non-x86 systems.
>
For non-x86 _embedded_ systems at least, usually the BSP provider
enables the necessary hwrng driver in question and credit its entropy;
e.g. 62f95ae805fa (hwrng: omap - Set default quality).
> > (b) initializing the crng really can be a timing problem
> >
> > The interrupt thing is only going to get worse as disks turn into
> > ssd's and some of them end up using polling rather than interrupts..
> > So we're likely to see _fewer_ interrupts in the future, not more.
>
> Yeah, agreed. Maybe we should have an "insecure_randomness" boot
> option which blindly forces the CRNG to be initialized at boot, so
> that at least people can get to a command line, if insecurely? I
> don't have any good ideas about how to solve this problem in general.
> :-( :-( :-(
>
> - Ted
Yeah, this is a hard engineering problem. You've earlier summarized it
perfectly here:
https://lore.kernel.org/r/[email protected]
I guess, to summarize earlier e-mails, a nice path would be:
1. Cutting down the number of bits needed to initialize the CRNG
to 256 bits (CHACHA20 cipher)
2. Complaining loudly when getrandom() is used while the CRNG is
not yet initialized.
3. Hopefully #2 will force distributions to act: either trusting
RDRANDOM when it's sane, configuring systmed-random-seed(8) to
credit the entropy by default, etc.
Thanks!
--
darwi
http://darwish.chasingpointers.com
On Thu, Sep 12, 2019 at 05:44:21AM +0200, Ahmed S. Darwish wrote:
> > People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE,
> > or some such, which wouldn't block and would return "best efforts"
> > randomness. I haven't been super enthusiastic about such a flag
> > because I *know* it would be abused. However, the next time a massive
> > security bug shows up on the front pages of the Wall Street Journal,
> > or on some web site such as https://factorable.net, it won't be the kernel's fault
> > since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such.
> > It doesn't really solve the problem, though.
Hmm, one thought might be GRND_FAILSAFE, which will wait up to two
minutes before returning "best efforts" randomness and issuing a huge
massive warning if it is triggered?
> At least for generating the MIT cookie, it would make some sort of
> sense... Really caring about truly random-numbers while using Xorg
> is almost like perfecting a hard-metal door for the paper house ;)
For the MIT Magic Cookie, it might as well use GRND_NONBLOCK, and if
it fails due to randomness being not available, it should just fall
back to random_r(3). Or heck, just use random_r(3) all the time,
since it's not at all secure anyway....
> Just 8 days ago, systemd v243 was released, with systemd-random-seed(8)
> now supporting *crediting* the entropy while loading the random seed:
>
> https://systemd.io/RANDOM_SEEDS
>
> systemd-random-seed do something similar to what OpenBSD does, by
> preserving the seed across reboots at /var/lib/systemd/random-seed.
That makes it systemd's responsibility to properly manage the random
seed file, and if the random seed file gets imaged, or if it gets read
while the system is off, that's on systemd.... which is fine.
The real problem here is that we're trying to engineer a system which
makes it safe for real cryptographic systems, but there's no way to
distinguish between real cryptographic systems where proper entropy is
critical and pretend security systems like X.org's MIT Magic Cookie
--- or python trying to get random numbers seeding its dictionary hash
tables to avoid DOS attacks when python is used for CGI scripts ---
but guess what happens when python is used for systemd generator
scripts in early boot.... before the random seed file might even be
mounted? In that case, python reverted to using /dev/urandom, which
was probably the right choice --- it didn't *need* to use getrandom.
> 1. Cutting down the number of bits needed to initialize the CRNG
> to 256 bits (CHACHA20 cipher)
Does the attach patch (see below) help?
> 2. Complaining loudly when getrandom() is used while the CRNG is
> not yet initialized.
A kernel printk will make it easier for people to understand why their
system is hung, in any case --- and which process is to blame. So
that's definitely a good thing.
- Ted
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..b9b3a5a82abf 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -500,7 +500,7 @@ static int crng_init = 0;
#define crng_ready() (likely(crng_init > 1))
static int crng_init_cnt = 0;
static unsigned long crng_global_init_time = 0;
-#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
+#define CRNG_INIT_CNT_THRESH CHACHA_KEY_SIZE
static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
static void _crng_backtrack_protect(struct crng_state *crng,
__u8 tmp[CHACHA_BLOCK_SIZE], int used);
On Thu, Sep 12, 2019 at 9:25 AM Theodore Y. Ts'o <[email protected]> wrote:
>
> Hmm, one thought might be GRND_FAILSAFE, which will wait up to two
> minutes before returning "best efforts" randomness and issuing a huge
> massive warning if it is triggered?
Yeah, based on (by now) _years_ of experience with people mis-using
"get me random numbers", I think the sense of a new flag needs to be
"yeah, I'm willing to wait for it".
Because most people just don't want to wait for it, and most people
don't think about it, and we need to make the default be for that
"don't think about it" crowd, with the people who ask for randomness
sources for a secure key having to very clearly and very explicitly
say "Yes, I understand that this can take minutes and can only be done
long after boot".
Even then people will screw that up because they copy code, or some
less than gifted rodent writes a library and decides "my library is so
important that I need that waiting sooper-sekrit-secure random
number", and then people use that broken library by mistake without
realizing that it's not going to be reliable at boot time.
An alternative might be to make getrandom() just return an error
instead of waiting. Sure, fill the buffer with "as random as we can"
stuff, but then return -EINVAL because you called us too early.
Linus
On Thu, Sep 12, 2019 at 04:25:30AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 12, 2019 at 05:44:21AM +0200, Ahmed S. Darwish wrote:
[...]
>
> > 1. Cutting down the number of bits needed to initialize the CRNG
> > to 256 bits (CHACHA20 cipher)
>
> Does the attach patch (see below) help?
>
[...]
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 5d5ea4ce1442..b9b3a5a82abf 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -500,7 +500,7 @@ static int crng_init = 0;
> #define crng_ready() (likely(crng_init > 1))
> static int crng_init_cnt = 0;
> static unsigned long crng_global_init_time = 0;
> -#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
> +#define CRNG_INIT_CNT_THRESH CHACHA_KEY_SIZE
> static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
> static void _crng_backtrack_protect(struct crng_state *crng,
> __u8 tmp[CHACHA_BLOCK_SIZE], int used);
Unfortunately, it only made the early fast init faster, but didn't fix
the normal crng init blockage :-(
Here's a trace log, got by applying the patch at [1]. The boot was
continued only after typing some random keys after ~30s:
#
# entries-in-buffer/entries-written: 22/22 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [001] dNh. 0.687088: crng_fast_load: crng threshold = 32
<idle>-0 [001] dNh. 0.687089: crng_fast_load: crng_init_cnt = 0
<idle>-0 [001] dNh. 0.687090: crng_fast_load: crng_init_cnt, now set to 16
<idle>-0 [001] dNh. 0.705208: crng_fast_load: crng threshold = 32
<idle>-0 [001] dNh. 0.705209: crng_fast_load: crng_init_cnt = 16
<idle>-0 [001] dNh. 0.705209: crng_fast_load: crng_init_cnt, now set to 32
<idle>-0 [001] dNh. 0.708048: crng_fast_load: random: fast init done
lvm-165 [001] d... 2.417971: urandom_read: random: crng_init_cnt, now set to 0
systemd-random--179 [003] .... 2.495669: wait_for_random_bytes.part.0: wait for randomness
dbus-daemon-274 [006] dN.. 3.294331: urandom_read: random: crng_init_cnt, now set to 0
dbus-daemon-274 [006] dN.. 3.316618: urandom_read: random: crng_init_cnt, now set to 0
polkitd-286 [007] dN.. 3.873918: urandom_read: random: crng_init_cnt, now set to 0
polkitd-286 [007] dN.. 3.874303: urandom_read: random: crng_init_cnt, now set to 0
polkitd-286 [007] dN.. 3.874375: urandom_read: random: crng_init_cnt, now set to 0
polkitd-286 [007] d... 3.886204: urandom_read: random: crng_init_cnt, now set to 0
polkitd-286 [007] d... 3.886217: urandom_read: random: crng_init_cnt, now set to 0
polkitd-286 [007] d... 3.888519: urandom_read: random: crng_init_cnt, now set to 0
polkitd-286 [007] d... 3.888529: urandom_read: random: crng_init_cnt, now set to 0
gnome-session-b-321 [006] .... 4.292034: wait_for_random_bytes.part.0: wait for randomness
<idle>-0 [002] dNh. 36.784001: crng_reseed: random: crng init done
gnome-session-b-321 [006] .... 36.784019: wait_for_random_bytes.part.0: wait done
systemd-random--179 [003] .... 36.784051: wait_for_random_bytes.part.0: wait done
[1] patch:
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..4a50ee2c230d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -500,7 +500,7 @@ static int crng_init = 0;
#define crng_ready() (likely(crng_init > 1))
static int crng_init_cnt = 0;
static unsigned long crng_global_init_time = 0;
-#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
+#define CRNG_INIT_CNT_THRESH (CHACHA_KEY_SIZE)
static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
static void _crng_backtrack_protect(struct crng_state *crng,
__u8 tmp[CHACHA_BLOCK_SIZE], int used);
@@ -931,6 +931,9 @@ static int crng_fast_load(const char *cp, size_t len)
unsigned long flags;
char *p;
+ trace_printk("crng threshold = %d\n", CRNG_INIT_CNT_THRESH);
+ trace_printk("crng_init_cnt = %d\n", crng_init_cnt);
+
if (!spin_trylock_irqsave(&primary_crng.lock, flags))
return 0;
if (crng_init != 0) {
@@ -943,11 +946,15 @@ static int crng_fast_load(const char *cp, size_t len)
cp++; crng_init_cnt++; len--;
}
spin_unlock_irqrestore(&primary_crng.lock, flags);
+
+ trace_printk("crng_init_cnt, now set to %d\n", crng_init_cnt);
+
if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
invalidate_batched_entropy();
crng_init = 1;
wake_up_interruptible(&crng_init_wait);
pr_notice("random: fast init done\n");
+ trace_printk("random: fast init done\n");
}
return 1;
}
@@ -1033,6 +1040,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
process_random_ready_list();
wake_up_interruptible(&crng_init_wait);
pr_notice("random: crng init done\n");
+ trace_printk("random: crng init done\n");
if (unseeded_warning.missed) {
pr_notice("random: %d get_random_xx warning(s) missed "
"due to ratelimiting\n",
@@ -1743,9 +1751,16 @@ EXPORT_SYMBOL(get_random_bytes);
*/
int wait_for_random_bytes(void)
{
+ int ret;
+
if (likely(crng_ready()))
return 0;
- return wait_event_interruptible(crng_init_wait, crng_ready());
+
+ trace_printk("wait for randomness\n");
+ ret = wait_event_interruptible(crng_init_wait, crng_ready());
+ trace_printk("wait done\n");
+
+ return ret;
}
EXPORT_SYMBOL(wait_for_random_bytes);
@@ -1974,6 +1989,8 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
current->comm, nbytes);
spin_lock_irqsave(&primary_crng.lock, flags);
crng_init_cnt = 0;
+ trace_printk("random: crng_init_cnt, now set to %d\n",
+ crng_init_cnt);
spin_unlock_irqrestore(&primary_crng.lock, flags);
}
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
thanks,
--
darwi
http://darwish.chasingpointers.com
getrandom() has been created as a new and more secure interface for
pseudorandom data requests. Unlike /dev/urandom, it unconditionally
blocks until the entropy pool has been properly initialized.
While getrandom() has no guaranteed upper bound for its waiting time,
user-space has been abusing it by issuing the syscall, from shared
libraries no less, during the main system boot sequence.
Thus, on certain setups where there is no hwrng (embedded), or the
hwrng is not trusted by some users (intel RDRAND), or sometimes it's
just broken (amd RDRAND), the system boot can be *reliably* blocked.
The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which
merges directory lookup code inode table IO, and thus minimizes the
number of disk interrupts and entropy during boot. After that commit,
a blocked boot can be reliably reproduced on a Thinkpad E480 laptop
with standard ArchLinux user-space.
Thus, don't trust user-space on calling getrandom() from the right
context. Just never block, and return -EINVAL if entropy is not yet
available.
Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Ahmed S. Darwish <[email protected]>
---
Notes:
This feels very risky at the very end of -rc8, so only sending
this as an RFC. The system of course reliably boots with this,
and the log, as expected, powerfully warns all callers:
$ dmesg | grep random
[0.236472] random: get_random_bytes called from start_kernel+0x30f/0x4d7 with crng_init=0
[0.680263] random: fast init done
[2.500346] random: lvm: uninitialized urandom read (4 bytes read)
[2.595125] random: systemd-random-: invalid getrandom request (512 bytes): crng not ready
[2.595126] random: systemd-random-: uninitialized urandom read (512 bytes read)
[3.427699] random: dbus-daemon: uninitialized urandom read (12 bytes read)
[3.979425] urandom_read: 1 callbacks suppressed
[3.979426] random: polkitd: uninitialized urandom read (8 bytes read)
[3.979726] random: polkitd: uninitialized urandom read (8 bytes read)
[3.979752] random: polkitd: uninitialized urandom read (8 bytes read)
[4.473398] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
[4.473404] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
[4.473409] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
[5.265636] random: crng init done
[5.265649] random: 3 urandom warning(s) missed due to ratelimiting
[5.265652] random: 1 getrandom warning(s) missed due to ratelimiting
drivers/char/random.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 4a50ee2c230d..309dc5ddf370 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
static struct ratelimit_state urandom_warning =
RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+static struct ratelimit_state getrandom_warning =
+ RATELIMIT_STATE_INIT("warn_getrandom_notavail", HZ, 3);
static int ratelimit_disable __read_mostly;
@@ -1053,6 +1055,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
urandom_warning.missed);
urandom_warning.missed = 0;
}
+ if (getrandom_warning.missed) {
+ pr_notice("random: %d getrandom warning(s) missed "
+ "due to ratelimiting\n",
+ getrandom_warning.missed);
+ getrandom_warning.missed = 0;
+ }
}
}
@@ -1915,6 +1923,7 @@ int __init rand_initialize(void)
crng_global_init_time = jiffies;
if (ratelimit_disable) {
urandom_warning.interval = 0;
+ getrandom_warning.interval = 0;
unseeded_warning.interval = 0;
}
return 0;
@@ -2138,8 +2147,6 @@ const struct file_operations urandom_fops = {
SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
unsigned int, flags)
{
- int ret;
-
if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
return -EINVAL;
@@ -2152,9 +2159,13 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
if (!crng_ready()) {
if (flags & GRND_NONBLOCK)
return -EAGAIN;
- ret = wait_for_random_bytes();
- if (unlikely(ret))
- return ret;
+
+ if (__ratelimit(&getrandom_warning))
+ pr_notice("random: %s: invalid getrandom request "
+ "(%zd bytes): crng not ready",
+ current->comm, count);
+
+ return -EINVAL;
}
return urandom_read(NULL, buf, count, NULL);
}
--
2.23.0
(resending without HTML this time, sorry for the duplicate)
14.09.2019 17:25, Ahmed S. Darwish пишет:
> getrandom() has been created as a new and more secure interface for
> pseudorandom data requests. Unlike /dev/urandom, it unconditionally
> blocks until the entropy pool has been properly initialized.
>
> While getrandom() has no guaranteed upper bound for its waiting time,
> user-space has been abusing it by issuing the syscall, from shared
> libraries no less, during the main system boot sequence.
>
> Thus, on certain setups where there is no hwrng (embedded), or the
> hwrng is not trusted by some users (intel RDRAND), or sometimes it's
> just broken (amd RDRAND), the system boot can be *reliably* blocked.
>
> The issue is further exaggerated by recent file-system optimizations,
> e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which
> merges directory lookup code inode table IO, and thus minimizes the
> number of disk interrupts and entropy during boot. After that commit,
> a blocked boot can be reliably reproduced on a Thinkpad E480 laptop
> with standard ArchLinux user-space.
>
> Thus, don't trust user-space on calling getrandom() from the right
> context. Just never block, and return -EINVAL if entropy is not yet
> available.
>
> Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
> Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
> Link: https://lkml.kernel.org/r/[email protected]
> Link: https://lkml.kernel.org/r/[email protected]
Let me reword the commit message for a hopefully better historical
perspective.
===
getrandom() has been created as a new and more secure interface for
pseudorandom data requests. It attempted to solve two problems, as
compared to /dev/{u,}random: the need to open a file descriptor (which
can fail) and possibility to get not-so-random data from the
incompletely initialized entropy pool. It has succeeded in the first
improvement, but failed horribly in the second one: it blocks until the
entropy pool has been properly initialized, if called without
GRND_NONBLOCK, while none of these behaviors are suitable for the early
boot stage.
The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which merges
directory lookup code inode table IO, and thus minimizes the number of
disk interrupts and entropy during boot. After that commit, a blocked
boot can be reliably reproduced on a Thinkpad E480 laptop with standard
ArchLinux user-space.
Thus, on certain setups where there is no hwrng (embedded systems or
non-KVM virtual machines), or the hwrng is not trusted by some users
(intel RDRAND), or sometimes it's just broken (amd RDRAND), the system
boot can be *reliably* blocked. It can be therefore argued that there is
no way to use getrandom() on Linux correctly, especially from shared
libraries: GRND_NONBLOCK has to be used, and a fallback to some other
interface like /dev/urandom is required, thus making the net result no
better than just using /dev/urandom unconditionally.
While getrandom() has no guaranteed upper bound for its waiting time,
user-space has been using it incorrectly by issuing the syscall, from
shared libraries no less, during the main system boot sequence, without
GRND_NONBLOCK.
We can't trust user-space on calling getrandom() from the right context.
Therefore, just never block, and return -EINVAL (with some entropy still
in the buffer) if the requested amount of entropy is not yet available.
Link:
https://github.com/openbsd/src/commit/edb2eeb7da8494998d0073f8aaeb8478cee5e00b
Link:
https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
===
That said, I have an issue with the -EINVAL return code here: it is also
returned in cases where the parameters passed are genuinely not
understood by the kernel, and no entropy has been written to the buffer.
Therefore, the caller has to assume that the call has failed, waste all
the bytes in the buffer, and try some fallback strategy. Can we think of
some other error code?
The other part of me thinks that triggering a fallback, by returning an
error code, is never the right thing to do. If the "uninitialized" state
exists at all, applications and libraries have to care (and I would
expect their authors who don't pass GRND_RANDOM to just fall back to
/dev/urandom). Therefore, we are back to square one, except that the
fallback code in the application is something that is only rarely
exercised, and thus has higher chances to accumulate bugs. Because the
only expected/reasonable fallback is to read from /dev/urandom, the
whole result looks like shifting the responsibility/blame without
achieving anything useful. As the issue is not really solvable, just
give the application not-so-random data, as /dev/urandom does, without
any indication - this would at least keep the benefit of not needing a
file descriptor. It is simply not possible to do anything better without
eliminating the userspace-visible "uninitialized" crng state, e.g. with
the help of entropy input from the boot loader or a configurable config
or command line option to trust the jitter entropy in-kernel.
>
> Suggested-by: Linus Torvalds <[email protected]>
> Signed-off-by: Ahmed S. Darwish <[email protected]>
> ---
>
> Notes:
> This feels very risky at the very end of -rc8, so only sending
> this as an RFC. The system of course reliably boots with this,
> and the log, as expected, powerfully warns all callers:
>
> $ dmesg | grep random
> [0.236472] random: get_random_bytes called from start_kernel+0x30f/0x4d7 with crng_init=0
> [0.680263] random: fast init done
> [2.500346] random: lvm: uninitialized urandom read (4 bytes read)
> [2.595125] random: systemd-random-: invalid getrandom request (512 bytes): crng not ready
> [2.595126] random: systemd-random-: uninitialized urandom read (512 bytes read)
> [3.427699] random: dbus-daemon: uninitialized urandom read (12 bytes read)
> [3.979425] urandom_read: 1 callbacks suppressed
> [3.979426] random: polkitd: uninitialized urandom read (8 bytes read)
> [3.979726] random: polkitd: uninitialized urandom read (8 bytes read)
> [3.979752] random: polkitd: uninitialized urandom read (8 bytes read)
> [4.473398] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
> [4.473404] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
> [4.473409] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
> [5.265636] random: crng init done
> [5.265649] random: 3 urandom warning(s) missed due to ratelimiting
> [5.265652] random: 1 getrandom warning(s) missed due to ratelimiting
>
> drivers/char/random.c | 21 ++++++++++++++++-----
> 1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 4a50ee2c230d..309dc5ddf370 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
> RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
> static struct ratelimit_state urandom_warning =
> RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
> +static struct ratelimit_state getrandom_warning =
> + RATELIMIT_STATE_INIT("warn_getrandom_notavail", HZ, 3);
>
> static int ratelimit_disable __read_mostly;
>
> @@ -1053,6 +1055,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
> urandom_warning.missed);
> urandom_warning.missed = 0;
> }
> + if (getrandom_warning.missed) {
> + pr_notice("random: %d getrandom warning(s) missed "
> + "due to ratelimiting\n",
> + getrandom_warning.missed);
> + getrandom_warning.missed = 0;
> + }
> }
> }
>
> @@ -1915,6 +1923,7 @@ int __init rand_initialize(void)
> crng_global_init_time = jiffies;
> if (ratelimit_disable) {
> urandom_warning.interval = 0;
> + getrandom_warning.interval = 0;
> unseeded_warning.interval = 0;
> }
> return 0;
> @@ -2138,8 +2147,6 @@ const struct file_operations urandom_fops = {
> SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
> unsigned int, flags)
> {
> - int ret;
> -
> if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
> return -EINVAL;
>
> @@ -2152,9 +2159,13 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
> if (!crng_ready()) {
> if (flags & GRND_NONBLOCK)
> return -EAGAIN;
> - ret = wait_for_random_bytes();
> - if (unlikely(ret))
> - return ret;
> +
> + if (__ratelimit(&getrandom_warning))
> + pr_notice("random: %s: invalid getrandom request "
> + "(%zd bytes): crng not ready",
> + current->comm, count);
> +
> + return -EINVAL;
> }
> return urandom_read(NULL, buf, count, NULL);
> }
> --
> 2.23.0
>
--
Alexander E. Patrakov
On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> On Thu, Sep 12, 2019 at 9:25 AM Theodore Y. Ts'o <[email protected]> wrote:
> >
> > Hmm, one thought might be GRND_FAILSAFE, which will wait up to two
> > minutes before returning "best efforts" randomness and issuing a huge
> > massive warning if it is triggered?
>
> Yeah, based on (by now) _years_ of experience with people mis-using
> "get me random numbers", I think the sense of a new flag needs to be
> "yeah, I'm willing to wait for it".
>
> Because most people just don't want to wait for it, and most people
> don't think about it, and we need to make the default be for that
> "don't think about it" crowd, with the people who ask for randomness
> sources for a secure key having to very clearly and very explicitly
> say "Yes, I understand that this can take minutes and can only be done
> long after boot".
>
> Even then people will screw that up because they copy code, or some
> less than gifted rodent writes a library and decides "my library is so
> important that I need that waiting sooper-sekrit-secure random
> number", and then people use that broken library by mistake without
> realizing that it's not going to be reliable at boot time.
>
> An alternative might be to make getrandom() just return an error
> instead of waiting. Sure, fill the buffer with "as random as we can"
> stuff, but then return -EINVAL because you called us too early.
>
ACK, that's probably _the_ most sensible approach. Only caveat is
the slight change in user-space API semantics though...
For example, this breaks the just released systemd-random-seed(8)
as it _explicitly_ requests blocking behvior from getrandom() here:
=> src/random-seed/random-seed.c:
/*
* Let's make this whole job asynchronous, i.e. let's make
* ourselves a barrier for proper initialization of the
* random pool.
*/
k = getrandom(buf, buf_size, GRND_NONBLOCK);
if (k < 0 && errno == EAGAIN && synchronous) {
log_notice("Kernel entropy pool is not initialized yet, "
"waiting until it is.");
k = getrandom(buf, buf_size, 0); /* retry synchronously */
}
if (k < 0) {
log_debug_errno(errno, "Failed to read random data with "
"getrandom(), falling back to "
"/dev/urandom: %m");
} else if ((size_t) k < buf_size) {
log_debug("Short read from getrandom(), falling back to "
"/dev/urandom: %m");
} else {
getrandom_worked = true;
}
Nonetheless, a slightly broken systemd-random-seed, that was just
released only 11 days ago (v243), is honestly much better than a
*non-booting system*...
I've sent an RFC patch at [1].
To handle the systemd case, I'll add the discussed "yeah, I'm
willing to wait for it" flag (GRND_BLOCK) in v2.
If this whole approach is going to be merged, and the slight ABI
breakage is to be tolerated (hmmmmm?), I wonder how will systemd
random-seed handle the semantics change though without doing
ugly kernel version checks..
thanks,
[1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc
--
darwi
http://darwish.chasingpointers.com
On Sat, Sep 14, 2019 at 11:25:09AM +0200, Ahmed S. Darwish wrote:
> Unfortunately, it only made the early fast init faster, but didn't fix
> the normal crng init blockage :-(
Yeah, I see why; the original goal was to do the fast init so that
using /dev/urandom, even before we were fully initialized, wouldn't be
deadly. But then we still wanted 128 bits of estimated entropy the
old fashioned way before we declare the CRNG initialized.
There are a bunch of things that I think I want to do long-term, such
as make CONFIG_RANDOM_TRUST_CPU the default, trying to get random
entropy from the bootloader, etc. But none of this is something we
should do in a hurry, especially this close before 5.4 drops. So I
think I want to fix things this way, which is a bit a of a hack, but I
think it's better than simply reverting commit b03755ad6f33.
Ahmed, Linus, what do you think?
- Ted
From f1a111bff3b996258410e51a3760fc39bbd7058f Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <[email protected]>
Date: Sat, 14 Sep 2019 12:21:39 -0400
Subject: [PATCH] ext4: don't plug in __ext4_get_inode_loc if the CRNG is not
initialized
Unfortuantely commit b03755ad6f33 ("ext4: make __ext4_get_inode_loc
plug") is so effective that on some systems, where RDRAND is not
trusted, and the GNOME display manager is using getrandom(2) to get
randomness for MIT Magic Cookie (which isn't really secure so using
getrandom(2) is a bit of waste) in early boot on an Arch system is
causing the boot to hang.
Since this is causing problems, although arguably this is userspace's
fault, let's not do it if the CRNG is not yet initialized. This is
better than trying to tweak the random number generator right before
5.4 is released (I'm afraid we'll accidentally make it _too_ weak),
and it's also better than simply completely reverting b03755ad6f33.
We're effectively reverting it while the RNG is not yet initialized,
to slow down the boot and make it less efficient, just to work around
broken init setups.
Fixes: b03755ad6f33 ("ext4: make __ext4_get_inode_loc plug")
Signed-off-by: Theodore Ts'o <[email protected]>
---
fs/ext4/inode.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4e271b509af1..41ad93f11b6d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4534,6 +4534,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
struct buffer_head *bh;
struct super_block *sb = inode->i_sb;
ext4_fsblk_t block;
+ int be_inefficient = !rng_is_initialized();
struct blk_plug plug;
int inodes_per_block, inode_offset;
@@ -4541,7 +4542,6 @@ static int __ext4_get_inode_loc(struct inode *inode,
if (inode->i_ino < EXT4_ROOT_INO ||
inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))
return -EFSCORRUPTED;
-
iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb);
gdp = ext4_get_group_desc(sb, iloc->block_group, NULL);
if (!gdp)
@@ -4623,7 +4623,8 @@ static int __ext4_get_inode_loc(struct inode *inode,
* If we need to do any I/O, try to pre-readahead extra
* blocks from the inode table.
*/
- blk_start_plug(&plug);
+ if (likely(!be_inefficient))
+ blk_start_plug(&plug);
if (EXT4_SB(sb)->s_inode_readahead_blks) {
ext4_fsblk_t b, end, table;
unsigned num;
@@ -4654,7 +4655,8 @@ static int __ext4_get_inode_loc(struct inode *inode,
get_bh(bh);
bh->b_end_io = end_buffer_read_sync;
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
- blk_finish_plug(&plug);
+ if (likely(!be_inefficient))
+ blk_finish_plug(&plug);
wait_on_buffer(bh);
if (!buffer_uptodate(bh)) {
EXT4_ERROR_INODE_BLOCK(inode, block,
--
2.23.0
On Sat, Sep 14, 2019 at 8:02 AM Ahmed S. Darwish <[email protected]> wrote:
>
> On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> >
> > An alternative might be to make getrandom() just return an error
> > instead of waiting. Sure, fill the buffer with "as random as we can"
> > stuff, but then return -EINVAL because you called us too early.
>
> ACK, that's probably _the_ most sensible approach. Only caveat is
> the slight change in user-space API semantics though...
>
> For example, this breaks the just released systemd-random-seed(8)
> as it _explicitly_ requests blocking behvior from getrandom() here:
>
Actually, I would argue that the "don't ever block, instead fill
buffer and return error instead" fixes this broken case.
> => src/random-seed/random-seed.c:
> /*
> * Let's make this whole job asynchronous, i.e. let's make
> * ourselves a barrier for proper initialization of the
> * random pool.
> */
> k = getrandom(buf, buf_size, GRND_NONBLOCK);
> if (k < 0 && errno == EAGAIN && synchronous) {
> log_notice("Kernel entropy pool is not initialized yet, "
> "waiting until it is.");
>
> k = getrandom(buf, buf_size, 0); /* retry synchronously */
> }
Yeah, the above is yet another example of completely broken garbage.
You can't just wait and block at boot. That is simply 100%
unacceptable, and always has been, exactly because that may
potentially mean waiting forever since you didn't do anything that
actually is likely to add any entropy.
> if (k < 0) {
> log_debug_errno(errno, "Failed to read random data with "
> "getrandom(), falling back to "
> "/dev/urandom: %m");
At least it gets a log message.
So I think the right thing to do is to just make getrandom() return
-EINVAL, and refuse to block.
As mentioned, this has already historically been a huge issue on
embedded devices, and with disks turnign not just to NVMe but to
actual polling nvdimm/xpoint/flash, the amount of true "entropy"
randomness we can give at boot is very questionable.
We can (and will) continue to do a best-effort thing (including very
much using rdread and friends), but the whole "wait for entropy"
simply *must* stop.
> I've sent an RFC patch at [1].
>
> [1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc
Looks reasonable to me. Except I'd just make it simpler and make it a
big WARN_ON_ONCE(), which is a lot harder to miss than pr_notice().
Make it clear that it is a *bug* if user space thinks it should wait
at boot time.
Also, we might even want to just fill the buffer and return 0 at that
point, to make sure that even more broken user space doesn't then try
to sleep manually and turn it into a "I'll wait myself" loop.
Linus
14.09.2019 21:30, Linus Torvalds пишет:
> On Sat, Sep 14, 2019 at 8:02 AM Ahmed S. Darwish <[email protected]> wrote:
>>
>> On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
>>>
>>> An alternative might be to make getrandom() just return an error
>>> instead of waiting. Sure, fill the buffer with "as random as we can"
>>> stuff, but then return -EINVAL because you called us too early.
>>
>> ACK, that's probably _the_ most sensible approach. Only caveat is
>> the slight change in user-space API semantics though...
>>
>> For example, this breaks the just released systemd-random-seed(8)
>> as it _explicitly_ requests blocking behvior from getrandom() here:
>>
>
> Actually, I would argue that the "don't ever block, instead fill
> buffer and return error instead" fixes this broken case.
>
>> => src/random-seed/random-seed.c:
>> /*
>> * Let's make this whole job asynchronous, i.e. let's make
>> * ourselves a barrier for proper initialization of the
>> * random pool.
>> */
>> k = getrandom(buf, buf_size, GRND_NONBLOCK);
>> if (k < 0 && errno == EAGAIN && synchronous) {
>> log_notice("Kernel entropy pool is not initialized yet, "
>> "waiting until it is.");
>>
>> k = getrandom(buf, buf_size, 0); /* retry synchronously */
>> }
>
> Yeah, the above is yet another example of completely broken garbage.
>
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.
>
>> if (k < 0) {
>> log_debug_errno(errno, "Failed to read random data with "
>> "getrandom(), falling back to "
>> "/dev/urandom: %m");
>
> At least it gets a log message.
>
> So I think the right thing to do is to just make getrandom() return
> -EINVAL, and refuse to block.
Let me repeat: not -EINVAL, please. Please find some other error code,
so that the application could sensibly distinguish between this case
(low quality entropy is in the buffer) and the "kernel is too dumb" case
(and no entropy is in the buffer).
--
Alexander E. Patrakov
On Sat, Sep 14, 2019 at 9:35 AM Alexander E. Patrakov
<[email protected]> wrote:
>
> Let me repeat: not -EINVAL, please. Please find some other error code,
> so that the application could sensibly distinguish between this case
> (low quality entropy is in the buffer) and the "kernel is too dumb" case
> (and no entropy is in the buffer).
I'm not convinced we want applications to see that difference.
The fact is, every time an application thinks it cares, it has caused
problems. I can just see systemd saying "ok, the kernel didn't block,
so I'll just do
while (getrandom(x) == -ENOENTROPY)
sleep(1);
instead. Which is still completely buggy garbage.
The fact is, we can't guarantee entropy in general. It's probably
there is practice, particularly with user space saving randomness from
last boot etc, but that kind of data may be real entropy, but the
kernel cannot *guarantee* that it is.
And people don't like us guaranteeing that rdrand/rdseed is "real
entropy" either, since they don't trust the CPU hw either.
Which means that we're all kinds of screwed. The whole "we guarantee
entropy" model is broken.
Linus
14.09.2019 21:52, Linus Torvalds пишет:
> On Sat, Sep 14, 2019 at 9:35 AM Alexander E. Patrakov
> <[email protected]> wrote:
>>
>> Let me repeat: not -EINVAL, please. Please find some other error code,
>> so that the application could sensibly distinguish between this case
>> (low quality entropy is in the buffer) and the "kernel is too dumb" case
>> (and no entropy is in the buffer).
>
> I'm not convinced we want applications to see that difference.
>
> The fact is, every time an application thinks it cares, it has caused
> problems. I can just see systemd saying "ok, the kernel didn't block,
> so I'll just do
>
> while (getrandom(x) == -ENOENTROPY)
> sleep(1);
>
> instead. Which is still completely buggy garbage.
OK, I understand this viewpoint. But then still, -EINVAL is not the
answer, because a hypothetical evil version of systemd will use -EINVAL
as -ENOENTROPY (with flags == 0 and a reasonable buffer size, there is
simply no other reason for the kernel to return -EINVAL). Yes I
understand that this is a complete reverse of my previous argument.
> The fact is, we can't guarantee entropy in general. It's probably
> there is practice, particularly with user space saving randomness from
> last boot etc, but that kind of data may be real entropy, but the
> kernel cannot *guarantee* that it is.
>
> And people don't like us guaranteeing that rdrand/rdseed is "real
> entropy" either, since they don't trust the CPU hw either.
>
> Which means that we're all kinds of screwed. The whole "we guarantee
> entropy" model is broken.
I agree here. Given that you suggested "to just fill the buffer and
return 0" in the previous mail (well, I think you really meant "return
buflen", otherwise ENOENTROPY == 0 and your previous objection applies),
let's do just that. As a bonus, it saves applications from the complex
dance with retrying via /dev/urandom and finally brings a reliable API
(modulo old and broken kernels) to get random numbers (well, as random
as possible right now) without needing a file descriptor.
--
Alexander E. Patrakov
On Sat, Sep 14, 2019 at 10:09 AM Alexander E. Patrakov
<[email protected]> wrote:
>
> > Which means that we're all kinds of screwed. The whole "we guarantee
> > entropy" model is broken.
>
> I agree here. Given that you suggested "to just fill the buffer and
> return 0" in the previous mail (well, I think you really meant "return
> buflen", otherwise ENOENTROPY == 0 and your previous objection applies),
Right.
The question remains when we should WARN_ON(), though.
For example, if somebody did save entropy between boots, we probably
should accept that - at least in the sense of not warning when they
then ask for randomness data back.
And if the hardware does have a functioning rdrand, we probably should
accept that too - simply because not accepting it and warning sounds a
bit too annoying.
But we definitely *should* have a warning for people who build
embedded devices that we can't see any reasonable amount of possible
entropy. Those have definitely happened, and it's a serious and real
security issue.
> let's do just that. As a bonus, it saves applications from the complex
> dance with retrying via /dev/urandom and finally brings a reliable API
> (modulo old and broken kernels) to get random numbers (well, as random
> as possible right now) without needing a file descriptor.
Yeah, well, the question in the end always is "what is reliable".
Waiting has definitely not been reliable, and has only ever caused problems.
Returning an error (or some status while still doing a best effort)
would be reasonable, but I really do think that people will mis-use
that. We just have too much of a history of people having the mindset
that they can just fall back to something better - like waiting - and
they are always wrong.
Just returning random data is the right thing, but we do need to make
sure that system developers see a warning if they do something
obviously wrong (so that the embedded people without even a real-time
clock to initialize any bits of entropy AT ALL won't think that they
can generate a system key on their router).
Linus
Hi,
On Sat, Sep 14, 2019 at 09:30:19AM -0700, Linus Torvalds wrote:
> On Sat, Sep 14, 2019 at 8:02 AM Ahmed S. Darwish <[email protected]> wrote:
> >
> > On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> > >
> > > An alternative might be to make getrandom() just return an error
> > > instead of waiting. Sure, fill the buffer with "as random as we can"
> > > stuff, but then return -EINVAL because you called us too early.
> >
> > ACK, that's probably _the_ most sensible approach. Only caveat is
> > the slight change in user-space API semantics though...
> >
> > For example, this breaks the just released systemd-random-seed(8)
> > as it _explicitly_ requests blocking behvior from getrandom() here:
> >
>
> Actually, I would argue that the "don't ever block, instead fill
> buffer and return error instead" fixes this broken case.
>
> > => src/random-seed/random-seed.c:
> > /*
> > * Let's make this whole job asynchronous, i.e. let's make
> > * ourselves a barrier for proper initialization of the
> > * random pool.
> > */
> > k = getrandom(buf, buf_size, GRND_NONBLOCK);
> > if (k < 0 && errno == EAGAIN && synchronous) {
> > log_notice("Kernel entropy pool is not initialized yet, "
> > "waiting until it is.");
> >
> > k = getrandom(buf, buf_size, 0); /* retry synchronously */
> > }
>
> Yeah, the above is yet another example of completely broken garbage.
>
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.
>
ACK, the systemd commit which introduced that code also does:
=> 26ded5570994 (random-seed: rework systemd-random-seed.service..)
[...]
--- a/units/systemd-random-seed.service.in
+++ b/units/systemd-random-seed.service.in
@@ -22,4 +22,9 @@ Type=oneshot
RemainAfterExit=yes
ExecStart=@rootlibexecdir@/systemd-random-seed load
ExecStop=@rootlibexecdir@/systemd-random-seed save
-TimeoutSec=30s
+
+# This service waits until the kernel's entropy pool is
+# initialized, and may be used as ordering barrier for service
+# that require an initialized entropy pool. Since initialization
+# can take a while on entropy-starved systems, let's increase the
+# time-out substantially here.
+TimeoutSec=10min
This 10min wait thing is really broken... it's basically "forever".
> > if (k < 0) {
> > log_debug_errno(errno, "Failed to read random data with "
> > "getrandom(), falling back to "
> > "/dev/urandom: %m");
>
> At least it gets a log message.
>
> So I think the right thing to do is to just make getrandom() return
> -EINVAL, and refuse to block.
>
> As mentioned, this has already historically been a huge issue on
> embedded devices, and with disks turnign not just to NVMe but to
> actual polling nvdimm/xpoint/flash, the amount of true "entropy"
> randomness we can give at boot is very questionable.
>
ACK.
Moreover, and as a result of all that, distributions are now officially
*duct-taping* the problem:
https://www.debian.org/releases/buster/amd64/release-notes/ch-information.en.html#entropy-starvation
5.1.4. Daemons fail to start or system appears to hang during boot
Due to systemd needing entropy during boot and the kernel treating
such calls as blocking when available entropy is low, the system
may hang for minutes to hours until the randomness subsystem is
sufficiently initialized (random: crng init done).
"the system may hang for minuts to hours"...
> We can (and will) continue to do a best-effort thing (including very
> much using rdread and friends), but the whole "wait for entropy"
> simply *must* stop.
>
> > I've sent an RFC patch at [1].
> >
> > [1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc
>
> Looks reasonable to me. Except I'd just make it simpler and make it a
> big WARN_ON_ONCE(), which is a lot harder to miss than pr_notice().
> Make it clear that it is a *bug* if user space thinks it should wait
> at boot time.
>
> Also, we might even want to just fill the buffer and return 0 at that
> point, to make sure that even more broken user space doesn't then try
> to sleep manually and turn it into a "I'll wait myself" loop.
>
ACK, I'll send an RFC v2, returning buflen, and so on..
/me will enjoy the popcorn from all the to-be-reported WARN_ON()s
on distribution mailing lists ;-)
> Linus
thanks,
--
darwi
http://darwish.chasingpointers.com
Ahmed S. Darwish - 14.09.19, 23:11:26 CEST:
> > Yeah, the above is yet another example of completely broken garbage.
> >
> > You can't just wait and block at boot. That is simply 100%
> > unacceptable, and always has been, exactly because that may
> > potentially mean waiting forever since you didn't do anything that
> > actually is likely to add any entropy.
>
> ACK, the systemd commit which introduced that code also does:
>
> => 26ded5570994 (random-seed: rework systemd-random-seed.service..)
> [...]
> --- a/units/systemd-random-seed.service.in
> +++ b/units/systemd-random-seed.service.in
> @@ -22,4 +22,9 @@ Type=oneshot
> RemainAfterExit=yes
> ExecStart=@rootlibexecdir@/systemd-random-seed load
> ExecStop=@rootlibexecdir@/systemd-random-seed save
> -TimeoutSec=30s
> +
> +# This service waits until the kernel's entropy pool is
> +# initialized, and may be used as ordering barrier for service
> +# that require an initialized entropy pool. Since initialization
> +# can take a while on entropy-starved systems, let's increase the
> +# time-out substantially here.
> +TimeoutSec=10min
>
> This 10min wait thing is really broken... it's basically "forever".
I am so happy to use Sysvinit on my systems again. Depending on entropy
for just booting a machine is broken?.
Of course regenerating SSH keys on boot, probably due to cloud-init
replacing the old key after a VM has been cloned from template, may
still be a challenge to handle well?. I'd probably replace SSH keys in
the background and restart the service then, but this may lead to
spurious man in the middle warnings.
[1] Debian Buster release notes: 5.1.4. Daemons fail to start or system
appears to hang during boot
https://www.debian.org/releases/stable/amd64/release-notes/ch-information.en.html#entropy-starvation
[2] Openssh taking minutes to become available, booting takes half an
hour ... because your server waits for a few bytes of randomness
https://daniel-lange.com/archives/152-hello-buster.html
Thanks,
--
Martin
On Sat, Sep 14, 2019 at 11:11:26PM +0200, Ahmed S. Darwish wrote:
> > > I've sent an RFC patch at [1].
> > >
> > > [1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc
> >
> > Looks reasonable to me. Except I'd just make it simpler and make it a
> > big WARN_ON_ONCE(), which is a lot harder to miss than pr_notice().
> > Make it clear that it is a *bug* if user space thinks it should wait
> > at boot time.
So I'd really rather not make a change as fundamental as this so close
to 5.3 being released. This sort of thing is subtle since essentially
what we're trying to do is to work around broken userspace, and worse,
in many cases, obstinent, determined userspace application progammers.
We've told them to avoid trying to generate cryptographically secure
random numbers for *years*. And they haven't listened.
This is also a fairly major functional change which is likely to be
very visible to userspace applications, and so it is likely to cause
*some* kind of breakage. So if/when this breaks applications, are we
going to then have to revert it?
> > Also, we might even want to just fill the buffer and return 0 at that
> > point, to make sure that even more broken user space doesn't then try
> > to sleep manually and turn it into a "I'll wait myself" loop.
Ugh. This makes getrandom(2) unreliable for application programers,
in that it returns success, but with the buffer filled with something
which is definitely not random. Please, let's not.
Worse, it won't even accomplish something against an obstinant
programmers. Someone who is going to change their program to sleep
loop waiting for getrandom(2) to not return with an error can just as
easily check for a buffer which is zero-filled, or an unchanged
buffer, and then sleep loop on that. Again, remember we're trying to
work around malicious human beings --- except instead trying to fight
malicious attackers, we're trying to fight malicious userspace
programmers. This is not a fight we can win. We can't make
getrandom(2) idiot-proof, because idiots are too d*mned ingenious :-)
For 5.3, can we please consider my proposal in [1]?
[1] https://lore.kernel.org/linux-ext4/[email protected]/
We can try to discuss different ways of working around broken/stupid
userspace, but let's wait until after the LTS release, and ultimately,
I still think we need to just try to get more randomness from hardware
whichever way we can. Pretty much all x86 laptop/desktop have TPM's.
So let's use that, in combination with RDRAND, and UEFI provided
randomness, etc., etc.,
And if we want to put in a big WARN_ON_ONCE, sure. But we've tried
not blocking before, and that way didn't end well[2], with almost 10%
of all publically accessible SSH keys across the entire internet being
shown to be week by an academic researcher. (This ruined my July 4th
holidays in 2012 when I was working on patches to fix this on very
short notice.) So let's *please* not be hasty with changes here.
We're dealing with a complex systems that includes some very
obstinent/strong personalities, including one which rhymes with
Loettering....
[2] https://factorable.net
- Ted
On Sat, Sep 14, 2019 at 3:24 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> > > Also, we might even want to just fill the buffer and return 0 at that
> > > point, to make sure that even more broken user space doesn't then try
> > > to sleep manually and turn it into a "I'll wait myself" loop.
>
> Ugh. This makes getrandom(2) unreliable for application programers,
> in that it returns success, but with the buffer filled with something
> which is definitely not random. Please, let's not.
You misunderstand,
The buffer would always be filled with "as random as we can make it".
My "return zero" was for success, but Alexander pointed out that the
return value is the length, not "zero for success".
> Worse, it won't even accomplish something against an obstinant
> programmers. Someone who is going to change their program to sleep
> loop waiting for getrandom(2) to not return with an error can just as
> easily check for a buffer which is zero-filled, or an unchanged
> buffer, and then sleep loop on that.
Again, no they can't. They'll get random data in the buffer. And
there is no way they can tell how much entropy that random data has.
Exactly the same way there is absolutely no way _we_ can tell how much
entropy we have.
> For 5.3, can we please consider my proposal in [1]?
>
> [1] https://lore.kernel.org/linux-ext4/[email protected]/
Honestly, to me that looks *much* worse than just saying that we need
to stop allowing insane user mode boot programs make insane choices
that have no basis in reality.
It may be the safest thing to do, but at that point we might as well
just revert the ext4 change entirely. I'd rather do that, than h ave
random filesystems start making random decisions based on crazy user
space behavior.
Linus
On Sat, Sep 14, 2019 at 03:32:46PM -0700, Linus Torvalds wrote:
> > Worse, it won't even accomplish something against an obstinant
> > programmers. Someone who is going to change their program to sleep
> > loop waiting for getrandom(2) to not return with an error can just as
> > easily check for a buffer which is zero-filled, or an unchanged
> > buffer, and then sleep loop on that.
>
> Again, no they can't. They'll get random data in the buffer. And
> there is no way they can tell how much entropy that random data has.
That makes me even more worried. It's probably going to be OK for
modern x86 systems, since "best we can do" will include RDRAND
(whether or not it's trusted). But on systems without something like
RDRAND --- e.g., ARM --- the "best we can do" could potentially be
Really Bad. Again, look back at the Mining Your P's and Q's paper
from factorable.net.
If we don't block, and we just return "the best we can do", and some
insane userspace tries to generate a long-term private key (for SSH or
TLS) in super-early boot, I think we owe them something beyond a big
fat WARN_ON_ONCE. We could return 0 for success, and yet "the best we
can do" could be really terrible.
> > For 5.3, can we please consider my proposal in [1]?
> >
> > [1] https://lore.kernel.org/linux-ext4/[email protected]/
>
> Honestly, to me that looks *much* worse than just saying that we need
> to stop allowing insane user mode boot programs make insane choices
> that have no basis in reality.
>
> It may be the safest thing to do, but at that point we might as well
> just revert the ext4 change entirely. I'd rather do that, than have
> random filesystems start making random decisions based on crazy user
> space behavior.
All we're doing is omitting the plug; I disagree that it's really all
that random. Honestly, I'd much rather just let distributions hang,
and force them to fix it that way. That's *much* better than silently
give them "the best we can do", which might be "not really random at
all".
The reality is that there will be some platforms where we will block
for a very long time, given certain kernel configs and certain really
stupid userspace decisions --- OR, we can open up a really massive
security hole given stupid userspace decisions. Ext4 just got unlocky
that a performance improvement patch happened to toggle one or two
configurations from "working" to "not working".
But just saying, "oh well" and returning something which might not
really be random with a success code is SUCH A TERRIBLE IDEA, that if
you really prefer that, I'll accept the ext4 revert, even though I
don't think we should be penalizing all ext4 performance just because
of a few distros being stupid.
If the choice is between that and making some unsuspecting
distributions being potentially completely insecure, it's no contest.
I won't have that on my conscience.
- Ted
On Sat, Sep 14, 2019 at 6:00 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> That makes me even more worried. It's probably going to be OK for
> modern x86 systems, since "best we can do" will include RDRAND
> (whether or not it's trusted). But on systems without something like
> RDRAND --- e.g., ARM --- the "best we can do" could potentially be
> Really Bad. Again, look back at the Mining Your P's and Q's paper
> from factorable.net.
Yes. And they had that problem *because* the blocking interface was
useless, and they didn't use it, and *because* nobody warned them
about it.
In other words, the whole disaster was exactly because blocking is
wrong, and because blocking to get "secure" data is unacceptable.
And the random people DIDN'T LEARN A SINGLE LESSON from that thing.
Seriously. getrandom() introduced the same broken model as /dev/random
had - and that then caused people to use /dev/urandom instead.
And now it has shown itself to be broken _again_.
And you still argue against the only sane model. Scream loudly that
you're doing something wrong so that people can fix their broken
garbage, but don't let people block, which is _also_ broken garbage.
Seriously. Blocking is wrong. Blocking has _always_ been wrong. It was
why /dev/random was useless, and it is now why the new getrandom()
system call is showing itself useless.
> We could return 0 for success, and yet "the best we
> can do" could be really terrible.
Yes. Which is why we should warn.
But we can't *block*. Because that just breaks people. Like shown in
this whole discussion.
Why is warning different? Because hopefully it tells the only person
who can *do* something about it - the original maintainer or developer
of the user space tools - that they are doing something wrong and need
to fix their broken model.
Blocking doesn't do that. Blocking only makes the system unusable. And
yes, some security people think "unusable == secure", but honestly,
those security people shouldn't do system design. They are the worst
kind of "technically correct" incompetent.
> > > For 5.3, can we please consider my proposal in [1]?
> > It may be the safest thing to do, but at that point we might as well
> > just revert the ext4 change entirely. I'd rather do that, than have
> > random filesystems start making random decisions based on crazy user
> > space behavior.
>
> All we're doing is omitting the plug;
Yes. Which we'll do by reverting that change. I agree that it's the
safe thing to do for 5.3.
We are not adding crazy workarounds for "getrandom()" bugs in some
low-level filesystem.
Either we fix getrandom() or we revert the change. We don't do some
mis-designed "let's work around bugs in getrandom() in the ext4
filesystem with ad-hoc behavioral changes".
Linus
On Sat, Sep 14, 2019 at 06:10:47PM -0700, Linus Torvalds wrote:
> > We could return 0 for success, and yet "the best we
> > can do" could be really terrible.
>
> Yes. Which is why we should warn.
I'm all in favor of warning. But people might just ignore the
warning. We warn today about systemd trying to read from /dev/urandom
too early, and that just gets ignored.
> But we can't *block*. Because that just breaks people. Like shown in
> this whole discussion.
I'd be willing to let it take at least 2 minutes, since that's slow
enough to be annoying. I'd be willing to to kill the process which
tried to call getrandom too early. But I believe blocking is better
than returning something potentially not random at all. I think
failing "safe" is extremely important. And returning something not
random which then gets used for a long-term private key is a disaster.
You basically want to turn getrandom into /dev/urandom. And that's
how we got into the mess where 10% of the publically accessible ssh
keys could be guessed. I've tried that already, and we saw how that
ended.
> Why is warning different? Because hopefully it tells the only person
> who can *do* something about it - the original maintainer or developer
> of the user space tools - that they are doing something wrong and need
> to fix their broken model.
Except the developer could (and *has) just ignored the warning, which
is what happened with /dev/urandom when it was accessed too early.
Even when I drew some developers attention to the warning, at least
one just said, "meh", and blew me off. Would a making it be noiser
(e.g., a WARN_ON) make enough of a difference? I guess I'm just not
convinced.
> Blocking doesn't do that. Blocking only makes the system unusable. And
> yes, some security people think "unusable == secure", but honestly,
> those security people shouldn't do system design. They are the worst
> kind of "technically correct" incompetent.
Which is worse really depends on your point of view, and what the
system might be controlling. If access to the system could cause a
malicious attacker to trigger a nuclear bomb, failing safe is always
going to be better. In other cases, maybe failing open is certainly
more convenient. It certainly leaves the system more "usable". But
how do we trade off "usable" with "insecure"? There are times when
"unusable" is WAY better than "could risk life or human safety".
Would you be willing to settle for a CONFIG option or a boot-command
line option which controls whether we fail "safe" or fail "open" if
someone calls getrandom(2) and there isn't enough entropy? Then each
distribution and/or system integrator can decide whether "proper
systems design" considers "usability" versus "must not fail
insecurely" to be more important.
- Ted
On Sat, Sep 14, 2019 at 7:05 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> I'd be willing to let it take at least 2 minutes, since that's slow
> enough to be annoying.
Have you ever met a real human being?
A boot that blocks will result in people pressing the big red button
in less than 30 seconds, unless it talks very much about _why_ it
blocks and gives an estimate of how long.
Please go out and actually interact with real people some day.
Linus
getrandom() has been created as a new and more secure interface for
pseudorandom data requests. Unlike /dev/urandom, it unconditionally
blocks until the entropy pool has been properly initialized.
While getrandom() has no guaranteed upper bound for its waiting time,
user-space has been abusing it by issuing the syscall, from shared
libraries no less, during the main system boot sequence.
Thus, on certain setups where there is no hwrng (embedded), or the
hwrng is not trusted by some users (intel RDRAND), or sometimes it's
just broken (amd RDRAND), the system boot can be *reliably* blocked.
The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which
merges directory lookup code inode table IO, and thus minimizes the
number of disk interrupts and entropy during boot. After that commit,
a blocked boot can be reliably reproduced on a Thinkpad E480 laptop
with standard ArchLinux user-space.
Thus, add an optional configuration option which stops getrandom(2)
from blocking, but instead returns "best efforts" randomness, which
might not be random or secure at all. This can be controlled via
random.getrandom_block boot command line option, and the
CONFIG_RANDOM_BLOCK can be used to set the default to be blocking.
Since according to the Great Penguin, only incompetent system
designers would value "security" ahead of "usability", the default is
to be non-blocking.
In addition, modify getrandom(2) to complain loudly with a kernel
warning when some userspace process is erroneously calling
getrandom(2) too early during the boot process.
Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
[ Modified by [email protected] to make the change of getrandom(2) to be
non-blocking to be optional. ]
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
---
Here's my take on the patch. I really very strongly believe that the
idea of making getrandom(2) non-blocking and to blindly assume that we
can load up the buffer with "best efforts" randomness to be a
terrible, terrible idea that is going to cause major security problems
that we will potentially regret very badly. Linus Torvalds believes I
am an incompetent systems designer.
So let's do it both ways, and push the decision on the distributor
and/or product manufacturer
drivers/char/Kconfig | 33 +++++++++++++++++++++++++++++++--
drivers/char/random.c | 34 +++++++++++++++++++++++++++++-----
2 files changed, 60 insertions(+), 7 deletions(-)
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3e866885a405..337baeca5ebc 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -557,8 +557,6 @@ config ADI
and SSM (Silicon Secured Memory). Intended consumers of this
driver include crash and makedumpfile.
-endmenu
-
config RANDOM_TRUST_CPU
bool "Trust the CPU manufacturer to initialize Linux's CRNG"
depends on X86 || S390 || PPC
@@ -573,3 +571,34 @@ config RANDOM_TRUST_CPU
has not installed a hidden back door to compromise the CPU's
random number generation facilities. This can also be configured
at boot with "random.trust_cpu=on/off".
+
+config RANDOM_BLOCK
+ bool "Block if getrandom is called before CRNG is initialized"
+ help
+ Say Y here if you want userspace programs which call
+ getrandom(2) before the Cryptographic Random Number
+ Generator (CRNG) is initialized to block until
+ secure random numbers are available.
+
+ Say N if you believe usability is more important than
+ security, so if getrandom(2) is called before the CRNG is
+ initialized, it should not block, but instead return "best
+ effort" randomness which might not be very secure or random
+ at all; but at least the system boot will not be delayed by
+ minutes or hours.
+
+ This can also be controlled at boot with
+ "random.getrandom_block=on/off".
+
+ Ideally, systems would be configured with hardware random
+ number generators, and/or configured to trust CPU-provided
+ RNG's. In addition, userspace should generate cryptographic
+ keys only as late as possible, when they are needed, instead
+ of during early boot. (For non-cryptographic use cases,
+ such as dictionary seeds or MIT Magic Cookies, other
+ mechanisms such as /dev/urandom or random(3) may be more
+ appropropriate.) This config option controls what the
+ kernel should do as a fallback when the non-ideal case
+ presents itself.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..243fb4a4535f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
static struct ratelimit_state urandom_warning =
RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+static struct ratelimit_state getrandom_warning =
+ RATELIMIT_STATE_INIT("warn_getrandom_randomness", HZ, 3);
static int ratelimit_disable __read_mostly;
@@ -854,12 +856,19 @@ static void invalidate_batched_entropy(void);
static void numa_crng_init(void);
static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static bool getrandom_block __ro_after_init = IS_ENABLED(CONFIG_RANDOM_BLOCK);
static int __init parse_trust_cpu(char *arg)
{
return kstrtobool(arg, &trust_cpu);
}
early_param("random.trust_cpu", parse_trust_cpu);
+static int __init parse_block(char *arg)
+{
+ return kstrtobool(arg, &getrandom_block);
+}
+early_param("random.getrandom_block", parse_block);
+
static void crng_initialize(struct crng_state *crng)
{
int i;
@@ -1045,6 +1054,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
urandom_warning.missed);
urandom_warning.missed = 0;
}
+ if (getrandom_warning.missed) {
+ pr_notice("random: %d getrandom warning(s) missed "
+ "due to ratelimiting\n",
+ getrandom_warning.missed);
+ getrandom_warning.missed = 0;
+ }
}
}
@@ -1900,6 +1915,7 @@ int __init rand_initialize(void)
crng_global_init_time = jiffies;
if (ratelimit_disable) {
urandom_warning.interval = 0;
+ getrandom_warning.interval = 0;
unseeded_warning.interval = 0;
}
return 0;
@@ -1969,8 +1985,8 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
if (!crng_ready() && maxwarn > 0) {
maxwarn--;
if (__ratelimit(&urandom_warning))
- printk(KERN_NOTICE "random: %s: uninitialized "
- "urandom read (%zd bytes read)\n",
+ pr_err("random: %s: CRNG uninitialized "
+ "(%zd bytes read)\n",
current->comm, nbytes);
spin_lock_irqsave(&primary_crng.lock, flags);
crng_init_cnt = 0;
@@ -2135,9 +2151,17 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
if (!crng_ready()) {
if (flags & GRND_NONBLOCK)
return -EAGAIN;
- ret = wait_for_random_bytes();
- if (unlikely(ret))
- return ret;
+ WARN_ON_ONCE(1);
+ if (getrandom_block) {
+ if (__ratelimit(&getrandom_warning))
+ pr_err("random: %s: getrandom blocking for CRNG initialization\n",
+ current->comm);
+ ret = wait_for_random_bytes();
+ if (unlikely(ret))
+ return ret;
+ } else if (__ratelimit(&getrandom_warning))
+ pr_err("random: %s: getrandom called too early\n",
+ current->comm);
}
return urandom_read(NULL, buf, count, NULL);
}
--
2.23.0
On Sat, Sep 14, 2019 at 10:05:21PM -0400, Theodore Y. Ts'o wrote:
> I'd be willing to let it take at least 2 minutes, since that's slow
> enough to be annoying.
It's an eternity, and prevents a backup system from being turned on in
time to replace a dead system. In fact the main problem with this is
that it destroys uptime on already configured systems for the sake of
making sure a private SSH key is produce correctly. It turns out that
if we instead give the info to this tool that the produced random is
not strong, this only tool that requires good entropy will be able to
ask the user to type something to add real entropy. But making the
system wait forever will not bring any extra entropy because the
services cannot start, it will not even receive network traffic and
will not be able to collect entropy. Sorry Ted, but I've been hit by
this already. It's a real problem to see a system not finish to boot
after a crash when you know your systems have only 5 minutes of total
downtime allowed per year (5 nines). And when the SSH keys, like the
rest of the config, were supposed to be either synchronized from the
network or pre-populated in a system image, nobody finds this a valid
justification for an extended downtime.
> Except the developer could (and *has) just ignored the warning, which
> is what happened with /dev/urandom when it was accessed too early.
That's why it's nice to have getrandom() return the error : it will
for once allow the developer of the program to care depending on the
program. Those proposing to choose the pieces to present in Tetris
will not care, those trying to generate an SSH key will care and will
have solid and well known fallbacks. And the rare ones who need good
randoms and ignore the error will be the ones *responsible* for this,
it will not be the kernel anymore giving bad random.
BTW I was thinking that EAGAIN was semantically better than EINVAL to
indicate that the same call should be done with blocking.
Just my two cents,
Willy
On Sat, Sep 14, 2019 at 10:05:21PM -0400, Theodore Y. Ts'o wrote:
> You basically want to turn getrandom into /dev/urandom. And that's
> how we got into the mess where 10% of the publically accessible ssh
> keys could be guessed.
Not exactly. This was an *API* issue that created this situation. The
fact that you had a single random() call in the libc, either mapped
to /dev/urandom or to /dev/random. By then many of us were used to rely
on one or the other and finding systems where /dev/random was a symlink
to /dev/urandom to avoid blocking was extremely common. In fact it was
caused by the exact same situation: we try to enforce good random for
everyone, it cannot work all the time and breaks programs which do not
need such randoms, so the user breaks the trust on randomness by
configuring the system so that randoms work all the time for the most
common programs. And that's how you end up with SSH trusting a broken
random generator without knowing it was misconfigured.
Your getrandom() API does have the ability to fix this. In my opinion
the best way to proceed is to consider that all those who don't care
about randomness quality never block and that those who care can be
sure they will either get good randoms or will know about it. Ideally
calling getrandom() without any flag should be equivalent to what you
have with /dev/urandom and be good enough to put a UUID on a file
system. And calling it with "SECURE" or something like this will be
the indication that it will not betray you and will only return good
randoms (which is what GRND_RANDOM does in my opinion).
The huge difference between getrandom() and /dev/*random here is that
each application can decide what type of random to use without relying
on what system-wide breakage was applied just for the sake of fixing
another simple application. This could even help OpenSSL use two different
calls for RAND_bytes() and RAND_pseudo_bytes(), instead of using the
same call and blocking.
Last but not least, I think we need to educate developers regarding
random number consumption, asking "if you could produce only 16 bytes
of random in your whole system's lifetime, where would you use them?".
Entropy is extremely precious and yet the most poorly used resource. I
almost wouldn't mind seeing GRND_RANDOM requiring a special capability
since it does have a system-wide impact!
Regards,
Willy
On Sa, 14.09.19 09:52, Linus Torvalds ([email protected]) wrote:
> On Sat, Sep 14, 2019 at 9:35 AM Alexander E. Patrakov
> <[email protected]> wrote:
> >
> > Let me repeat: not -EINVAL, please. Please find some other error code,
> > so that the application could sensibly distinguish between this case
> > (low quality entropy is in the buffer) and the "kernel is too dumb" case
> > (and no entropy is in the buffer).
>
> I'm not convinced we want applications to see that difference.
>
> The fact is, every time an application thinks it cares, it has caused
> problems. I can just see systemd saying "ok, the kernel didn't block,
> so I'll just do
>
> while (getrandom(x) == -ENOENTROPY)
> sleep(1);
>
> instead. Which is still completely buggy garbage.
>
> The fact is, we can't guarantee entropy in general. It's probably
> there is practice, particularly with user space saving randomness from
> last boot etc, but that kind of data may be real entropy, but the
> kernel cannot *guarantee* that it is.
I am not expecting the kernel to guarantee entropy. I just expecting
the kernel to not give me garbage knowingly. It's OK if it gives me
garbage unknowingly, but I have a problem if it gives me trash all the
time.
There's benefit in being able to wait until the pool is initialized
before we update the random seed stored on disk with a new one, and
there's benefit in being able to wait until the pool is initialized
before we let cryptsetup read a fresh, one-time key for dm-crypt from
/dev/urandom. I fully understand that any such reporting for
initialization is "best-effort", i.e. to the point where we don't know
anything to the contrary, but at least give userspace that.
Lennart
--
Lennart Poettering, Berlin
On Sa, 14.09.19 09:30, Linus Torvalds ([email protected]) wrote:
> > => src/random-seed/random-seed.c:
> > /*
> > * Let's make this whole job asynchronous, i.e. let's make
> > * ourselves a barrier for proper initialization of the
> > * random pool.
> > */
> > k = getrandom(buf, buf_size, GRND_NONBLOCK);
> > if (k < 0 && errno == EAGAIN && synchronous) {
> > log_notice("Kernel entropy pool is not initialized yet, "
> > "waiting until it is.");
> >
> > k = getrandom(buf, buf_size, 0); /* retry synchronously */
> > }
>
> Yeah, the above is yet another example of completely broken garbage.
>
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.
Oh man. Just spend 5min to understand the situation, before claiming
this was garbage or that was garbage. The code above does not block
boot. It blocks startup of services that explicit order themselves
after the code above. There's only a few services that should do that,
and the main system boots up just fine without waiting for this.
Primary example for stuff that orders itself after the above,
correctly: cryptsetup entries that specify /dev/urandom as password
source (i.e. swap space and stuff, that wants a new key on every
boot). If we don't wait for the initialized pool for cases like that
the password for that swap space is not actually going to be random,
and that defeats its purpose.
Another example: the storing of an updated random seed file on
disk. We should only do that if the seed on disk is actually properly
random, i.e. comes from an initialized pool. Hence we wait for the
pool to be initialized before reading the seed from the pool, and
writing it to disk.
I'd argue that doing things like this is not "garbage", like you say,
but *necessary* to make this stuff safe and secure.
And no, other stuff is not delayed for this (but there are bugs of
course, some random services in 3rd party packages that set too
agressive deps, but that needs to be fixed there, and not in the
kernel).
Anyway, I really don't appreciate your tone, and being sucked into
messy LKML discussions. I generally stay away from LKML, and gah, you
remind me why. Just tone it down, not everything you never bothered to
understand is "garbage".
And please don't break /dev/urandom again. The above code is the ony
way I see how we can make /dev/urandom-derived swap encryption safe,
and the only way I can see how we can sanely write a valid random seed
to disk after boot. You guys changed semantics on /dev/urandom all the
time in the past, don't break API again, thank you very much.
Lennart
On Sun, Sep 15, 2019 at 08:56:55AM +0200, Lennart Poettering wrote:
> There's benefit in being able to wait until the pool is initialized
> before we update the random seed stored on disk with a new one,
And what exactly makes you think that waiting with arms crossed not
doing anything else has any chance to make the situation change if
you already had no such entropy available when reaching that first
call, especially during early boot ?
Willy
On So, 15.09.19 09:01, Willy Tarreau ([email protected]) wrote:
> On Sun, Sep 15, 2019 at 08:56:55AM +0200, Lennart Poettering wrote:
> > There's benefit in being able to wait until the pool is initialized
> > before we update the random seed stored on disk with a new one,
>
> And what exactly makes you think that waiting with arms crossed not
> doing anything else has any chance to make the situation change if
> you already had no such entropy available when reaching that first
> call, especially during early boot ?
That code can finish 5h after boot, it's entirely fine with this
specific usecase.
Again: we don't delay "the boot" for this. We just delay "writing a
new seed to disk" for this. And if that is 5h later, then that's
totally fine, because in the meantime it's just one bg process more that
hangs around waiting to do what it needs to do.
Lennart
--
Lennart Poettering, Berlin
On Sun, Sep 15, 2019 at 09:05:41AM +0200, Lennart Poettering wrote:
> On So, 15.09.19 09:01, Willy Tarreau ([email protected]) wrote:
>
> > On Sun, Sep 15, 2019 at 08:56:55AM +0200, Lennart Poettering wrote:
> > > There's benefit in being able to wait until the pool is initialized
> > > before we update the random seed stored on disk with a new one,
> >
> > And what exactly makes you think that waiting with arms crossed not
> > doing anything else has any chance to make the situation change if
> > you already had no such entropy available when reaching that first
> > call, especially during early boot ?
>
> That code can finish 5h after boot, it's entirely fine with this
> specific usecase.
>
> Again: we don't delay "the boot" for this. We just delay "writing a
> new seed to disk" for this. And if that is 5h later, then that's
> totally fine, because in the meantime it's just one bg process more that
> hangs around waiting to do what it needs to do.
Didn't you say it could also happen when using encrypted swap ? If so
I suspect this could happen very early during boot, before any services
may be started ?
Willy
On Sun, Sep 15, 2019 at 08:51:42AM +0200, Lennart Poettering wrote:
> On Sa, 14.09.19 09:30, Linus Torvalds ([email protected]) wrote:
[...]
>
> And please don't break /dev/urandom again. The above code is the ony
> way I see how we can make /dev/urandom-derived swap encryption safe,
> and the only way I can see how we can sanely write a valid random seed
> to disk after boot.
>
Any hope in making systemd-random-seed(8) credit that "random seed
from previous boot" file, through RNDADDENTROPY, *by default*?
Because of course this makes the problem reliably go away on my system
too (as discussed in the original bug report, but you were not CCed).
I know that by v243, just released 12 days ago, this can be optionally
done through SYSTEMD_RANDOM_SEED_CREDIT=1. I wonder though if it can
ever be done by default, just like what the BSDs does... This would
solve a big part of the current problem.
> Lennart
thanks,
--
darwi
http://darwish.chasingpointers.com
Since Linux v3.17, getrandom() has been created as a new and more
secure interface for pseudorandom data requests. It attempted to solve
three problems as compared to /dev/urandom:
1. the need to access filesystem paths, which can fail, e.g. under a
chroot
2. the need to open a file descriptor, which can fail under file
descriptor exhaustion attacks
3. the possibility to get not-so-random data from /dev/urandom, due to
an incompletely initialized kernel entropy pool
To solve the third problem, getrandom(2) was made to block until a
proper amount of entropy has been accumulated. This basically made the
system call have no guaranteed upper-bound for its waiting time.
As was said in c6e9d6f38894 (random: introduce getrandom(2) system
call): "Any userspace program which uses this new functionality must
take care to assure that if it is used during the boot process, that it
will not cause the init scripts or other portions of the system startup
to hang indefinitely."
Meanwhile, user-facing Linux documentation, e.g. the urandom(4) and
getrandom(2) manpages, didn't add such explicit warnings. It didn't
also help that glibc, since v2.25, implemented an "OpenBSD-like"
getentropy(3) in terms of getrandom(2). OpenBSD getentropy(2) never
blocked though, while linux-glibc version did, possibly indefinitely.
Since that glibc change, even more applications at the boot-path began
to implicitly reques randomness through getrandom(2); e.g., for an
Xorg/Xwayland MIT cookie.
OpenBSD genentropy(2) never blocked because, as stated in its rnd(4)
manpages, it saves entropy to disk on shutdown and restores it on boot.
Moreover, the NetBSD bootloader, as shown in its boot(8), even have
special commands to load a random seed file and pass it to the kernel.
Meanwhile on a Linux systemd userland, systemd-random-seed(8) preserved
a random seed across reboots at /var/lib/systemd/random-seed, but it
never had the actual code, until very recently at v243, to ask the
kernel to credit such entropy through an RNDADDENTROPY ioctl.
From a mix of the above factors, it began to be common for Embedded
Linux systems to "get stuck at boot" unless a daemon like haveged is
installed, or the BSP provider enabling the necessary hwrng driver in
question and crediting its entropy; e.g. 62f95ae805fa (hwrng: omap - Set
default quality). Over time, the issue began to even creep into
consumer-level x86 laptops: mainstream distributions, like debian
buster, began to recommend installing haveged as a workaround.
Thus, on certain setups where there is no hwrng (embedded systems or VMs
on a host lacking virtio-rng), or the hwrng is not trusted by some users
(intel RDRAND), or sometimes it's just broken (amd RDRAND), the system
boot can be *reliably* blocked.
It can therefore be argued that there is no way to use getrandom() on
Linux correctly, especially from shared libraries: GRND_NONBLOCK has
to be used, and a fallback to some other interface like /dev/urandom
is required, thus making the net result no better than just using
/dev/urandom unconditionally.
The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which merges
directory lookup code inode table IO, and thus minimizes the number of
disk interrupts and entropy during boot. After that commit, a blocked
boot can be reliably reproduced on a Thinkpad E480 laptop with
standard ArchLinux user-space.
Thus, don't trust user-space on calling getrandom(2) from the right
context. Never block, by default, and just return data from the
urandom source if entropy is not yet available. This is an explicit
decision not to let user-space work around this through busy loops on
error-codes.
Note: this lowers the quality of random data returned by getrandom(2)
to the level of randomness returned by /dev/urandom, with all the
original security implications coming out of that, as discussed in
problem "3." at the top of this commit log. If this is not desirable,
offer users a fallback to old behavior, by CONFIG_RANDOM_BLOCK=y, or
random.getrandom_block=true bootparam.
[[email protected]: make the change to a non-blocking getrandom(2) optional]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://factorable.net ("Widespread Weak Keys in Network Devices")
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Rreported-by: Ahmed S. Darwish <[email protected]>
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Signed-off-by: Ahmed S. Darwish <[email protected]>
---
Notes:
changelog-v2:
- tytso: make blocking optional
changelog-v3:
- more detailed commit log + historical context (thanks patrakov)
- remove WARN_ON_ONCE. It's pretty excessive, and the first caller
is systemd-random-seed(8), which we know it will not change.
Just print errors in the kernel log.
$dmesg | grep random:
[0.235843] random: get_random_bytes called from start_kernel+0x30f/0x4d7 with crng_init=0
[0.685682] random: fast init done
[2.405263] random: lvm: CRNG uninitialized (4 bytes read)
[2.480686] random: systemd-random-: getrandom (512 bytes): CRNG not yet initialized
[2.480687] random: systemd-random-: CRNG uninitialized (512 bytes read)
[3.265201] random: dbus-daemon: CRNG uninitialized (12 bytes read)
[3.835066] urandom_read: 1 callbacks suppressed
[3.835068] random: polkitd: CRNG uninitialized (8 bytes read)
[3.835509] random: polkitd: CRNG uninitialized (8 bytes read)
[3.835577] random: polkitd: CRNG uninitialized (8 bytes read)
[4.190653] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized
[4.190658] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized
[4.190662] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized
[4.952299] random: crng init done
[4.952311] random: 3 urandom warning(s) missed due to ratelimiting
[4.952314] random: 1 getrandom warning(s) missed due to ratelimiting
drivers/char/Kconfig | 33 +++++++++++++++++++++++++++++++--
drivers/char/random.c | 33 ++++++++++++++++++++++++++++-----
2 files changed, 59 insertions(+), 7 deletions(-)
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3e866885a405..337baeca5ebc 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -557,8 +557,6 @@ config ADI
and SSM (Silicon Secured Memory). Intended consumers of this
driver include crash and makedumpfile.
-endmenu
-
config RANDOM_TRUST_CPU
bool "Trust the CPU manufacturer to initialize Linux's CRNG"
depends on X86 || S390 || PPC
@@ -573,3 +571,34 @@ config RANDOM_TRUST_CPU
has not installed a hidden back door to compromise the CPU's
random number generation facilities. This can also be configured
at boot with "random.trust_cpu=on/off".
+
+config RANDOM_BLOCK
+ bool "Block if getrandom is called before CRNG is initialized"
+ help
+ Say Y here if you want userspace programs which call
+ getrandom(2) before the Cryptographic Random Number
+ Generator (CRNG) is initialized to block until
+ secure random numbers are available.
+
+ Say N if you believe usability is more important than
+ security, so if getrandom(2) is called before the CRNG is
+ initialized, it should not block, but instead return "best
+ effort" randomness which might not be very secure or random
+ at all; but at least the system boot will not be delayed by
+ minutes or hours.
+
+ This can also be controlled at boot with
+ "random.getrandom_block=on/off".
+
+ Ideally, systems would be configured with hardware random
+ number generators, and/or configured to trust CPU-provided
+ RNG's. In addition, userspace should generate cryptographic
+ keys only as late as possible, when they are needed, instead
+ of during early boot. (For non-cryptographic use cases,
+ such as dictionary seeds or MIT Magic Cookies, other
+ mechanisms such as /dev/urandom or random(3) may be more
+ appropropriate.) This config option controls what the
+ kernel should do as a fallback when the non-ideal case
+ presents itself.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 4a50ee2c230d..689fdb486785 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
static struct ratelimit_state urandom_warning =
RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+static struct ratelimit_state getrandom_warning =
+ RATELIMIT_STATE_INIT("warn_getrandom_randomness", HZ, 3);
static int ratelimit_disable __read_mostly;
@@ -854,12 +856,19 @@ static void invalidate_batched_entropy(void);
static void numa_crng_init(void);
static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static bool getrandom_block __ro_after_init = IS_ENABLED(CONFIG_RANDOM_BLOCK);
static int __init parse_trust_cpu(char *arg)
{
return kstrtobool(arg, &trust_cpu);
}
early_param("random.trust_cpu", parse_trust_cpu);
+static int __init parse_block(char *arg)
+{
+ return kstrtobool(arg, &getrandom_block);
+}
+early_param("random.getrandom_block", parse_block);
+
static void crng_initialize(struct crng_state *crng)
{
int i;
@@ -1053,6 +1062,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
urandom_warning.missed);
urandom_warning.missed = 0;
}
+ if (getrandom_warning.missed) {
+ pr_notice("random: %d getrandom warning(s) missed "
+ "due to ratelimiting\n",
+ getrandom_warning.missed);
+ getrandom_warning.missed = 0;
+ }
}
}
@@ -1915,6 +1930,7 @@ int __init rand_initialize(void)
crng_global_init_time = jiffies;
if (ratelimit_disable) {
urandom_warning.interval = 0;
+ getrandom_warning.interval = 0;
unseeded_warning.interval = 0;
}
return 0;
@@ -1984,8 +2000,8 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
if (!crng_ready() && maxwarn > 0) {
maxwarn--;
if (__ratelimit(&urandom_warning))
- printk(KERN_NOTICE "random: %s: uninitialized "
- "urandom read (%zd bytes read)\n",
+ pr_err("random: %s: CRNG uninitialized "
+ "(%zd bytes read)\n",
current->comm, nbytes);
spin_lock_irqsave(&primary_crng.lock, flags);
crng_init_cnt = 0;
@@ -2152,9 +2168,16 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
if (!crng_ready()) {
if (flags & GRND_NONBLOCK)
return -EAGAIN;
- ret = wait_for_random_bytes();
- if (unlikely(ret))
- return ret;
+
+ if (__ratelimit(&getrandom_warning))
+ pr_err("random: %s: getrandom (%zd bytes): CRNG not "
+ "yet initialized", current->comm, count);
+
+ if (getrandom_block) {
+ ret = wait_for_random_bytes();
+ if (unlikely(ret))
+ return ret;
+ }
}
return urandom_read(NULL, buf, count, NULL);
}
--
darwi
http://darwish.chasingpointers.com
On So, 15.09.19 09:07, Willy Tarreau ([email protected]) wrote:
> > That code can finish 5h after boot, it's entirely fine with this
> > specific usecase.
> >
> > Again: we don't delay "the boot" for this. We just delay "writing a
> > new seed to disk" for this. And if that is 5h later, then that's
> > totally fine, because in the meantime it's just one bg process more that
> > hangs around waiting to do what it needs to do.
>
> Didn't you say it could also happen when using encrypted swap ? If so
> I suspect this could happen very early during boot, before any services
> may be started ?
Depends on the deps, and what options are used in /etc/crypttab. If
people hard rely on swap to be enabled for boot to proceed and also
use one-time passwords from /dev/urandom they better provide some form
of hw rng, too. Otherwise the boot will block, yes.
Basically, just add "nofail" to a line in /etc/crypttab, and the entry
will be activated at boot, but we won't delay boot for it. It's going
to be activated as soon as the deps are fulfilled (and thus the pool
initialized), but that may well be 5h after boot, and that's totally
OK as long as nothing else hard depends on it.
Lennart
--
Lennart Poettering, Berlin
On So, 15.09.19 09:27, Ahmed S. Darwish ([email protected]) wrote:
> On Sun, Sep 15, 2019 at 08:51:42AM +0200, Lennart Poettering wrote:
> > On Sa, 14.09.19 09:30, Linus Torvalds ([email protected]) wrote:
> [...]
> >
> > And please don't break /dev/urandom again. The above code is the ony
> > way I see how we can make /dev/urandom-derived swap encryption safe,
> > and the only way I can see how we can sanely write a valid random seed
> > to disk after boot.
> >
>
> Any hope in making systemd-random-seed(8) credit that "random seed
> from previous boot" file, through RNDADDENTROPY, *by default*?
No. For two reasons:
a) It's way too late. We shouldn't credit entropy from the disk seed
if we cannot update the disk seed with a new one at the same time,
otherwise we might end up crediting the same seed twice on
subsequent reboots (think: user hard powers off a system after we
credited but before we updated), in which case there would not be a
point in doing that at all. Hence, we have to wait until /var is
writable, but that's relatively late during boot. Long afer the
initrd ran, long after iscsi and what not ran. Long after the
network stack is up and so on. In a time where people load root
images from the initrd via HTTPS thats's generally too late to be
useful.
b) Golden images are a problem. There are probably more systems
running off golden images in the wild, than those not running off
them. This means: a random seed on disk is only safe to credit if
it gets purged when the image is distributed to the systems it's
supposed to be used on, because otherwise these systems will all
come up with the very same seed, which makes it useless. So, by
requesting people to explicitly acknowledge that they are aware of
this problem (and either don't use golden images, or safely wipe
the seed off the image before shipping it), by setting the env var,
we protect ourselves from this.
Last time I looked at it most popular distro's live images didn't wipe
the random seed properly before distributing it to users...
This is all documented btw:
https://systemd.io/RANDOM_SEEDS#systemds-support-for-filling-the-kernel-entropy-pool
See point #2.
> I know that by v243, just released 12 days ago, this can be optionally
> done through SYSTEMD_RANDOM_SEED_CREDIT=1. I wonder though if it can
> ever be done by default, just like what the BSDs does... This would
> solve a big part of the current problem.
I think the best approach would be to do this in the boot loader. In
fact systemd does this in its own boot loader (sd-boot): it reads a
seed off the ESP, updates it (via a SHA256 hashed from the old one)
and passes that to the OS. PID 1 very early on then credits this to
the kernel's pool (ideally the kernel would just do this on its own
btw). The trick we employ to make this generally safe is that we
persistently store a "system token" as EFI var too, and include it in
the SHA sum. The "system token" is a per-system random blob. It is
created the first time it's needed and a good random source exists,
and then stays on the system, for all future live images to use. This
makes sure that even if sloppily put together live images are used
(which do not reset any random seed) every system will use a different
series of RNG seeds.
This then solves both problems: the golden image problem, and the
early-on problem. But of course only on ESP. Other systems should be
able to provide similar mechanisms though, it's not rocket science.
This is also documented here:
https://systemd.io/RANDOM_SEEDS#systemds-support-for-filling-the-kernel-entropy-pool
See point #3...
Ideally other boot loaders (grub, …) would support the same scheme,
but I am not sure the problem set is known to them.
Lennart
--
Lennart Poettering, Berlin
On So, 15.09.19 10:17, Ahmed S. Darwish ([email protected]) wrote:
> Thus, don't trust user-space on calling getrandom(2) from the right
> context. Never block, by default, and just return data from the
> urandom source if entropy is not yet available. This is an explicit
> decision not to let user-space work around this through busy loops on
> error-codes.
>
> Note: this lowers the quality of random data returned by getrandom(2)
> to the level of randomness returned by /dev/urandom, with all the
> original security implications coming out of that, as discussed in
> problem "3." at the top of this commit log. If this is not desirable,
> offer users a fallback to old behavior, by CONFIG_RANDOM_BLOCK=y, or
> random.getrandom_block=true bootparam.
This is an awful idea. It just means that all crypto that needs
entropy doing during early boot will now be using weak keys, and
doesn't even know it.
Yeah, it's a bad situation, but I am very sure that failing loudly in
this case is better than just sticking your head in the sand and
ignoring the issue without letting userspace know is an exceptionally
bad idea.
We live in a world where people run HTTPS, SSH, and all that stuff in
the initrd already. It's where SSH host keys are generated, and plenty
session keys. If Linux lets all that stuff run with awful entropy then
you pretend things where secure while they actually aren't. It's much
better to fail loudly in that case, I am sure.
Quite frankly, I don't think this is something to fix in the
kernel. Let the people putting together systems deal with this. Let
them provide a creditable hw rng, and let them pay the price if they
don't.
Lennart
--
Lennart Poettering, Berlin
On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> We live in a world where people run HTTPS, SSH, and all that stuff in
> the initrd already. It's where SSH host keys are generated, and plenty
> session keys.
It is exactly the type of crap that create this situation : making
people developing such scripts believe that any random source was OK
to generate these, and as such forcing urandom to produce crypto-solid
randoms! No, distro developers must know that it's not acceptable to
generate lifetime crypto keys from the early boot when no entropy is
available. At least with this change they will get an error returned
from getrandom() and will be able to ask the user to feed entropy, or
be able to say "it was impossible to generate the SSH key right now,
the daemon will only be started once it's possible", or "the SSH key
we produced will not be saved because it's not safe and is only usable
for this recovery session".
> If Linux lets all that stuff run with awful entropy then
> you pretend things where secure while they actually aren't. It's much
> better to fail loudly in that case, I am sure.
This is precisely what this change permits : fail instead of block
by default, and let applications decide based on the use case.
> Quite frankly, I don't think this is something to fix in the
> kernel.
As long as it offers a single API to return randoms, and that it is
not possible not to block for low-quality randoms, it needs to be
at least addressed there. Then userspace can adapt. For now userspace
does not have this option just due to the kernel's way of exposing
randoms.
Willy
On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> > We live in a world where people run HTTPS, SSH, and all that stuff in
> > the initrd already. It's where SSH host keys are generated, and plenty
> > session keys.
>
> It is exactly the type of crap that create this situation : making
> people developing such scripts believe that any random source was OK
> to generate these, and as such forcing urandom to produce crypto-solid
> randoms!
Willy, let's tone it down please... the thread is already getting a
bit toxic.
> No, distro developers must know that it's not acceptable to
> generate lifetime crypto keys from the early boot when no entropy is
> available. At least with this change they will get an error returned
> from getrandom() and will be able to ask the user to feed entropy, or
> be able to say "it was impossible to generate the SSH key right now,
> the daemon will only be started once it's possible", or "the SSH key
> we produced will not be saved because it's not safe and is only usable
> for this recovery session".
>
> > If Linux lets all that stuff run with awful entropy then
> > you pretend things where secure while they actually aren't. It's much
> > better to fail loudly in that case, I am sure.
>
> This is precisely what this change permits : fail instead of block
> by default, and let applications decide based on the use case.
>
Unfortunately, not exactly.
Linus didn't want getrandom to return an error code / "to fail" in
that case, but to silently return CRNG-uninitialized /dev/urandom
data, to avoid user-space even working around the error code through
busy-loops.
I understand the rationale behind that, of course, and this is what
I've done so far in the V3 RFC.
Nonetheless, this _will_, for example, make systemd-random-seed(8)
save week seeds under /var/lib/systemd/random-seed, since the kernel
didn't inform it about such weakness at all..
The situation is so bad now, that it's more of "some user-space are
more equal than others".. Let's just at least admit this while
discussing the RFC patch in question.
thanks,
> > Quite frankly, I don't think this is something to fix in the
> > kernel.
>
> As long as it offers a single API to return randoms, and that it is
> not possible not to block for low-quality randoms, it needs to be
> at least addressed there. Then userspace can adapt. For now userspace
> does not have this option just due to the kernel's way of exposing
> randoms.
>
> Willy
On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote:
> On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> > > We live in a world where people run HTTPS, SSH, and all that stuff in
> > > the initrd already. It's where SSH host keys are generated, and plenty
> > > session keys.
> >
> > It is exactly the type of crap that create this situation : making
> > people developing such scripts believe that any random source was OK
> > to generate these, and as such forcing urandom to produce crypto-solid
> > randoms!
>
> Willy, let's tone it down please... the thread is already getting a
> bit toxic.
I don't see what's wrong in my tone above, I'm sorry if it can be
perceived as such. My point was that things such as creating lifetime
keys while there's no entropy is the wrong thing to do and what
progressively led to this situation.
> > > If Linux lets all that stuff run with awful entropy then
> > > you pretend things where secure while they actually aren't. It's much
> > > better to fail loudly in that case, I am sure.
> >
> > This is precisely what this change permits : fail instead of block
> > by default, and let applications decide based on the use case.
> >
>
> Unfortunately, not exactly.
>
> Linus didn't want getrandom to return an error code / "to fail" in
> that case, but to silently return CRNG-uninitialized /dev/urandom
> data, to avoid user-space even working around the error code through
> busy-loops.
But with this EINVAL you have the information that it only filled
the buffer with whatever it could, right ? At least that was the
last point I manage to catch in the discussion. Otherwise if it's
totally silent, I fear that it will reintroduce the problem in a
different form (i.e. libc will say "our randoms are not reliable
anymore, let us work around this and produce blocking, solid randoms
again to help all our users").
> I understand the rationale behind that, of course, and this is what
> I've done so far in the V3 RFC.
>
> Nonetheless, this _will_, for example, make systemd-random-seed(8)
> save week seeds under /var/lib/systemd/random-seed, since the kernel
> didn't inform it about such weakness at all..
Then I am confused because I understood that the goal was to return
EINVAL or anything equivalent in which case the userspace knows what
it has to deal with :-/
Regards,
Willy
On Sun, Sep 15, 2019 at 12:40:27PM +0200, Willy Tarreau wrote:
> On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote:
> > On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> > > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
[...]
> > > > If Linux lets all that stuff run with awful entropy then
> > > > you pretend things where secure while they actually aren't. It's much
> > > > better to fail loudly in that case, I am sure.
> > >
> > > This is precisely what this change permits : fail instead of block
> > > by default, and let applications decide based on the use case.
> > >
> >
> > Unfortunately, not exactly.
> >
> > Linus didn't want getrandom to return an error code / "to fail" in
> > that case, but to silently return CRNG-uninitialized /dev/urandom
> > data, to avoid user-space even working around the error code through
> > busy-loops.
>
> But with this EINVAL you have the information that it only filled
> the buffer with whatever it could, right ? At least that was the
> last point I manage to catch in the discussion. Otherwise if it's
> totally silent, I fear that it will reintroduce the problem in a
> different form (i.e. libc will say "our randoms are not reliable
> anymore, let us work around this and produce blocking, solid randoms
> again to help all our users").
>
V1 of the patch I posted did indeed return -EINVAL. Linus then
suggested that this might make still some user-space act smart and
just busy-loop around that, basically blocking the boot again:
https://lkml.kernel.org/r/CAHk-=wiB0e_uGpidYHf+dV4eeT+XmG-+rQBx=JJ110R48QFFWw@mail.gmail.com
https://lkml.kernel.org/r/CAHk-=whSbo=dBiqozLoa6TFmMgbeB8d9krXXvXBKtpRWkG0rMQ@mail.gmail.com
So it was then requested to actually return what /dev/urandom would
return, so that user-space has no way whatsoever in knowing if
getrandom has failed. Then, it's the job of system integratos / BSP
builders to fix the inspect the big fat WARN on the kernel and fix
that.
This is the core of Lennart's critqueue of V3 above.
> > I understand the rationale behind that, of course, and this is what
> > I've done so far in the V3 RFC.
> >
> > Nonetheless, this _will_, for example, make systemd-random-seed(8)
> > save week seeds under /var/lib/systemd/random-seed, since the kernel
> > didn't inform it about such weakness at all..
>
> Then I am confused because I understood that the goal was to return
> EINVAL or anything equivalent in which case the userspace knows what
> it has to deal with :-/
>
Yeah, the discussion moved a bit beyond that.
thanks,
--darwi
On Sun, Sep 15, 2019 at 12:55:39PM +0200, Ahmed S. Darwish wrote:
> On Sun, Sep 15, 2019 at 12:40:27PM +0200, Willy Tarreau wrote:
> > On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote:
> > > On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> > > > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> [...]
> > > > > If Linux lets all that stuff run with awful entropy then
> > > > > you pretend things where secure while they actually aren't. It's much
> > > > > better to fail loudly in that case, I am sure.
> > > >
> > > > This is precisely what this change permits : fail instead of block
> > > > by default, and let applications decide based on the use case.
> > > >
> > >
> > > Unfortunately, not exactly.
> > >
> > > Linus didn't want getrandom to return an error code / "to fail" in
> > > that case, but to silently return CRNG-uninitialized /dev/urandom
> > > data, to avoid user-space even working around the error code through
> > > busy-loops.
> >
> > But with this EINVAL you have the information that it only filled
> > the buffer with whatever it could, right ? At least that was the
> > last point I manage to catch in the discussion. Otherwise if it's
> > totally silent, I fear that it will reintroduce the problem in a
> > different form (i.e. libc will say "our randoms are not reliable
> > anymore, let us work around this and produce blocking, solid randoms
> > again to help all our users").
> >
>
> V1 of the patch I posted did indeed return -EINVAL. Linus then
> suggested that this might make still some user-space act smart and
> just busy-loop around that, basically blocking the boot again:
>
> https://lkml.kernel.org/r/CAHk-=wiB0e_uGpidYHf+dV4eeT+XmG-+rQBx=JJ110R48QFFWw@mail.gmail.com
> https://lkml.kernel.org/r/CAHk-=whSbo=dBiqozLoa6TFmMgbeB8d9krXXvXBKtpRWkG0rMQ@mail.gmail.com
>
> So it was then requested to actually return what /dev/urandom would
> return, so that user-space has no way whatsoever in knowing if
> getrandom has failed. Then, it's the job of system integratos / BSP
> builders to fix the inspect the big fat WARN on the kernel and fix
> that.
Then I was indeed a bit confused in the middle of the discussion as
I didn't understand exactly this, thanks for the clarifying :-)
But does it still block when called with GRND_RANDOM ? If so I guess
I'm fine as it translates exactly the previous behavior of random vs
urandom, and that GRND_NONBLOCK allows the application to fall back
to reliable sources if needed (typically human interactions).
Thanks,
Willy
On Sat, Sep 14, 2019 at 11:51 PM Lennart Poettering
<[email protected]> wrote:
>
> Oh man. Just spend 5min to understand the situation, before claiming
> this was garbage or that was garbage. The code above does not block
> boot.
Yes it does. You clearly didn't read the thread.
> It blocks startup of services that explicit order themselves
> after the code above. There's only a few services that should do that,
> and the main system boots up just fine without waiting for this.
That's a nice theory, but it doesn't actually match reality.
There are clearly broken setups that use this for things that it
really shouldn't be used for. Asking for true randomness at boot
before there is any indication that randomness exists, and then just
blocking with no further action that could actually _generate_ said
randomness.
If your description was true that the system would come up and be
usable while the blocked thread is waiting for that to happen, things
would be fine.
But that simply isn't the case.
Linus
On Sat, Sep 14, 2019 at 11:56 PM Lennart Poettering
<[email protected]> wrote:
>
> I am not expecting the kernel to guarantee entropy. I just expecting
> the kernel to not give me garbage knowingly. It's OK if it gives me
> garbage unknowingly, but I have a problem if it gives me trash all the
> time.
So realistically, we never actually give you *garbage*.
It's just that we try very hard to actually give you some entropy
guarantees, and that we can't always do in a timely manner -
particularly if you don't help.
But on a PC, we can _almost_ guarantee entropy. Even with a golden
image, we do mix in:
- timestamp counter on every device interrupt (but "device interrupt"
doesn't include things like the local CPU timer, so it really needs
device activity)
- random boot and BIOS memory (dmi tables, the EFI RNG entry, etc)
- various device state (things like MAC addresses when registering
network devices, USB device numbers, etc)
- and obviously any CPU rdrand data
and note the "mix in" part - it's all designed so that you don't trust
any of this for randomness on its own, but very much hopefully it
means that almost *any* differences in boot environment will add a
fair amount of unpredictable behavior.
But also note the "on a PC" part.
Also note that as far as the kernel is concerned, none of the above
counts as "entropy" for us, except to a very small degree the device
interrupt timing thing. But you need hundreds of interrupts for that
to be considered really sufficient.
And that's why things broke. It turns out that making ext4 be more
efficient at boot caused fewer disk interrupts, and now we weren't
convinced we had sufficient entropy. And the systemd boot thing just
*stopped* waiting for entropy to magically appear, which is never will
if the machine is idle and not doing anything.
So do we give you "garbage" in getrandom()? We try really really hard
not to, but it's exactly the "can we _guarantee_ that it has entropy"
that ends up being the problem.
So if some silly early boot process comes along, and asks for "true
randomness", and just blocks for it without doing anything else,
that's broken from a kernel perspective.
In practice, the only situation we have had really big problems with
not giving "garbage" isn't actually the "golden distro image" case you
talk about. It's the "embedded device golden _system_ image" case,
where the image isn't just the distribution, but the full bootloader
state.
Some cheap embedded MIPS CPU without even a timestamp counter, with
identical flash contents for millions of devices, and doing a "on
first boot, generate a long-term key" without even connecting to the
network first.
That's the thing Ted was pointing at:
https://factorable.net/weakkeys12.extended.pdf
so yes, it can be "garbage", but it can be garbage only if you really
really do things entirely wrong.
But basically, you should never *ever* try to generate some long-lived
key and then just wait for it without doing anything else. The
"without doing anything else" is key here.
But every time we've had a blocking interface, that's exactly what
somebody has done. Which is why I consider that long blocking thing to
be completely unacceptable. There is no reason to believe that the
wait will ever end, partly exactly because we don't consider timer
interrupts to add any timer randomness. So if you are just waiting,
nothing necessarily ever happen.
Linus
[ Added Lennart, who was active in the other thread ]
On Sat, Sep 14, 2019 at 10:22 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> Thus, add an optional configuration option which stops getrandom(2)
> from blocking, but instead returns "best efforts" randomness, which
> might not be random or secure at all.
So I hate having a config option for something like this.
How about this attached patch instead? It only changes the waiting
logic, and I'll quote the comment in full, because I think that
explains not only the rationale, it explains every part of the patch
(and is most of the patch anyway):
* We refuse to wait very long for a blocking getrandom().
*
* The crng may not be ready during boot, but if you ask for
* blocking random numbers very early, there is no guarantee
* that you'll ever get any timely entropy.
*
* If you are sure you need entropy and that you can generate
* it, you need to ask for non-blocking random state, and then
* if that fails you must actively _do_something_ that causes
* enough system activity, perhaps asking the user to type
* something on the keyboard.
*
* Just asking for blocking random numbers is completely and
* fundamentally wrong, and the kernel will not play that game.
*
* We will block for at most 15 seconds at a time, and if called
* sequentially will decrease the blocking amount so that we'll
* block for at most 30s total - and if people continue to ask
* for blocking, at that point we'll just return whatever random
* state we have acquired.
*
* This will also complain loudly if the timeout happens, to let
* the distribution or system admin know about the problem.
*
* The process that gets the -EAGAIN will hopefully also log the
* error, to raise awareness that there may be use of random
* numbers without sufficient entropy.
Hmm? No strange behavior. No odd config variables. A bounded total
boot-time wait of 30s (which is a completely random number, but I
claimed it as the "big red button" time).
And if you only do it once and fall back to something else it will
only wait for 15s, and you'll have your error value so that you can
log it properly.
Yes, a single boot-time wait of 15s at boot is still "darn annoying",
but it likely
(a) isn't so long that people consider it a boot failure and give up
(but hopefully annoying enough that they'll report it)
(b) long enough that *if* the thing that is waiting is not actually
blocking the boot sequence, the non-blocked part of the boot sequence
should have time to do sufficient IO to get better randomness.
So (a) is the "the system is still usable" part. While (b) is the
"give it a chance, and even if it fails and you fall back on urandom
or whatever, you'll actually be getting good randomness even if we
can't perhaps _guarantee_ entropy".
Also, if you have some user that wants to do the old-timey ssh-keygen
thing with user input etc, we now have a documented way to do that:
just do the nonblocking thing, and then make really really sure that
you actually have something that generates more entropy if that
nonblocking thing returns EAGAIN. But it's also very clear that at
that point the program that wants this entropy guarantee has to _work_
for it.
Because just being lazy and say "block" without any entropy will
return EAGAIN for a (continually decreasing) while, but then at some
point stop and say "you're broken", and just give you the urandom
data.
Because if you really do nothing at all, and there is no activity
what-so-ever for 15s because you blocked the boot, then I claim that
it's better to return an error than to wait forever. And if you ignore
the error and just retry, eventually we'll do the fallback for you.
Of course, if you have something like rdrand, and told us you trust
it, none of this matters at all, since we'll have initialized the pool
long before.
So this is unconditional, but it's basically "unconditionally somewhat
flexibly reasonable". It should only ever trigger for the case where
the boot sequence was fundamentally broken. And it will complain
loudly (both at a kernel level, and hopefully at a systemd journal
level too) if it ever triggers.
And hey, if some distro wants to then revert this because they feel
uncomfortable with this, that's now _their_ problem, not the problem
of the upstream kernel. The upstream kernel tries to do something that
I think is arguably fairly reasonable in all situations.
Linus
On Sun, Sep 15, 2019 at 10:32:15AM -0700, Linus Torvalds wrote:
> * We will block for at most 15 seconds at a time, and if called
> * sequentially will decrease the blocking amount so that we'll
> * block for at most 30s total - and if people continue to ask
> * for blocking, at that point we'll just return whatever random
> * state we have acquired.
I think that the exponential decay will either not be used or
be totally used, so in practice you'll always end up with 0 or
30s depending on the entropy situation, because I really do not
see any valid reason for entropy to suddenly start to appear
after 15s if it didn't prior to this. As such I do think that
a single timeout should be enough.
In addition, since you're leaving the door open to bikeshed around
the timeout valeue, I'd say that while 30s is usually not huge in a
desktop system's life, it actually is a lot in network environments
when it delays a switchover. It can cause other timeouts to occur
and leave quite a long embarrassing black out. I'd guess that a max
total wait time of 2-3s should be OK though since application timeouts
rarely are lower due to TCP generally starting to retransmit at 3s.
And even in 3s we're supposed to see quite some interrupts or it's
unlikely that much more will happen between 3 and 30s.
If the setting had to be made user-changeable then it could make
sense to let it be overridden on the kernel's command line though
I don't think that it should be necessary with a low enough value.
Thanks,
Willy
On Sun, Sep 15, 2019 at 11:32 AM Willy Tarreau <[email protected]> wrote:
>
> I think that the exponential decay will either not be used or
> be totally used, so in practice you'll always end up with 0 or
> 30s depending on the entropy situation
According to the systemd random-seed source snippet that Ahmed posted,
it actually just tries once (well, first once non-blocking, then once
blocking) and then falls back to reading urandom if it fails.
So assuming there's just one of those "read much too early" cases, I
think it actually matters.
But while I tried to test this, on my F30 install, systemd seems to
always just use urandom().
I can trigger the urandom read warning easily enough (turn of CPU
rdrand trusting and increase the entropy requirement by a factor of
ten, and turn of the ioctl to add entropy from user space), just not
the getrandom() blocking case at all.
So presumably that's because I have a systemd that doesn't use
getrandom() at all, or perhaps uses the 'rdrand' instruction directly.
Or maybe because Arch has some other oddity that just triggers the
problem.
> In addition, since you're leaving the door open to bikeshed around
> the timeout valeue, I'd say that while 30s is usually not huge in a
> desktop system's life, it actually is a lot in network environments
> when it delays a switchover.
Oh, absolutely.
But in that situation you have a MIS person on call, and somebody who
can fix it.
It's not like switchovers happen in a vacuum. What we should care
about is that updating a kernel _works_. No regressions. But if you
have some five-nines setup with switchover, you'd better have some
competent MIS people there too. You don't just switch kernels without
testing ;)
Linus
On Sun, Sep 15, 2019 at 11:37 AM Willy Tarreau <[email protected]> wrote:
>
> I also wanted to ask, are we going to enforce the same strategy on
> /dev/urandom ?
Right now the strategy for /dev/urandom is "print a one-line warning,
then do the read".
I don't see why we should change that. The whole point of urandom has
been that it doesn't block, and doesn't use up entropy.
It's the _blocking_ behavior that has always been problematic. It's
why almost nobody uses /dev/random in practice.
getrandom() looks like /dev/urandom in not using up entropy, but had
that blocking behavior of /dev/random that was problematic.
And exactly the same way it was problematic for /dev/random users, it
has now shown itself to be problematic for getrandom().
My suggested patch left the /dev/random blocking behavior, because
hopefully people *know* about the problems there.
And hopefully people understand that getrandom(GRND_RANDOM) has all
the same issues.
If you want that behavior, you can still use GRND_RANDOM or
/dev/random, but they are simply not acceptable for boot-time
schenarios. Never have been,
... exactly the way the "block forever" wasn't acceptable for getrandom().
Linus
On Sun, Sep 15, 2019 at 11:59:41AM -0700, Linus Torvalds wrote:
> > In addition, since you're leaving the door open to bikeshed around
> > the timeout valeue, I'd say that while 30s is usually not huge in a
> > desktop system's life, it actually is a lot in network environments
> > when it delays a switchover.
>
> Oh, absolutely.
>
> But in that situation you have a MIS person on call, and somebody who
> can fix it.
>
> It's not like switchovers happen in a vacuum. What we should care
> about is that updating a kernel _works_. No regressions. But if you
> have some five-nines setup with switchover, you'd better have some
> competent MIS people there too. You don't just switch kernels without
> testing ;)
I mean maybe I didn't use the right term, but typically in networked
environments you'll have watchdogs on sensitive devices (e.g. the
default gateways and load balancers), which will trigger an instant
reboot of the system if something really bad happens. It can range
from a dirty oops, FS remounted R/O, pure freeze, OOM, missing
process, panic etc. And here the reset which used to take roughly
10s to get the whole services back up for operations suddenly takes
40s. My point is that I won't have issues explaining users that 10s
or 13s is the same when they rely on five nices, but trying to argue
that 40s is identical to 10s will be a hard position to stand by.
And actually there are other dirty cases. Such systems often work
in active-backup or active-active modes. One typical issue is that
the primary system reboots, the second takes over within one second,
and once the primary system is back *apparently* operating, some
processes which appear to be present and which possibly have already
bound their listening ports are waiting for 30s in getrandom() while
the monitoring systems around see them as ready, thus the primary
machine goes back to its role and cannot reliably run the service
for the first 30 seconds, which roughly multiplies the downtime by
30. That's why I'd like to make it possible to lower it this value
(either definitely or by cmdline, as I think it can be fine for
all those who care about down time).
Willy
On Sun, Sep 15, 2019 at 12:08:31PM -0700, Linus Torvalds wrote:
> My suggested patch left the /dev/random blocking behavior, because
> hopefully people *know* about the problems there.
>
> And hopefully people understand that getrandom(GRND_RANDOM) has all
> the same issues.
I think this one doesn't cause any issue to users. It's the only
one that should be used for long-lived crypto keys in my opinion.
> If you want that behavior, you can still use GRND_RANDOM or
> /dev/random, but they are simply not acceptable for boot-time
> schenarios.
Oh no I definitely don't want this behavior at all for urandom, what
I'm saying is that as long as getrandom() will have a lower quality
of service than /dev/urandom for non-important randoms, there will be
compelling reasons to avoid it. And I think that your bounded wait
could actually reconciliate both ends of the users spectrum, those
who want excellent randoms to run tetris and those who don't care
to always play the same party on every boot because they just want
to play. And by making /dev/urandom behave like getrandom() we could
actually tell users "both are now exactly the same, you have no valid
reason anymore not to use the new API". And it forces us to remain
very reasonable in getrandom() so that we don't break old applications
that relied on urandom to be fast.
Willy
On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <[email protected]> wrote:
>
> Oh no I definitely don't want this behavior at all for urandom, what
> I'm saying is that as long as getrandom() will have a lower quality
> of service than /dev/urandom for non-important randoms
Ahh, here you're talking about the fact that it can block at all being
"lower quality".
I do agree that getrandom() is doing some odd things. It has the
"total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
it has no mode of replacing /dev/urandom.
So if you want the /dev/urandom bvehavior, then no, getrandom() simply
has never given you that.
Use /dev/urandom if you want that.
Sad, but there it is. We could have a new flag (GRND_URANDOM) that
actually gives the /dev/urandom behavior. But the ostensible reason
for getrandom() was the blocking for entropy. See commit c6e9d6f38894
("random: introduce getrandom(2) system call") from back in 2014.
The fact that it took five years to hit this problem is probably due
to two reasons:
(a) we're actually pretty good about initializing the entropy pool
fairly quickly most of the time
(b) people who started using 'getrandom()' and hit this issue
presumably then backed away from it slowly and just used /dev/urandom
instead.
So it needed an actual "oops, we don't get as much entropy from the
filesystem accesses" situation to actually turn into a problem. And
presumably the people who tried out things like nvdimm filesystems
never used Arch, and never used a sufficiently new systemd to see the
"oh, without disk interrupts you don't get enough randomness to boot".
One option is to just say that GRND_URANDOM is the default (ie never
block, do the one-liner log entry to warn) and add a _new_ flag that
says "block for entropy". But if we do that, then I seriously think
that the new behavior should have that timeout limiter.
For 5.3, I'll just revert the ext4 change, stupid as that is. That
avoids the regression, even if it doesn't avoid the fundamental
problem. And gives us time to discuss it.
Linus
On Sun, Sep 15, 2019 at 12:31:42PM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <[email protected]> wrote:
> >
> > Oh no I definitely don't want this behavior at all for urandom, what
> > I'm saying is that as long as getrandom() will have a lower quality
> > of service than /dev/urandom for non-important randoms
>
> Ahh, here you're talking about the fact that it can block at all being
> "lower quality".
>
> I do agree that getrandom() is doing some odd things. It has the
> "total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
> it has no mode of replacing /dev/urandom.
Yep but with your change it's getting better.
> So if you want the /dev/urandom bvehavior, then no, getrandom() simply
> has never given you that.
>
> Use /dev/urandom if you want that.
It's not available in chroot, which is the main driver for getrandom()
I guess.
> Sad, but there it is. We could have a new flag (GRND_URANDOM) that
> actually gives the /dev/urandom behavior. But the ostensible reason
> for getrandom() was the blocking for entropy. See commit c6e9d6f38894
> ("random: introduce getrandom(2) system call") from back in 2014.
Oh I definitely know it's been a long debate.
> The fact that it took five years to hit this problem is probably due
> to two reasons:
>
> (a) we're actually pretty good about initializing the entropy pool
> fairly quickly most of the time
>
> (b) people who started using 'getrandom()' and hit this issue
> presumably then backed away from it slowly and just used /dev/urandom
> instead.
We've hit it the hard way more than a year ago already, when openssl
adopted getrandom() instead of urandom for certain low-importance
things in order to work better in chroots and/or avoid fd leaks. And
even openssl had to work around these issues in multiple iterations
(I don't remember how however).
> So it needed an actual "oops, we don't get as much entropy from the
> filesystem accesses" situation to actually turn into a problem. And
> presumably the people who tried out things like nvdimm filesystems
> never used Arch, and never used a sufficiently new systemd to see the
> "oh, without disk interrupts you don't get enough randomness to boot".
In my case the whole system is in the initramfs and the only accesses
to the flash are to read the config. So that's pretty a limited source
of interrupts for a headless system ;-)
> One option is to just say that GRND_URANDOM is the default (ie never
> block, do the one-liner log entry to warn) and add a _new_ flag that
> says "block for entropy". But if we do that, then I seriously think
> that the new behavior should have that timeout limiter.
I think the timeout is a good thing to do, but it would be nice to
let the application know that what was provided was probably not as
good as expected (well if the application wants real random, it
should use GRND_RANDOM).
> For 5.3, I'll just revert the ext4 change, stupid as that is. That
> avoids the regression, even if it doesn't avoid the fundamental
> problem. And gives us time to discuss it.
It's sad to see that being excessive on randomness leads to forcing
totally unrelated subsystem to be less efficient :-(
Willy
On Sun, Sep 15, 2019 at 09:29:55AM -0700, Linus Torvalds wrote:
> On Sat, Sep 14, 2019 at 11:51 PM Lennart Poettering
> <[email protected]> wrote:
> >
> > Oh man. Just spend 5min to understand the situation, before claiming
> > this was garbage or that was garbage. The code above does not block
> > boot.
>
> Yes it does. You clearly didn't read the thread.
>
> > It blocks startup of services that explicit order themselves
> > after the code above. There's only a few services that should do that,
> > and the main system boots up just fine without waiting for this.
>
> That's a nice theory, but it doesn't actually match reality.
>
> There are clearly broken setups that use this for things that it
> really shouldn't be used for. Asking for true randomness at boot
> before there is any indication that randomness exists, and then just
> blocking with no further action that could actually _generate_ said
> randomness.
>
> If your description was true that the system would come up and be
> usable while the blocked thread is waiting for that to happen, things
> would be fine.
>
A small note here, especially after I've just read the commit log of
72dbcf721566 ('Revert ext4: "make __ext4_get_inode_loc plug"'), which
unfairly blames systemd there.
Yes, the systemd-random-seed(8) process blocks, but this is an
isolated process, and it's only there as a synchronization point and
to load/restore random seeds from disk across reboots.
The wisdom of having a sysnchronization service ("before/after urandom
CRNG is inited") can be debated. That service though, and systemd in
general, did _not_ block the overall system boot.
What blocked the system boot was GDM/gnome-session implicitly calling
getrandom() for the Xorg MIT cookie. This was shown in the strace log
below:
https://lkml.kernel.org/r/20190910173243.GA3992@darwi-home-pc
thanks,
--
darwi
http://darwish.chasingpointers.com
On Mon, Sep 16, 2019 at 03:40:50AM +0200, Ahmed S. Darwish wrote:
> On Sun, Sep 15, 2019 at 09:29:55AM -0700, Linus Torvalds wrote:
> > On Sat, Sep 14, 2019 at 11:51 PM Lennart Poettering
> > <[email protected]> wrote:
> > >
> > > Oh man. Just spend 5min to understand the situation, before claiming
> > > this was garbage or that was garbage. The code above does not block
> > > boot.
> >
> > Yes it does. You clearly didn't read the thread.
> >
> > > It blocks startup of services that explicit order themselves
> > > after the code above. There's only a few services that should do that,
> > > and the main system boots up just fine without waiting for this.
> >
> > That's a nice theory, but it doesn't actually match reality.
> >
> > There are clearly broken setups that use this for things that it
> > really shouldn't be used for. Asking for true randomness at boot
> > before there is any indication that randomness exists, and then just
> > blocking with no further action that could actually _generate_ said
> > randomness.
> >
> > If your description was true that the system would come up and be
> > usable while the blocked thread is waiting for that to happen, things
> > would be fine.
> >
>
> A small note here, especially after I've just read the commit log of
> 72dbcf721566 ('Revert ext4: "make __ext4_get_inode_loc plug"'), which
> unfairly blames systemd there.
>
> Yes, the systemd-random-seed(8) process blocks, but this is an
> isolated process, and it's only there as a synchronization point and
> to load/restore random seeds from disk across reboots.
>
> The wisdom of having a sysnchronization service ("before/after urandom
> CRNG is inited") can be debated. That service though, and systemd in
> general, did _not_ block the overall system boot.
>
> What blocked the system boot was GDM/gnome-session implicitly calling
> getrandom() for the Xorg MIT cookie. This was shown in the strace log
> below:
>
> https://lkml.kernel.org/r/20190910173243.GA3992@darwi-home-pc
>
So did systemd-random-seed instead drain what little entropy there was
before GDM started, increasing the likelihood a subsequent getrandom()
call would block?
Regards,
Vito Caputo
On Sun, Sep 15, 2019 at 11:59:41AM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 11:32 AM Willy Tarreau <[email protected]> wrote:
> >
> > I think that the exponential decay will either not be used or
> > be totally used, so in practice you'll always end up with 0 or
> > 30s depending on the entropy situation
>
> According to the systemd random-seed source snippet that Ahmed posted,
> it actually just tries once (well, first once non-blocking, then once
> blocking) and then falls back to reading urandom if it fails.
>
> So assuming there's just one of those "read much too early" cases, I
> think it actually matters.
>
Just a quick note, the snippest I posted:
https://lkml.kernel.org/r/20190914150206.GA2270@darwi-home-pc
is not PID 1.
It's just a lowly process called "systemd-random-seed". Its main
reason of existence is to load/restore a random seed file from and to
disk across reboots (just like what sysv scripts did).
The reason I posted it was to show that if we change getrandom() to
silently return weak crypto instead of blocking or an error code,
systemd-random-seed will break: it will save the resulting data to
disk, then even _credit_ it (if asked to) in the next boot cycle
through RNDADDENTROPY.
> But while I tried to test this, on my F30 install, systemd seems to
> always just use urandom().
>
> I can trigger the urandom read warning easily enough (turn of CPU
> rdrand trusting and increase the entropy requirement by a factor of
> ten, and turn of the ioctl to add entropy from user space), just not
> the getrandom() blocking case at all.
>
Yeah, because the problem was/is not with systemd :)
It is GDM/gnome-session which was blocking the graphical boot process.
Regarding reproducing the issue, through a quick trace_prink, all of
below processes are calling getrandom() on my Arch system at boot:
https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
The fatal call was gnome-session's one, because gnome didn't continue
_its own_ boot due to this blockage.
> So presumably that's because I have a systemd that doesn't use
> getrandom() at all, or perhaps uses the 'rdrand' instruction directly.
> Or maybe because Arch has some other oddity that just triggers the
> problem.
>
It seems Arch is good at triggering this. For example, here is a
another Arch user on a Thinkpad (different model than mine), also with
GDM getting blocked on entropy:
https://bbs.archlinux.org/viewtopic.php?id=248035
"As you can see, the system is literally waiting a half minute for
something - up until crng init is done"
(The NetworkManager logs are just noise. I also had them, but completely
disabling NetworkManager didn't do anything .. just made the logs
cleaner)
thanks,
--
Ahmed Darwish
http://darwish.chasingpointers.com
On Sun, Sep 15, 2019 at 06:48:34PM -0700, Vito Caputo wrote:
> > A small note here, especially after I've just read the commit log of
> > 72dbcf721566 ('Revert ext4: "make __ext4_get_inode_loc plug"'), which
> > unfairly blames systemd there.
...
> > What blocked the system boot was GDM/gnome-session implicitly calling
> > getrandom() for the Xorg MIT cookie. This was shown in the strace log
> > below:
> >
> > https://lkml.kernel.org/r/20190910173243.GA3992@darwi-home-pc
Yes, that's correct, this isn't really systemd's fault. It's a
combination of GDM/gnome-session stupidly using MIT Magic Cookie at
*all* (it was a bad idea 30 years ago, and it's a bad idea in 2019),
GDM/gnome-session using getrandom(2) at all; it should have just stuck
with /dev/urandom, or heck just used random_r(3) since when we're
talking about MIT Magic Cookie, there's no real security *anyway*.
It's also a combination of the hardware used by this particular user,
the init scripts in use that were probably not generating enough read
requests compared to other distributions (ironically, distributions
and init systems that try the hardest to accelerate the boot make this
problem worse by reducing the entropy that can be harvested from I/O).
And then when we optimzied ext4 so it would be more efficient, that
tipped this particular user over the edge.
Linus might not have liked my proposal to disable the optimization if
the CRNG isn't optimized, but ultimately this problem *has* gotten
worse because we've optimized things more. So to the extent that
systemd has made systems boot faster, you could call that systemd's
"fault" --- just as Linus reverting ext4's performance optimization is
ssaying that it's ext4 "fault" because we had the temerity to try to
make the file system be more efficient, and hence, reduce entropy that
can be collected.
Ultimately, though, the person who touches this last is whose "fault"
it is. And the problem is because it really is a no-win situation
here. No matter *what* we do, it's going to either (a) make some
systems insecure, or (b) make some systems more likely hang while
booting. Whether you consider the risk of (a) or (b) to be worse is
ultimately going to cause you to say that people of the contrary
opinion are either "being reckless with system security", or
"incompetent at system design".
And really, it's all going to depend on how the Linux kernel is being
used. The fact that Linux is being used in IOT devices, mobile
handsets, desktops, servers running in VM's, user desktops, etc.,
means that there will be some situations where blocking is going to be
terrible, and some situations where a failure to provide system
security could result in risking someone's life, health, or mission
failure in some critical system.
That's why this discussion can easily get toxic. If you are only
focusing on one part of Linux market, then obviously *you* are the
only sane one, and everyone *else* who disagrees with you must be
incompetent. When, perhaps, they may simply be focusing on a
different part of the ecosystem where Linux is used.
> So did systemd-random-seed instead drain what little entropy there was
> before GDM started, increasing the likelihood a subsequent getrandom()
> call would block?
No. Getrandom(2) uses the new CRNG, which is either initialized, or
it's not. Once it's initialized, it won't block again ever.
- Ted
On Sun, Sep 15, 2019 at 6:41 PM Ahmed S. Darwish <[email protected]> wrote:
>
> Yes, the systemd-random-seed(8) process blocks, but this is an
> isolated process, and it's only there as a synchronization point and
> to load/restore random seeds from disk across reboots.
>
> What blocked the system boot was GDM/gnome-session implicitly calling
> getrandom() for the Xorg MIT cookie.
Aahh. I saw that email, but then in the discussion the systemd case
always ended up coming up first, and I never made the connection.
What a complete crock that silly MIT random cookie is, and what a sad
sad reason for blocking.
Linus
Theodore Y. Ts'o <[email protected]> wrote:
>
> Ultimately, though, we need to find *some* way to fix userspace's
> assumptions that they can always get high quality entropy in early
> boot, or we need to get over people's distrust of Intel and RDRAND.
> Otherwise, future performance improvements in any part of the system
> which reduces the number of interrupts is always going to potentially
> result in somebody's misconfigured system or badly written
> applications to fail to boot. :-(
Can we perhaps artifically increase the interrupt rate while the
CRNG is not initialised?
Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Sun, Sep 15, 2019 at 8:52 PM Herbert Xu <[email protected]> wrote:
>
> Can we perhaps artifically increase the interrupt rate while the
> CRNG is not initialised?
Long term (or even medium term in some areas), the problem really is
that device interrupts during boot really are going away, rather than
becoming more common.
That just happened to be the case now because of improved plugging,
but it's fundamentally the direction any storage is moving with faster
flash interfaces.
The only interrupt we could easily increase the rate of in the kernel
is the timer interrupt, but that's also the interrupt that is the
least useful for randomness.
The timer interrupt could be somewhat interesting if you are also
CPU-bound on a non-trivial load, because then "what program counter
got interrupted" ends up being possibly unpredictable - even with a
very stable timer interrupt source - and effectively stand in for a
cycle counter even on hardware that doesn't have a native TSC. Lots of
possible low-level jitter there to use for entropy. But especially if
you're just idly _waiting_ for entropy, you won't be "CPU-bound on an
interesting load" - you'll just hit the CPU idle loop all the time so
even that wouldn't work.
But practically speaking timers really are not really much of an
option. And if we are idle, even having a high-frequency TSC isn't all
that useful with the timer interrupt, because the two tend to be very
intimately related.
Of course, if you're generating a host key for SSH or something like
that, you could try to at least cause some network traffic while
generating the key. That's not much of an option for the _kernel_, but
for a program like ssh-keygen it certainly could be.
Blocking is fine if you simply don't care about time at all (the "five
hours later is fine" situation), or if you have some a-priori
knowledge that the machine is doing real interesting work that will
generate entropy. But I don't see how the kernel can generate entropy
on its own, particularly during boot (which is when the problem
happens), when most devices aren't even necessarily meaningfully set
up yet.
Hopefully hw random number generators will make this issue effectively
moot before we really end up having the "nvdimms and their ilk are
common enough that you really have no early boot irq-driven disk IO at
all".
Linus
Hi Ted,
On Sun, Sep 15, 2019 at 10:49:04PM -0400, Theodore Y. Ts'o wrote:
> No matter *what* we do, it's going to either (a) make some
> systems insecure, or (b) make some systems more likely hang while
> booting.
I continue to strongly disagree with opposing these two. (b) is
caused precisely because of this conflation. Life long keys are
produced around once per system's life (at least this order of
magnitude). Boot happens way more often. Users would not complain
that systems fail to start if the two types of random are properly
distinguished so that we don't fail to boot just for the sake of
secure randoms that will never be consumed as such.
Before systems had HWRNGs it was pretty common for some tools to
ask the user to type hundreds of characters on the keyboard and
use that (content+timings) to feed entropy while generating a key.
This is acceptable once in a system's life. And on some systems
with no entropy like VMs, it's commonly generated from a central
place and never from the VM itself, so it's not a problem either.
In my opinion the problem recently happened because getrandom()
was perceived as a good replacement for /dev/urandom and is way
more convenient to use, so applications progressively started to
use it without realizing that contrary to its ancestor it can
block. And each time a system fails to boot confirms that entropy
still remains a problem even on PCs in 2019. This is one more
reason for clearly keeping two interfaces depending on what type
of random is needed.
I'd be in favor of adding in the man page something like "this
random source is only suitable for applications which will not be
harmed by getting a predictable value on output, and as such it is
not suitable for generation of system keys or passwords, please
use GRND_RANDOM for this". This distinction currently is not clear
enough for people who don't know this subtle difference, and can
increase the interface's misuse.
Regards,
Willy
On Sun, Sep 15, 2019 at 09:21:06PM -0700, Linus Torvalds wrote:
> The timer interrupt could be somewhat interesting if you are also
> CPU-bound on a non-trivial load, because then "what program counter
> got interrupted" ends up being possibly unpredictable - even with a
> very stable timer interrupt source - and effectively stand in for a
> cycle counter even on hardware that doesn't have a native TSC. Lots of
> possible low-level jitter there to use for entropy. But especially if
> you're just idly _waiting_ for entropy, you won't be "CPU-bound on an
> interesting load" - you'll just hit the CPU idle loop all the time so
> even that wouldn't work.
In the old DOS era, I used to produce randoms by measuring the time it
took for some devices to reset themselves (typically 8250 UARTs could
take in the order of milliseconds). And reading their status registers
during the reset phase used to show various sequences of flags at
approximate timings.
I suspect this method is still usable, even with SoCs full of peripherals,
in part because not all clocks are synchronous, so we can retrieve a
little bit of entropy by measuring edge transitions. I don't know how
we can assess the number of bits provided by such method (probably
log2(card(discrete values))) but maybe this is something we should
progressively encourage drivers authors to do in the various device
probing functions once we figure the best way to do it.
The idea is around this. Instead of :
probe(dev)
{
(...)
while (timeout && !(status_reg & STATUS_RDY))
timeout--;
(...)
}
We could do something like this (assuming 1 bit of randomness here) :
probe(dev)
{
(...)
prev_timeout = timeout;
prev_reg = status_reg;
while (timeout && !(status_reg & STATUS_RDY)) {
if (status_reg != prev_reg) {
add_device_randomness_bits(timeout - prev_timeout, 1);
prev_timeout = timeout;
prev_reg = status_reg;
}
timeout--;
}
(...)
}
It's also interesting to note that on many motherboards there are still
multiple crystal oscillators (typically one per ethernet port) and that
such types of independent, free-running clocks do present unpredictable
edges compared to the CPU's clock, so when they affect the device's
setup time, this does help quite a bit.
Willy
On Sun, Sep 15, 2019 at 9:30 PM Willy Tarreau <[email protected]> wrote:
>
> I'd be in favor of adding in the man page something like "this
> random source is only suitable for applications which will not be
> harmed by getting a predictable value on output, and as such it is
> not suitable for generation of system keys or passwords, please
> use GRND_RANDOM for this".
The problem with GRND_RANDOM is that it also ends up extracting
entropy, and has absolutely horrendous performance behavior. It's why
hardly anybody uses /dev/random.
Which nobody should really ever do. I don't understand why people want
that thing, considering that the second law of thermodynamics really
pretty much applies. If you can crack the cryptographic hashes well
enough to break them despite reseeding etc, people will have much more
serious issues than the entropy accounting.
So the problem with getrandom() is that it only offered two flags, and
to make things worse they were the wrong ones.
Nobody should basically _ever_ use the silly "entropy can go away"
model, yet that is exactly what GRND_RANDOM does.
End result: GRND_RANDOM is almost entirely useless, and is actively
dangerous, because it can actually block not just during boot, it can
block (and cause others to block) during random running of the system
because it does that entropy accounting().
Nobody can use GRND_RANDOM if they have _any_ performance requirements
what-so-ever. It's possibly useful for one-time ssh host keys etc.
So GRND_RANDOM is just bad - with or without GRND_NONBLOCK, because
even in the nonblocking form it will account for entropy in the
blocking pool (until it's all gone, and it will return -EAGAIN).
And the non-GRND_RANDOM case avoids that problem, but requires the
initial entropy with no way to opt out of it. Yes, GRND_NONBLOCK makes
it work.
So we have four flag combinations:
- 0 - don't use if it could possibly run at boot
Possibly useful for the systemd-random-seed case, and if you *know*
you're way past boot, but clearly overused.
This is the one that bit us this time.
- GRND_NONBLOCK - fine, but you now don't get even untrusted random
numbers, and you have to come up with a way to fill the entropy pool
This one is most useful as a quick "get me urandom", but needs a
fallback to _actual_ /dev/urandom when it fails.
This is the best choice by far, and has no inherent downsides apart
from needing that fallback code.
- GRND_RANDOM - don't use
This will block and it will decrease the blocking pool entropy so
that others will block too, and has horrible performance.
Just don't use it outside of very occasional non-serious work.
Yes, it will give you secure numbers, but because of performance
issues it's not viable for any serious code, and obviously not for
bootup.
It can be useful as a seed for future serious use that just does
all random handling in user space. Just not during boot.
- GRND_RANDOM | GRND_NONBLOCK - don't use
This won't block, but it will decrease the blocking pool entropy.
It might be an acceptable "get me a truly secure ring with reliable
performance", but when it fails, you're going to be unhappy, and there
is no obvious fallback.
So three out of four flag combinations end up being mostly "don't
use", and the fourth one isn't what you'd normally want (which is just
plain /dev/urandom semantics).
Linus
On Sun, Sep 15, 2019 at 10:02:02PM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 9:30 PM Willy Tarreau <[email protected]> wrote:
> >
> > I'd be in favor of adding in the man page something like "this
> > random source is only suitable for applications which will not be
> > harmed by getting a predictable value on output, and as such it is
> > not suitable for generation of system keys or passwords, please
> > use GRND_RANDOM for this".
>
> The problem with GRND_RANDOM is that it also ends up extracting
> entropy, and has absolutely horrendous performance behavior. It's why
> hardly anybody uses /dev/random.
>
> Which nobody should really ever do. I don't understand why people want
> that thing, considering that the second law of thermodynamics really
> pretty much applies. If you can crack the cryptographic hashes well
> enough to break them despite reseeding etc, people will have much more
> serious issues than the entropy accounting.
That's exactly what I was thinking about a few minutes ago and which
drove me back to mutt :-)
> So the problem with getrandom() is that it only offered two flags, and
> to make things worse they were the wrong ones.
(...)
> - GRND_RANDOM | GRND_NONBLOCK - don't use
>
> This won't block, but it will decrease the blocking pool entropy.
>
> It might be an acceptable "get me a truly secure ring with reliable
> performance", but when it fails, you're going to be unhappy, and there
> is no obvious fallback.
>
> So three out of four flag combinations end up being mostly "don't
> use", and the fourth one isn't what you'd normally want (which is just
> plain /dev/urandom semantics).
I'm seeing it from a different angle. I now understand better why
getrandom() absolutely wants to have an initialized pool, it's to
encourage private key producers to use a secure, infinite source of
randomness. Something that neither /dev/random nor /dev/urandom
reliably provide. Unfortunately it does it by changing how urandom
works while it ought to have done it as the replacement of /dev/random.
The 3 random generation behaviors we currently support are :
- /dev/random: only returns safe random (blocks), AND depletes entropy.
getrandom(GRND_RANDOM) does the same.
- /dev/urandom: returns whatever (never blocks), inexhaustible
- getrandom(0): returns safe random (blocks), inexhaustible
Historically we used to want to rely on /dev/random for SSH keys and
certificates. It's arguable that with the massive increase of crypto
usage, what used to be done only once in a system's life happens a
bit more often and using /dev/random here can sometimes become a
problem because it harms the whole system (thus why I said I think that
we could almost require CAP_something to access it). Applications
falling back to /dev/urandom obviously resulted in the massive mess
we've seen years ago, even if it apparently solved the problem for
their users. Thus getrandom(0) does make sense, but not as an
alternative to urandom but to random, since it returns randoms safe
for use for long lived keys.
Couldn't we simply change the way things work ? Make GRND_RANDOM *not*
deplate entropy, and document it as the only safe source, and make the
default call return the same as /dev/urandom ? We can then use your
timeout mechanism for the first one (which is not supposed to be called
often and would be more accepted with a moderately long delay).
Applications need to evolve as well. It's fine to use libraries to do
whatever you need for you but ultimately the lib exports a function for
a generic use case and doesn't know how to best adapt to the use case.
Typically I would expect an SSH/HTTP daemon running in a recovery
initramfs to produce unsafe randoms so that I can connect there without
having to dance around it. However the self-signed cert produced there
must not be saved, just like the SSH host key. But this means that the
application (here the ssh-keygen or openssl) also need to be taught to
purposely produce insecure keys when explicitly instructed to do so.
Otherwise we know what will happen in the long term, since history
repeats itself as long as the conditions are not changed :-/
Willy
On Tue, Sep 10, 2019 at 07:56:35AM -0400, Theodore Y. Ts'o wrote:
> Hmm, I'm not seeing this on a Dell XPS 13 (model 9380) using a Debian
> Bullseye (Testing) running a rc4+ kernel.
>
> This could be because Debian is simply doing more I/O; or it could be
> because I don't have some package installed which is trying to reading
> from /dev/random or calling getrandom(2). Previously, Fedora ran into
> blocking issues because of some FIPS compliance patches to some
> userspace daemons. So it's going to be very user space dependent and
> package dependent.
Btw, I've been seeing this issue on debian testing with an XFS root
file system ever since the blocking random changes went in. There
are a few reports (not from me) in the BTS since. I ended up just
giving up on gdm and using lightdm instead as it was clearly related
to that.
On Sun, Sep 15, 2019 at 11:13 PM Willy Tarreau <[email protected]> wrote:
>
> >
> > So three out of four flag combinations end up being mostly "don't
> > use", and the fourth one isn't what you'd normally want (which is just
> > plain /dev/urandom semantics).
>
> I'm seeing it from a different angle. I now understand better why
> getrandom() absolutely wants to have an initialized pool, it's to
> encourage private key producers to use a secure, infinite source of
> randomness.
Right. There is absolutely no question that that is a useful thing to have.
And that's what GRND_RANDOM _should_ have meant. But didn't.
So the semantics that getrandom() should have had are:
getrandom(0) - just give me reasonable random numbers for any of a
million non-strict-long-term-security use (ie the old urandom)
- the nonblocking flag makes no sense here and would be a no-op
getrandom(GRND_RANDOM) - get me actual _secure_ random numbers with
blocking until entropy pool fills (but not the completely invalid
entropy decrease accounting)
- the nonblocking flag is useful for bootup and for "I will
actually try to generate entropy".
and both of those are very very sensible actions. That would actually
have _fixed_ the problems we had with /dev/[u]random, both from a
performance standpoint and for a filesystem access standpoint.
But that is sadly not what we have right now.
And I suspect we can't fix it, since people have grown to depend on
the old behavior, and already know to avoid GRND_RANDOM because it's
useless with old kernels even if we fixed it with new ones.
Does anybody really seriously debate the above? Ted? Are you seriously
trying to claim that the existing GRND_RANDOM has any sensible use?
Are you seriously trying to claim that the fact that we don't have a
sane urandom source is a "feature"?
Linus
On So, 15.09.19 10:32, Linus Torvalds ([email protected]) wrote:
> [ Added Lennart, who was active in the other thread ]
>
> On Sat, Sep 14, 2019 at 10:22 PM Theodore Y. Ts'o <[email protected]> wrote:
> >
> > Thus, add an optional configuration option which stops getrandom(2)
> > from blocking, but instead returns "best efforts" randomness, which
> > might not be random or secure at all.
>
> So I hate having a config option for something like this.
>
> How about this attached patch instead? It only changes the waiting
> logic, and I'll quote the comment in full, because I think that
> explains not only the rationale, it explains every part of the patch
> (and is most of the patch anyway):
>
> * We refuse to wait very long for a blocking getrandom().
> *
> * The crng may not be ready during boot, but if you ask for
> * blocking random numbers very early, there is no guarantee
> * that you'll ever get any timely entropy.
> *
> * If you are sure you need entropy and that you can generate
> * it, you need to ask for non-blocking random state, and then
> * if that fails you must actively _do_something_ that causes
> * enough system activity, perhaps asking the user to type
> * something on the keyboard.
You are requesting a UI change here. Maybe the kernel shouldn't be the
one figuring out UI.
I mean, as I understand you are unhappy with behaviour you saw on
systemd systems; we can certainly improve behaviour of systemd in
userspace alone, i.e. abort the getrandom() after a while in userspace
and log about it using typical userspace logging to the console. I am
not sure why you want to do all that in the kernel, the kernel isn't
great at user interaction, and really shouldn't be.
If all you want is abort the getrandom() after 30s and a friendly
message on screen, by all means, let's add that to systemd, I have
zero problem with that. systemd has infrastructure for pushing that to
the user, the kernel doesn't really have that so nicely.
It appears to me you subscribe too much to an idea that userspace
people are not smart enough and couldn't implement something like
this. Turns out we can though, and there's no need to add logic that
appears to follow the logic of "never trust userspace"...
i.e. why not just consider this all just a feature request for the
systemd-random-seed.service, i.e. the service you saw the issue with
to handle this on its own?
> Hmm? No strange behavior. No odd config variables. A bounded total
> boot-time wait of 30s (which is a completely random number, but I
> claimed it as the "big red button" time).
As mentioned, in systemd's case, updating the random seed on disk
is entirely fine to take 5h or so. I don't really think we really need
to bound this in kernel space.
Lennart
--
Lennart Poettering, Berlin
On Mon, Sep 16, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
> So the semantics that getrandom() should have had are:
>
> getrandom(0) - just give me reasonable random numbers for any of a
> million non-strict-long-term-security use (ie the old urandom)
>
> - the nonblocking flag makes no sense here and would be a no-op
That change is what I consider highly problematic. There are a *huge*
number of applications which use cryptography which assumes that
getrandom(0) means, "I'm guaranteed to get something safe
cryptographic use". Changing his now would expose a very large number
of applications to be insecure. Part of the problem here is that
there are many different actors. There is the application or
cryptographic library developer, who may want to be sure they have
cryptographically secure random numbers. They are the ones who will
select getrandom(0).
Then you have the distribution or consumer-grade electronics
developers who may choose to run them too early in some init script or
systemd unit files. And some of these people may do something stupid,
like run things too early, or omit the a hardware random number
generator in their design, even though it's for a security critical
purpose (say, a digital wallet for bitcoin). Because some of these
people might do something stupid, one argument (not mine) is that we
must therefore not let getrandom() block. But doing this penalizes
the security of all the users of the application, not just the stupid
ones.
> getrandom(GRND_RANDOM) - get me actual _secure_ random numbers with
> blocking until entropy pool fills (but not the completely invalid
> entropy decrease accounting)
>
> - the nonblocking flag is useful for bootup and for "I will
> actually try to generate entropy".
>
> and both of those are very very sensible actions. That would actually
> have _fixed_ the problems we had with /dev/[u]random, both from a
> performance standpoint and for a filesystem access standpoint.
>
> But that is sadly not what we have right now.
>
> And I suspect we can't fix it, since people have grown to depend on
> the old behavior, and already know to avoid GRND_RANDOM because it's
> useless with old kernels even if we fixed it with new ones.
I don't think we can fix it, because it's the changing of
getrandom(0)'s behavior which is the problem, not GRND_RANDOM. People
*expect* getrandom(0) to always return secure results. I don't think
we can make it sometimes return not-necessarily secure results
depending on when the systems integrator or distribution decides to
run the application, and depending on the hardware platform (yes,
traditional x86 systems are probably fine, and fortunately x86
embedded CPU are too expensive and have lousy power management, so no
one really uses x86 for embedded yet, despite Intel's best efforts).
That would just be a purely irresponsible thing to do, IMO.
> Does anybody really seriously debate the above? Ted? Are you seriously
> trying to claim that the existing GRND_RANDOM has any sensible use?
> Are you seriously trying to claim that the fact that we don't have a
> sane urandom source is a "feature"?
There are people who can debate that GRND_RANDOM has any sensible use
cases. GPG uses /dev/random, and that was a fully informed choice.
I'm not convinced, because I think that at least for now the CRNG is
perfectly fine for 99.999% of the use cases. Yes, in a post-quantum
cryptography world, the CRNG might be screwed --- but so will most of
the other cryptographic algorithms in the kernel. So if anyone ever
gets post-quantum cryptoanalytic attacks working, the use of the CRNG
is going to be least of our problems.
As I mentioned to you in Lisbon, I've been going back and forth about
whether or not to rip out the entire /dev/random infrastructure,
mainly for code maintainability reasons. The only reason why I've
been holding back is because there are (very few) non-insane people
who do want to use it. There are also a much larger of rational
people who use it because they want some insane PCI compliance labs to
go away. What I suspect most of them are actually doing in practice
is they use /dev/random, but they also use a hardware random number
generator so /dev/random never actually blocks in practice. The use
of /dev/random is enough to make the PCI compliance lab go away, and
the hardware random number generator (or virtio-rng on a VM) makes
/dev/random useable.
But I don't think we can reuse GRND_RANDOM for that reason.
We could create a new flag, GRND_INSECURE, which never blocks. And
that that allows us to solve the problem for silly applications that
are using getrandom(2) for non-cryptographic use cases. Use cases
might include Python dictionary seeds, gdm for MIT Magic Cookie, UUID
generation where best efforts probably is good enough, etc. The
answer today is they should just use /dev/urandom, since that exists
today, and we have to support it for backwards compatibility anyway.
It sounds like gdm recently switched to getrandom(2), and I suspect
that it's going to get caught on some hardware configs anyway, even
without the ext4 optimization patch. So I suspect gdm will switch
back to /dev/urandom, and this particular pain point will probably go
away.
- Ted
On Mon, Sep 16, 2019 at 08:08:01PM +0200, Lennart Poettering wrote:
> I mean, as I understand you are unhappy with behaviour you saw on
> systemd systems; we can certainly improve behaviour of systemd in
> userspace alone, i.e. abort the getrandom() after a while in userspace
> and log about it using typical userspace logging to the console. I am
> not sure why you want to do all that in the kernel, the kernel isn't
> great at user interaction, and really shouldn't be.
Because the syscall will have the option to return what random data
was available in this case, while if you try to fix it only from
within systemd you currently don't even get that data.
> It appears to me you subscribe too much to an idea that userspace
> people are not smart enough and couldn't implement something like
> this. Turns out we can though, and there's no need to add logic that
> appears to follow the logic of "never trust userspace"...
I personally see this very differently. If randoms were placed into a
kernel compared to other operating systems doing everything in userspace,
it's in part because it requires to collect data very widely to gather
some entropy and that no isolated userspace alone can collect as much
as the kernel. Or they each have to reimplement their own method, each
with their own bugs, instead of fixing them all at a single place. All
applications need random, there's no reason for having to force them
all to implement them in detail.
Willy
On Mon, Sep 16, 2019 at 01:21:17PM -0400, Theodore Y. Ts'o wrote:
> On Mon, Sep 16, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
> > So the semantics that getrandom() should have had are:
> >
> > getrandom(0) - just give me reasonable random numbers for any of a
> > million non-strict-long-term-security use (ie the old urandom)
> >
> > - the nonblocking flag makes no sense here and would be a no-op
>
> That change is what I consider highly problematic. There are a *huge*
> number of applications which use cryptography which assumes that
> getrandom(0) means, "I'm guaranteed to get something safe
> cryptographic use". Changing his now would expose a very large number
> of applications to be insecure. Part of the problem here is that
> there are many different actors. There is the application or
> cryptographic library developer, who may want to be sure they have
> cryptographically secure random numbers. They are the ones who will
> select getrandom(0).
>
> Then you have the distribution or consumer-grade electronics
> developers who may choose to run them too early in some init script or
> systemd unit files. And some of these people may do something stupid,
> like run things too early, or omit the a hardware random number
> generator in their design, even though it's for a security critical
> purpose (say, a digital wallet for bitcoin).
Ted, you're really the expert here. My apologies though, every time I
see the words "too early" I get a cramp... Please check my earlier
reply:
https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Specifically the trace_printk log of all the getrandom(2) calls
during an standard Archlinux boot...
where is the "too early" boundary there? It's undefinable.
You either have entropy, or you don't. And if you don't, it will stay
like this forever, because if you had, you wouldn't have blocked in
the first place...
Thanks,
--
Ahmed Darwish
http://darwish.chasingpointers.com
On Mo, 16.09.19 13:21, Theodore Y. Ts'o ([email protected]) wrote:
> We could create a new flag, GRND_INSECURE, which never blocks. And
> that that allows us to solve the problem for silly applications that
> are using getrandom(2) for non-cryptographic use cases. Use cases
> might include Python dictionary seeds, gdm for MIT Magic Cookie, UUID
> generation where best efforts probably is good enough, etc. The
> answer today is they should just use /dev/urandom, since that exists
> today, and we have to support it for backwards compatibility anyway.
> It sounds like gdm recently switched to getrandom(2), and I suspect
> that it's going to get caught on some hardware configs anyway, even
> without the ext4 optimization patch. So I suspect gdm will switch
> back to /dev/urandom, and this particular pain point will probably go
> away.
The problem is that reading from /dev/urandom at a point where it's
not initialized yet results in noisy kernel logging on current
kernels. If you want people to use /dev/urandom then the logging needs
to go away, because it scares people, makes them file bug reports and
so on, even though there isn't actually any problem for these specific
purposes.
For that reason I'd prefer GRND_INSECURE I must say, because it
indicates people grokked "I know I might get questionnable entropy".
Lennart
Hi,
This is an RFC, and it obviously needs much more testing beside the
"it boots" smoke test I've just did.
Interestingly though, on my current system, the triggered WARN()
**reliably** makes the system get un-stuck... I know this is a very
crude heuristic, but I would personally prefer it to the other
proposals that were mentioned in this jumbo thread.
If I get an OK from Linus on this, I'll send a polished v5: further
real testing, kernel-parameters.txt docs, a new getrandom_wait(7)
manpage as referenced in the WARN() message, and extensions to the
getrandom(2) manpage for new getrandom2().
The new getrandom2() system call is basically a summary of Linus',
Lennart's, and Willy's proposals. Please see the patch #1 commit log,
and the "Link:" section inside it, for a rationale.
@Lennart, since you obviously represent user-space here, any further
notes on the new system call?
thanks,
Ahmed S. Darwish (1):
random: WARN on large getrandom() waits and introduce getrandom2()
drivers/char/Kconfig | 60 ++++++++++++++++++++++++--
drivers/char/random.c | 85 ++++++++++++++++++++++++++++++++-----
include/uapi/linux/random.h | 20 +++++++--
3 files changed, 148 insertions(+), 17 deletions(-)
--
http://darwish.chasingpointers.com
Since Linux v3.17, getrandom(2) has been created as a new and more
secure interface for pseudorandom data requests. It attempted to
solve three problems, as compared to /dev/urandom:
1. the need to access filesystem paths, which can fail, e.g. under a
chroot
2. the need to open a file descriptor, which can fail under file
descriptor exhaustion attacks
3. the possibility of getting not-so-random data from /dev/urandom,
due to an incompletely initialized kernel entropy pool
To solve the third point, getrandom(2) was made to block until a
proper amount of entropy has been accumulated to initialize the
CHACHA20 cipher. This basically made the system call have no
guaranteed upper-bound for its initial waiting time.
Thus when it was introduced at c6e9d6f38894 ("random: introduce
getrandom(2) system call"), it came with a clear warning: "Any
userspace program which uses this new functionality must take care to
assure that if it is used during the boot process, that it will not
cause the init scripts or other portions of the system startup to hang
indefinitely."
Unfortunately, due to multiple factors, including not having this
warning written in a scary-enough language in the manpages, and due to
glibc since v2.25 implementing a BSD-like getentropy(3) in terms of
getrandom(2), modern user-space is calling getrandom(2) in the boot
path everywhere.
Embedded Linux systems were first hit by this, and reports of embedded
systems "getting stuck at boot" began to be common. Over time, the
issue began to even creep into consumer-level x86 laptops: mainstream
distributions, like Debian Buster, began to recommend installing
haveged as a duct-tape workaround... just to let the system boot. (!)
Moreover, filesystem optimizations in EXT4 and XFS, e.g. b03755ad6f33
("ext4: make __ext4_get_inode_loc plug"), which merged directory
lookup code inode table IO, and very fast systemd boots, further
exaggerated the problem by limiting interrupt-based entropy sources.
This led to large delays until the kernel's cryptographic random
number generator (CRNG) got initialized.
Mitigate the problem, as a first step, in two ways:
1. Issue a big WARN_ON when any process gets stuck on getrandom(2)
for more than CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC seconds.
2. Introduce the new getrandom2(2) system call, with clear semantics
that can guide user-space in doing the right thing.
On the author's Thinkpad E480 x86 laptop and an ArchLinux user-space,
the ext4 commit earlier mentioned reliably blocked the system on GDM
gnome-session boot. Complain loudly through a WARN_ON if processes
get stuck on getrandom(2). Beside its obvious informational purposes,
the WARN_ON also reliably gets the system unstuck.
Set CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC to a heuristic 30-second
default value. We __deeply encourage__ system integrators and
distribution builders not to increase it much: during system boot, you
either have entropy, or you don't. And if you didn't have entropy, it
will stay like this forever, because if you had, you wouldn't have
blocked in the first place. It's an atomic "either/or" situation, with
no middle ground. Please think twice.
For the new getrandom2(2) system call, it tries to avoid the problems
introduced by its earlier siblings. As Linus mentioned several times
in the bug report thread, Linux should have never provided the
"/dev/random" and "getrandom(GRND_RANDOM)" APIs. These interfaces are
broken by design due to their almost-permanent blockage, leading to
the current misuse of /dev/urandom and getrandom(flags=0) calls. Thus
for getrandom2, introduce the flags:
1. GRND2_SECURE_UNBOUNDED_INITIAL_WAIT
2. GRND2_INSECURE
where both extract randomness __exclusively__ from the urandom source.
Due to the clear nature of its new GRND2_* flags, the getrandom2()
system call will never issue any warnings on the kernel log.
OpenBSD, to its credit, got that correctly from the start by making
both of /dev/random and /dev/urandom equivalent.
Rreported-by: Ahmed S. Darwish <[email protected]>
Link: https://lkml.kernel.org/r/20190910042107.GA1517@darwi-home-pc
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/20190917160844.GC31567@gardel-login
Link: https://lkml.kernel.org/r/CAHk-=wjABG3+daJFr4w3a+OWuraVcZpi=SMUg=pnZ+7+O0E2FA@mail.gmail.com
Link: https://lkml.kernel.org/r/CAHk-=wjQeiYu8Q_wcMgM-nAcW7KsBfG1+90DaTD5WF2cCeGCgA@mail.gmail.com
Link: https://factorable.net ("Widespread Weak Keys in Network Devices")
Link: https://man.openbsd.org/man4/random.4
Signed-off-by: Ahmed S. Darwish <[email protected]>
---
drivers/char/Kconfig | 60 ++++++++++++++++++++++++--
drivers/char/random.c | 85 ++++++++++++++++++++++++++++++++-----
include/uapi/linux/random.h | 20 +++++++--
3 files changed, 148 insertions(+), 17 deletions(-)
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index df0fc997dc3e..772765c36fc3 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -535,8 +535,6 @@ config ADI
and SSM (Silicon Secured Memory). Intended consumers of this
driver include crash and makedumpfile.
-endmenu
-
config RANDOM_TRUST_CPU
bool "Trust the CPU manufacturer to initialize Linux's CRNG"
depends on X86 || S390 || PPC
@@ -559,4 +557,60 @@ config RANDOM_TRUST_BOOTLOADER
device randomness. Say Y here to assume the entropy provided by the
booloader is trustworthy so it will be added to the kernel's entropy
pool. Otherwise, say N here so it will be regarded as device input that
- only mixes the entropy pool.
\ No newline at end of file
+ only mixes the entropy pool.
+
+config GETRANDOM_WAIT_THRESHOLD_SEC
+ int
+ default 30
+ help
+ The getrandom(2) system call, when asking for entropy from the
+ urandom source, blocks until the kernel's Cryptographic Random
+ Number Generator (CRNG) gets initialized. This configuration
+ option sets the maximum wait time, in seconds, for a process
+ to get blocked on such a system call before the kernel issues
+ a loud warning. Rationale follows:
+
+ When the getrandom(2) system call was created, it came with
+ the clear warning: "Any userspace program which uses this new
+ functionality must take care to assure that if it is used
+ during the boot process, that it will not cause the init
+ scripts or other portions of the system startup to hang
+ indefinitely.
+
+ Unfortunately, due to multiple factors, including not having
+ this warning written in a scary-enough language in the
+ manpages, and due to glibc since v2.25 implementing a BSD-like
+ getentropy(3) in terms of getrandom(2), modern user-space is
+ calling getrandom(2) in the boot path everywhere.
+
+ Embedded Linux systems were first hit by this, and reports of
+ embedded system "getting stuck at boot" began to be
+ common. Over time, the issue began to even creep into consumer
+ level x86 laptops: mainstream distributions, like Debian
+ Buster, began to recommend installing haveged as a workaround,
+ just to let the system boot.
+
+ Filesystem optimizations in EXT4 and XFS exagerated the
+ problem, due to aggressive batching of IO requests, and thus
+ minimizing sources of entropy at boot. This led to large
+ delays until the kernel's CRNG got initialized.
+
+ System integrators and distribution builderss are not
+ encouraged to considerably increase this value: during system
+ boot, you either have entropy, or you don't. And if you didn't
+ have entropy, it will stay like this forever, because if you
+ had, you wouldn't have blocked in the first place. It's an
+ atomic "either/or" situation, with no middle ground. Please
+ think twice.
+
+ Ideally, systems would be configured with hardware random
+ number generators, and/or configured to trust the CPU-provided
+ RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
+ (CONFIG_RANDOM_TRUST_BOOTLOADER). In addition, userspace
+ should generate cryptographic keys only as late as possible,
+ when they are needed, instead of during early boot. For
+ non-cryptographic use cases, such as dictionary seeds or MIT
+ Magic Cookies, the getrandom2(GRND2_INSECURE) system call,
+ or even random(3), may be more appropropriate.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 566922df4b7b..74057e496303 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -322,6 +322,7 @@
#include <linux/interrupt.h>
#include <linux/mm.h>
#include <linux/nodemask.h>
+#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/kthread.h>
#include <linux/percpu.h>
@@ -854,12 +855,21 @@ static void invalidate_batched_entropy(void);
static void numa_crng_init(void);
static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static int getrandom_wait_threshold __ro_after_init =
+ CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC;
+
static int __init parse_trust_cpu(char *arg)
{
return kstrtobool(arg, &trust_cpu);
}
early_param("random.trust_cpu", parse_trust_cpu);
+static int __init parse_getrandom_wait_threshold(char *arg)
+{
+ return kstrtoint(arg, 0, &getrandom_wait_threshold);
+}
+early_param("random.getrandom_wait_threshold", parse_getrandom_wait_threshold);
+
static void crng_initialize(struct crng_state *crng)
{
int i;
@@ -1960,7 +1970,7 @@ random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
}
static ssize_t
-urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+_urandom_read(char __user *buf, size_t nbytes, bool warn_on_noninited_crng)
{
unsigned long flags;
static int maxwarn = 10;
@@ -1968,7 +1978,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
if (!crng_ready() && maxwarn > 0) {
maxwarn--;
- if (__ratelimit(&urandom_warning))
+ if (warn_on_noninited_crng && __ratelimit(&urandom_warning))
printk(KERN_NOTICE "random: %s: uninitialized "
"urandom read (%zd bytes read)\n",
current->comm, nbytes);
@@ -1982,6 +1992,12 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
return ret;
}
+static ssize_t
+urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+{
+ return _urandom_read(buf, nbytes, true);
+}
+
static __poll_t
random_poll(struct file *file, poll_table * wait)
{
@@ -2118,11 +2134,41 @@ const struct file_operations urandom_fops = {
.llseek = noop_llseek,
};
-SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
- unsigned int, flags)
+static int getrandom_wait(char __user *buf, size_t count,
+ bool warn_on_large_wait)
{
+ unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
int ret;
+ if (warn_on_large_wait && (getrandom_wait_threshold > 0))
+ timeout = HZ * getrandom_wait_threshold;
+
+ do {
+ ret = wait_event_interruptible_timeout(crng_init_wait,
+ crng_ready(),
+ timeout);
+ if (ret < 0)
+ return ret;
+
+ if (ret == 0) {
+ WARN(1, "random: %s[%d]: getrandom(%zu bytes) "
+ "is blocked for more than %d seconds. Check "
+ "getrandom_wait(7)\n", current->comm,
+ task_pid_nr(current), count,
+ getrandom_wait_threshold);
+
+ /* warn once per caller */
+ timeout = MAX_SCHEDULE_TIMEOUT;
+ }
+
+ } while (ret == 0);
+
+ return _urandom_read(buf, count, true);
+}
+
+SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
+ unsigned int, flags)
+{
if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
return -EINVAL;
@@ -2132,14 +2178,31 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
if (flags & GRND_RANDOM)
return _random_read(flags & GRND_NONBLOCK, buf, count);
- if (!crng_ready()) {
- if (flags & GRND_NONBLOCK)
+ if ((flags & GRND_NONBLOCK) && !crng_ready())
return -EAGAIN;
- ret = wait_for_random_bytes();
- if (unlikely(ret))
- return ret;
- }
- return urandom_read(NULL, buf, count, NULL);
+
+ return getrandom_wait(buf, count, true);
+}
+
+SYSCALL_DEFINE3(getrandom2, char __user *, buf, size_t, count,
+ unsigned int, flags)
+{
+ if (flags & ~(GRND2_SECURE_UNBOUNDED_INITIAL_WAIT|GRND2_INSECURE))
+ return -EINVAL;
+
+ if (flags & (GRND2_SECURE_UNBOUNDED_INITIAL_WAIT|GRND2_INSECURE))
+ return -EINVAL;
+
+ if (count > INT_MAX)
+ count = INT_MAX;
+
+ if (flags & GRND2_SECURE_UNBOUNDED_INITIAL_WAIT)
+ return getrandom_wait(buf, count, false);
+
+ if (flags & GRND2_INSECURE)
+ return _urandom_read(buf, count, false);
+
+ unreachable();
}
/********************************************************************
diff --git a/include/uapi/linux/random.h b/include/uapi/linux/random.h
index 26ee91300e3e..3f09a8f6aff3 100644
--- a/include/uapi/linux/random.h
+++ b/include/uapi/linux/random.h
@@ -8,6 +8,7 @@
#ifndef _UAPI_LINUX_RANDOM_H
#define _UAPI_LINUX_RANDOM_H
+#include <linux/bits.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#include <linux/irqnr.h>
@@ -23,7 +24,7 @@
/* Get the contents of the entropy pool. (Superuser only.) */
#define RNDGETPOOL _IOR( 'R', 0x02, int [2] )
-/*
+/*
* Write bytes into the entropy pool and add to the entropy count.
* (Superuser only.)
*/
@@ -50,7 +51,20 @@ struct rand_pool_info {
* GRND_NONBLOCK Don't block and return EAGAIN instead
* GRND_RANDOM Use the /dev/random pool instead of /dev/urandom
*/
-#define GRND_NONBLOCK 0x0001
-#define GRND_RANDOM 0x0002
+#define GRND_NONBLOCK BIT(0)
+#define GRND_RANDOM BIT(1)
+
+/*
+ * Flags for getrandom2(2)
+ *
+ * GRND2_SECURE Use urandom pool, block until CRNG is inited
+ * GRND2_INSECURE Use urandom pool, never block even if CRNG isn't inited
+ *
+ * NOTE: don't mix flag values with GRND, to protect against the
+ * security implications of users passing the invalid flag family
+ * to system calls (GRND_* vs. GRND2_*).
+ */
+#define GRND2_SECURE_UNBOUNDED_INITIAL_WAIT BIT(7)
+#define GRND2_INSECURE BIT(8)
#endif /* _UAPI_LINUX_RANDOM_H */
--
Ahmed Darwish
http://darwish.chasingpointers.com
(Adding linux-api since this patch proposes an API change; both by
changing the existing behavior, and adding new flags and possibly a
new system call.)
On Wed, Sep 18, 2019 at 04:57:58PM -0700, Linus Torvalds wrote:
> On Wed, Sep 18, 2019 at 2:17 PM Ahmed S. Darwish <[email protected]> wrote:
> >
> > Since Linux v3.17, getrandom(2) has been created as a new and more
> > secure interface for pseudorandom data requests. It attempted to
> > solve three problems, as compared to /dev/urandom:
>
> I don't think your patch is really _wrong_, but I think it's silly to
> introduce a new system call, when we have 30 bits left in the flags of
> the old one, and the old system call checked them.
The only reason to introduce a new system call is if we were going to
keep the existing behavior of getrandom. Given that the patch changes
what getrandom(0), I agree there's no point to adding a new system
call.
> There is *one* other small semantic change: The old code did
> urandom_read() which added warnings, but each warning also _reset_ the
> crng_init_cnt. Until it decided not to warn any more, at which point
> it also stops that resetting of crng_init_cnt.
>
> And that reset of crng_init_cnt, btw, is some cray cray.
>
> It's basically a "we used up entropy" thing, which is very
> questionable to begin with as the whole discussion has shown, but
> since it stops doing it after 10 cases, it's not even good security
> assuming the "use up entropy" case makes sense in the first place.
It was a bug that it stopped doing it after 10 tries, and there's a
really good reason for it. Yes, the "using up entropy" thing doesn't
make much sense in the general case. But we still need some threshold
for deciding whether or not it's been sufficiently initialized such
that we consider the CRNG initialized.
The reason for zeroing it after we expose state is because otherwise
if the pool starts in a known state (the attacker knows the starting
configuration, knows the DMI table that we're mixing into the pool
since that's a constant, etc.), then after we've injected a small
amount of uncertainty in the pool --- say, we started with a single
known state of the pool, and after injecting some randomness, there
are 64 possible states of the pool. If the attacker can read from
/dev/urandom, the attacker can know which of the 64 possible states of
the pool it's in. Now suppose we inject more uncertainty, so that
there's another 64 unknown states, and the attacker is able to
constantly read from /dev/urandom in a tight loop; it'll be able to
keep up with the injection of entropy insertion, and so even though
we've injected 256 "bits" of uncertainty, the attacker will still know
the state of the pool. That's why when we read from the pool, we need
to clear the entropy bits.
This is sometimes called a "state extension attack", and there have
been attacks that have been carried out against RNG's that's don't
protect against it. What happened is when I added the rate-limiting
to the uninitialized /dev/urandom warning, I accidentally wiped out
the protection. But it was there for a reason.
> And the new cases are defined to *not* warn. In particular,
> GRND_INSECURE very much does *not* warn about early urandom access
> when crng isn't ready. Because the whole point of that new mode is
> that the user knows it isn't secure.
>
> So that should make getrandom(GRND_INSECURE) palatable to the systemd
> kind of use that wanted to avoid the pointless kernel warning.
Yes, that's clearly the right thing to do. I do think we need to
restore the state extension attack protections, though.
> + /*
> + * People are really confused about whether
> + * this is secure or insecure. Traditional
> + * behavior is secure, but there are users
> + * who clearly didn't want that, and just
> + * never thought about it.
> + */
> + case 0:
> ret = wait_for_random_bytes();
> - if (unlikely(ret))
> + if (ret)
> return ret;
> + break;
I'm happy this proposed is not changing the behavior of getrandom(0).
Why not just remap 0 to GRND_EXPLICIT | GRND_WAIT_ENTROPY, though? It
will have the same effect, and it's make it clear what we're doing.
Later on, when we rip out /dev/random pool code (and make reading from
/dev/random the equivalent of getrandom(GRND_SECURE)), we'll need to
similarly map the legacy combination of flags for GRND_RANDOM and
GRND_RANDOM | GRND_NONBLOCK.
- Ted
On Thu, Sep 19, 2019 at 7:34 AM Theodore Y. Ts'o <[email protected]> wrote:
>
> > It's basically a "we used up entropy" thing, which is very
> > questionable to begin with as the whole discussion has shown, but
> > since it stops doing it after 10 cases, it's not even good security
> > assuming the "use up entropy" case makes sense in the first place.
>
> It was a bug that it stopped doing it after 10 tries, and there's a
> really good reason for it.
I really doubt that.
> The reason for zeroing it after we expose state is because otherwise
> if the pool starts in a known state (the attacker knows the starting
> configuration, knows the DMI table that we're mixing into the pool
> since that's a constant, etc.),
That's at least partly because our pool hashing has what looks a
fairly sad property.
Yes, it hashes it using a good hash, but it does so in a way that
makes it largely possible to follow the hashing and repeat it and
analyze it.
That breaks if we have hw randomness, because it does the
if (arch_get_random_long(&v))
crng->state[14] ^= v;
so it always mixes in hardware randomness as part of the extraction,
but we don't mix anything else unpredictable - or even
process-specific - state in. So without hw randomness, you can try to
get a lot of data over a lot of boots - and for long times during
boots - and maybe find the pattern.
But honestly, this isn't realistic. I can point to emails where *you*
are arguing against other hashing algorithms because the whole state
extension attack simply isn't realistic.
And I think it's also pretty questionable how we don't try to mix in
anything timing/process-specific when extracting, which is what makes
that "do lots of boots" possible.
The silly "reset crng_init_cnt" does absolutely nothing to help that,
but in fact what it does is to basically give the attacker a way to
get an infinite stream of data without any reseeding (because that
only happens after crng_read()), and able to extend that "block at
boot" time indefinitely while doing so.
Also honestly, if the attacker already has access to the system at
boot, you have some fairly big problems to begin with.
So a much bigger issue than the state extension attack (pretty much
purely theoretical, given any entropy at all, which we _will_ have
even without the crng_init_cnt clearing) is the fact that right now we
really are predictable if there are no hardware interrupts, and people
have used /dev/urandom because other sources weren't useful.
And the fact is, we *know* people use /dev/urandom exactly because
other sources haven't been useful.
And unlike your theoretical state extension attack, I can point you to
black hat presentations that literally talk about using the fact that
we delay m,ixing in the input pull hash to know what's going on:
https://www.blackhat.com/docs/eu-14/materials/eu-14-Kedmi-Attacking-The-Linux-PRNG-On-Android-Weaknesses-In-Seeding-Of-Entropic-Pools-And-Low-Boot-Time-Entropy.pdf
That's a real attack. Based on the REAL fact that we currently have to
use the urandom logic because the entropy-waiting one is useless, and
in fact depends on the re-seeding happening too late.
Yes, yes, our urandom has changed since that attack, and we use chacha
instead of sha1 these days. We have other changes too. But I don't see
anything fundamentally different.
And all your arguments seem to make that _real_ security issue just
worse, exactly because we also avoid reseeding while crng_init is
zero.
> I'm happy this proposed is not changing the behavior of getrandom(0).
> Why not just remap 0 to GRND_EXPLICIT | GRND_WAIT_ENTROPY, though? It
> will have the same effect, and it's make it clear what we're doing.
Have you you not followed the whole discussion? Didn't you read the comment?
People use "getrandom(0)" not because they want secure randomness, but
because that's the default.
And we *will* do something about it. This patch didn't, because I want
to be able to backport it to stable, so that everybody is happier with
saying "ok, I'll use the new getrandom(GRND_INSECURE)".
Because getrandom(0) will NOT be the the same as GRND_EXPLICIT |
GRND_WAIT_ENTROPY.
getrandom(0) is the "I don't know what I am doing" thing. It could be
somebody that wants real secure random numbers. Or it could *not* be
one of those, and need the timeout.
> Later on, when we rip out /dev/random pool code (and make reading from
> /dev/random the equivalent of getrandom(GRND_SECURE)), we'll need to
> similarly map the legacy combination of flags for GRND_RANDOM and
> GRND_RANDOM | GRND_NONBLOCK.
And that is completely immaterial, because the "I'm confused" case
isn't about GRND_RANDOM. Nobody uses that anyway, and more importantly
it's not the case that has caused bugs. That one blocks even during
normal execution, so that one - despite being completely useless -
actually has the one good thing going for it that it's testable.
People will see the "oh, that took a long time" during testing. And
then they'll stop using it.
Ted - you really don't seem to be making any distinction between
"these are real problems that should be fixed" vs "this is theory that
isn't relevant".
The "getrandom(0)" is a real problem that needs to be fixed.
The warnings from /dev/urandom are real problems that people
apparently have worked around by (incorrectly) using getrandom(0).
The "hashing the random pool still leaves identities in place" is a
real problem that had a real attack.
The state extension attack? Complete theory (again, I can point to you
saying the same thing in other threads), and the "fix" of resetting
the counter and not reseeding seems to be anything but.
Linus
On Thu, Sep 19, 2019 at 8:20 AM Linus Torvalds
<[email protected]> wrote:
>
> The silly "reset crng_init_cnt" does absolutely nothing to help that,
> but in fact what it does is to basically give the attacker a way to
> get an infinite stream of data without any reseeding (because that
> only happens after crng_read()), and able to extend that "block at
> boot" time indefinitely while doing so.
.. btw, instead of bad workarounds for a theoretical attack, here's
something that should add actual *practical* real value: use the time
of day (whether from an RTC device, or from ntp) to add noise to the
random pool.
If you let attackers in before you've set the clock on the device,
you're doing something seriously wrong.
And while this doesn't add much "serious" entropy, it does mean that
the whole "let's look for identical state" which is a _real_ attack,
goes out the window.
In other words, this is about real security, not academic papers.
Of course, attackers can still see possible bad random values from
before the clock was set (possibly from things like TCP sequence
numbers etc, orfrom that AT_RANDOM of a very early process, which was
part of the Android the attack). But doing things like delaying
reseeding sure isn't helping, which is what the crng_count reset does.
Linus
On Thu, Sep 19, 2019 at 8:20 AM Linus Torvalds
<[email protected]> wrote:
>
> Yes, it hashes it using a good hash, but it does so in a way that
> makes it largely possible to follow the hashing and repeat it and
> analyze it.
>
> That breaks if we have hw randomness, because it does the
>
> if (arch_get_random_long(&v))
> crng->state[14] ^= v;
>
> so it always mixes in hardware randomness as part of the extraction,
> but we don't mix anything else unpredictable - or even
> process-specific - state in.
So this is the other actual _serious_ patch I'd suggest: replace the
if (arch_get_random_long(&v))
crng->state[14] ^= v;
with
if (!arch_get_random_long(&v))
v = random_get_entropy();
crng->state[14] += v;
instead. Yeah, it still doesn't help on machines that don't even have
a cycle counter, but it at least means that you don't have to have a
CPU rdrand (or equivalent) but you do have a cycle counter, now the
extraction of randomness from the pool doesn't just do the
(predictable) mutation for the backtracking, but actually means that
you have some very hard to predict timing effects.
Again, in this case a cycle counter really does add a small amount of
entropy (everybody agrees that modern CPU's are simply too complex to
be predictable at a cycle level), but that's not really the point. The
point is that now doing the extraction really fundamentally changes
the state in unpredictable ways, so that you don't have that "if I
recognize a value, I know what the next value will be" kind of attack.
Which, as mentioned, is actually not a purely theoretical concern.
Note small detail above: I changed the ^= to a +=. Addition tends to
be better (due to carry between bits) when there might be bit
commonalities. Particularly with something like a cycle count where
two xors can mostly cancel out previous bits rather than move bits
around in the word.
With an actual random input from rdrand, the xor-vs-add is immaterial
and doesn't matter, of course, so the old code made sense in that
context.
In the attached patch I also moved the arch_get_random_long() and
random_get_entropy() to outside the crng spinlock. We're not talking
blocking operations, but it can easily be hundreds of cycles with
rdrand retries, or the random_get_entropy() reading an external clock
on some architectures.
Linus
20.09.2019 01:04, Linus Torvalds пишет:
> instead. Yeah, it still doesn't help on machines that don't even have
> a cycle counter, but it at least means that you don't have to have a
> CPU rdrand (or equivalent) but you do have a cycle counter, now the
> extraction of randomness from the pool doesn't just do the
> (predictable) mutation for the backtracking, but actually means that
> you have some very hard to predict timing effects.
>
> Again, in this case a cycle counter really does add a small amount of
> entropy (everybody agrees that modern CPU's are simply too complex to
> be predictable at a cycle level), but that's not really the point. The
> point is that now doing the extraction really fundamentally changes
> the state in unpredictable ways, so that you don't have that "if I
> recognize a value, I know what the next value will be" kind of attack.
This already resembles in-kernel haveged (except that it doesn't credit
entropy), and Willy Tarreau said "collect the small entropy where it is,
period" today. So, too many people touched upon the topic in one day,
and therefore I'll bite.
We already have user-space software (haveged and modern versions of
rngd) that extract supposed entropy from clock jitter and feed it back
to the kernel via /dev/random (crediting it). Indeed, at present, on
some hardware this is the only way for distributions and users to
collect enough entropy during boot and avoid stalls - all other
suggestions are simply non-constructive. Also, Google's Fuchsia OS does
use and credit jitter entropy.
For the record: I do not have a justifiable opinion whether haveged/rngd
output (known as jitter entropy) actually contains any entropy. I
understand that there are two possible viewpoints here. The rest of the
email is written under the assumption that haveged does provide real
entropy and not fake one.
The problem that I have with the current situation is that distributions
and users, when they set up their systems to run haveged or rngd, often
do it incorrectly (even, as mentioned, under the assumption that haveged
is something valid and useful). The most common mistake is relying on
systemd-provided default dependencies, thus not starting such software
as early as possible. Even worse, no initramfs generator allows one to
easily include haveged/rngd in the initramfs and run it there. And for
me, the first urandom warning comes from the initramfs, so anything
started from the main system is, arguably, already too late.
Therefore, I think, an in-kernel hwrng that exposes jitter entropy is
something useful (for those who agree that jitter entropy is not fake),
because it avoids the pitfall-ridden userspace setup. Just as an
exercise, I have implemented a very simple driver (attached as a patch)
that does just that. I am only half-serious here, the driver is only
lightly tested in KVM without any devices except an unconnected virtio
network card, not on any real hardware. Someone else can also find it
useful as a test/fake hwrng driver.
I am aware that there was an earlier decision that jitter entropy should
not be credited, i.e. effectively a pre-existing NAK from Theodore Ts'o.
But, well, distributions are already overriding this decision in
userspace, and do it badly, so in my viewpoint, the driver would be a
net win if some mechanism is added that makes it a no-op by default even
if the driver is built-in. E.g. an explicit "enable" parameter, but I am
open to other suggestions, too.
--
Alexander E. Patrakov
On Thu, Sep 19, 2019 at 1:45 PM Alexander E. Patrakov
<[email protected]> wrote:
>
> This already resembles in-kernel haveged (except that it doesn't credit
> entropy), and Willy Tarreau said "collect the small entropy where it is,
> period" today. So, too many people touched upon the topic in one day,
> and therefore I'll bite.
I'm one of the people who aren't entirely convinced by the jitter
entropy - I definitely believe it exists, I just am not necessarily
convinced about the actual entropy calculations.
So while I do think we should take things like the cycle counter into
account just because I think it's a a useful way to force some noise,
I am *not* a huge fan of the jitter entropy driver either, because of
the whole "I'm not convinced about the amount of entropy".
The whole "third order time difference" thing would make sense if the
time difference was some kind of smooth function - which it is at a
macro level.
But at a micro level, I could easily see the time difference having
some very simple pattern - say that your cycle counter isn't really
cycle-granular, and the load takes 5.33 "cycles" and you see a time
difference pattern of (5, 5, 6, 5, 5, 6, ...). No real entropy at all
there, it is 100% reliable.
At a macro level, that's a very smooth curve, and you'd say "ok, time
difference is 5.3333 (repeating)". But that's not what the jitter
entropy code does. It just does differences of differences.
And that completely non-random pattern has a first-order difference of
0, 1, 1, 0, 1, 1.. and a second order of 1, 0, 1, 1, 0, and so on
forever. So the "jitter entropy" logic will assign that completely
repeatable thing entropy, because the delta difference doesn't ever go
away.
Maybe I misread it.
We used to (we still do, but we used to too) do that same third-order
delta difference ourselves for the interrupt timing entropy estimation
in add_timer_randomness(). But I think it's more valid with something
that likely has more noise (interrupt timing really _should_ be
noisy). It's not clear that the jitterentropy load really has all that
much noise.
That said, I'm _also_ not a fan of the user mode models - they happen
too late anyway for some users, and as you say, it leaves us open to
random (heh) user mode distribution choices that may be more or less
broken.
I would perhaps be willing to just put my foot down, and say "ok,
we'll solve the 'getrandom(0)' issue by just saying that if that
blocks too much, we'll do the jitter entropy thing".
Making absolutely nobody happy, but working in practice. And maybe
encouraging the people who don't like jitter entropy to use
GRND_SECURE instead.
Linus
20.09.2019 02:47, Linus Torvalds пишет:
> On Thu, Sep 19, 2019 at 1:45 PM Alexander E. Patrakov
> <[email protected]> wrote:
>>
>> This already resembles in-kernel haveged (except that it doesn't credit
>> entropy), and Willy Tarreau said "collect the small entropy where it is,
>> period" today. So, too many people touched upon the topic in one day,
>> and therefore I'll bite.
>
> I'm one of the people who aren't entirely convinced by the jitter
> entropy - I definitely believe it exists, I just am not necessarily
> convinced about the actual entropy calculations.
>
> So while I do think we should take things like the cycle counter into
> account just because I think it's a a useful way to force some noise,
> I am *not* a huge fan of the jitter entropy driver either, because of
> the whole "I'm not convinced about the amount of entropy".
>
> The whole "third order time difference" thing would make sense if the
> time difference was some kind of smooth function - which it is at a
> macro level.
>
> But at a micro level, I could easily see the time difference having
> some very simple pattern - say that your cycle counter isn't really
> cycle-granular, and the load takes 5.33 "cycles" and you see a time
> difference pattern of (5, 5, 6, 5, 5, 6, ...). No real entropy at all
> there, it is 100% reliable.
>
> At a macro level, that's a very smooth curve, and you'd say "ok, time
> difference is 5.3333 (repeating)". But that's not what the jitter
> entropy code does. It just does differences of differences.
>
> And that completely non-random pattern has a first-order difference of
> 0, 1, 1, 0, 1, 1.. and a second order of 1, 0, 1, 1, 0, and so on
> forever. So the "jitter entropy" logic will assign that completely
> repeatable thing entropy, because the delta difference doesn't ever go
> away.
>
> Maybe I misread it.
You didn't. Let me generalize and rephrase the part of the concern that
I agree with, in my own words:
The same code is used in cryptoapi rng, and also a userspace version
exists. These two have been tested by the author via the "dieharder"
tool (see the message for commit d9d67c87), so we know that on his
machine it actually produces good-quality random bits. However, the
in-kernel self-test is much, much weaker, and would not catch the
situation when someone's machine is deterministic in a way that you
describe, or something similar.
OTOH, I thought that at least part of the real entropy, if it exists,
comes from the interference of the CPU's memory accesses with the
refresh cycles that are clocked from an independent oscillator. That's
why (in order to catch more of them before declaring the crng
initialized) I have set the quality to the minimum possible that is
guaranteed to be distinct from zero according to the fixed-point math in
hwrng_fillfn() in drivers/char/hw_random/core.c.
>
> We used to (we still do, but we used to too) do that same third-order
> delta difference ourselves for the interrupt timing entropy estimation
> in add_timer_randomness(). But I think it's more valid with something
> that likely has more noise (interrupt timing really _should_ be
> noisy). It's not clear that the jitterentropy load really has all that
> much noise.
>
> That said, I'm _also_ not a fan of the user mode models - they happen
> too late anyway for some users, and as you say, it leaves us open to
> random (heh) user mode distribution choices that may be more or less
> broken.
>
> I would perhaps be willing to just put my foot down, and say "ok,
> we'll solve the 'getrandom(0)' issue by just saying that if that
> blocks too much, we'll do the jitter entropy thing".
>
> Making absolutely nobody happy, but working in practice. And maybe
> encouraging the people who don't like jitter entropy to use
> GRND_SECURE instead.
I think this approach makes sense. For those who don't believe in jitter
entropy, it changes really nothing (except a one-time delay) to Ahmed's
first patch that makes getrandom(0) equivalent to /dev/urandom, and
nobody so far proposed anything better that doesn't break existing
systems. And for those who do believe in jitter entropy, this makes the
situation as good as in OpenBSD.
--
Alexander E. Patrakov
20.09.2019 03:23, Alexander E. Patrakov пишет:
> 20.09.2019 02:47, Linus Torvalds пишет:
>> On Thu, Sep 19, 2019 at 1:45 PM Alexander E. Patrakov
>> <[email protected]> wrote:
>>>
>>> This already resembles in-kernel haveged (except that it doesn't credit
>>> entropy), and Willy Tarreau said "collect the small entropy where it is,
>>> period" today. So, too many people touched upon the topic in one day,
>>> and therefore I'll bite.
>>
>> I'm one of the people who aren't entirely convinced by the jitter
>> entropy - I definitely believe it exists, I just am not necessarily
>> convinced about the actual entropy calculations.
>>
>> So while I do think we should take things like the cycle counter into
>> account just because I think it's a a useful way to force some noise,
>> I am *not* a huge fan of the jitter entropy driver either, because of
>> the whole "I'm not convinced about the amount of entropy".
>>
>> The whole "third order time difference" thing would make sense if the
>> time difference was some kind of smooth function - which it is at a
>> macro level.
>>
>> But at a micro level, I could easily see the time difference having
>> some very simple pattern - say that your cycle counter isn't really
>> cycle-granular, and the load takes 5.33 "cycles" and you see a time
>> difference pattern of (5, 5, 6, 5, 5, 6, ...). No real entropy at all
>> there, it is 100% reliable.
>>
>> At a macro level, that's a very smooth curve, and you'd say "ok, time
>> difference is 5.3333 (repeating)". But that's not what the jitter
>> entropy code does. It just does differences of differences.
>>
>> And that completely non-random pattern has a first-order difference of
>> 0, 1, 1, 0, 1, 1.. and a second order of 1, 0, 1, 1, 0, and so on
>> forever. So the "jitter entropy" logic will assign that completely
>> repeatable thing entropy, because the delta difference doesn't ever go
>> away.
>>
>> Maybe I misread it.
>
> You didn't. Let me generalize and rephrase the part of the concern that
> I agree with, in my own words:
>
> The same code is used in cryptoapi rng, and also a userspace version
> exists. These two have been tested by the author via the "dieharder"
> tool (see the message for commit d9d67c87), so we know that on his
> machine it actually produces good-quality random bits. However, the
> in-kernel self-test is much, much weaker, and would not catch the
> situation when someone's machine is deterministic in a way that you
> describe, or something similar.
A constructive suggestion here would be to put the first few thousands
(ok, a completely made up number) raw timing intervals through a "gzip
compression test" in addition to the third derivative test, just based
on what we already have in the kernel.
--
Alexander E. Patrakov
On Thu, Sep 19, 2019 at 08:20:57AM -0700, Linus Torvalds wrote:
> And unlike your theoretical state extension attack, I can point you to
> black hat presentations that literally talk about using the fact that
> we delay m,ixing in the input pull hash to know what's going on:
>
> https://www.blackhat.com/docs/eu-14/materials/eu-14-Kedmi-Attacking-The-Linux-PRNG-On-Android-Weaknesses-In-Seeding-Of-Entropic-Pools-And-Low-Boot-Time-Entropy.pdf
>
> That's a real attack. Based on the REAL fact that we currently have to
> use the urandom logic because the entropy-waiting one is useless, and
> in fact depends on the re-seeding happening too late.
Actually, that particular case proves my point.
In that particular attack was against Android 4.3 (Android KitKat).
In the 3.4 kernel used by KitKat, before the urandom pool is
considered initialized, 100% of the entropy from
add_interrupt_randomness() goes to the urandom pool, NOT the input
pool. add_device_entropy() also fed the urandom pool. And on an
Android device, it doesn't have a keyboard, mouse, or spinning HDD, so
add_timer_randomness() and add_disk_randomness() weren't a factor.
The real problem was that the Android zygote process sampled the the
urandom pool too early, and what the attack did was essentially one
where they were trying to determine the state of the pool by looking
at that sampled output of /dev/urandom.
If we make getrandom(0) work like /dev/urandom, it doesn't solve the
problem, because if you read from the entropy pool before we can get
high quality randomness, you're screwed. The only real answers are
(a) try to get better entropy early, or (b) get userspace to wait
until it's safe to read from /dev/urandom.
Long-term, (a) is the only real way to solve the problem, and whether
you trust the bootloader, or trust the built-in hardware random number
generator (whether it's RDRAND, or some secure element in the device,
etc), we can't control userspace. We can try to enforce userspace to
be safe by blocking, but that makes people unhappy. We can certainly
try to influence userspace by annoying them with WARN() stack traces
in the logs, and hope they pay attention, but that's not guaranteed.
> But honestly, this isn't realistic. I can point to emails where *you*
> are arguing against other hashing algorithms because the whole state
> extension attack simply isn't realistic.
The blackhat presentation which you pointed at *was* actually a state
extension attack. When I argued against state extension attacks, that
was in cases where people worried about recovery after the pool is
exposed --- and my argument was if you can read from kernel memory
enough to grab the pool state, you have other problems. Your
observation that if you can install malware that runs at system
initscript/userspace bootup time, you probably have other problems, is
a similar argument, and it's a fair one. But it *has* happened, as
the blackhat paper demonstrates.
My thinking at the time is that if people are reading from the CRNG
before it's initialized (which could only happen via /dev/urandom),
that was kind of a disaster anyway, so resetting the initialization
count would at least get us to the point where when the CRNG *was*
declared to be initialized, that was something could state with high
confidence that we were in a secure state.
> > I'm happy this proposed is not changing the behavior of getrandom(0).
> > Why not just remap 0 to GRND_EXPLICIT | GRND_WAIT_ENTROPY, though? It
> > will have the same effect, and it's make it clear what we're doing.
>
> Have you you not followed the whole discussion? Didn't you read the comment?
>
> People use "getrandom(0)" not because they want secure randomness, but
> because that's the default.
>
> And we *will* do something about it. This patch didn't, because I want
> to be able to backport it to stable, so that everybody is happier with
> saying "ok, I'll use the new getrandom(GRND_INSECURE)".
>
> Because getrandom(0) will NOT be the the same as GRND_EXPLICIT |
> GRND_WAIT_ENTROPY.
No, I did read the comment. And I agree that at the moment, that yes,
it is ambiguous. What I really care about though, is the HUGE
DEPLOYED BASE which is using getrandom(0) *because* they are
generating cryptographic keys, and we will be changing things out from
under them.
We agree that we don't want to change things out from under the stable
users. I'm pleading that we not screw over existing userspace --- at
least not right away. Give them *time* to release update their source
bases to use getrandom(GRND_SECURE). So what if we make getrandom(0)
print a ratelimited KERN_ERR deprecation notice that program should
switch to either specify either GRND_INSECURE or GRND_SECURE, and not
change the current semantics of getrandom(0) for some period of time?
Say, a year. Or even six months.
If that's not good enough, what if we change getrandom(0) immediately,
but only for those platforms which have a functional
arch_get_random_long() or random_get_entropy()? That gets us the x86
platform, which is where pretty much all of the users who have
complained have been coming from. For the IOT/embedded user cases,
blocking is actually a feature, because the problem will be caught
while the product is in development, when the userspace code can be
fixed.
- Ted
On Thu, Sep 19, 2019 at 08:50:15AM -0700, Linus Torvalds wrote:
> .. btw, instead of bad workarounds for a theoretical attack, here's
> something that should add actual *practical* real value: use the time
> of day (whether from an RTC device, or from ntp) to add noise to the
> random pool.
Actally, we used to seed the pool from the RTC device --- that was the
case in the 3.4 kernel referenced by the Blackhat attack, and it
didn't stop the researchers. In later kernels, we moved up when
rand_initialized() got called to before time_init(), so
init_std_data() was no longer seeding the pool from the RTC clock.
That being said, adding calls to add_device_randomness() to
do_settimeofday64() and timekeeping_inject_offset() is an obviously
good thing to do. I'll prepare a separate patch for the random.git
tree to do that.
- Ted
On Fri, Sep 20, 2019 at 03:23:58AM +0500, Alexander E. Patrakov wrote:
> OTOH, I thought that at least part of the real entropy, if it exists, comes
> from the interference of the CPU's memory accesses with the refresh cycles
> that are clocked from an independent oscillator.
That's not a valid assumption; on *many* systems, there is only a
single master oscillator. It saves on power, parts cost, reduces the
amount of RF interference, etc.
- Ted
Hi,
On Wed, Sep 18, 2019 at 04:57:58PM -0700, Linus Torvalds wrote:
> On Wed, Sep 18, 2019 at 2:17 PM Ahmed S. Darwish <[email protected]> wrote:
> >
> > Since Linux v3.17, getrandom(2) has been created as a new and more
> > secure interface for pseudorandom data requests. It attempted to
> > solve three problems, as compared to /dev/urandom:
>
> I don't think your patch is really _wrong_, but I think it's silly to
> introduce a new system call, when we have 30 bits left in the flags of
> the old one, and the old system call checked them.
>
> So it's much simpler and more straightforward to just introduce a
> single new bit #2 that says "I actually know what I'm doing, and I'm
> explicitly asking for secure/insecure random data".
>
> And then say that the existing bit #1 just means "I want to wait for entropy".
>
> So then you end up with this:
>
> /*
> * Flags for getrandom(2)
> *
> * GRND_NONBLOCK Don't block and return EAGAIN instead
> * GRND_WAIT_ENTROPY Explicitly wait for entropy
> * GRND_EXPLICIT Make it clear you know what you are doing
> */
> #define GRND_NONBLOCK 0x0001
> #define GRND_WAIT_ENTROPY 0x0002
> #define GRND_EXPLICIT 0x0004
>
> #define GRND_SECURE (GRND_EXPLICIT | GRND_WAIT_ENTROPY)
> #define GRND_INSECURE (GRND_EXPLICIT | GRND_NONBLOCK)
>
> /* Nobody wants /dev/random behavior, nobody should use it */
> #define GRND_RANDOM 0x0002
>
> which is actually fairly easy to understand. So now we have three
> bits, and the values are:
>
> 000 - ambiguous "secure or just lazy/ignorant"
> 001 - -EAGAIN or secure
> 010 - blocking /dev/random DO NOT USE
> 011 - nonblocking /dev/random DO NOT USE
> 100 - nonsense, returns -EINVAL
> 101 - /dev/urandom without warnings
> 110 - blocking secure
> 111 - -EAGAIN or secure
>
Hmmm, the point of the new syscall was **exactly** to avoid the 2^3
combinations above, and to provide developers only two, sane and easy,
options:
- GRND2_INSECURE
- GRND2_SECURE_UNBOUNDED_INITIAL_WAIT
You *must* pick one of these, and that's it. (!)
Then the proposed getrandom_wait(7) manpage, also mentioned in the V4
patch WARN message, would provide a big rationale, and encourage
everyone to use the new getrandom2(2) syscall instead.
But yeah, maybe we should add the extra flags to the old getrandom()
instead, and let glibc implement a getrandom_safe(3) wrapper only
with the sane options available.
Problem is, glibc is still *really* slow in adopting linux syscall
wrappers, so I'm not optimistic about that...
I still see the new system call as the sanest path, even provided
the cost of a new syscall number..
@Linus, @Ted: Final thoughts?
> and people would be encouraged to use one of these three:
>
> - GRND_INSECURE
> - GRND_SECURE
> - GRND_SECURE | GRND_NONBLOCK
>
> all of which actually make sense, and none of which have any
> ambiguity. And while "GRND_INSECURE | GRND_NONBLOCK" works, it's
> exactly the same as just plain GRND_INSECURE - the point is that it
> doesn't block for entropy anyway, so non-blocking makes no different.
>
[...]
>
> There is *one* other small semantic change: The old code did
> urandom_read() which added warnings, but each warning also _reset_ the
> crng_init_cnt. Until it decided not to warn any more, at which point
> it also stops that resetting of crng_init_cnt.
>
> And that reset of crng_init_cnt, btw, is some cray cray.
>
> It's basically a "we used up entropy" thing, which is very
> questionable to begin with as the whole discussion has shown, but
> since it stops doing it after 10 cases, it's not even good security
> assuming the "use up entropy" case makes sense in the first place.
>
> So I didn't copy that insanity either. And I'm wondering if removing
> it from /dev/urandom might also end up helping Ahmed's case of getting
> entropy earlier, when we don't reset the counter.
>
Yeah, noticed that, but I've learned not to change crypto or
speculative-execution code even if the changes "just look the same" at
first glance ;-)
(out of curiosity, I'll do a quick test with this CRNG entropy reset
part removed. Maybe it was indeed part of the problem..)
> But other than those two details, none of the existing semantics
> changed, we just added the three actually _sane_ cases without any
> ambiguity.
>
> In particular, this still leaves the semantics of that nasty
> "getrandom(0)" as the same "blocking urandom" that it currently is.
> But now it's a separate case, and we can make that perhaps do the
> timeout, or at least the warning.
>
Yeah, I would propose to keep the V4-submitted "timeout then WARN"
logic. This alone will give user-space / distributions time to adapt.
For example, it was interesting that even the 0day bot had limited
entropy on boot (virtio-rng / TRUST_CPU not enabled):
https://lkml.kernel.org/r/20190920005120.GP15734@shao2-debian
If user-space didn't get its act together, then the other extreme
measures can be implemented later (the getrandom() length test, using
jitter as a credited kernel entropy source, etc., etc.)
> And the new cases are defined to *not* warn. In particular,
> GRND_INSECURE very much does *not* warn about early urandom access
> when crng isn't ready. Because the whole point of that new mode is
> that the user knows it isn't secure.
>
> So that should make getrandom(GRND_INSECURE) palatable to the systemd
> kind of use that wanted to avoid the pointless kernel warning.
>
Yup, that's what was in the submitted V4 patch too. The caller
explicitly asked for "insecure", so they know what they're doing.
getrandom2(2) never prints any kernel message.
> And we could mark this for stable and try to get it backported so that
> it will have better coverage, and encourage people to use the new sane
> _explicit_ waiting (or not) for entropy.
>
ACK. I'll wait for an answer to the "Final thoughts?" question above,
send a V5 with CC:stable, then disappear from this thread ;-)
Thanks a lot everyone!
--
Ahmed Darwish
On Fri, Sep 20, 2019 at 6:46 AM Ahmed S. Darwish <[email protected]> wrote:
>
> Hi,
>
> On Wed, Sep 18, 2019 at 04:57:58PM -0700, Linus Torvalds wrote:
> > On Wed, Sep 18, 2019 at 2:17 PM Ahmed S. Darwish <[email protected]> wrote:
> > >
> > > Since Linux v3.17, getrandom(2) has been created as a new and more
> > > secure interface for pseudorandom data requests. It attempted to
> > > solve three problems, as compared to /dev/urandom:
> >
> > I don't think your patch is really _wrong_, but I think it's silly to
> > introduce a new system call, when we have 30 bits left in the flags of
> > the old one, and the old system call checked them.
> >
> > So it's much simpler and more straightforward to just introduce a
> > single new bit #2 that says "I actually know what I'm doing, and I'm
> > explicitly asking for secure/insecure random data".
> >
> > And then say that the existing bit #1 just means "I want to wait for entropy".
> >
> > So then you end up with this:
> >
> > /*
> > * Flags for getrandom(2)
> > *
> > * GRND_NONBLOCK Don't block and return EAGAIN instead
> > * GRND_WAIT_ENTROPY Explicitly wait for entropy
> > * GRND_EXPLICIT Make it clear you know what you are doing
> > */
> > #define GRND_NONBLOCK 0x0001
> > #define GRND_WAIT_ENTROPY 0x0002
> > #define GRND_EXPLICIT 0x0004
What is this GRND_EXPLICIT thing?
A few weeks ago, I sent a whole series to address this, and I
obviously didn't cc enough people. I'll resend a rebased version
today. Meanwhile, some comments on this whole mess:
As I think everyone mostly agrees in this whole thread, getrandom()
can't just magically start returning non-random results. That would
be a big problem.
Linus, I disagree that blocking while waiting for randomness is an
error. Sometimes you want to generate a key, you want to finish as
quickly as possible, and you don't want to be in the business of
fiddling with the setup of the kernel RNG. I would argue that *most*
crypto applications are in this category. I think that the kernel
should, instead, handle this mess itself. As a first pass, it could
be as simple as noticing that someone is blocking on randomness and
kicking off a thread that does some randomish reads to the rootfs.
This would roughly simulate the old behavior in which an ext4 rootfs
did more IO than necessary. A fancier version would, as discussed in
this thread, do more clever things.
(As an aside, I am not a fan of xoring or adding stuff to the CRNG
state. We should just use an actual crypto primitive for this.
Accumulate the state in a buffer and SHA-512 it. Or use something
like the Keccak duplex sponge. But this is a discussion for another
day.)
So I'm going to resend my series. You can all fight over whether the
patch that actually goes in should be based on my series or based on
this patch.
--Andy
On Fri, Sep 20, 2019 at 7:34 AM Andy Lutomirski <[email protected]> wrote:
>
> What is this GRND_EXPLICIT thing?
Your own email gives the explanation:
> Linus, I disagree that blocking while waiting for randomness is an
> error. Sometimes you want to generate a key
That's *exactly* why GRND_EXPLICIT needs to be done regardless.
The keyword there is "Sometimes".
But people currently use "getrandom(0)" when they DO NOT want a key,
they just want some miscellaneous random numbers for some totally
non-security-related reason.
And that will continue. Exactly because the people who do not want a
key by definition aren't thinking about it very hard.
So the interface was very much mis-designed from the get-go. It was
designed purely for key people, even though generating keys is by no
means the most common reason for wanting a block of "random" numbers.
So GRND_EXPLICIT is there very much to make sure people who want true
secure keys will say so, and five years from now we will not have the
confusion between "Oh, I wasn't thinking about bootup". Because at a
minimum, in the near future getrandom(0) will warn about the
ambiguity. Or it will use some questionable jitter entropy that some
real key users will look at sideways and go "I don't want that".
This is an ABI design issue. The old ABI was fundamentally misdesigned
and actively encouraged the current situation of mixing secure and
insecure callers for that getrandom(0).
And it's entirely orthogonal to _any_ actual technical change we will
do (like removing the old GRND_RANDOM behavior entirely, which is
insane for other reasons and nobody ever wanted or likely used).
Linus
Hi Ahmed,
On Fri, Sep 20, 2019 at 03:46:09PM +0200, Ahmed S. Darwish wrote:
> Problem is, glibc is still *really* slow in adopting linux syscall
> wrappers, so I'm not optimistic about that...
>
> I still see the new system call as the sanest path, even provided
> the cost of a new syscall number..
New syscalls are always a pain to deal with in userland, because when
they are introduced, everyone wants them long before they're available
in glibc. So userland has to define NR_xxx for each supported arch and
to perform the call itself.
With flags adoption is instantaneous. Just #ifndef/#define, check if
the flag is supported and that's done. The only valid reason for a new
syscall is when the API changes (e.g. one extra arg, a la accept4()),
which doesn't seem to be the case here. Otherwise please by all means
avoid this in general.
Thanks,
Willy
On Fri, Sep 20, 2019 at 9:30 AM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, Sep 20, 2019 at 7:34 AM Andy Lutomirski <[email protected]> wrote:
> >
> > What is this GRND_EXPLICIT thing?
>
> Your own email gives the explanation:
>
> > Linus, I disagree that blocking while waiting for randomness is an
> > error. Sometimes you want to generate a key
>
> That's *exactly* why GRND_EXPLICIT needs to be done regardless.
>
> The keyword there is "Sometimes".
>
> But people currently use "getrandom(0)" when they DO NOT want a key,
> they just want some miscellaneous random numbers for some totally
> non-security-related reason.
>
> And that will continue. Exactly because the people who do not want a
> key by definition aren't thinking about it very hard.
I fully agree that this is a problem. It's a problem we brought on
ourselves because we screwed up the ABI from the beginning. The
question is what to do about it that doesn't cause its own set of
nasty problems.
> So GRND_EXPLICIT is there very much to make sure people who want true
> secure keys will say so, and five years from now we will not have the
> confusion between "Oh, I wasn't thinking about bootup". Because at a
> minimum, in the near future getrandom(0) will warn about the
> ambiguity. Or it will use some questionable jitter entropy that some
> real key users will look at sideways and go "I don't want that".
There are programs that call getrandom(0) *today* that expect secure
output. openssl does a horrible dance in which it calls getentropy()
if available and falls back to syscall(__NR_getrandom, buf, buflen, 0)
otherwise. We can't break this use case. Changing the semantics of
getrandom(0) out from under them seems like the worst kind of ABI
break -- existing applications will *appear* to continue working but
will, in fact, become insecure.
IMO, from the beginning, we should have done this:
GRND_INSECURE: insecure. always works.
GRND_SECURE_BLOCKING: does exactly what it says.
0: -EINVAL.
Using it correctly would be obvious. Something like GRND_EXPLICIT
would be a head-scratcher: people would have to look at the man page
and actually think about it, and it's still easy to get wrong:
getrandom(..., GRND_EXPLICIT): just fscking give me a number. it
seems to work and it shuts up the warning
And we're back to square one.
I think that, given existing software, we should make two or three
changes to fix the basic problems here:
1. Add GRND_INSECURE: at least let new applications do the right thing
going forward.
2. Fix what is arguably a straight up kernel bug, not even an ABI
issue: when a user program is blocking in getrandom(..., 0), the
kernel happily sits there doing absolutely nothing and deadlocks the
system as a result. This IMO isn't an ABI issue -- it's an
implementation problem. How about we make getrandom() (probably
actually wait_for_random_bytes()) do something useful to try to seed
the RNG if the system is otherwise not doing IO.
3. Optionally, entirely in user code: Get glibc to add new *library*
functions: getentropy_secure_blocking() and getentropy_insecure() or
whatever they want to call them. Deprecate getentropy().
I think #2 is critical. Right now, suppose someone has a system that
neets to do a secure network request (a la Red Hat's Clevis). I have
no idea what Clevis actually does, but it wouldn't be particularly
crazy to do a DH exchange or sign with an EC key to ask some network
server to help unlock a dm-crypt volume. If the system does this at
boot, it needs to use getrandom(..., 0), GRND_EXPLICIT, or whatever,
because it NEEDS a secure random number. No about of ABI fiddling
will change this. The kernel should *work* in this case rather than
deadlocking.
--Andy
On Fri, Sep 20, 2019 at 07:26:09PM +0200, Willy Tarreau wrote:
> Hi Ahmed,
>
> On Fri, Sep 20, 2019 at 03:46:09PM +0200, Ahmed S. Darwish wrote:
> > Problem is, glibc is still *really* slow in adopting linux syscall
> > wrappers, so I'm not optimistic about that...
> >
> > I still see the new system call as the sanest path, even provided
> > the cost of a new syscall number..
>
> New syscalls are always a pain to deal with in userland, because when
> they are introduced, everyone wants them long before they're available
> in glibc. So userland has to define NR_xxx for each supported arch and
> to perform the call itself.
>
> With flags adoption is instantaneous. Just #ifndef/#define, check if
> the flag is supported and that's done. The only valid reason for a new
> syscall is when the API changes (e.g. one extra arg, a la accept4()),
> which doesn't seem to be the case here. Otherwise please by all means
> avoid this in general.
>
I see. Thanks a lot for the explanation above :)
--
Ahmed Darwish
On Fri, Sep 20, 2019 at 10:52 AM Andy Lutomirski <[email protected]> wrote:
>
> IMO, from the beginning, we should have done this:
>
> GRND_INSECURE: insecure. always works.
>
> GRND_SECURE_BLOCKING: does exactly what it says.
>
> 0: -EINVAL.
Violently agreed. And that's kind of what the GRND_EXPLICIT is really
aiming for.
However, it's worth noting that nobody should ever use GRND_EXPLICIT
directly. That's just the name for the bit. The actual users would use
GRND_INSECURE or GRND_SECURE.
And yes, maybe it's worth making the name be GRND_SECURE_BLOCKING just
to make people see what the big deal is.
In the meantime, we need that new bit just to be able to create the
new semantics eventually. With a warning to nudge people in the right
direction.
We may never be able to return -EINVAL, but we can add the pr_notice()
to discourage people from using it.
And yes, we'll have to block - at least for a time - to get some
entropy. But at some point we either start making entropy up, or we
say "0 means jitter-entropy for ten seconds".
That will _work_, but it will also make the security-people nervous,
which is just one more hint that they should move to
GRND_SECURE[_BLOCKING].
> getrandom(..., GRND_EXPLICIT): just fscking give me a number. it
> seems to work and it shuts up the warning
>
> And we're back to square one.
Actually, you didn't read the GRND_INSECURE patch, did you.
getrandom(GRND_EXPLICIT) on its own returns -EINVAL.
Because yes, I thought about it, and yes, I agree that it's the same
as the old 0.
So GRND_EXPLICIT is a bit that basically means "I am explicit about
what behavior I want". But part of that is that you need to _state_
the behavior too.
So:
- GRND_INSECURE is (GRND_EXPLICIT | GRND_NONBLOCK)
As in "I explicitly ask you not to just not ever block": urandom
- GRND_SECURE_BLOCKING is (GRND_EXPLICIT | GRND_RANDOM)
As in "I explicitly ask you for those secure random numbers"
- GRND_SECURE_NONBLOCKING is (GRND_EXPLICIT | GRND_RANDOM | GRND_NONBLOCK)
As in "I want explicitly secure random numbers, but return -EAGAIN
if that would block".
Which are the three sane behaviors (that last one is useful for the "I
can try to generate entropy if you don't have any" case. I'm not sure
anybody will do it, but it definitely conceptually makes sense).
And I agree that your naming is better.
I had it as just "GRND_SECURE" for the blocking version, and
"GRND_SECURE | GRND_NONBLOCK" for the "secure but return EAGAIN if you
would need to block for entropy" version.
But explicitly stating the blockingness in the name makes it clearer
to the people who just want GRND_INSECURE, and makes them realize that
they don't want the blocking version.
Linus
Hi Andy,
On Fri, Sep 20, 2019 at 10:52:30AM -0700, Andy Lutomirski wrote:
> 2. Fix what is arguably a straight up kernel bug, not even an ABI
> issue: when a user program is blocking in getrandom(..., 0), the
> kernel happily sits there doing absolutely nothing and deadlocks the
> system as a result. This IMO isn't an ABI issue -- it's an
> implementation problem. How about we make getrandom() (probably
> actually wait_for_random_bytes()) do something useful to try to seed
> the RNG if the system is otherwise not doing IO.
I thought about it as well with my old MSDOS reflexes, but here I
doubt we can do a lot. It seems fishy to me to start to fiddle with
various drivers from within a getrandom() syscall, we could sometimes
even end up waiting even longer because one device is already locked,
and when we have access there there's not much we can do without
risking to cause some harm. On desktop systems you have a bit more
choice than on headless systems (blink keyboard leds and time the
interrupts, run some disk accesses when there's still a disk, get a
copy of the last buffer of the audio input and/or output, turn on
the microphone and/or webcam, and collect some data). Many of them
cannot always be used. We could do some more portable stuff like scan
and hash the totality of the RAM. But that's all quite bad and
unreliable and at this point it's better to tell userland "here's
what I could get for you, if you want better, do it yourself" and the
userland can then ask the user "dear user, I really need valid entropy
this time to generate your GPG key, please type frantically on this
keyboard". And it will be more reliable this way in my opinion.
My analysis of the problem precisely lies in the fact that we've
always considered that the kernel had to provide randoms for any
use case and had to cover the most difficult cases and imposed
their constraints on simplest ones. Better let the application
decide.
Willy
On Fri, Sep 20, 2019 at 11:09:53AM -0700, Linus Torvalds wrote:
(...)
> So:
>
> - GRND_INSECURE is (GRND_EXPLICIT | GRND_NONBLOCK)
>
> As in "I explicitly ask you not to just not ever block": urandom
>
> - GRND_SECURE_BLOCKING is (GRND_EXPLICIT | GRND_RANDOM)
>
> As in "I explicitly ask you for those secure random numbers"
>
> - GRND_SECURE_NONBLOCKING is (GRND_EXPLICIT | GRND_RANDOM | GRND_NONBLOCK)
>
> As in "I want explicitly secure random numbers, but return -EAGAIN
> if that would block".
>
> Which are the three sane behaviors (that last one is useful for the "I
> can try to generate entropy if you don't have any" case. I'm not sure
> anybody will do it, but it definitely conceptually makes sense).
>
> And I agree that your naming is better.
>
> I had it as just "GRND_SECURE" for the blocking version, and
> "GRND_SECURE | GRND_NONBLOCK" for the "secure but return EAGAIN if you
> would need to block for entropy" version.
>
> But explicitly stating the blockingness in the name makes it clearer
> to the people who just want GRND_INSECURE, and makes them realize that
> they don't want the blocking version.
I really like it this way. Explicit and full control for the application
plus reasonable backwards compatibility, it sounds pretty good.
Willy
20.09.2019 22:52, Andy Lutomirski пишет:
> I think that, given existing software, we should make two or three
> changes to fix the basic problems here:
>
> 1. Add GRND_INSECURE: at least let new applications do the right thing
> going forward.
>
> 2. Fix what is arguably a straight up kernel bug, not even an ABI
> issue: when a user program is blocking in getrandom(..., 0), the
> kernel happily sits there doing absolutely nothing and deadlocks the
> system as a result. This IMO isn't an ABI issue -- it's an
> implementation problem. How about we make getrandom() (probably
> actually wait_for_random_bytes()) do something useful to try to seed
> the RNG if the system is otherwise not doing IO.
>
> 3. Optionally, entirely in user code: Get glibc to add new *library*
> functions: getentropy_secure_blocking() and getentropy_insecure() or
> whatever they want to call them. Deprecate getentropy().
>
> I think #2 is critical. Right now, suppose someone has a system that
> neets to do a secure network request (a la Red Hat's Clevis). I have
> no idea what Clevis actually does, but it wouldn't be particularly
> crazy to do a DH exchange or sign with an EC key to ask some network
> server to help unlock a dm-crypt volume. If the system does this at
> boot, it needs to use getrandom(..., 0), GRND_EXPLICIT, or whatever,
> because it NEEDS a secure random number. No about of ABI fiddling
> will change this. The kernel should *work* in this case rather than
> deadlocking.
Let me express a little bit of disagreement with the logic here.
I do agree that #2 is critical, and the Clevis use case is a perfect
example why it is important. I doubt that it is solvable without
trusting jitter entropy, or without provoking a dummy read on a random
block device, just for timings, or maybe some other interaction with the
external world - but Willy already said "it seems fishy". However, _if_
it is solved, then we don't need GRND_INSECURE, because solving #2 is
equivalent to magically making secure random numbers always available.
--
Alexander E. Patrakov
> On Sep 20, 2019, at 11:15 AM, Alexander E. Patrakov <[email protected]> wrote:
>
> 20.09.2019 22:52, Andy Lutomirski пишет:
>> I think that, given existing software, we should make two or three
>> changes to fix the basic problems here:
>> 1. Add GRND_INSECURE: at least let new applications do the right thing
>> going forward.
>> 2. Fix what is arguably a straight up kernel bug, not even an ABI
>> issue: when a user program is blocking in getrandom(..., 0), the
>> kernel happily sits there doing absolutely nothing and deadlocks the
>> system as a result. This IMO isn't an ABI issue -- it's an
>> implementation problem. How about we make getrandom() (probably
>> actually wait_for_random_bytes()) do something useful to try to seed
>> the RNG if the system is otherwise not doing IO.
>> 3. Optionally, entirely in user code: Get glibc to add new *library*
>> functions: getentropy_secure_blocking() and getentropy_insecure() or
>> whatever they want to call them. Deprecate getentropy().
>> I think #2 is critical. Right now, suppose someone has a system that
>> neets to do a secure network request (a la Red Hat's Clevis). I have
>> no idea what Clevis actually does, but it wouldn't be particularly
>> crazy to do a DH exchange or sign with an EC key to ask some network
>> server to help unlock a dm-crypt volume. If the system does this at
>> boot, it needs to use getrandom(..., 0), GRND_EXPLICIT, or whatever,
>> because it NEEDS a secure random number. No about of ABI fiddling
>> will change this. The kernel should *work* in this case rather than
>> deadlocking.
>
> Let me express a little bit of disagreement with the logic here.
>
> I do agree that #2 is critical, and the Clevis use case is a perfect example why it is important. I doubt that it is solvable without trusting jitter entropy, or without provoking a dummy read on a random block device, just for timings, or maybe some other interaction with the external world - but Willy already said "it seems fishy". However, _if_ it is solved, then we don't need GRND_INSECURE, because solving #2 is equivalent to magically making secure random numbers always available.
>
>
I beg to differ. There is a big difference between “do your best *right now*” and “give me a real secure result in a vaguely timely manner”.
For example, the former is useful for ASLR or hash table randomization. The latter is not.
> On Sep 20, 2019, at 11:10 AM, Linus Torvalds <[email protected]> wrote:
>
> On Fri, Sep 20, 2019 at 10:52 AM Andy Lutomirski <[email protected]> wrote:
>>
>> IMO, from the beginning, we should have done this:
>>
>> GRND_INSECURE: insecure. always works.
>>
>> GRND_SECURE_BLOCKING: does exactly what it says.
>>
>> 0: -EINVAL.
>
> Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> aiming for.
>
> However, it's worth noting that nobody should ever use GRND_EXPLICIT
> directly. That's just the name for the bit. The actual users would use
> GRND_INSECURE or GRND_SECURE.
>
> And yes, maybe it's worth making the name be GRND_SECURE_BLOCKING just
> to make people see what the big deal is.
>
> In the meantime, we need that new bit just to be able to create the
> new semantics eventually. With a warning to nudge people in the right
> direction.
>
> We may never be able to return -EINVAL, but we can add the pr_notice()
> to discourage people from using it.
>
The problem is that new programs will have to try the new flag value
and, if it returns -EINVAL, fall back to 0. This isn't so great.
> And yes, we'll have to block - at least for a time - to get some
> entropy. But at some point we either start making entropy up, or we
> say "0 means jitter-entropy for ten seconds".
>
> That will _work_, but it will also make the security-people nervous,
> which is just one more hint that they should move to
> GRND_SECURE[_BLOCKING].
Wait, are you suggesting that 0 means invoke jitter-entropy or
whatever and GRND_SECURE_BLOCKING means not wait forever and deadlock?
That's no good -- people will want to continue using 0 because the
behavior is better. My point here is that asking for secure random
numbers isn’t some legacy oddity — it’s genuinely necessary. The
kernel should do whatever it needs to in order to make it work. We
really don’t want a situation where 0 means get me secure random
numbers reliably but spam the logs and GRND_SECURE_BLOCKING means
don’t spam the logs but risk deadlocking. This will encourage people
to pass 0 to get the improved behavior.
> So GRND_EXPLICIT is a bit that basically means "I am explicit about
> what behavior I want". But part of that is that you need to _state_
> the behavior too.
>
> So:
>
> - GRND_INSECURE is (GRND_EXPLICIT | GRND_NONBLOCK)
>
> As in "I explicitly ask you not to just not ever block": urandom
IMO this is confusing. The GRND_RANDOM flag was IMO a mistake and
should just be retired. Let's enumerate useful cases and then give
them sane values.
>
> - GRND_SECURE_BLOCKING is (GRND_EXPLICIT | GRND_RANDOM)
>
> As in "I explicitly ask you for those secure random numbers"
>
> - GRND_SECURE_NONBLOCKING is (GRND_EXPLICIT | GRND_RANDOM | GRND_NONBLOCK)
>
> As in "I want explicitly secure random numbers, but return -EAGAIN
> if that would block".
>
> Which are the three sane behaviors (that last one is useful for the "I
> can try to generate entropy if you don't have any" case. I'm not sure
> anybody will do it, but it definitely conceptually makes sense).
>
> And I agree that your naming is better.
I think this is the complete list of "good" behaviors for new programs:
"insecure": always works, never warns.
"secure, blocking": always returns *eventually* with secure output,
i.e., does something to avoid deadlocks
"secure, nonblocking" returns secure output immediately or returns -EAGAIN.
And the only real question is how to map existing users to these
semantics. I see two sensible choices:
1. 0 means "secure, blocking". I think this is not what we'd do if we
could go back in time and chage the ABI from day 1, but I think it's
actually good enough. As long as this mode won't deadlock, it's not
*that* bad if programs are using it when they wanted "insecure".
2. 0 means "secure, blocking, but warn". Some new value means
"secure, blocking, don't warn". The problem is that new applications
will have to fall back to 0 to continue supporting old kernels.
I briefly thought that maybe GRND_RANDOM would be a reasonable choice
for "secure, blocking, don't warn", but the effect on new programs on
old kernels will be unfortunate.
I'm willing to go along with #2 if you like it better than #1, and
I'll update my patches accordingly, but I prefer #1.
I do think we should make all the ABI changes that we want to make all
in one release. Let's not make programs think about their behavior on
more versions than necessary. So I'd like to get rid of the current
/dev/random semantics, add "insecure" mode, and do whatever deadlock
avoidance scheme we settle on in a single release.
--Andy
On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
> Perhaps userland could register a helper that takes over and does
> something better?
If userland sees the failure it can do whatever the developer/distro
packager thought suitable for the system facing this condition.
> But I think the kernel really should do something
> vaguely reasonable all by itself.
Definitely, that's what Linus' proposal was doing. Sleeping for some time
is what I call "vaguely reasonable".
> If nothing else, we want the ext4
> patch that provoked this whole discussion to be applied,
Oh absolutely!
> which means
> that we need to unbreak userspace somehow, and returning garbage it to
> is not a good choice.
It depends how it's used. I'd claim that we certainly use randoms for
other things (such as ASLR/hashtables) *before* using them to generate
long lived keys thus we can have a bit more time to get some more
entropy before reaching the point of producing these keys.
> Here are some possible approaches that come to mind:
>
> int count;
> while (crng isn't inited) {
> msleep(1);
> }
>
> and modify add_timer_randomness() to at least credit a tiny bit to
> crng_init_cnt.
Without a timeout it's sure we'll still face some situations where
it blocks forever, which is the current problem.
> Or we do something like intentionally triggering readahead on some
> offset on the root block device.
You don't necessarily have such a device, especially when you're
in an initramfs. It's precisely where userland can be smarter. When
the caller is sfdisk for example, it does have more chances to try
to perform I/O than when it's a tiny http server starting to present
a configuration page.
> We should definitely not trigger *blocking* IO.
I think I agree.
> Also, I wonder if the real problem preventing the RNG from staring up
> is that the crng_init_cnt threshold is too high. We have a rather
> baroque accounting system, and it seems like we can accumulate and
> credit entropy for a very long time indeed without actually
> considering ourselves done.
I have no opinion on this, lacking the skills to evaluate the situation.
What I can say for sure is that I've faced the non-booting issue quite a
number of times on headless systems, and conversely in the 2.4 era, my
front reverse-proxy by then had the same SSH key as 89 other machines on
the net. So there's surely a sweet spot to find between those two extremes.
I tend to think that waiting *a little bit* for the *first* random is
acceptable, even 10-15s, by the time the user starts to think about
pressing the reset button the system might finish to boot. Hashing some
RAM locations and the RTC when present can also help a little bit. If
at least my machine by then had combined the RTC's date and time with
the hash, chances for a key collision would have gone down to one over
many thousands.
Willy
On Fri, Sep 20, 2019 at 12:12 PM Andy Lutomirski <[email protected]> wrote:
>
> The problem is that new programs will have to try the new flag value
> and, if it returns -EINVAL, fall back to 0. This isn't so great.
Don't be silly.
Of course they will do that, but so what? With a new kernel, they'll
get the behavior they expect. And with an old kernel, they'll get the
behavior they expect.
They'd never fall back to to "0 means something I didn't want",
exactly because we'd make this new flag be the first change.
> Wait, are you suggesting that 0 means invoke jitter-entropy or
> whatever and GRND_SECURE_BLOCKING means not wait forever and deadlock?
> That's no good -- people will want to continue using 0 because the
> behavior is better.
I assume that "not wait forever" was meant to be "wait forever".
So the one thing we have to do is break the "0 waits forever". I
guarantee that will happen. I will override Ted if he just NAk's it,
because we simply _cannot_ continue with it.
So we absolutely _will_ come up with some way 0 ends the wait. Whether
it's _just_ a timeout, or whether it's jitter-entropy or whatever, it
will happen.
But we'll also make getrandom(0) do the annoying warning, because it's
just ambiguous. And I suspect you'll find that a lot of security
people don't really like jitter-entropy, at least not in whatever
cut-down format we'll likely have to use in the kernel.
And we'll also have to make getrandom(0) be really _timely_. Security
people would likely rather wait for minutes before they are happy with
it. But because it's a boot constraint as things are now, it will not
just be jitter-entropy, it will be _accelerated_ jitter-entropy in 15
seconds or whatever, and since it can't use up all of CPU time, it's
realistically more like "15 second timeout, but less of actual CPU
time for jitter".
We can try to be clever with a background thread and a lot of
yielding(), so that if the CPU is actually idle we'll get most of that
15 seconds for whatever jitter, but end result is that it's still
accelerated.
Do I believe we can do a good job in that kind of timeframe?
Absolutely. The whole point should be that it's still "good enough",
and as has been pointed out, that same jitter entropy that people are
worried about is just done in user space right now instead.
But do I believe that security people would prefer a non-accelerated
GRND_SECURE_BLOCKING? Yes I do. That doesn't mean that
GRND_SECURE_BLOCKING shouldn't use jitter entropy too, but it doesn't
need the same kind of "let's hurry this up because it might be during
early boot and block things".
That said, if we can all convince everybody (hah!) that jitter entropy
in the kernel would be sufficient, then we can make the whole point
entirely moot, and just say "we'll just change crng_wait() to do
jitter entropy instead and be done with it. Then any getrandom() user
will just basically wait for a (very limited) time and the system will
be happy.
If that is the case we wouldn't need new flags at all. But I don't
think you can make everybody agree to that, which is why I suspect
we'll need the new flag, and I'll just take the heat for saying "0 is
now off limits, because it does this thing that a lot of people
dislike".
> IMO this is confusing. The GRND_RANDOM flag was IMO a mistake and
> should just be retired. Let's enumerate useful cases and then give
> them sane values.
That's basically what I'm doing. I enumerate the new values.
But the enumerations have hidden meaning, because the actual bits do
matter. The GRND_EXPLICIT bit isn't supposed to be used by any user,
but it has the value it has because it makes old kernels return
-EINVAL.
But if people hate the bit names, we can just do an enum and be done with it:
enum grnd_flags {
GRND_NONBLOCK = 1,
GRND_RANDOM, // Don't use!
GRND_RANDOM_NONBLOCK, // Don't use
GRND_UNUSED,
GRND_INSECURE,
GRND_SECURE_BLOCKING,
GRND_SECURE_NONBLOCKING,
};
but the values now have a _hidden_ pattern (because we currently have
that "| GRND_NONBLOCK" pattern that I want to make sure still
continues to work, rather than give unexpected behavior in case
somebody continues to use it).
So the _only_ difference between the above and what I suggested is
that I made the bit pattern explicit rather than hidden in the value.
> And the only real question is how to map existing users to these
> semantics. I see two sensible choices:
>
> 1. 0 means "secure, blocking". I think this is not what we'd do if we
> could go back in time and chage the ABI from day 1, but I think it's
> actually good enough. As long as this mode won't deadlock, it's not
> *that* bad if programs are using it when they wanted "insecure".
It's exactly that "as long as it won't deadlock" that is our current problem.
It *does* deadlock.
So it can't mean "blocking" in any long-term meaning.
It can mean "blocks for up to 15 seconds" or something like that. I'd
honestly prefer a smaller number, but I think 15 seconds is an
acceptable "your user space is buggy, but we won't make you think the
machine hung".
> 2. 0 means "secure, blocking, but warn". Some new value means
> "secure, blocking, don't warn". The problem is that new applications
> will have to fall back to 0 to continue supporting old kernels.
The same comment about blocking.
Maybe you came in in the middle, and didn't see the whole "reduced IO
patterns means that boot blocks forever" part of the original problem.
THAT is why 0 will absolutely change behaviour.
Linus
> On Sep 20, 2019, at 12:37 PM, Willy Tarreau <[email protected]> wrote:
>
> On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
>> Perhaps userland could register a helper that takes over and does
>> something better?
>
> If userland sees the failure it can do whatever the developer/distro
> packager thought suitable for the system facing this condition.
>
>> But I think the kernel really should do something
>> vaguely reasonable all by itself.
>
> Definitely, that's what Linus' proposal was doing. Sleeping for some time
> is what I call "vaguely reasonable".
I don’t buy it. We have existing programs that can deadlock on boot. Just throwing -EAGAIN at them in a syscall that didn’t previously block does not strike me as reasonable.
>
>> If nothing else, we want the ext4
>> patch that provoked this whole discussion to be applied,
>
> Oh absolutely!
>
>> which means
>> that we need to unbreak userspace somehow, and returning garbage it to
>> is not a good choice.
>
> It depends how it's used. I'd claim that we certainly use randoms for
> other things (such as ASLR/hashtables) *before* using them to generate
> long lived keys thus we can have a bit more time to get some more
> entropy before reaching the point of producing these keys.
The problem is that we don’t know what userspace is doing with the output from getrandom(..., 0), so I think we have to be conservative. New kernels need to work with old user code. It’s okay if they’re slower to boot than they could be.
>
>> Here are some possible approaches that come to mind:
>>
>> int count;
>> while (crng isn't inited) {
>> msleep(1);
>> }
>>
>> and modify add_timer_randomness() to at least credit a tiny bit to
>> crng_init_cnt.
>
> Without a timeout it's sure we'll still face some situations where
> it blocks forever, which is the current problem.
The point is that we keep the timer running by looping like this, which should cause add_timer_randomness() to get called continuously, which should prevent the deadlock. I assume the deadlock is because we go into nohz-idle and we sit there with nothing happening at all.
>
>> Or we do something like intentionally triggering readahead on some
>> offset on the root block device.
>
> You don't necessarily have such a device, especially when you're
> in an initramfs. It's precisely where userland can be smarter. When
> the caller is sfdisk for example, it does have more chances to try
> to perform I/O than when it's a tiny http server starting to present
> a configuration page.
What I mean is: allow user code to register a usermode helper that helps get entropy. Or just convince distros to bundle some useful daemon that starts at early boot and lives in the initramfs.
On Fri, Sep 20, 2019 at 11:12 AM Willy Tarreau <[email protected]> wrote:
>
> Hi Andy,
>
> On Fri, Sep 20, 2019 at 10:52:30AM -0700, Andy Lutomirski wrote:
> > 2. Fix what is arguably a straight up kernel bug, not even an ABI
> > issue: when a user program is blocking in getrandom(..., 0), the
> > kernel happily sits there doing absolutely nothing and deadlocks the
> > system as a result. This IMO isn't an ABI issue -- it's an
> > implementation problem. How about we make getrandom() (probably
> > actually wait_for_random_bytes()) do something useful to try to seed
> > the RNG if the system is otherwise not doing IO.
>
> I thought about it as well with my old MSDOS reflexes, but here I
> doubt we can do a lot. It seems fishy to me to start to fiddle with
> various drivers from within a getrandom() syscall, we could sometimes
> even end up waiting even longer because one device is already locked,
> and when we have access there there's not much we can do without
> risking to cause some harm. On desktop systems you have a bit more
> choice than on headless systems (blink keyboard leds and time the
> interrupts, run some disk accesses when there's still a disk, get a
> copy of the last buffer of the audio input and/or output, turn on
> the microphone and/or webcam, and collect some data). Many of them
> cannot always be used. We could do some more portable stuff like scan
> and hash the totality of the RAM. But that's all quite bad and
> unreliable and at this point it's better to tell userland "here's
> what I could get for you, if you want better, do it yourself" and the
> userland can then ask the user "dear user, I really need valid entropy
> this time to generate your GPG key, please type frantically on this
> keyboard". And it will be more reliable this way in my opinion.
Perhaps userland could register a helper that takes over and does
something better? But I think the kernel really should do something
vaguely reasonable all by itself. If nothing else, we want the ext4
patch that provoked this whole discussion to be applied, which means
that we need to unbreak userspace somehow, and returning garbage it to
is not a good choice.
Here are some possible approaches that come to mind:
int count;
while (crng isn't inited) {
msleep(1);
}
and modify add_timer_randomness() to at least credit a tiny bit to
crng_init_cnt.
Or we do something like intentionally triggering readahead on some
offset on the root block device. We should definitely not trigger
*blocking* IO.
Also, I wonder if the real problem preventing the RNG from staring up
is that the crng_init_cnt threshold is too high. We have a rather
baroque accounting system, and it seems like we can accumulate and
credit entropy for a very long time indeed without actually
considering ourselves done.
--Andy
On Fri, Sep 20, 2019 at 12:22 PM Andy Lutomirski <[email protected]> wrote:
>
> Here are some possible approaches that come to mind:
>
> int count;
> while (crng isn't inited) {
> msleep(1);
> }
>
> and modify add_timer_randomness() to at least credit a tiny bit to
> crng_init_cnt.
I'd love that, but we don't actually call add_timer_randomness() for timers.
Yeah, the name is misleading.
What the "timer" in add_timer_randomness() means is that we look at
the timing between calls. And we may actually have (long ago) called
it for timer interrupts. But we don't any more.
The only actual users of add_timer_randomness() is
add_input_randomness() and add_disk_randomness(). And it turns out
that even disk IO doesn't really call add_disk_randomness(), so the
only _real_ user is that keyboard input thing.
Which means that unless you sit at the machine and type things in,
add_timer_randomness() _never_ gets called.
No, the real source of entropy right now is
add_interrupt_randomness(), which is called for all device interrupts.
But note the "device interrupts" part. Not the timer interrupt. That's
special, and has its own low-level architecture rules. So only the
normal IO interrupts (like disk/network/etc).
So timers right now do not add _anything_ to the randomness pool. Not
noise, not entropy.
But yes, what you can do is a jitter entropy thing, which basically
does what you suggest, except instead of "msleep(1)" it does something
like
while (crng isn't inited) {
sched_yield();
do_a_round_of_memory_accesses_etc();
add_cycle_counter_entropy();
}
and with a lot of handwaving you'll convince a certain amount of
people that yes, the timing of the above is unpredictable enough that
the entropy you add is real.
Linus
21.09.2019 00:51, Linus Torvalds пишет:
> And we'll also have to make getrandom(0) be really _timely_. Security
> people would likely rather wait for minutes before they are happy with
> it. But because it's a boot constraint as things are now, it will not
> just be jitter-entropy, it will be _accelerated_ jitter-entropy in 15
> seconds or whatever, and since it can't use up all of CPU time, it's
> realistically more like "15 second timeout, but less of actual CPU
> time for jitter".
I don't think that "accelerated jitter" makes sense. The jitterentropy
hwrng that I sent earlier fills the entropy buffer in less than 2
seconds, even with quality=4, so there is no need to accelerate it even
more.
> That said, if we can all convince everybody (hah!) that jitter entropy
> in the kernel would be sufficient, then we can make the whole point
> entirely moot, and just say "we'll just change crng_wait() to do
> jitter entropy instead and be done with it. Then any getrandom() user
> will just basically wait for a (very limited) time and the system will
> be happy.
>
> If that is the case we wouldn't need new flags at all. But I don't
> think you can make everybody agree to that, which is why I suspect
> we'll need the new flag, and I'll just take the heat for saying "0 is
> now off limits, because it does this thing that a lot of people
> dislike".
I 100% agree with that.
--
Alexander E. Patrakov
On Fri, Sep 20, 2019 at 12:51:12PM -0700, Linus Torvalds wrote:
> So we absolutely _will_ come up with some way 0 ends the wait. Whether
> it's _just_ a timeout, or whether it's jitter-entropy or whatever, it
> will happen.
FWIW, Zircon uses the jitter entropy generator to seed the CRNG and
documented their findings in
https://fuchsia.dev/fuchsia-src/zircon/jitterentropy/config-basic .
--
Matthew Garrett | [email protected]
On Fri, Sep 20, 2019 at 12:51 PM Linus Torvalds
<[email protected]> wrote:
>
> > And the only real question is how to map existing users to these
> > semantics. I see two sensible choices:
> >
> > 1. 0 means "secure, blocking". I think this is not what we'd do if we
> > could go back in time and chage the ABI from day 1, but I think it's
> > actually good enough. As long as this mode won't deadlock, it's not
> > *that* bad if programs are using it when they wanted "insecure".
>
> It's exactly that "as long as it won't deadlock" that is our current problem.
>
> It *does* deadlock.
>
> So it can't mean "blocking" in any long-term meaning.
>
> It can mean "blocks for up to 15 seconds" or something like that. I'd
> honestly prefer a smaller number, but I think 15 seconds is an
> acceptable "your user space is buggy, but we won't make you think the
> machine hung".
To be clear, when I say "blocking", I mean "blocks until we're ready,
but we make sure we're ready in a moderately timely manner".
Rather than answering everything point by point, here's a updated
mini-proposal and some thoughts. There are two families of security
people that I think we care about. One is the FIPS or CC or PCI
crowd, and they might, quite reasonably, demand actual hardware RNGs.
We should make the hwrng API stop sucking and they should be happy.
(This means expose an hwrng device node per physical device, IMO.)
The other is the one who wants getrandom(), etc to be convincingly
secure and is willing to do some actual analysis. And I think we can
make them quite happy like this:
In the kernel, we have two types of requests for random numbers: a
request for "secure" bytes and a request for "insecure" bytes.
Requests for "secure" bytes can block or return -EAGAIN. Requests for
"insecure" bytes succeed without waiting. In addition, we have a
jitter entropy mechanism (maybe the one mjg59 referenced, maybe
Alexander's -- doesn't really matter) and we *guarantee* that jitter
entropy, by itself, is enough to get the "secure" generator working
after, say, 5s of effort. By this, I mean that, on an idle system, it
finishes in 5s and, on a fully loaded system, it's allowed to take a
little while longer but not too much longer.
In other words, I want GRND_SECURE_BLOCKING and /dev/random reads to
genuinely always work and to genuinely never take much longer than 5s.
I don't want a special case where they fail.
The exposed user APIs are, subject to bikeshedding that can happen
later over the actual values, etc:
GRND_SECURE_BLOCKING: returns "secure" output and blocks until it's
ready. This never fails, but it also never blocks forever.
GRND_SECURE_NONBLOCKING: same but returns -EAGAIN instead of blocking.
GRND_INSECURE: returns "insecure" output immediately. I think we do
need this -- the "secure" mode may take a little while at early boot,
and libraries that initialize themselves with some randomness really
do want a way to get some numbers without any delay whatsoever.
0: either the same as GRND_SECURE_BLOCKING plus a warning or the
"accelerated" version. The "accelerated" version means wait up to 2s
for secure numbers and, if there still aren't any, fall back to
"insecure".
GRND_RANDOM: either the same as 0 or the same as GRND_SECURE_BLOCKING
but with a warning. I don't particularly care either way.
I'm okay with a well-defined semantic like I proposed for an
accelerated mode. I don't really want to try to define what a
secure-but-not-as-secure mode means as a separate complication that
the underlying RNG needs to support forever. I don't think the
security folks would like that either.
How does this sound?
On Fri, Sep 20, 2019 at 1:51 PM Andy Lutomirski <[email protected]> wrote:
>
> To be clear, when I say "blocking", I mean "blocks until we're ready,
> but we make sure we're ready in a moderately timely manner".
.. an I want a pony.
The problem is that you start from an assumption that we simply can't
seem to do.
> In other words, I want GRND_SECURE_BLOCKING and /dev/random reads to
> genuinely always work and to genuinely never take much longer than 5s.
> I don't want a special case where they fail.
Honestly, if that's the case and we _had_ such a methoc of
initializing the rng, then I suspect we could just ignore the flags
entirely, with the possible exception of GRND_NONBLOCK. And even that
is "possible exception", because once your worst-case is a one-time
delay of 5s at boot time thing, you might as well consider it
nonblocking in general.
Yes, there are some in-kernel users that really can't afford to do
even that 5s delay (not just may they be atomic, but more likely it's
just that we don't want to delay _everything_ by 5s), but they don't
use the getrandom() system call anyway.
> The exposed user APIs are, subject to bikeshedding that can happen
> later over the actual values, etc:
So the thing is, you start from the impossible assumption, and _if_
you hold that assumption then we might as well just keep the existing
"zero means blocking", because nobody mind.
I'd love to say "yes, we can guarantee good enough entropy for
everybody in 5s and we don't even need to warn about it, because
everybody will be comfortable with the state of our entropy at that
point".
It sounds like a _lovely_ model.
But honestly, it simply sounds unlikely.
Now, there are different kinds of unlikely.
In particular, if you actually have a CPU cycle counter that actually
runs at least on the same order of magnitude as the CPU frequency -
then I believe in the jitter entropy more than in many other cases.
Sadly, many platforms don't have that kind of cycle counter.
I've also not seen a hugely believable "yes, the jitter entropy is
real" paper. Alexander points to the existing jitterentropy crypto
code, and claims it can fill all our entropy needs in two seconds, but
there are big caveats:
(a) that code uses get_random_entropy(), which on a PC is that nice
fast TSC that we want. On other platforms (or on really old PC's - we
technically support CPU's still that don't have rdtsc)? It might be
zero. Every time.
(b) How was it tested? There are lots of randomness tests, but most
of them can be fooled with a simple counter through a cryptographic
hash - which you basically need to do anyway on whatever entropy
source you have in order to "whiten" it. It's simply _really_ hard to
decide on entropy.
So it's really easy to make the randomness of some input look really
good, without any real idea how good it truly is. And maybe it really
is very very good on one particular machine, and then on another one
(with either a simpler in-order core or a lower-frequency timestamp
counter) it might be horrendously bad, and you'll never know,
So I'd love to believe in your simple model. Really. I just don't see
how to get there reliably.
Matthew Garrettpointed to one analysis on jitterentropy, and that one
wasn't all that optimistic.
I do think jitterentropy would likely be good enough in practice - at
least on PC's with a TSC - for the fairly small window at boot and
getrandom(0). As I mentioned, I don't think it will make anybody
_happy_, but it might be one of those things where it's a compromise
that at least works for people, with the key generation people who are
really unhappy with it having a new option for their case.
And maybe Alexander can convince people that when you run the
jitterentropy code a hundred billion times, the end result (not the
random stream from it, but the jitter bits themselves - but I'm not
even sure how to boil it down) - really is random.
Linus
On Fri, Sep 20, 2019 at 3:44 PM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, Sep 20, 2019 at 1:51 PM Andy Lutomirski <[email protected]> wrote:
> >
> > To be clear, when I say "blocking", I mean "blocks until we're ready,
> > but we make sure we're ready in a moderately timely manner".
>
> .. an I want a pony.
>
> The problem is that you start from an assumption that we simply can't
> seem to do.
Eh, fair enough, I wasn't thinking about platforms without fast clocks.
I'm very nervous about allowing getrandom(..., 0) to fail with
-EAGAIN, though. On a very, very brief search, I didn't find any
programs that would incorrectly assume it worked, but I can easily
imagine programs crashing, and that might be bad, too. At the end of
the day, most user programmers who call getrandom() really did notice
that we flubbed the ABI, and either they were too lazy to fall back to
/dev/urandom, or they didn't want to for some reason, or they
genuinely want the blocking behavior. And people who work with little
embedded systems without good clocks that basically can't generate
random numbers already know this, and they have little scripts to help
out.
So I think that just improving the
getrandom()-is-blocking-on-x86-and-arm behavior, adding GRND_INSECURE
and GRND_SECURE_BLOCKING, and adding the warning if 0 is passed is
good enough. I suppose we could also have separate
GRND_SECURE_BLOCKING and GRND_SECURE_BLOCK_FOREVER. We could also say
that, if you want to block forever, you should poll() on /dev/random
(with my patches applied, where this actually does what users would
want).
--Andy
On Fri, Sep 20, 2019 at 04:30:20PM -0700, Andy Lutomirski wrote:
> So I think that just improving the
> getrandom()-is-blocking-on-x86-and-arm behavior, adding GRND_INSECURE
> and GRND_SECURE_BLOCKING, and adding the warning if 0 is passed is
> good enough.
I think so as well. Anyway, keep in mind that *with a sane API*,
userland can improve very quickly (faster than kernel deployments in
field). But userland developers need reliable and testable support for
features. If it's enough to do #ifndef GRND_xxx/#define GRND_xxx and
call getrandom() with these flags to detect support, it's basically 5
reliable lines of code to add to userland to make a warning disappear
and/or to allow a system that previously failed to boot to now boot. So
this gives strong incentive to userland to adopt the new API, provided
there's a way for the developer to understand what's happening (which
the warning does).
If we do it right, all we'll hear are userland developers complaining
that those stupid kernel developers have changed their API again and
really don't know what they want. That will be a good sign that the
warning flows back to them and that adoption is taking.
And if the change is small enough, maybe it could make sense to backport
it to stable versions to fix boot issues. With a testable feature it
does make sense.
Willy
* Linus Torvalds:
> Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> aiming for.
>
> However, it's worth noting that nobody should ever use GRND_EXPLICIT
> directly. That's just the name for the bit. The actual users would use
> GRND_INSECURE or GRND_SECURE.
Should we switch glibc's getentropy to GRND_EXPLICIT? Or something
else?
I don't think we want to print a kernel warning for this function.
Thanks,
Florian
From: Linus Torvalds
> Sent: 19 September 2019 21:04
...
> Note small detail above: I changed the ^= to a +=. Addition tends to
> be better (due to carry between bits) when there might be bit
> commonalities. Particularly with something like a cycle count where
> two xors can mostly cancel out previous bits rather than move bits
> around in the word.
There is code in one on the kernel RNG that XORs together the output
of 3 LFSR (CRC) generators.
I think it is used for 'low quality' randomness and reseeded from the main RNG.
Using XOR makes the entire generator 'linear' and thus trivially reversible.
With a relatively small number of consecutive outputs you can determine the state
of all 3 LFSR.
Merge the results with addition and the process is immensely harder.
I've also wondered whether the RC4 generator is a useful entropy store?
It has a lot of state and you can fairly easily feed in values that might (or
might not) contain any randomness without losing any stored entropy.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Fri, Sep 20, 2019 at 11:07 PM Florian Weimer <[email protected]> wrote:
>
> * Linus Torvalds:
>
> > Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> > aiming for.
> >
> > However, it's worth noting that nobody should ever use GRND_EXPLICIT
> > directly. That's just the name for the bit. The actual users would use
> > GRND_INSECURE or GRND_SECURE.
>
> Should we switch glibc's getentropy to GRND_EXPLICIT? Or something
> else?
>
> I don't think we want to print a kernel warning for this function.
>
Contemplating this question, I think the answer is that we should just
not introduce GRND_EXPLICIT or anything like it. glibc is going to
have to do *something*, and getentropy() is unlikely to just go away.
The explicitly documented semantics are that it blocks if the RNG
isn't seeded.
Similarly, FreeBSD has getrandom():
https://www.freebsd.org/cgi/man.cgi?query=getrandom&sektion=2&manpath=freebsd-release-ports
and if we make getrandom(..., 0) warn, then we have a situation where
the *correct* (if regrettable) way to use the function on FreeBSD
causes a warning on Linux.
Let's just add GRND_INSECURE, make the blocking mode work better, and,
if we're feeling a bit more adventurous, add GRND_SECURE_BLOCKING as a
better replacement for 0, convince FreeBSD to add it too, and then
worry about deprecating 0 once we at least get some agreement from the
FreeBSD camp.
Hi!
> > => src/random-seed/random-seed.c:
> > /*
> > * Let's make this whole job asynchronous, i.e. let's make
> > * ourselves a barrier for proper initialization of the
> > * random pool.
> > */
...
> > k = getrandom(buf, buf_size, GRND_NONBLOCK);
> > if (k < 0 && errno == EAGAIN && synchronous) {
> > log_notice("Kernel entropy pool is not initialized yet, "
> > "waiting until it is.");
> >
> > k = getrandom(buf, buf_size, 0); /* retry synchronously */
> > }
>
> Yeah, the above is yet another example of completely broken garbage.
>
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.
Hmm. This actually points to a solution, and I believe solution is in the
kernel. Userspace is not the best place to decide what is the best way to
generate entropy.
> As mentioned, this has already historically been a huge issue on
> embedded devices, and with disks turnign not just to NVMe but to
> actual polling nvdimm/xpoint/flash, the amount of true "entropy"
> randomness we can give at boot is very questionable.
>
> We can (and will) continue to do a best-effort thing (including very
> much using rdread and friends), but the whole "wait for entropy"
> simply *must* stop.
And we can stop it... from kernel, and without hacks. Simply by generating some
entropy. We do not need to sit quietly while userspace waits for entropy to appear.
We can for example do some reads from the disk. (find / should be good for generating
entropy on many systems). For systems with rtc but not timestamp counter, we can
actually just increase register, then read it from interrupt...
...to get precise timings. We know system is blocked waiting for entropy, we can
do expensive things we would not "normally" do.
Yes, it would probably mean new kind of "driver" whose purpose is to generate some
kind of activity so that interrupts happen and entropy is generated... But that is
still better solution than fixing all of the userspace.
(With some proposals here, userspace _could_ do
while (getrandom() == -EINVAL) {
system("find / &");
sleep(1);
}
...but I believe we really want to do it once, in kernel, and less hacky than this)
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, Sep 23, 2019 at 11:33:21AM -0700, Andy Lutomirski wrote:
> On Fri, Sep 20, 2019 at 11:07 PM Florian Weimer <[email protected]> wrote:
> >
> > * Linus Torvalds:
> >
> > > Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> > > aiming for.
> > >
> > > However, it's worth noting that nobody should ever use GRND_EXPLICIT
> > > directly. That's just the name for the bit. The actual users would use
> > > GRND_INSECURE or GRND_SECURE.
> >
> > Should we switch glibc's getentropy to GRND_EXPLICIT? Or something
> > else?
> >
> > I don't think we want to print a kernel warning for this function.
> >
>
> Contemplating this question, I think the answer is that we should just
> not introduce GRND_EXPLICIT or anything like it. glibc is going to
> have to do *something*, and getentropy() is unlikely to just go away.
> The explicitly documented semantics are that it blocks if the RNG
> isn't seeded.
>
> Similarly, FreeBSD has getrandom():
>
> https://www.freebsd.org/cgi/man.cgi?query=getrandom&sektion=2&manpath=freebsd-release-ports
>
> and if we make getrandom(..., 0) warn, then we have a situation where
> the *correct* (if regrettable) way to use the function on FreeBSD
> causes a warning on Linux.
>
> Let's just add GRND_INSECURE, make the blocking mode work better, and,
> if we're feeling a bit more adventurous, add GRND_SECURE_BLOCKING as a
> better replacement for 0, ...
This is what's now done in the just-submitted V5, except the "make the
blocking mode work better" part:
https://lkml.kernel.org/r/20190926204217.GA1366@pc
It's a very conservative patch so far IMHO (minus the loud warning).
Thanks,
--
Ahmed Darwish
On 9/10/19 12:21 AM, Ahmed S. Darwish wrote:
> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.
Tangent: I asked aloud on Twitter last night if anyone had exploited
Rowhammer-like effects to generate entropy...and sure enough, the usual
suspects have: https://arxiv.org/pdf/1808.04286.pdf
While this requires low level access to a memory controller, it's
perhaps an example of something a platform designer could look at as a
source to introduce boot-time entropy for e.g. EFI_RNG_PROTOCOL even on
an existing platform without dedicated hardware for the purpose.
Just a thought.
Jon.
On 9/10/19 12:21 AM, Ahmed S. Darwish wrote:
> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.
Tangent: I asked aloud on Twitter last night if anyone had exploited
Rowhammer-like effects to generate entropy...and sure enough, the usual
suspects have: https://arxiv.org/pdf/1808.04286.pdf
While this requires low level access to a memory controller, it's
perhaps an example of something a platform designer could look at as a
source to introduce boot-time entropy for e.g. EFI_RNG_PROTOCOL even on
an existing platform without dedicated hardware for the purpose.
Just a thought.
Jon.