Dear Linux folks,
Since Linux 4.17-rcX, Linux spams a lot of `random: get_random_u32
called from` messages. I believe, this setting should be reverted by
default as otherwise a lot of other messages are not seen.
Please find my configuration attached.
Kind regards,
Paul
On Tue, Apr 24, 2018 at 01:48:16PM +0200, Paul Menzel wrote:
> Dear Linux folks,
>
> w
> Since Linux 4.17-rcX, Linux spams a lot of `random: get_random_u32 called
> from` messages. I believe, this setting should be reverted by default as
> otherwise a lot of other messages are not seen.
Can you tell me a bit about your system? What distribution, what
hardware is present in your sytsem (what architecture, what
peripherals are attached, etc.)?
There's a reason why we made this --- we were declaring the random
number pool to be fully intialized before it really was, and that was
a potential security concern. It's not as bad as the weakness
discovered by Nadia Heninger in 2012. (See https://factorable.net for
more details.) However, this is not one of those things where we like
to fool around.
So I want to understand if this is an issue with a particular hardware
configuration, or whether it's just a badly designed Linux init system
or embedded setup, or something else. After all, you wouldn't want
the NSA spying on all of your network traffic, would you? :-)
- Ted
Dear Theodore,
On 04/24/18 15:56, Theodore Y. Ts'o wrote:
> On Tue, Apr 24, 2018 at 01:48:16PM +0200, Paul Menzel wrote:
>> Since Linux 4.17-rcX, Linux spams a lot of `random: get_random_u32 called
>> from` messages. I believe, this setting should be reverted by default as
>> otherwise a lot of other messages are not seen.
>
> Can you tell me a bit about your system? What distribution, what
> hardware is present in your system (what architecture, what
> peripherals are attached, etc.)?
Good question, and sorry for not attaching the messages.
```
31.132: [ 0.581736] random: get_random_u32 called from
cache_random_seq_create+0xa3/0x1f0 with crng_init=0
31.132: [ 0.590831] random: get_random_u32 called from
cache_alloc_refill+0x5bb/0x13d0 with crng_init=0
31.132: [ 0.599654] random: get_random_u32 called from
cache_random_seq_create+0xa3/0x1f0 with crng_init=0
31.132: [ 0.608722] random: get_random_u32 called from
cache_alloc_refill+0x5bb/0x13d0 with crng_init=0
31.132: [ 0.617551] random: get_random_u32 called from
cache_random_seq_create+0xa3/0x1f0 with crng_init=0
31.132: [ 0.626630] random: get_random_u32 called from
cache_alloc_refill+0x5bb/0x13d0 with crng_init=0
31.132: [ 0.635438] random: get_random_u32 called from
cache_random_seq_create+0xa3/0x1f0 with crng_init=0
31.133: [ 0.644556] input: AT Translated Set 2 keyboard as
/devices/platform/i8042/serio0/input/input0
```
Until know I only saw this on systems where the Linux kernel was built
for 32-bit (i386).
make ARCH=i386 bindeb-pkg -j50
[…]
Please find the full log attached. Maybe you can reproduce it under QEMU.
Kind regards,
Paul
On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
> On Tue, Apr 24, 2018 at 01:48:16PM +0200, Paul Menzel wrote:
> > Dear Linux folks,
> >
> >
> > Since Linux 4.17-rcX, Linux spams a lot of `random: get_random_u32 called
> > from` messages. I believe, this setting should be reverted by default as
> > otherwise a lot of other messages are not seen.
>
> Can you tell me a bit about your system? What distribution, what
> hardware is present in your sytsem (what architecture, what
> peripherals are attached, etc.)?
Can you also send me your dmesg or kern.log so I can see where
get_random_u32 is getting called from during your system startup?
Thanks!
- Ted
Dear Theodore,
On 04/24/18 17:49, Theodore Y. Ts'o wrote:
> On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
>> On Tue, Apr 24, 2018 at 01:48:16PM +0200, Paul Menzel wrote:
>>> Since Linux 4.17-rcX, Linux spams a lot of `random: get_random_u32 called
>>> from` messages. I believe, this setting should be reverted by default as
>>> otherwise a lot of other messages are not seen.
>>
>> Can you tell me a bit about your system? What distribution, what
>> hardware is present in your sytsem (what architecture, what
>> peripherals are attached, etc.)?
>
> Can you also send me your dmesg or kern.log so I can see where
> get_random_u32 is getting called from during your system startup?
Sorry, for just attaching the unedited log file with the coreboot boot
messages. But at time stamp 31 second (first column) the Linux messages
are also included. An excerpt, and the full log in my last message.
> 01.515: Jumping to boot code at 00009000(7f733000)
> 31.117: [ 0.515017] 00:07: ttyS1 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> 31.118: [ 0.523260] Linux agpgart interface v0.103
> 31.119: [ 0.528188] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
> 31.130: [ 0.547244] serio: i8042 KBD port at 0x60,0x64 irq 1
> 31.130: [ 0.552286] serio: i8042 AUX port at 0x60,0x64 irq 12
> 31.130: [ 0.557653] rtc_cmos 00:03: RTC can wake from S4
> 31.131: [ 0.562627] rtc_cmos 00:03: registered as rtc0
> 31.131: [ 0.567197] rtc_cmos 00:03: alarms up to one month, y3k, 242 bytes nvram, hpet irqs
> 31.131: [ 0.575045] ledtrig-cpu: registered to indicate activity on CPUs
> 31.132: [ 0.581736] random: get_random_u32 called from cache_random_seq_create+0xa3/0x1f0 with crng_init=0
> 31.132: [ 0.590831] random: get_random_u32 called from cache_alloc_refill+0x5bb/0x13d0 with crng_init=0
> 31.132: [ 0.599654] random: get_random_u32 called from cache_random_seq_create+0xa3/0x1f0 with crng_init=0
> 31.132: [ 0.608722] random: get_random_u32 called from cache_alloc_refill+0x5bb/0x13d0 with crng_init=0
> 31.132: [ 0.617551] random: get_random_u32 called from cache_random_seq_create+0xa3/0x1f0 with crng_init=0
> 31.132: [ 0.626630] random: get_random_u32 called from cache_alloc_refill+0x5bb/0x13d0 with crng_init=0
> 31.132: [ 0.635438] random: get_random_u32 called from cache_random_seq_create+0xa3/0x1f0 with crng_init=0
> 31.133: [ 0.644556] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
The problem on the Lenovo X60 is, that the serial console seems to
switch from ttyS0 to ttyS1 during bootup with the dock attached, that’s
why you do not see the messages in the beginning.
Kind regards,
Paul
Does this help on your system?
- Ted
commit 4e00b339e264802851aff8e73cde7d24b57b18ce
Author: Theodore Ts'o <[email protected]>
Date: Wed Apr 25 01:12:32 2018 -0400
random: rate limit unseeded randomness warnings
On systems without sufficient boot randomness, no point spamming dmesg.
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 721dca8db9cf..cd888d4ee605 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -261,6 +261,7 @@
#include <linux/ptrace.h>
#include <linux/workqueue.h>
#include <linux/irq.h>
+#include <linux/ratelimit.h>
#include <linux/syscalls.h>
#include <linux/completion.h>
#include <linux/uuid.h>
@@ -438,6 +439,16 @@ static void _crng_backtrack_protect(struct crng_state *crng,
static void process_random_ready_list(void);
static void _get_random_bytes(void *buf, int nbytes);
+static struct ratelimit_state unseeded_warning =
+ RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
+static struct ratelimit_state urandom_warning =
+ RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+
+static int ratelimit_disable __read_mostly;
+
+module_param_named(ratelimit_disable, ratelimit_disable, int, 0644);
+MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
+
/**********************************************************************
*
* OS independent entropy store. Here are the functions which handle
@@ -932,6 +943,18 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
process_random_ready_list();
wake_up_interruptible(&crng_init_wait);
pr_notice("random: crng init done\n");
+ if (unseeded_warning.missed) {
+ pr_notice("random: %d get_random_xx warning(s) missed "
+ "due to ratelimiting\n",
+ unseeded_warning.missed);
+ unseeded_warning.missed = 0;
+ }
+ if (urandom_warning.missed) {
+ pr_notice("random: %d urandom warning(s) missed "
+ "due to ratelimiting\n",
+ urandom_warning.missed);
+ urandom_warning.missed = 0;
+ }
}
}
@@ -1572,8 +1595,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
#ifndef CONFIG_WARN_ALL_UNSEEDED_RANDOM
print_once = true;
#endif
- pr_notice("random: %s called from %pS with crng_init=%d\n",
- func_name, caller, crng_init);
+ if (__ratelimit(&unseeded_warning))
+ pr_notice("random: %s called from %pS with crng_init=%d\n",
+ func_name, caller, crng_init);
}
/*
@@ -1767,6 +1791,10 @@ static int rand_initialize(void)
init_std_data(&blocking_pool);
crng_initialize(&primary_crng);
crng_global_init_time = jiffies;
+ if (ratelimit_disable) {
+ urandom_warning.interval = 0;
+ unseeded_warning.interval = 0;
+ }
return 0;
}
early_initcall(rand_initialize);
@@ -1834,9 +1862,10 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
if (!crng_ready() && maxwarn > 0) {
maxwarn--;
- printk(KERN_NOTICE "random: %s: uninitialized urandom read "
- "(%zd bytes read)\n",
- current->comm, nbytes);
+ if (__ratelimit(&urandom_warning))
+ printk(KERN_NOTICE "random: %s: uninitialized "
+ "urandom read (%zd bytes read)\n",
+ current->comm, nbytes);
spin_lock_irqsave(&primary_crng.lock, flags);
crng_init_cnt = 0;
spin_unlock_irqrestore(&primary_crng.lock, flags);
Dear Theodore,
Am 25.04.2018 um 09:41 schrieb Theodore Y. Ts'o:
> Does this help on your system?
Thank you, after figuring out how to apply the paste, yes it helped on
my Lenovo X60.
> commit 4e00b339e264802851aff8e73cde7d24b57b18ce
> Author: Theodore Ts'o <[email protected]>
> Date: Wed Apr 25 01:12:32 2018 -0400
>
> random: rate limit unseeded randomness warnings
>
> On systems without sufficient boot randomness, no point spamming dmesg.
I guess this is a problem with old hardware?
[…]
Kind regards,
Pul
I noticed "systems without sufficient boot randomness" and would like to add to this.
With the changes to /dev/random going from 4.16.3 to 4.16.4, my low-spec Chromebook does not reach
the login screen upon boot (it stays stuck on a black screen) until I provide a source of entropy to
the system via interrupts (e.g., holding down a key on the keyboard for 5 sec or moving my finger
across the touchpad a lot). After providing a source of entropy for long enough,
"random: crng init done" prints out in dmesg and the login screen finally pops up.
Detailed information on my system can be found on this bug report I recently worked on:
https://bugzilla.kernel.org/show_bug.cgi?id=199463
On Wed, Apr 25, 2018 at 09:11:08PM -0700, Sultan Alsawaf wrote:
> I noticed "systems without sufficient boot randomness" and would like to add to this.
>
> With the changes to /dev/random going from 4.16.3 to 4.16.4, my low-spec Chromebook does not reach
> the login screen upon boot (it stays stuck on a black screen) until I provide a source of entropy to
> the system via interrupts (e.g., holding down a key on the keyboard for 5 sec or moving my finger
> across the touchpad a lot). After providing a source of entropy for long enough,
> "random: crng init done" prints out in dmesg and the login screen finally pops up.
Thanks for the report!
I assume since you're upgrading your own kernel, you must not be
running Chrome OS on your Acer CB3-431 Chromebook (Edgar). Are you
running Chromium --- or some Linux distribution on it?
Thanks,
- Ted
> Thanks for the report!
>
> I assume since you're upgrading your own kernel, you must not be
> running Chrome OS on your Acer CB3-431 Chromebook (Edgar). Are you
> running Chromium --- or some Linux distribution on it?
>
> Thanks,
>
> - Ted
Correct, I'm running Xubuntu 18.04 with my own kernel based off linux-stable.
Hi!
> Since Linux 4.17-rcX, Linux spams a lot of `random: get_random_u32 called
> from` messages. I believe, this setting should be reverted by default as
> otherwise a lot of other messages are not seen.
>
> Please find my configuration attached.
Same here, thinkpad X60:
[ 3.163839] systemd[1]: Failed to insert module 'ipv6'
[ 3.181266] systemd[1]: Set hostname to <amd>.
[ 3.267243] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[ 3.669590] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[ 3.696242] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.700066] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.703716] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.756137] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.760460] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.764515] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.835312] random: systemd: uninitialized urandom read (16 bytes
read)
[ 4.173204] systemd[1]: Binding to IPv6 address not available since
kernel does not support IPv6.
[ 4.176977] systemd[1]: [/lib/systemd/system/gpsd.socket:6] Failed
to parse address value, ignoring: [::1]:2947
[ 4.186472] systemd[1]: Starting Forward Password Requests to Wall
Directory Watch.
[ 4.188845] systemd[1]: Started Forward Password Requests to Wall
Directory Watch.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, Apr 25, 2018 at 10:05:55PM -0700, Sultan Alsawaf wrote:
>
> Correct, I'm running Xubuntu 18.04 with my own kernel based off linux-stable.
>
Hmm, can you let the boot hang for a while? It should continue after
a few minutes if you wait long enough, but wait a minute or two, then
give it entropy so the boot can continue. Then can you use
"systemd-analyze blame" or "systemd-analyize critical-chain" and we
can see what process was trying to get randomness during the boot
startup and blocking waiting for the CRNG to be fully initialized.
- Ted
> Hmm, can you let the boot hang for a while? It should continue after
> a few minutes if you wait long enough, but wait a minute or two, then
> give it entropy so the boot can continue. Then can you use
> "systemd-analyze blame" or "systemd-analyize critical-chain" and we
> can see what process was trying to get randomness during the boot
> startup and blocking waiting for the CRNG to be fully initialized.
>
> - Ted
systemd-analyze blame: https://hastebin.com/ikipavevew.css
systemd-analyze critical-chain: https://hastebin.com/odoyuqeges.pl
dmesg: https://hastebin.com/waracebeja.vbs
On Thu, Apr 26, 2018 at 08:17:34AM -0700, Sultan Alsawaf wrote:
> > Hmm, can you let the boot hang for a while? It should continue after
> > a few minutes if you wait long enough, but wait a minute or two, then
> > give it entropy so the boot can continue. Then can you use
> > "systemd-analyze blame" or "systemd-analyize critical-chain" and we
> > can see what process was trying to get randomness during the boot
> > startup and blocking waiting for the CRNG to be fully initialized.
> >
> > - Ted
>
> systemd-analyze blame: https://hastebin.com/ikipavevew.css
> systemd-analyze critical-chain: https://hastebin.com/odoyuqeges.pl
> dmesg: https://hastebin.com/waracebeja.vbs
>
Hmm, it looks like the multiuser startup is getting blocked on snapd:
29.060s snapd.service
graphical.target @1min 32.145s
└─multi-user.target @1min 32.145s
└─hddtemp.service @6.512s +28ms
└─network-online.target @6.508s
└─NetworkManager-wait-online.service @2.428s +4.079s
└─NetworkManager.service @2.016s +404ms
└─dbus.service @1.869s
└─basic.target @1.824s
└─sockets.target @1.824s
└─snapd.socket @1.821s +1ms
└─sysinit.target @1.812s
└─apparmor.service @587ms +1.224s
└─local-fs.target @585ms
└─local-fs-pre.target @585ms
└─keyboard-setup.service @235ms +346ms
└─systemd-journald.socket @226ms
└─system.slice @225ms
└─-.slice @220ms
This appears to be some kind of new package management system for
Ubuntu:
Description-en: Tool to interact with Ubuntu Core Snappy.
Install, configure, refresh and remove snap packages. Snaps are
'universal' packages that work across many different Linux systems,
enabling secure distribution of the latest apps and utilities for
cloud, servers, desktops and the internet of things.
Why it the Ubuntu package believes it needs to be fully started before
the login screen can display is unclear to me. It might be worth
using systemctl to disable snapd.serivce and see if that makes things
work better for you.
- Ted
> Hmm, it looks like the multiuser startup is getting blocked on snapd:
>
> 29.060s snapd.service
>
> graphical.target @1min 32.145s
> └─multi-user.target @1min 32.145s
> └─hddtemp.service @6.512s +28ms
> └─network-online.target @6.508s
> └─NetworkManager-wait-online.service @2.428s +4.079s
> └─NetworkManager.service @2.016s +404ms
> └─dbus.service @1.869s
> └─basic.target @1.824s
> └─sockets.target @1.824s
> └─snapd.socket @1.821s +1ms
> └─sysinit.target @1.812s
> └─apparmor.service @587ms +1.224s
> └─local-fs.target @585ms
> └─local-fs-pre.target @585ms
> └─keyboard-setup.service @235ms +346ms
> └─systemd-journald.socket @226ms
> └─system.slice @225ms
> └─-.slice @220ms
>
> This appears to be some kind of new package management system for
> Ubuntu:
>
> Description-en: Tool to interact with Ubuntu Core Snappy.
> Install, configure, refresh and remove snap packages. Snaps are
> 'universal' packages that work across many different Linux systems,
> enabling secure distribution of the latest apps and utilities for
> cloud, servers, desktops and the internet of things.
>
> Why it the Ubuntu package believes it needs to be fully started before
> the login screen can display is unclear to me. It might be worth
> using systemctl to disable snapd.serivce and see if that makes things
> work better for you.
>
> - Ted
I removed snapd completely which did nothing.
Here are new logs:
systemd-analyze blame: https://hastebin.com/edehikuyeb.css
systemd-analyze critical-chain: https://hastebin.com/vedufafema.pl
dmesg: https://hastebin.com/zuwuwoxadu.vbs
I should also note that leaving the system untouched does not result in it booting: I must
provide a source of entropy, otherwise it just stays stuck. In both of the dmesgs I've given, I
manually provided entropy to the system after about 5 minutes of waiting.
Also, regardless of what's hanging on CRNG init, CRNG should be able to init on its own in a timely
manner without the need for user-provided entropy. Userspace was working fine before the recent CRNG
kernel changes, so I don't think this is a userspace bug.
-Sultan
On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> > Hmm, it looks like the multiuser startup is getting blocked on snapd:
> >
> > 29.060s snapd.service
> >
> > graphical.target @1min 32.145s
> > └─multi-user.target @1min 32.145s
> > └─hddtemp.service @6.512s +28ms
> > └─network-online.target @6.508s
> > └─NetworkManager-wait-online.service @2.428s +4.079s
> > └─NetworkManager.service @2.016s +404ms
> > └─dbus.service @1.869s
> > └─basic.target @1.824s
> > └─sockets.target @1.824s
> > └─snapd.socket @1.821s +1ms
> > └─sysinit.target @1.812s
> > └─apparmor.service @587ms +1.224s
> > └─local-fs.target @585ms
> > └─local-fs-pre.target @585ms
> > └─keyboard-setup.service @235ms +346ms
> > └─systemd-journald.socket @226ms
> > └─system.slice @225ms
> > └─-.slice @220ms
> >
> > This appears to be some kind of new package management system for
> > Ubuntu:
> >
> > Description-en: Tool to interact with Ubuntu Core Snappy.
> > Install, configure, refresh and remove snap packages. Snaps are
> > 'universal' packages that work across many different Linux systems,
> > enabling secure distribution of the latest apps and utilities for
> > cloud, servers, desktops and the internet of things.
> >
> > Why it the Ubuntu package believes it needs to be fully started before
> > the login screen can display is unclear to me. It might be worth
> > using systemctl to disable snapd.serivce and see if that makes things
> > work better for you.
> >
> > - Ted
>
> I removed snapd completely which did nothing.
>
> Here are new logs:
> systemd-analyze blame: https://hastebin.com/edehikuyeb.css
> systemd-analyze critical-chain: https://hastebin.com/vedufafema.pl
> dmesg: https://hastebin.com/zuwuwoxadu.vbs
>
> I should also note that leaving the system untouched does not result in it booting: I must
> provide a source of entropy, otherwise it just stays stuck. In both of the dmesgs I've given, I
We have observed a similiar problem with libvirt. As soon as entropy is
provided the boot finishes otherwise it hangs for a long time.
This is not happening with v4.17-rc1 afaict.
Christian
> manually provided entropy to the system after about 5 minutes of waiting.
>
> Also, regardless of what's hanging on CRNG init, CRNG should be able to init on its own in a timely
> manner without the need for user-provided entropy. Userspace was working fine before the recent CRNG
> kernel changes, so I don't think this is a userspace bug.
>
> -Sultan
>
On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
>
> Also, regardless of what's hanging on CRNG init, CRNG should be able to init on its own in a timely
> manner without the need for user-provided entropy. Userspace was working fine before the recent CRNG
> kernel changes, so I don't think this is a userspace bug.
The CRNG changes were needed because were erroneously saying that the
entropy pool was securely initialized before it really was. Saying
that CRNG should be able to init on its own is much like saying, "Ted
should be able to fly wherever he wants in his own personal Gulfstream
V." It would certainly be _nice_ if I could afford my personal jet.
I certainly wish I were that rich. But the problem is that dollars
(or Euro's) are like entropy, they don't just magically drop out of
the sky.
If there isn't user-provided entropy, and the hardware isn't providing
sufficient entropy, where did you think the kernel is supposed to get
the entropy from? Should it dial 1-800-TRUST-NSA?
From the dmesg log, you have a Chromebook Acer 14. I'm guessing the
problem is that Chromebooks have hardware tries *very* hard not to
issue interrupts, since that helps with power savings. The following
from your dmesg is very interesting:
[ 0.526786] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
I suspect this isn't a firmware bug; it's the hardware working as
intended / working as designed, for power savings reasons.
So there are two ways to fix this that I can see. One is to try to
adjust userspace so that it allows the boot to proceed. As there is
more activity, the disk completion interrupts, the user typing their
username/password into the login screen, etc., there will be timing
events which can be used to harvest entropy.
The other approach would be to compile the kernel with
CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
initalize chip->hwrng.quality = 500. We've historically made this
something that the system administrator must set via sysfs. This is
because we wanted system adminisrators to explicitly say that they
trust the any hardware manufacturer that (a) they haven't been paid by
your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
and (b) they are competent to actually implemnt a _secure_ hardware
random number generator. Sadly, this has not always been the case.
Please see:
https://www.chromium.org/chromium-os/tpm_firmware_update
And note that your Edgar Chromebook is one the list of devices that
have a TPM with the buggy firmware. Fortunately this particular TPM
bug only affects RSA prime generation, so as far as I know there is no
_known_ vulerability in your TPM's hardware random number generator.
B ut we want it to be _your_ responsibility to decide you are willing
to truste it. I certainly don't want to be legally liable --- or even
have the moral responsibility --- of guaranteeing that every single
TPM out there is bug-free(tm).
- Ted
On Thu, Apr 26, 2018 at 10:47:49PM +0200, Christian Brauner wrote:
>
> We have observed a similiar problem with libvirt. As soon as entropy is
> provided the boot finishes otherwise it hangs for a long time.
> This is not happening with v4.17-rc1 afaict.
For libvirt there is at least an easy workaround. Make surue the
guest kernel has CONFIG_HW_RANDOM_VIRTIO enabled, and then make sure
qemu is started with the options:
-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0
Cheers,
- Ted
> The CRNG changes were needed because were erroneously saying that the
> entropy pool was securely initialized before it really was. Saying
> that CRNG should be able to init on its own is much like saying, "Ted
> should be able to fly wherever he wants in his own personal Gulfstream
> V." It would certainly be _nice_ if I could afford my personal jet.
> I certainly wish I were that rich. But the problem is that dollars
> (or Euro's) are like entropy, they don't just magically drop out of
> the sky.
>
> If there isn't user-provided entropy, and the hardware isn't providing
> sufficient entropy, where did you think the kernel is supposed to get
> the entropy from? Should it dial 1-800-TRUST-NSA?
>
> From the dmesg log, you have a Chromebook Acer 14. I'm guessing the
> problem is that Chromebooks have hardware tries *very* hard not to
> issue interrupts, since that helps with power savings. The following
> from your dmesg is very interesting:
>
> [ 0.526786] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
>
> I suspect this isn't a firmware bug; it's the hardware working as
> intended / working as designed, for power savings reasons.
>
> So there are two ways to fix this that I can see. One is to try to
> adjust userspace so that it allows the boot to proceed. As there is
> more activity, the disk completion interrupts, the user typing their
> username/password into the login screen, etc., there will be timing
> events which can be used to harvest entropy.
>
> The other approach would be to compile the kernel with
> CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
> initalize chip->hwrng.quality = 500. We've historically made this
> something that the system administrator must set via sysfs. This is
> because we wanted system adminisrators to explicitly say that they
> trust the any hardware manufacturer that (a) they haven't been paid by
> your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
> and (b) they are competent to actually implemnt a _secure_ hardware
> random number generator. Sadly, this has not always been the case.
> Please see:
>
> https://www.chromium.org/chromium-os/tpm_firmware_update
>
> And note that your Edgar Chromebook is one the list of devices that
> have a TPM with the buggy firmware. Fortunately this particular TPM
> bug only affects RSA prime generation, so as far as I know there is no
> _known_ vulerability in your TPM's hardware random number generator.
> B ut we want it to be _your_ responsibility to decide you are willing
> to truste it. I certainly don't want to be legally liable --- or even
> have the moral responsibility --- of guaranteeing that every single
> TPM out there is bug-free(tm).
>
> - Ted
Why don't we tell users that they need to smash their keyboards to make their computers boot
then? And if they question it, we can tell them that it certainly would be _nice_ to not have
to smash their keyboards to make their computers boot, but alas, a part of me has a feeling that
users would not take kindly to that :)
I noted at least 20,000 mmc interrupts before I intervened in the boot process to provide entropy
myself. That's just for mmc, so I'm sure there were even more interrupts elsewhere. Is 20k+ interrupts
really not sufficient?
There are lots of other sources of entropy available as well, like the ever-changing CPU frequencies reported
by any recent Intel chip (i.e., they report precision down to 1 kHz). Why are we so limited to h/w interrupts?
Sultan
Hi Ted,
Please correct me if I'm wrong, but my present understanding of this
is that crng readiness used to be broken, meaning people would have a
seeded rng without it actually being seeded. You fixed this bug, and
now people are discovering that they don't have crng readiness during
a late stage of their init, which is breaking all sorts of entirely
reasonable and widely deployed userspaces.
You could argue that those userspaces were "only designed for machines
that have enough [by what measure?] boot time entropy", but obviously
they didn't have that in mind. And now here we have an example of an
ordinary x86 machine -- not some weird embedded device -- hitting
these issues. I'd suspect that the problem here isn't one that we can
exclusively punt onto userspace.
Sultan mentioned that his machine actually does trigger large
quantities of interrupts. Is it possible that the entropy gathering
algorithm has some issues, and Sultan's report points to a real bug
here? Considering the crng readiness state hasn't been working until
your recent fix, I suspect the actual entropy gathering code probably
hasn't prompted too many bug reports, until now that is.
Jason
On Fri, Apr 27, 2018 at 05:38:52PM +0200, Jason A. Donenfeld wrote:
>
> Please correct me if I'm wrong, but my present understanding of this
> is that crng readiness used to be broken, meaning people would have a
> seeded rng without it actually being seeded. You fixed this bug, and
> now people are discovering that they don't have crng readiness during
> a late stage of their init, which is breaking all sorts of entirely
> reasonable and widely deployed userspaces.
I'd say the problem is a combination of some classes of x86 hardware
devices (so far I've mainly seen repurposed chromebooks and VM's that
don't have virtio-rng enabled) combined with some distributions that
could make themselves more amenable to platforms with minimal amounts
of entropy available to them during system startup.
> Sultan mentioned that his machine actually does trigger large
> quantities of interrupts. Is it possible that the entropy gathering
> algorithm has some issues, and Sultan's report points to a real bug
> here? Considering the crng readiness state hasn't been working until
> your recent fix, I suspect the actual entropy gathering code probably
> hasn't prompted too many bug reports, until now that is.
It's not clear when his machine is triggering the "large quantity of
interrupts". Is it during the system startup, or after he's logged
into the machine? I suspect what is going on is the Chromebook has
been engineered so that when it's idle, it doesn't issue any
interrupts at all --- which is a good thing from a power management
perspective. So if nothing is actually _querying_ the SD Card reader,
it's not generating any interrupts.
This is a feature, and not a bug. That being said, a laptop which
sends some number of interrupts as it receives, say, WiFi packets, and
a system which automatically starts looking for suitable access points
as soon as the machine is started gives us timing events which is not
easily available to an analyst sitting in Fort Meade, Maryland. In
practice, that seems to be much more of the rule and not the
exception. However, as laptops try to become much more sparing
interrupts to save power, then we either have to (a) be willing to
trust hardware random number generators available to the laptop,
and/or (b) change userspace to *wait* until after the user has logged
in to try to obtain cryptographic-graded randomness.
If you think there is an alternative besides those two, I'm all ears...
- Ted
On Thu, Apr 26, 2018 at 10:20:44PM -0700, Sultan Alsawaf wrote:
>
> I noted at least 20,000 mmc interrupts before I intervened in the boot process to provide entropy
> myself. That's just for mmc, so I'm sure there were even more interrupts elsewhere. Is 20k+ interrupts
> really not sufficient?
How did you determine that there were 20,000 mmc interrupts before you
had logged in? Did you have access to the machine w/o having access
to the login prompt?
I can send a patch (see attached) that will spew large amounts of logs
as each interrupt comes in and the entropy pool is getting intialized.
That's how I test things on QEMU, and Jann did something similar on a
(physical) test machine, so I'm pretty confident that if you were
getting interrupts, it would result in them contributing into the
pool.
You will need a serial console, or build a kernel with a much larger
dmesg buffer, since if you really are getting that many interrupts it
will cause a lot of log spew.
> There are lots of other sources of entropy available as well, like
> the ever-changing CPU frequencies reported by any recent Intel chip
> (i.e., they report precision down to 1 kHz).
That's something we could look at, but the problem is if there is some
systemd unit during early boot that blocks waiting for the entropy
pool to be initalized, the system will come to a dead halt, and even
the CPU frequency shifts will probably not move much --- just as there
weren't any interrupts while some system startup on the boot path
wedges the system startup waiting for entropy.
This is why ultimately, we do need to attack this problem from both
ends, which means teaching userspace programs to only request
cryptographic-grade randomness when it is really needed --- and most
of the time, if the user has not logged in yet, you probably don't
need cryptographic-grade randomness....
- Ted
diff --git a/drivers/char/random.c b/drivers/char/random.c
index cd888d4ee605..69bd29f039e7 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -916,6 +916,10 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
__u32 key[8];
} buf;
+ if (crng == &primary_crng)
+ pr_notice("random: crng_reseed primary from %px\n", r);
+ else
+ pr_notice("random: crng_reseed crng %px from %px\n", crng, r);
if (r) {
num = extract_entropy(r, &buf, 32, 16, 0);
if (num == 0)
@@ -1241,6 +1245,10 @@ void add_interrupt_randomness(int irq, int irq_flags)
fast_pool->pool[2] ^= ip;
fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
get_reg(fast_pool, regs);
+ if (crng_init < 2)
+ pr_notice("random: add_interrupt(cycles=0x%08llx, now=%ld, "
+ "irq=%d, ip=0x%08lx)\n",
+ cycles, now, irq, _RET_IP_);
fast_mix(fast_pool);
add_interrupt_bench(cycles);
@@ -1282,6 +1290,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
/* award one bit for the contents of the fast pool */
credit_entropy_bits(r, credit + 1);
+ if (crng_init < 2)
+ pr_notice("random: batched into pool in stage %d, bits now %d",
+ crng_init, ENTROPY_BITS(r));
}
EXPORT_SYMBOL_GPL(add_interrupt_randomness);
> On Thu, Apr 26, 2018 at 10:20:44PM -0700, Sultan Alsawaf wrote:
>> I noted at least 20,000 mmc interrupts before I intervened in the boot process to provide entropy
>> myself. That's just for mmc, so I'm sure there were even more interrupts elsewhere. Is 20k+ interrupts
>> really not sufficient?
> How did you determine that there were 20,000 mmc interrupts before you
> had logged in? Did you have access to the machine w/o having access
> to the login prompt?
>
> I can send a patch (see attached) that will spew large amounts of logs
> as each interrupt comes in and the entropy pool is getting intialized.
> That's how I test things on QEMU, and Jann did something similar on a
> (physical) test machine, so I'm pretty confident that if you were
> getting interrupts, it would result in them contributing into the
> pool.
>
> You will need a serial console, or build a kernel with a much larger
> dmesg buffer, since if you really are getting that many interrupts it
> will cause a lot of log spew.
>> There are lots of other sources of entropy available as well, like
>> the ever-changing CPU frequencies reported by any recent Intel chip
>> (i.e., they report precision down to 1 kHz).
> That's something we could look at, but the problem is if there is some
> systemd unit during early boot that blocks waiting for the entropy
> pool to be initalized, the system will come to a dead halt, and even
> the CPU frequency shifts will probably not move much --- just as there
> weren't any interrupts while some system startup on the boot path
> wedges the system startup waiting for entropy.
>
> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness....
>
> - Ted
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index cd888d4ee605..69bd29f039e7 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -916,6 +916,10 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
> __u32 key[8];
> } buf;
>
> + if (crng == &primary_crng)
> + pr_notice("random: crng_reseed primary from %px\n", r);
> + else
> + pr_notice("random: crng_reseed crng %px from %px\n", crng, r);
> if (r) {
> num = extract_entropy(r, &buf, 32, 16, 0);
> if (num == 0)
> @@ -1241,6 +1245,10 @@ void add_interrupt_randomness(int irq, int irq_flags)
> fast_pool->pool[2] ^= ip;
> fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
> get_reg(fast_pool, regs);
> + if (crng_init < 2)
> + pr_notice("random: add_interrupt(cycles=0x%08llx, now=%ld, "
> + "irq=%d, ip=0x%08lx)\n",
> + cycles, now, irq, _RET_IP_);
>
> fast_mix(fast_pool);
> add_interrupt_bench(cycles);
> @@ -1282,6 +1290,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
>
> /* award one bit for the contents of the fast pool */
> credit_entropy_bits(r, credit + 1);
> + if (crng_init < 2)
> + pr_notice("random: batched into pool in stage %d, bits now %d",
> + crng_init, ENTROPY_BITS(r));
> }
> EXPORT_SYMBOL_GPL(add_interrupt_randomness);
I dumped the contents of /proc/interrupts to dmesg using the attached patch I threw together,
and then waited a sufficient amount of time before introducing entropy myself in order to ensure
that the interrupt readings were not contaminated by user-provided interrupts.
Here is the interesting snippet from my dmesg:
[ 30.689076] /proc/interrupts dump:
| CPU0 CPU1 CPU2 CPU3
0: 6 0 0 0 IO-APIC 2-edge timer
8: 0 0 1 0 IO-APIC 8-edge rtc0
9: 0 533 0 0 IO-APIC 9-fasteoi acpi
10: 0 0 0 0 IO-APIC 10-edge tpm0
29: 0 0 0 0 IO-APIC 29-fasteoi intel_sst_driver
36: 203 0 0 0 IO-APIC 36-fasteoi 808622C1:04
37: 0 264 0 0 IO-APIC 37-fasteoi 808622C1:05
42: 0 0 0 0 IO-APIC 42-fasteoi dw:dmac-1
43: 0 0 0 0 IO-APIC 43-fasteoi dw:dmac-1
45: 0 0 0 11402 IO-APIC 45-fasteoi mmc0
168: 0 0 1 0 chv-gpio 95 rt5645
182: 0 0 0 9 chv-gpio 17 i8042
183: 0 0 0 0 chv-gpio 18 ELAN0000:00
230: 0 0 0 0 chv-gpio 15 ACPI:Event
310: 0 0 0 0 PCI-MSI 458752-edge PCIe PME, pciehp
311: 0 0 0 0 PCI-MSI 462848-edge PCIe PME
312: 0 520 0 0 PCI-MSI 327680-edge xhci_hcd
313: 940 0 0 0 PCI-MSI 32768-edge i915
314: 0 137 0 0 PCI-MSI 1048576-edge iwlwifi
315: 0 0 0 70 PCI-MSI 442368-edge snd_hda_intel:card0
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 4419 4014 4590 4564 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 1 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 1562 1235 1647 796 Rescheduling interrupts
CAL: 1220 1340 1466 1477 Function call interrupts
TLB: 27 18 20 17 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 1 1 1 1 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 Posted-interrupt wakeup event
|
[ 81.698372] random: crng init done
Looks like there were 11,000 mmc interrupts 30 seconds into boot. When I measured 20,000, it was a few
minutes into boot, so that is why there is a disparity. Do also note that crng init completed 50 seconds
after the /proc/interrupts dump, so 11k+ interrupts clearly didn't do the trick. If you want, I can dump out
/proc/interrupts when the "random: crng init done" message is printed.
And here is the full dmesg: https://hastebin.com/isujicenev.vbs
Sultan
From 79576697e3ca631c88ea784d837672ef34a24e42 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf <[email protected]>
Date: Fri, 27 Apr 2018 15:46:18 -0700
Subject: [PATCH] Print out /proc/interrupts to kmsg ~30s after boot
---
fs/proc/Makefile | 1 +
fs/proc/interrupts_print.c | 42 ++++++++++++++++++++++++++++++++++++++++++
kernel/printk/printk.c | 2 +-
3 files changed, 44 insertions(+), 1 deletion(-)
create mode 100644 fs/proc/interrupts_print.c
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index ead487e80510..9bd462cec4ec 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -33,3 +33,4 @@ proc-$(CONFIG_PROC_KCORE) += kcore.o
proc-$(CONFIG_PROC_VMCORE) += vmcore.o
proc-$(CONFIG_PRINTK) += kmsg.o
proc-$(CONFIG_PROC_PAGE_MONITOR) += page.o
+obj-y += interrupts_print.o
diff --git a/fs/proc/interrupts_print.c b/fs/proc/interrupts_print.c
new file mode 100644
index 000000000000..4981dca3b828
--- /dev/null
+++ b/fs/proc/interrupts_print.c
@@ -0,0 +1,42 @@
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/workqueue.h>
+
+#define BUF_MAX_LEN (10000)
+
+static struct delayed_work intr_print_dwork;
+
+static void print_out_interrupts(struct work_struct *work)
+{
+ char *buf;
+ int fd, i;
+
+ buf = kzalloc(BUF_MAX_LEN, GFP_KERNEL);
+ if (!buf)
+ return;
+
+ fd = sys_open("/proc/interrupts", O_RDONLY, 0444);
+ if (fd < 0)
+ goto free_buf;
+
+ for (i = 0; i < BUF_MAX_LEN - 1; i++) {
+ if (sys_read(fd, buf + i, 1) != 1)
+ break;
+ }
+ sys_close(fd);
+
+ printk("/proc/interrupts dump: \n|%s|\n", buf);
+
+free_buf:
+ kfree(buf);
+}
+
+static int __init intr_print_init(void)
+{
+ INIT_DELAYED_WORK(&intr_print_dwork, print_out_interrupts);
+ schedule_delayed_work(&intr_print_dwork,
+ msecs_to_jiffies(30 * MSEC_PER_SEC));
+
+ return 0;
+}
+device_initcall(intr_print_init);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f274fbef821d..2d3151ce5f24 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -428,7 +428,7 @@ static u64 clear_seq;
static u32 clear_idx;
#define PREFIX_MAX 32
-#define LOG_LINE_MAX (1024 - PREFIX_MAX)
+#define LOG_LINE_MAX (10000)
#define LOG_LEVEL(v) ((v) & 0x07)
#define LOG_FACILITY(v) ((v) >> 3 & 0xff)
--
2.14.1
Hi!
> Am 25.04.2018 um 09:41 schrieb Theodore Y. Ts'o:
> >Does this help on your system?
>
> Thank you, after figuring out how to apply the paste, yes it helped on my
> Lenovo X60.
>
> >commit 4e00b339e264802851aff8e73cde7d24b57b18ce
> >Author: Theodore Ts'o <[email protected]>
> >Date: Wed Apr 25 01:12:32 2018 -0400
> >
> > random: rate limit unseeded randomness warnings
> > On systems without sufficient boot randomness, no point spamming dmesg.
>
> I guess this is a problem with old hardware?
Ok, I see it too, thinkpad x60.
But... this machine has spinning harddrive and independend RTC; there
really should be enough randomness...
Could we exploit either of them as randomness source when we run out
of entropy?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
On Thu 2018-04-26 19:56:30, Theodore Y. Ts'o wrote:
> On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> >
> > Also, regardless of what's hanging on CRNG init, CRNG should be able to init on its own in a timely
> > manner without the need for user-provided entropy. Userspace was working fine before the recent CRNG
> > kernel changes, so I don't think this is a userspace bug.
>
> The CRNG changes were needed because were erroneously saying that the
> entropy pool was securely initialized before it really was. Saying
> that CRNG should be able to init on its own is much like saying, "Ted
> should be able to fly wherever he wants in his own personal Gulfstream
> V." It would certainly be _nice_ if I could afford my personal jet.
> I certainly wish I were that rich. But the problem is that dollars
> (or Euro's) are like entropy, they don't just magically drop out of
> the sky.
>
> If there isn't user-provided entropy, and the hardware isn't providing
> sufficient entropy, where did you think the kernel is supposed to get
> the entropy from? Should it dial 1-800-TRUST-NSA?
Yes, we could dial 1-800-TRUST-NSA. Then nicely ask them to provide us
some unbackdoored randomness. Then we'd ignore whatever they say, but
would collect randomness from timing and noise on the telephone line.
> The other approach would be to compile the kernel with
> CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
> initalize chip->hwrng.quality = 500. We've historically made this
> something that the system administrator must set via sysfs. This is
> because we wanted system adminisrators to explicitly say that they
> trust the any hardware manufacturer that (a) they haven't been paid by
> your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
> and (b) they are competent to actually implemnt a _secure_ hardware
> random number generator. Sadly, this has not always been the case.
Well, we could actively start accessing suitable device (SD card ? HDD
? CMOS RTC?) when we detect entropy is low. Yes, that would eat power,
but that would be better than machine that hangs at boot.
We could also access the hwrng, then collect entropy from the
timing. TPM is slow chip...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness....
IOW moving them from /dev/random to /dev/urandom?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Apr 29, 2018 at 04:32:05PM +0200, Pavel Machek wrote:
> Hi!
>
> > This is why ultimately, we do need to attack this problem from both
> > ends, which means teaching userspace programs to only request
> > cryptographic-grade randomness when it is really needed --- and most
> > of the time, if the user has not logged in yet, you probably don't
> > need cryptographic-grade randomness....
>
> IOW moving them from /dev/random to /dev/urandom?
> Pavel
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
/dev/urandom isn't cryptographically secure, so that's not an option.
I'd also like to add that my high-spec x86 laptop exhibits the same issue as
my Edgar Chromebook.
Here's my dmesg: https://hastebin.com/dofejolobi.go
The most interesting line:
[ 90.811633] random: crng init done
I waited 90 seconds after boot to provide entropy myself, at which point crng
init completed. In other words, crng init only completed because I provided
the entropy by smashing the keyboard. I could've waited longer and crng init
wouldn't have completed without my input.
Mind you, this laptop has a 45W CPU, so power savings were definitely not
considered in its design. Do you have any machines that can provide enough
boot entropy to satisfy crng init without requiring user-provided entropy?
Sultan
On Sun 2018-04-29 10:05:41, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 04:32:05PM +0200, Pavel Machek wrote:
> > Hi!
> >
> > > This is why ultimately, we do need to attack this problem from both
> > > ends, which means teaching userspace programs to only request
> > > cryptographic-grade randomness when it is really needed --- and most
> > > of the time, if the user has not logged in yet, you probably don't
> > > need cryptographic-grade randomness....
> >
> > IOW moving them from /dev/random to /dev/urandom?
>
> /dev/urandom isn't cryptographically secure, so that's not an
> option.
Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Apr 29, 2018 at 11:30:57AM -0700, Sultan Alsawaf wrote:
>
> Mind you, this laptop has a 45W CPU, so power savings were definitely not
> considered in its design. Do you have any machines that can provide enough
> boot entropy to satisfy crng init without requiring user-provided entropy?
My 2018 Dell XPS 13 laptop, running "egrep '(random|EXT4)' /var/log/kern.log":
Apr 24 17:05:01 cwcc kernel: [ 0.000000] random: get_random_bytes called from start_kernel+0x83/0x500 with crng_init=0
Apr 24 17:05:01 cwcc kernel: [ 1.363383] random: fast init done
Apr 24 17:05:01 cwcc kernel: [ 3.567432] random: lvm: uninitialized urandom read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 3.593132] random: lvm: uninitialized urandom read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 7.584838] random: cryptsetup: uninitialized urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 7.600685] random: cryptsetup: uninitialized urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 7.803194] random: cryptsetup: uninitialized urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 7.831050] random: lvm: uninitialized urandom read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 7.851884] random: lvm: uninitialized urandom read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 7.875382] random: lvm: uninitialized urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [ 8.162552] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Apr 24 17:05:01 cwcc kernel: [ 8.646497] random: crng init done
- Ted
On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
Okay, but /dev/urandom isn't a solution to this problem because it isn't usable
until crng init is complete, so it suffers from the same init lag as
/dev/random.
Sultan
On Sun 2018-04-29 13:20:33, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> > Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>
> Okay, but /dev/urandom isn't a solution to this problem because it isn't usable
> until crng init is complete, so it suffers from the same init lag as
> /dev/random.
So -- I'm pretty sure systemd and friends should be using
/dev/urandom. Maybe gpg wants to use /dev/random. _Maybe_.
[ 2.948192] random: systemd: uninitialized urandom read (16 bytes
read)
[ 2.953526] systemd[1]: systemd 215 running in system mode. (+PAM
+AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
-SECCOMP -APPARMOR)
[ 2.980278] systemd[1]: Detected architecture 'x86'.
[ 3.115072] usb 5-2: New USB device found, idVendor=0483,
idProduct=2016, bcdDevice= 0.01
[ 3.119633] usb 5-2: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[ 3.124147] usb 5-2: Product: Biometric Coprocessor
[ 3.128621] usb 5-2: Manufacturer: STMicroelectronics
[ 3.163839] systemd[1]: Failed to insert module 'ipv6'
[ 3.181266] systemd[1]: Set hostname to <amd>.
[ 3.267243] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[ 3.669590] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[ 3.696242] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.700066] random: systemd: uninitialized urandom read (16 bytes
read)
[ 3.703716] random: systemd: uninitialized urandom read (16 bytes
read)
Anyway, urandom should need to be seeded once, and then provide random
data forever... which is not impression I get from the dmesg output
above. Boot clearly proceeds... somehow. So now I'm confused.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Apr 29, 2018 at 11:18:55PM +0200, Pavel Machek wrote:
> So -- I'm pretty sure systemd and friends should be using
> /dev/urandom. Maybe gpg wants to use /dev/random. _Maybe_.
>
> [ 2.948192] random: systemd: uninitialized urandom read (16 bytes
> read)
> [ 2.953526] systemd[1]: systemd 215 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
> -SECCOMP -APPARMOR)
> [ 2.980278] systemd[1]: Detected architecture 'x86'.
> [ 3.115072] usb 5-2: New USB device found, idVendor=0483,
> idProduct=2016, bcdDevice= 0.01
> [ 3.119633] usb 5-2: New USB device strings: Mfr=1, Product=2,
> SerialNumber=0
> [ 3.124147] usb 5-2: Product: Biometric Coprocessor
> [ 3.128621] usb 5-2: Manufacturer: STMicroelectronics
> [ 3.163839] systemd[1]: Failed to insert module 'ipv6'
> [ 3.181266] systemd[1]: Set hostname to <amd>.
> [ 3.267243] random: systemd-sysv-ge: uninitialized urandom read (16
> bytes read)
> [ 3.669590] random: systemd-sysv-ge: uninitialized urandom read (16
> bytes read)
> [ 3.696242] random: systemd: uninitialized urandom read (16 bytes
> read)
> [ 3.700066] random: systemd: uninitialized urandom read (16 bytes
> read)
> [ 3.703716] random: systemd: uninitialized urandom read (16 bytes
> read)
>
> Anyway, urandom should need to be seeded once, and then provide random
> data forever... which is not impression I get from the dmesg output
> above. Boot clearly proceeds... somehow. So now I'm confused.
Hmm... Well, the attached patch (which redirects /dev/random to /dev/urandom)
didn't fix my boot issue, so I'm at a loss as well.
Sultan
From 15f54e2756866956d8713fdec92b54c6c69eb1bb Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf <[email protected]>
Date: Sun, 29 Apr 2018 12:53:44 -0700
Subject: [PATCH] char: mem: Link /dev/random to /dev/urandom
---
drivers/char/mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d3434c..0cd22e6100ad 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -870,7 +870,7 @@ static const struct memdev {
#endif
[5] = { "zero", 0666, &zero_fops, 0 },
[7] = { "full", 0666, &full_fops, 0 },
- [8] = { "random", 0666, &random_fops, 0 },
+ [8] = { "random", 0666, &urandom_fops, 0 },
[9] = { "urandom", 0666, &urandom_fops, 0 },
#ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, &kmsg_fops, 0 },
--
2.14.1
On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> > Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>
> Okay, but /dev/urandom isn't a solution to this problem because it isn't usable
> until crng init is complete, so it suffers from the same init lag as
> /dev/random.
It's more accurate to say that using /dev/urandom is no worse than
before (from a few years ago). There are, alas, plenty of
distributions and user space application programmers that basically
got lazy using /dev/urandom, and assumed that there would be plenty of
entropy during early system startup.
When they switched over the getrandom(2), the most egregious examples
of this caused pain (and they got fixed), but due to a bug in
drivers/char/random.c, if getrandom(2) was called after the entropy
pool was "half initialized", it would not block, but proceed.
Is that exploitable? Well, Jann and I didn't find an _obvious_ way to
exploit the short coming, which is this wasn't treated like an
emergency situation ala the embarassing situation we had five years
ago[1].
[1] https://factorable.net/paper.html
However, it was enough to make us be uncomfortable, which is why I
pushed the changes that I did. At least on the devices we had at
hand, using the distributions that we typically use, the impact seemed
minimal. Unfortuantely, there is no way to know for sure without
rolling out change and seeing who screams. In the ideal world,
software would not require cryptographic randomness immediately after
boot, before the user logs in. And ***really***, as in [1], softwaret
should not be generating long-term public keys that are essential to
the security of the box a few seconds immediately after the device is
first unboxed and plugged in.i
What would be useful is if people gave reports that listed exactly
what laptop and distributions they are using. Just "a high spec x86
laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
running Debian testing is working just fine. The year, model, make,
and CPU type plus what distribution (and distro version number) you
are running is useful, so I can assess how wide spread the unhappiness
is going to be, and what mitigation steps make sense.
What mitigations steps can be taken?
If you believe in security-through-complexity (the cache architecture
of x86 is *sooooo* complicated no one can understand it, so
Jitterentropy / Haveged *must* be secure), or security-through-secrecy
(the cache architecture of x86 is only avilable to internal architects
inside Intel, so Jitterentropy / Haveged *must* be secure, never mind
that the Intel CPU architects who were asked about it were "nervous"),
then wiring up CONFIG_JITTERENTROPY or using haveged might be one
approach.
If you believe that Intel hasn't backdoored RDRAND, then installing
rng-tools and running rngd with --enable-drng will enable RDRAND.
That seems to be popular with various defense contractors, perhaps on
the assumption that if it _was_ backdoored (no one knows for sure), it
was probably with the connivance or request of the US government, who
doesn't need to worry about spying on itself.
Or you can use some kind of open hardware design RNG, such as
ChoasKey[2] from Altus Metrum. But that requires using specially
ordered hardware plugged into a USB slot, and it's probably not a mass
solution.
[2] https://altusmetrum.org/ChaosKey/
Personally, I prefer fixing the software to simply not require
cryptographic grade entropy before the user has logged in. Because
it's better than the alternatives.
- Ted
On Sun, Apr 29, 2018 at 06:05:19PM -0400, Theodore Y. Ts'o wrote:
> It's more accurate to say that using /dev/urandom is no worse than
> before (from a few years ago). There are, alas, plenty of
> distributions and user space application programmers that basically
> got lazy using /dev/urandom, and assumed that there would be plenty of
> entropy during early system startup.
>
> When they switched over the getrandom(2), the most egregious examples
> of this caused pain (and they got fixed), but due to a bug in
> drivers/char/random.c, if getrandom(2) was called after the entropy
> pool was "half initialized", it would not block, but proceed.
>
> Is that exploitable? Well, Jann and I didn't find an _obvious_ way to
> exploit the short coming, which is this wasn't treated like an
> emergency situation ala the embarassing situation we had five years
> ago[1].
>
> [1] https://factorable.net/paper.html
>
> However, it was enough to make us be uncomfortable, which is why I
> pushed the changes that I did. At least on the devices we had at
> hand, using the distributions that we typically use, the impact seemed
> minimal. Unfortuantely, there is no way to know for sure without
> rolling out change and seeing who screams. In the ideal world,
> software would not require cryptographic randomness immediately after
> boot, before the user logs in. And ***really***, as in [1], softwaret
> should not be generating long-term public keys that are essential to
> the security of the box a few seconds immediately after the device is
> first unboxed and plugged in.i
>
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using. Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine. The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.
>
>
> What mitigations steps can be taken?
>
> If you believe in security-through-complexity (the cache architecture
> of x86 is *sooooo* complicated no one can understand it, so
> Jitterentropy / Haveged *must* be secure), or security-through-secrecy
> (the cache architecture of x86 is only avilable to internal architects
> inside Intel, so Jitterentropy / Haveged *must* be secure, never mind
> that the Intel CPU architects who were asked about it were "nervous"),
> then wiring up CONFIG_JITTERENTROPY or using haveged might be one
> approach.
>
> If you believe that Intel hasn't backdoored RDRAND, then installing
> rng-tools and running rngd with --enable-drng will enable RDRAND.
> That seems to be popular with various defense contractors, perhaps on
> the assumption that if it _was_ backdoored (no one knows for sure), it
> was probably with the connivance or request of the US government, who
> doesn't need to worry about spying on itself.
>
> Or you can use some kind of open hardware design RNG, such as
> ChoasKey[2] from Altus Metrum. But that requires using specially
> ordered hardware plugged into a USB slot, and it's probably not a mass
> solution.
>
> [2] https://altusmetrum.org/ChaosKey/
>
>
> Personally, I prefer fixing the software to simply not require
> cryptographic grade entropy before the user has logged in. Because
> it's better than the alternatives.
>
> - Ted
>
The attached patch fixes my crng init woes. With it, crng init completes 0.86
seconds into boot, but I can't help but feel like a solution this obvious would
just expose my Richard Stallman photo collection to prying eyes at the NSA.
Thoughts on the patch?
Sultan
From 597b0f2b3c986f853bf1d30a7fb9d76869e47fe8 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf <[email protected]>
Date: Sun, 29 Apr 2018 15:22:59 -0700
Subject: [PATCH] random: remove ratelimiting from add_interrupt_randomness()
---
drivers/char/random.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38729baed6ee..5b38277b104a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -574,7 +574,6 @@ static void mix_pool_bytes(struct entropy_store *r, const void *in,
struct fast_pool {
__u32 pool[4];
- unsigned long last;
unsigned short reg_idx;
unsigned char count;
};
@@ -1195,20 +1194,14 @@ void add_interrupt_randomness(int irq, int irq_flags)
crng_fast_load((char *) fast_pool->pool,
sizeof(fast_pool->pool))) {
fast_pool->count = 0;
- fast_pool->last = now;
}
return;
}
- if ((fast_pool->count < 64) &&
- !time_after(now, fast_pool->last + HZ))
- return;
-
r = &input_pool;
if (!spin_trylock(&r->lock))
return;
- fast_pool->last = now;
__mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
/*
--
2.14.1
Hi!
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using. Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine. The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.
Thinkpad X60,
model name : Genuine Intel(R) CPU T2400 @ 1.83GHz
pavel@amd:~$ cat /etc/debian_version
8.10
I already posted some dmesg snippets, but system boots. On _this_
boot, it was ok, and I do not see anything:
pavel@amd:/data/l/linux-next-32$ dmesg | grep urandom
pavel@amd:/data/l/linux-next-32$
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> - if ((fast_pool->count < 64) &&
> - !time_after(now, fast_pool->last + HZ))
> - return;
> -
I suspect you still want the rate-limiting in place. But if you _do_
want to cheat like this, you could instead just modify the condition
to only relax the rate limiting when !crng_init().
On Mon, Apr 30, 2018 at 12:43:48AM +0200, Jason A. Donenfeld wrote:
> > - if ((fast_pool->count < 64) &&
> > - !time_after(now, fast_pool->last + HZ))
> > - return;
> > -
>
> I suspect you still want the rate-limiting in place. But if you _do_
> want to cheat like this, you could instead just modify the condition
> to only relax the rate limiting when !crng_init().
Good idea. Attached a new patch that's less intrusive. It still fixes my issue,
of course.
Sultan
From 6870b0383b88438d842599aa8608a260e6fb0ed2 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf <[email protected]>
Date: Sun, 29 Apr 2018 15:44:27 -0700
Subject: [PATCH] random: don't ratelimit add_interrupt_randomness() until crng
is ready
---
drivers/char/random.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38729baed6ee..8c00c008e797 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1201,7 +1201,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
}
if ((fast_pool->count < 64) &&
- !time_after(now, fast_pool->last + HZ))
+ !time_after(now, fast_pool->last + HZ) && crng_ready())
return;
r = &input_pool;
--
2.14.1
On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
> Can you tell me a bit about your system? What distribution, what
> hardware is present in your sytsem (what architecture, what
> peripherals are attached, etc.)?
>
> There's a reason why we made this --- we were declaring the random
> number pool to be fully intialized before it really was, and that was
> a potential security concern. It's not as bad as the weakness
> discovered by Nadia Heninger in 2012. (See https://factorable.net for
> more details.) However, this is not one of those things where we like
> to fool around.
>
> So I want to understand if this is an issue with a particular hardware
> configuration, or whether it's just a badly designed Linux init system
> or embedded setup, or something else. After all, you wouldn't want
> the NSA spying on all of your network traffic, would you? :-)
Why do we continue to print this stuff out when crng_init=1 though ?
(This from debian stable, on a pretty basic atom box, but similar
dmesg's on everything else I've put 4.17-rc on so far)
[ 0.000000] random: get_random_bytes called from start_kernel+0x96/0x519 with crng_init=0
[ 0.000000] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0
[ 0.000000] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0
[ 0.151401] calling initialize_ptr_random+0x0/0x36 @ 1
[ 0.151527] initcall initialize_ptr_random+0x0/0x36 returned 0 after 0 usecs
[ 0.294661] calling prandom_init+0x0/0xbd @ 1
[ 0.294763] initcall prandom_init+0x0/0xbd returned 0 after 0 usecs
[ 1.430529] _warn_unseeded_randomness: 165 callbacks suppressed
[ 1.430540] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0
[ 1.430860] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0
[ 1.452240] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[ 2.954901] _warn_unseeded_randomness: 54 callbacks suppressed
[ 2.954910] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0
[ 2.955185] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0
[ 2.957701] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0
[ 6.017364] _warn_unseeded_randomness: 88 callbacks suppressed
[ 6.017373] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0
[ 6.042652] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0
[ 6.060333] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0
[ 6.951978] calling prandom_reseed+0x0/0x2a @ 1
[ 6.960627] initcall prandom_reseed+0x0/0x2a returned 0 after 105 usecs
[ 7.371745] _warn_unseeded_randomness: 37 callbacks suppressed
[ 7.371759] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[ 7.395926] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=0
[ 7.411549] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=0
[ 7.553379] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 7.563210] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 7.571498] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 8.449679] _warn_unseeded_randomness: 154 callbacks suppressed
[ 8.449691] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[ 8.483097] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[ 8.497999] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=0
[ 9.353904] random: fast init done
[ 9.770384] _warn_unseeded_randomness: 187 callbacks suppressed
[ 9.770398] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 with crng_init=1
[ 9.791514] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 9.834909] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 10.802200] _warn_unseeded_randomness: 168 callbacks suppressed
[ 10.802214] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 10.802276] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 10.802289] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 11.821109] _warn_unseeded_randomness: 160 callbacks suppressed
[ 11.821122] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 11.863770] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 with crng_init=1
[ 11.869384] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 12.843237] _warn_unseeded_randomness: 260 callbacks suppressed
[ 12.843251] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 12.875369] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 12.898152] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 with crng_init=1
[ 13.858256] _warn_unseeded_randomness: 245 callbacks suppressed
[ 13.858271] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 13.866366] random: get_random_u32 called from arch_setup_additional_pages+0x79/0xb0 with crng_init=1
[ 13.895379] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 14.896395] _warn_unseeded_randomness: 301 callbacks suppressed
[ 14.896409] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 14.921096] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 14.935596] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 15.924405] _warn_unseeded_randomness: 152 callbacks suppressed
[ 15.924419] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 15.942457] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 15.953995] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 19.295109] _warn_unseeded_randomness: 25 callbacks suppressed
[ 19.295142] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 20.319905] random: get_random_bytes called from flow_hash_from_keys+0x14c/0x2b0 with crng_init=1
[ 21.323229] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 21.351464] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 21.366761] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 22.367243] _warn_unseeded_randomness: 420 callbacks suppressed
[ 22.367282] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 22.367306] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 22.367329] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 23.378128] _warn_unseeded_randomness: 283 callbacks suppressed
[ 23.378141] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 23.378164] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 23.378176] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 24.381404] _warn_unseeded_randomness: 246 callbacks suppressed
[ 24.381417] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 24.396831] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 24.418850] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 25.391285] _warn_unseeded_randomness: 320 callbacks suppressed
[ 25.391298] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 25.417982] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 25.434112] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 26.463997] _warn_unseeded_randomness: 182 callbacks suppressed
[ 26.464009] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 26.700479] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 26.728446] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 28.393318] _warn_unseeded_randomness: 86 callbacks suppressed
[ 28.393331] random: get_random_bytes called from inet6_ehashfn+0x14c/0x1c0 with crng_init=1
[ 28.414841] random: get_random_bytes called from inet6_ehashfn+0x191/0x1c0 with crng_init=1
[ 28.430781] random: get_random_bytes called from inet_ehashfn+0xe3/0x110 with crng_init=1
[ 33.345320] _warn_unseeded_randomness: 82 callbacks suppressed
[ 33.345334] random: get_random_bytes called from secure_tcp_ts_off+0x83/0xb0 with crng_init=1
[ 33.346074] random: get_random_bytes called from secure_tcp_seq+0x9c/0xc0 with crng_init=1
[ 33.349477] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 34.352703] _warn_unseeded_randomness: 78 callbacks suppressed
[ 34.352716] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 34.353348] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 34.353716] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 36.444658] _warn_unseeded_randomness: 32 callbacks suppressed
[ 36.444670] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 36.453636] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 36.454025] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 37.939280] _warn_unseeded_randomness: 53 callbacks suppressed
[ 37.939292] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 42.179988] random: get_random_u32 called from neigh_hash_alloc+0x7b/0xc0 with crng_init=1
[ 44.202043] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 46.035713] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 46.067589] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 46.085148] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 47.198815] _warn_unseeded_randomness: 7 callbacks suppressed
[ 47.207534] random: get_random_bytes called from __prandom_timer+0x24/0x90 with crng_init=1
[ 53.127055] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 53.145929] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 53.165246] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 54.177186] _warn_unseeded_randomness: 75 callbacks suppressed
[ 54.177198] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 54.245759] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[ 54.276658] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 55.339125] _warn_unseeded_randomness: 113 callbacks suppressed
[ 55.339137] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 55.365379] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 55.383400] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 61.772814] _warn_unseeded_randomness: 6 callbacks suppressed
[ 61.772827] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 61.798504] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 61.816345] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 77.460681] _warn_unseeded_randomness: 6 callbacks suppressed
[ 77.460694] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[ 77.487010] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1
[ 77.504121] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1
[ 80.717699] _warn_unseeded_randomness: 5 callbacks suppressed
[ 80.717714] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 99.514633] random: get_random_u32 called from neigh_hash_alloc+0x7b/0xc0 with crng_init=1
[ 125.914405] random: get_random_bytes called from __prandom_timer+0x24/0x90 with crng_init=1
[ 137.252356] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1
[ 165.806247] random: crng init done
[ 165.815049] random: 7 urandom warning(s) missed due to ratelimiting
On Sun, Apr 29, 2018 at 07:02:02PM -0400, Dave Jones wrote:
> On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
>
> > Can you tell me a bit about your system? What distribution, what
> > hardware is present in your sytsem (what architecture, what
> > peripherals are attached, etc.)?
> >
> > There's a reason why we made this --- we were declaring the random
> > number pool to be fully intialized before it really was, and that was
> > a potential security concern. It's not as bad as the weakness
> > discovered by Nadia Heninger in 2012. (See https://factorable.net for
> > more details.) However, this is not one of those things where we like
> > to fool around.
> >
> > So I want to understand if this is an issue with a particular hardware
> > configuration, or whether it's just a badly designed Linux init system
> > or embedded setup, or something else. After all, you wouldn't want
> > the NSA spying on all of your network traffic, would you? :-)
>
> Why do we continue to print this stuff out when crng_init=1 though ?
answering my own question, I think.. This is a tristate, and we need it
to be >1 to be quiet, which doesn't happen until..
> [ 165.806247] random: crng init done
this point.
Dave
On Sun, Apr 29, 2018 at 03:49:28PM -0700, Sultan Alsawaf wrote:
> On Mon, Apr 30, 2018 at 12:43:48AM +0200, Jason A. Donenfeld wrote:
> > > - if ((fast_pool->count < 64) &&
> > > - !time_after(now, fast_pool->last + HZ))
> > > - return;
> > > -
> >
> > I suspect you still want the rate-limiting in place. But if you _do_
> > want to cheat like this, you could instead just modify the condition
> > to only relax the rate limiting when !crng_init().
>
> Good idea. Attached a new patch that's less intrusive. It still fixes my issue,
> of course.
What your patch does is assume that there is a full bit of uncertainty
that can be obtained from the information gathered from each
interrupt. I *might* be willing to assume that to be valid on x86
systems that have a high resolution cycle counter. But on ARM
platforms, especially during system bootup when the user isn't typing
anything and SSD's and flash storage tend to have very predictable
timing patterns? Not a bet I'd be willing to take. Even with a cycle
counter, there's a reason why we assumed that we need to mix in timing
results from 64 interrupts or one second's worth before we would give
a single bit's worth of entropy credit.
- Ted
On Sun, Apr 29, 2018 at 07:07:29PM -0400, Dave Jones wrote:
> > Why do we continue to print this stuff out when crng_init=1 though ?
>
> answering my own question, I think.. This is a tristate, and we need it
> to be >1 to be quiet, which doesn't happen until..
>
> > [ 165.806247] random: crng init done
>
> this point.
Right. What happens is that we divert the first 64 bits of entropy
credits directly into the crng state, without initializing the
input_pool. So when we hit crng_init=1, the crng has only 64 bits of
entropy (conservatively speaking); furthermore, since we aren't doing
catastrophic reseeding, if something is continuously reading from
/dev/urandom or get_random_bytes() during that time, then the attacker
could be able to detremine which one of the 32 states the entropy pool
was when the entropy count was 5, and then 5 bits later, poll the
output of the pool again, and guess which of the 32 states the pool
was in, etc., and effectively keep up with the entropy as it trickles
in.
This is the reasoning behind catastrophic reseeding; we wait until we
have 128 bits of entropy in the input pool, and then we reseed the
pool all at once.
Why do we have the crng_init=1 state? Because it provides some basic
protection for super-early users of the entropy pool. It's
essentially a bandaid, and we could improve the time to get to fully
initialize by about 33% if we left the pool totally unititalized and
only focused on filling the input pool. But given that on many
distributions, ssh still insists on initializing long-term public keys
at first boot from /dev/urandom, instead of *waiting* until the first
time someone attempts to ssh into box, or waiting until getrandom(2)
doesn't block --- without hanging the boot --- we have the crng_init=1
hack essentially as a palliative.
I view this as working around broken user space. But userspace has
been broken for a long time, and users tend to blame the kernel, not
userspace....
- Ted
On 04/29/2018 03:05 PM, Theodore Y. Ts'o wrote:
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using. Just "a high spec x86
> laptop" isn't terribly useful, because*my* brand-new Dell XPS 13
> running Debian testing is working just fine. The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.
I'm pretty sure Fedora is hitting this in our VMs. I just spent some
time debugging an issue of a boot delay with someone from the
infrastructure team where it would take upwards of 2 minutes to boot.
If someone holds down a key, it boots in 4 seconds. There's a qemu
reproducer at https://bugzilla.redhat.com/show_bug.cgi?id=1572916#c3
I suggested a cat on the keyboard as a workaround.
Independently, we also got a report of a boot hang in GCE with 4.16.4
where as 4.16.3 works which corresponds to the previous report of a
stable regression. This was just via IRC so I didn't have time to
dig into this.
Thanks,
Laura
On Sun, Apr 29, 2018 at 08:11:07PM -0400, Theodore Y. Ts'o wrote:
>
> What your patch does is assume that there is a full bit of uncertainty
> that can be obtained from the information gathered from each
> interrupt. I *might* be willing to assume that to be valid on x86
> systems that have a high resolution cycle counter. But on ARM
> platforms, especially during system bootup when the user isn't typing
> anything and SSD's and flash storage tend to have very predictable
> timing patterns? Not a bet I'd be willing to take. Even with a cycle
> counter, there's a reason why we assumed that we need to mix in timing
> results from 64 interrupts or one second's worth before we would give
> a single bit's worth of entropy credit.
>
> - Ted
What about abusing high-resolution timers to get entropy? Since hrtimers can't
make guarantees down to the nanosecond, there's always a skew between the
requested expiry time and the actual expiry time.
Please see the attached patch and let me know just how horrible it is.
Sultan
From b0d21c38558c661531d4cb46816fbb36b874a169 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf <[email protected]>
Date: Sun, 29 Apr 2018 21:28:08 -0700
Subject: [PATCH] random: use high-res timers to generate entropy until crng
init is done
---
drivers/char/random.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d9e38523b383..af2d60bbcec3 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -286,6 +286,7 @@
#define OUTPUT_POOL_WORDS (1 << (OUTPUT_POOL_SHIFT-5))
#define SEC_XFER_SIZE 512
#define EXTRACT_SIZE 10
+#define ENTROPY_GEN_INTVL_NS (1 * NSEC_PER_MSEC)
#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
@@ -408,6 +409,8 @@ static struct fasync_struct *fasync;
static DEFINE_SPINLOCK(random_ready_list_lock);
static LIST_HEAD(random_ready_list);
+static struct hrtimer entropy_gen_hrtimer;
+
struct crng_state {
__u32 state[16];
unsigned long init_time;
@@ -2287,3 +2290,47 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
credit_entropy_bits(poolp, entropy);
}
EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
+
+/*
+ * Generate entropy on init using high-res timers. Although high-res timers
+ * provide nanosecond precision, they don't actually honor requests to the
+ * nanosecond. The skew between the expected time difference in nanoseconds and
+ * the actual time difference can be used as a way to generate entropy on boot
+ * for machines that lack sufficient boot-time entropy.
+ */
+static enum hrtimer_restart entropy_timer_cb(struct hrtimer *timer)
+{
+ static u64 prev_ns;
+ u64 curr_ns, delta;
+
+ if (crng_ready())
+ return HRTIMER_NORESTART;
+
+ curr_ns = ktime_get_mono_fast_ns();
+ delta = curr_ns - prev_ns;
+
+ add_interrupt_randomness(delta);
+
+ /* Use the hrtimer skew to make the next interval more unpredictable */
+ if (likely(prev_ns))
+ hrtimer_add_expires_ns(timer, delta);
+ else
+ hrtimer_add_expires_ns(timer, ENTROPY_GEN_INTVL_NS);
+
+ prev_ns = curr_ns;
+ return HRTIMER_RESTART;
+}
+
+static int entropy_gen_hrtimer_init(void)
+{
+ if (!IS_ENABLED(CONFIG_HIGH_RES_TIMERS))
+ return 0;
+
+ hrtimer_init(&entropy_gen_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+ entropy_gen_hrtimer.function = entropy_timer_cb;
+ hrtimer_start(&entropy_gen_hrtimer, ns_to_ktime(ENTROPY_GEN_INTVL_NS),
+ HRTIMER_MODE_REL);
+ return 0;
+}
+core_initcall(entropy_gen_hrtimer_init);
--
2.14.1
On Sun, Apr 29, 2018 at 09:34:45PM -0700, Sultan Alsawaf wrote:
>
> What about abusing high-resolution timers to get entropy? Since hrtimers can't
> make guarantees down to the nanosecond, there's always a skew between the
> requested expiry time and the actual expiry time.
>
> Please see the attached patch and let me know just how horrible it is.
So think about exactly where the possible causes of the skew might be
coming from. Look very closely at the software implemntation. The
important thing here is to not get hung up on the software
abstraction, but to look at the *implementation*. (And if it's an
implementation in architecture specific code, we need to look at all
architectures.)
This applies on the hardware level as hard, but that gets harder
because there many possible hardware implemntations in use out there.
Remember that that on many systems there may be only single clock
crystal, and all other hardware timers maybe derived from that clock
using frequency dividers. (At least for everything on the mainboard.)
- Ted
On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
>>> Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>
>> Okay, but /dev/urandom isn't a solution to this problem because it isn't usable
>> until crng init is complete, so it suffers from the same init lag as
>> /dev/random.
>
> It's more accurate to say that using /dev/urandom is no worse than
> before (from a few years ago). There are, alas, plenty of
> distributions and user space application programmers that basically
> got lazy using /dev/urandom, and assumed that there would be plenty of
> entropy during early system startup.
>
> When they switched over the getrandom(2), the most egregious examples
> of this caused pain (and they got fixed), but due to a bug in
> drivers/char/random.c, if getrandom(2) was called after the entropy
> pool was "half initialized", it would not block, but proceed.
>
> Is that exploitable? Well, Jann and I didn't find an _obvious_ way to
> exploit the short coming, which is this wasn't treated like an
> emergency situation ala the embarassing situation we had five years
> ago[1].
>
> [1] https://factorable.net/paper.html
>
> However, it was enough to make us be uncomfortable, which is why I
> pushed the changes that I did. At least on the devices we had at
> hand, using the distributions that we typically use, the impact seemed
> minimal. Unfortuantely, there is no way to know for sure without
> rolling out change and seeing who screams. In the ideal world,
> software would not require cryptographic randomness immediately after
> boot, before the user logs in. And ***really***, as in [1], softwaret
> should not be generating long-term public keys that are essential to
> the security of the box a few seconds immediately after the device is
> first unboxed and plugged in.i
>
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using. Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine. The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.
Fedora has started seeing some bug reports on this for Fedora 27[0] and
I've asked reporters to include their hardware details.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944
Regards,
Jeremy
On Mon, Apr 30, 2018 at 4:12 PM, Jeremy Cline <[email protected]> wrote:
> On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
>> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
>>>> Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>>
>>> Okay, but /dev/urandom isn't a solution to this problem because it isn't usable
>>> until crng init is complete, so it suffers from the same init lag as
>>> /dev/random.
>>
>> It's more accurate to say that using /dev/urandom is no worse than
>> before (from a few years ago). There are, alas, plenty of
>> distributions and user space application programmers that basically
>> got lazy using /dev/urandom, and assumed that there would be plenty of
>> entropy during early system startup.
>>
>> When they switched over the getrandom(2), the most egregious examples
>> of this caused pain (and they got fixed), but due to a bug in
>> drivers/char/random.c, if getrandom(2) was called after the entropy
>> pool was "half initialized", it would not block, but proceed.
>>
>> Is that exploitable? Well, Jann and I didn't find an _obvious_ way to
>> exploit the short coming, which is this wasn't treated like an
>> emergency situation ala the embarassing situation we had five years
>> ago[1].
>>
>> [1] https://factorable.net/paper.html
>>
>> However, it was enough to make us be uncomfortable, which is why I
>> pushed the changes that I did. At least on the devices we had at
>> hand, using the distributions that we typically use, the impact seemed
>> minimal. Unfortuantely, there is no way to know for sure without
>> rolling out change and seeing who screams. In the ideal world,
>> software would not require cryptographic randomness immediately after
>> boot, before the user logs in. And ***really***, as in [1], softwaret
>> should not be generating long-term public keys that are essential to
>> the security of the box a few seconds immediately after the device is
>> first unboxed and plugged in.i
>>
>> What would be useful is if people gave reports that listed exactly
>> what laptop and distributions they are using. Just "a high spec x86
>> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
>> running Debian testing is working just fine. The year, model, make,
>> and CPU type plus what distribution (and distro version number) you
>> are running is useful, so I can assess how wide spread the unhappiness
>> is going to be, and what mitigation steps make sense.
>
> Fedora has started seeing some bug reports on this for Fedora 27[0] and
> I've asked reporters to include their hardware details.
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944
>
We have also had reports that Fedora users are seeing this on Google
Compute Engine.
Justin
On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
>
> We have also had reports that Fedora users are seeing this on Google
> Compute Engine.
Can you reproduce this yourself? If so, could you confirm that
removing the dracut-fips package makes the problem go away for you?
Thanks,
- Ted
On Mon 2018-04-30 12:11:43, Theodore Y. Ts'o wrote:
> On Sun, Apr 29, 2018 at 09:34:45PM -0700, Sultan Alsawaf wrote:
> >
> > What about abusing high-resolution timers to get entropy? Since hrtimers can't
> > make guarantees down to the nanosecond, there's always a skew between the
> > requested expiry time and the actual expiry time.
> >
> > Please see the attached patch and let me know just how horrible it is.
>
> So think about exactly where the possible causes of the skew might be
> coming from. Look very closely at the software implemntation. The
> important thing here is to not get hung up on the software
> abstraction, but to look at the *implementation*. (And if it's an
> implementation in architecture specific code, we need to look at all
> architectures.)
>
> This applies on the hardware level as hard, but that gets harder
> because there many possible hardware implemntations in use out there.
> Remember that that on many systems there may be only single clock
> crystal, and all other hardware timers maybe derived from that clock
> using frequency dividers. (At least for everything on the mainboard.)
On "many" systems? No, sorry, computers usually do not behave like
this (CMOS RTC has separate clock, for example). I'm pretty sure that
not a single machine problems were reported on has this problem.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue, May 1, 2018 at 7:55 AM, Theodore Y. Ts'o <[email protected]> wrote:
> On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
>>
>> We have also had reports that Fedora users are seeing this on Google
>> Compute Engine.
>
> Can you reproduce this yourself? If so, could you confirm that
> removing the dracut-fips package makes the problem go away for you?
>
I have not reproduced in GCE myself. We did get some confirmation
that removing dracut-fips does make the problem less dire (but I
wouldn't call a 4 minute boot a win, but booting in 4 minutes is
better than not booting at all). Specifically systemd calls libgcrypt
before it even opens the log with fips there, and this is before
virtio-rng modules could even load. Right now though, we are looking
at pretty much any possible options as the majority of people are
calling for me to backout the patches completely from rawhide.
On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
>
> I have not reproduced in GCE myself. We did get some confirmation
> that removing dracut-fips does make the problem less dire (but I
> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
> better than not booting at all). Specifically systemd calls libgcrypt
> before it even opens the log with fips there, and this is before
> virtio-rng modules could even load. Right now though, we are looking
> at pretty much any possible options as the majority of people are
> calling for me to backout the patches completely from rawhide.
FWIW, Debian Testing is using systemd 238, and from what I can tell
it's calling libgcrypt and it has the same (as near as I can tell)
totally pointless hmac nonsense, and it's not a problem that I can
see. Of course, Debian and Fedora may have a different set of
patches....
- Ted
On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
>
> I have not reproduced in GCE myself. We did get some confirmation
> that removing dracut-fips does make the problem less dire (but I
> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
> better than not booting at all). Specifically systemd calls libgcrypt
> before it even opens the log with fips there, and this is before
> virtio-rng modules could even load. Right now though, we are looking
> at pretty much any possible options as the majority of people are
> calling for me to backout the patches completely from rawhide.
I've attached what I think is a reasonable stopgap solution until this is
actually fixed. If you're willing to revert the CVE-2018-1108 patches
completely, then I don't think you'll mind using this patch in the meantime.
Sultan
From 5be2efdde744d3c55db3df81c0493fc67dc35620 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf <[email protected]>
Date: Tue, 1 May 2018 17:36:17 -0700
Subject: [PATCH] random: use urandom instead of random for now and speed up
crng init
With the fixes for CVE-2018-1108, /dev/random now requires user-provided
entropy on quite a few machines lacking high levels of boot entropy
in order to complete its initialization. This causes issues on environments
where userspace depends on /dev/random in order to finish booting
completely (i.e., userspace will remain stuck, unable to boot, waiting for
entropy more-or-less indefinitely until the user provides it via something
like keystrokes or mouse movements).
As a temporary workaround, redirect /dev/random to /dev/urandom instead,
and speed up the initialization process by slightly relaxing the
threshold for interrupts to go towards adding one bit of entropy credit
(only until initialization is complete).
Signed-off-by: Sultan Alsawaf <[email protected]>
---
drivers/char/mem.c | 3 ++-
drivers/char/random.c | 9 ++++++---
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d3434c..cc9507f01c79 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -870,7 +870,8 @@ static const struct memdev {
#endif
[5] = { "zero", 0666, &zero_fops, 0 },
[7] = { "full", 0666, &full_fops, 0 },
- [8] = { "random", 0666, &random_fops, 0 },
+ /* Redirect /dev/random to /dev/urandom until /dev/random is fixed */
+ [8] = { "random", 0666, &urandom_fops, 0 },
[9] = { "urandom", 0666, &urandom_fops, 0 },
#ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, &kmsg_fops, 0 },
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d9e38523b383..bce3b43cdd3b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1200,9 +1200,12 @@ void add_interrupt_randomness(int irq)
return;
}
- if ((fast_pool->count < 64) &&
- !time_after(now, fast_pool->last + HZ))
- return;
+ if (fast_pool->count < 64) {
+ unsigned long timeout = crng_ready() ? HZ : HZ / 4;
+
+ if (!time_after(now, fast_pool->last + timeout))
+ return;
+ }
r = &input_pool;
if (!spin_trylock(&r->lock))
--
2.14.1
On Tue, May 01, 2018 at 05:43:17PM -0700, Sultan Alsawaf wrote:
>
> I've attached what I think is a reasonable stopgap solution until this is
> actually fixed. If you're willing to revert the CVE-2018-1108 patches
> completely, then I don't think you'll mind using this patch in the meantime.
I would put it slightly differently; reverting the CVE-2018-1108
patches is less dangerous than what you are proposing in your attached
patch.
Again, I think the right answer is to fix userspace to not require
cryptographic grade entropy during early system startup, and for
people to *think* about what they are doing. I've looked at the
systemd's use of hmac in journal-authenticate, and as near as I can
tell, there isn't any kind of explanation about why it was necessary,
or what threat it was trying to protect against.
- Ted
On Tue, May 01, 2018 at 08:56:04PM -0400, Theodore Y. Ts'o wrote:
> On Tue, May 01, 2018 at 05:43:17PM -0700, Sultan Alsawaf wrote:
> >
> > I've attached what I think is a reasonable stopgap solution until this is
> > actually fixed. If you're willing to revert the CVE-2018-1108 patches
> > completely, then I don't think you'll mind using this patch in the meantime.
>
> I would put it slightly differently; reverting the CVE-2018-1108
> patches is less dangerous than what you are proposing in your attached
> patch.
>
> Again, I think the right answer is to fix userspace to not require
> cryptographic grade entropy during early system startup, and for
> people to *think* about what they are doing. I've looked at the
> systemd's use of hmac in journal-authenticate, and as near as I can
> tell, there isn't any kind of explanation about why it was necessary,
> or what threat it was trying to protect against.
>
> - Ted
Why is /dev/urandom so much more dangerous than /dev/random? The
more I search, the more I see that many sources consider /dev/urandom
to be cryptographically secure... and since I hold down a single key on
the keyboard to make my computer boot without any kernel workarounds,
I'm sure the NSA would eventually notice my predictable behavior and get
their hands on my Richard Stallman photos.
Fixing all the "broken" userspace instances of entropy usage during early
system startup is a tall order. What about barebone machines used as
remote servers? I feel like just "fixing userspace" isn't going to cover
all of the usecases that the CVE-2018-1108 patches broke.
Sultan
On Tue, May 1, 2018 at 7:02 PM, Theodore Y. Ts'o <[email protected]> wrote:
> On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
>>
>> I have not reproduced in GCE myself. We did get some confirmation
>> that removing dracut-fips does make the problem less dire (but I
>> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
>> better than not booting at all). Specifically systemd calls libgcrypt
>> before it even opens the log with fips there, and this is before
>> virtio-rng modules could even load. Right now though, we are looking
>> at pretty much any possible options as the majority of people are
>> calling for me to backout the patches completely from rawhide.
>
> FWIW, Debian Testing is using systemd 238, and from what I can tell
> it's calling libgcrypt and it has the same (as near as I can tell)
> totally pointless hmac nonsense, and it's not a problem that I can
> see. Of course, Debian and Fedora may have a different set of
> patches....
>
Yes, Fedora libgcrypt is carrying a patch which makes it particularly
painful for us, we have reached out to the libgcrypt maintainer to
follow up on that end. But as I said before, even without that code
path (no dracut-fips) we are seeing some instances of 4 minute boots.
This is not really a workable user experience. And are you sure that
every cloud platform and VM platform offers, makes it possible to
config virtio-rng?
Justin
On Wed, May 02, 2018 at 07:09:11AM -0500, Justin Forbes wrote:
> Yes, Fedora libgcrypt is carrying a patch which makes it particularly
> painful for us, we have reached out to the libgcrypt maintainer to
> follow up on that end. But as I said before, even without that code
> path (no dracut-fips) we are seeing some instances of 4 minute boots.
> This is not really a workable user experience. And are you sure that
> every cloud platform and VM platform offers, makes it possible to
> config virtio-rng?
Unfortunately, the answer is no. Google Compute Engine, alas, does
not currently support virtio-rng. With my Google hat on, I can't
comment on future product features. With my upstream developer hat
on, I'll give you three guesses what I have been advocating and
pushing for internally, and the first two don't count. :-)
That being said, I just booted a Debian 9 (Stable, aka Stretch)
standard kernel, and then installed 4.17-rc3 (which has the
CVE-2018-1108 patches). The crng_init=2 message doesn't appear
immediately, and it does appear quite a bit later comapred to
the standard 4.9.0-6-amd64 Debian 9 kernel. However, the lack of a
fully initialized random pool doesn't prevent the standard Debian 9
image from booting:
May 2 15:33:42 localhost kernel: [ 0.000000] Linux version 4.17.0-rc3-xfstests (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-16)) #169 SMP Wed May 2 11:28:17 EDT 2018
May 2 15:33:42 localhost kernel: [ 1.456883] random: fast init done
May 2 15:33:46 rng-testing systemd[1]: Startup finished in 3.202s (kernel) + 5.963s (userspace) = 9.166s.
May 2 15:33:46 rng-testing google-accounts: INFO Starting Google Accounts daemon.
May 2 15:44:39 rng-testing kernel: [ 661.436664] random: crng init done
So it really does appear to be something going on with Fedora's
userspace; can you help try to track down what it is?
Thanks,
- Ted
On 05/02/2018 09:26 AM, Theodore Y. Ts'o wrote:
> On Wed, May 02, 2018 at 07:09:11AM -0500, Justin Forbes wrote:
>> Yes, Fedora libgcrypt is carrying a patch which makes it particularly
>> painful for us, we have reached out to the libgcrypt maintainer to
>> follow up on that end. But as I said before, even without that code
>> path (no dracut-fips) we are seeing some instances of 4 minute boots.
>> This is not really a workable user experience. And are you sure that
>> every cloud platform and VM platform offers, makes it possible to
>> config virtio-rng?
>
> Unfortunately, the answer is no. Google Compute Engine, alas, does
> not currently support virtio-rng. With my Google hat on, I can't
> comment on future product features. With my upstream developer hat
> on, I'll give you three guesses what I have been advocating and
> pushing for internally, and the first two don't count. :-)
>
> That being said, I just booted a Debian 9 (Stable, aka Stretch)
> standard kernel, and then installed 4.17-rc3 (which has the
> CVE-2018-1108 patches). The crng_init=2 message doesn't appear
> immediately, and it does appear quite a bit later comapred to
> the standard 4.9.0-6-amd64 Debian 9 kernel. However, the lack of a
> fully initialized random pool doesn't prevent the standard Debian 9
> image from booting:
>
> May 2 15:33:42 localhost kernel: [ 0.000000] Linux version 4.17.0-rc3-xfstests (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-16)) #169 SMP Wed May 2 11:28:17 EDT 2018
> May 2 15:33:42 localhost kernel: [ 1.456883] random: fast init done
> May 2 15:33:46 rng-testing systemd[1]: Startup finished in 3.202s (kernel) + 5.963s (userspace) = 9.166s.
> May 2 15:33:46 rng-testing google-accounts: INFO Starting Google Accounts daemon.
> May 2 15:44:39 rng-testing kernel: [ 661.436664] random: crng init done
>
> So it really does appear to be something going on with Fedora's
> userspace; can you help try to track down what it is?
>
> Thanks,
>
> - Ted
>
It is a Fedora patch we're carrying
https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
so yes, it is a Fedora specific use case.
From talking to the libgcrypt team, this is a FIPS mode requirement
to run power on self test at the library constructor and the self
test of libgrcypt ends up requiring a fully seeded RNG. Citation
is in section 9.10 of
https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
The response was this _could_ be fixed in libgcrypt but it needs
to be done carefully to ensure nothing actually gets broken. So in
the mean time we're stuck with userspace getting blocked whenever
some program decides to use libgcrypt too early.
Thanks,
Laura
On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
>
> It is a Fedora patch we're carrying
> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
> so yes, it is a Fedora specific use case.
> From talking to the libgcrypt team, this is a FIPS mode requirement
> to run power on self test at the library constructor and the self
> test of libgrcypt ends up requiring a fully seeded RNG. Citation
> is in section 9.10 of
> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
Forgive me if this is a stupid question, but does Fedora need FIPS
compliance? Or is this something which is only required for RHEL?
("Here's to FIPS: the cause of, and solution to, all of Life's
problems." :-)
- Ted
On Wed 2018-05-02 18:25:22, Theodore Y. Ts'o wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
> >
> > It is a Fedora patch we're carrying
> > https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
> > so yes, it is a Fedora specific use case.
> > From talking to the libgcrypt team, this is a FIPS mode requirement
> > to run power on self test at the library constructor and the self
> > test of libgrcypt ends up requiring a fully seeded RNG. Citation
> > is in section 9.10 of
> > https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
>
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance? Or is this something which is only required for RHEL?
If RHEL needs it, Fedora needs it, too -- as Fedora is a beta test for
RHEL.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, May 2, 2018 at 5:25 PM, Theodore Y. Ts'o <[email protected]> wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
>>
>> It is a Fedora patch we're carrying
>> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
>> so yes, it is a Fedora specific use case.
>> From talking to the libgcrypt team, this is a FIPS mode requirement
>> to run power on self test at the library constructor and the self
>> test of libgrcypt ends up requiring a fully seeded RNG. Citation
>> is in section 9.10 of
>> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
>
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance? Or is this something which is only required for RHEL?
>
> ("Here's to FIPS: the cause of, and solution to, all of Life's
> problems." :-)
>
One of the advantages of carrying such things in Fedora is we find
these problems before RHEL does and hopefully there is a solution in
place before they ever even see it.
From the rawhide end, I just brought in virtio-rng as inline vs
module, this works around the issue for lots of users, but not all.
GCE is still impacted, and a user came to complain about it already
last night. And of course any other virt platform without virtio-rng,
or some hardware. Most hardware installs don't have dracut-fips so
they will boot, eventually.
Justin
Since I wasn't on this thread from the start, I can only find a way to
reply to message in mbox format on patchwork, and this seemed the best.
On Fri, 2018-04-27 at 16:10 -0400, Theodore Tso wrote:
>
>
> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness....
I've hit this on an embedded system. mke2fs hangs trying to format a
persistent writable filesystem, which is where the random seed to
initialize the kernel entropy pool would be stored, because it wants 16
bytes of non-cryptographic random data for a filesystem UUID, and util-
linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
hangs for over four minutes.
Some things I've seen here don't work in the embedded world.
The user will not log in. No one logs in. There are not even user
accounts with a valid password that could log in.
The storage comes pre-written with a static image from the manufacturer
or is programmed from a static image via JTAG or some other out of band
step. It cannot be different from device to device when it first
boots. No saved entropy.
The bootloader gets entropy from writable storage to give to the
kernel? Can't do that. The bootloader has no access to writable
storage.
I understand that if someone wants cryptographic-grade randomness early
in boot when that just isn't available and isn't going to be available,
then that isn't going to happen and lying to the consumer about the
randomness of the data isn't the answer.
But I just want UUIDs for a filesystem. And the systemd machineid for
the journal file. It seems the util-linux authors thought, apparently
incorrectly, that getrandom() without GRND_RANDOM was a good way to do
get it.
What is the right way? The fact that so many userspace consumers get
it wrong might be a sign that this is lacking or at least very non-
obvious.
I want random data and I want it now. It's ok if it's low entropy.
This seems to be a very real, and unavoidable, thing in early boot.
And crng_init == 1 seems to be the intended way to do this. What's the
way to get random data of crng_init==1 quality without blocking?
On Fri, May 18, 2018 at 01:27:03AM +0000, Trent Piepho wrote:
>
> I've hit this on an embedded system. mke2fs hangs trying to format a
> persistent writable filesystem, which is where the random seed to
> initialize the kernel entropy pool would be stored, because it wants 16
> bytes of non-cryptographic random data for a filesystem UUID, and util-
> linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
> hangs for over four minutes.
This is fixed in util-linux 2.32. It ships with the following commits:
commit edc1c90cb972fdca1f66be5a8e2b0706bd2a4949
Author: Karel Zak <[email protected]>
Date: Tue Mar 20 14:17:24 2018 +0100
lib/randutils: don't break on EAGAIN, use usleep()
The current code uses lose_counter to make more attempts to read
random numbers. It seems better to wait a moment between attempts to
avoid busy loop (we do the same in all-io.h).
The worst case is 1 second delay for all random_get_bytes() on systems
with uninitialized entropy pool -- for example you call sfdisk (MBR Id
or GPT UUIDs) on very first boot, etc. In this case it will use libc
rand() as a fallback solution.
Note that we do not use random numbers for security sensitive things
like keys or so. It's used for random based UUIDs etc.
Addresses: https://github.com/karelzak/util-linux/pull/603
Signed-off-by: Karel Zak <[email protected]>
commit a9cf659e0508c1f56813a7d74c64f67bbc962538
Author: Carlo Caione <[email protected]>
Date: Mon Mar 19 10:31:07 2018 +0000
lib/randutils: Do not block on getrandom()
In Endless we have hit a problem when using 'sfdisk' on the really first
boot to automatically expand the rootfs partition. On this platform
'sfdisk' is blocking on getrandom() because not enough random bytes are
available. This is an ARM platform without a hwrng.
We fix this passing GRND_NONBLOCK to getrandom(). 'sfdisk' will use the
best entropy it has available and fallback only as necessary.
Signed-off-by: Carlo Caione <[email protected]>
Interestingly, these commits in util-linux landed *before* the patches
to address CVE-2018-1108 appeared in the kernel in April 2019. This
was because the issue of libuuid was blocking on a handful of embedded
systems even for we made this change in Linux's random driver. (It
just made this problem more likely to be visbile on a larger number of
systems; but it was always there.)
- Ted
On Thu, 2018-05-17 at 22:32 -0400, Theodore Y. Ts'o wrote:
> On Fri, May 18, 2018 at 01:27:03AM +0000, Trent Piepho wrote:
> > I've hit this on an embedded system. mke2fs hangs trying to format a
> > persistent writable filesystem, which is where the random seed to
> > initialize the kernel entropy pool would be stored, because it wants 16
> > bytes of non-cryptographic random data for a filesystem UUID, and util-
> > linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
> > hangs for over four minutes.
>
> This is fixed in util-linux 2.32. It ships with the following commits:
I feel like "fix" might overstate the result a bit.
This ends up taking a full second to make each UUID. Having gone to
great effort to make an iMX25 complete userspace startup in 250 ms, a
full second, per UUID, in early startup is pretty appalling.
Let's look at what we're doing after this fix:
Want non-cryptographic random data for UUID, ask kernel for it.
Kernel has non-cryptographic random data, won't give it to us.
Wait one second for cryptographic random data, which we didn't need.
Give up and create our own random data, which is non-cryptographic and
even worse than what the kernel could have given us from the start.
util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
tv_usec from gettimeofday(). Pretty bad on an embedded system with no
RTC and worse than what the kernel in crng_init 1 state can give us.
What took microseconds now takes a seconds. We have lower quality
random data than we had before.
Seems like two steps backward. Can't we do better?
How about adding a flag to getrandom() that allows the kernel to return
low-quality data if high-quality data would require blocking?
It would seem to be a fact that there will be users of non-
cryptographic random data in early boot. What is the best practice for
that? To fall back to each user trying "to find randomly-looking
things on an 1990s Unix." That doesn't seem good to me. But what's
the better way?
On Fri, May 18, 2018 at 10:56:18PM +0000, Trent Piepho wrote:
>
> I feel like "fix" might overstate the result a bit.
>
> This ends up taking a full second to make each UUID. Having gone to
> great effort to make an iMX25 complete userspace startup in 250 ms, a
> full second, per UUID, in early startup is pretty appalling.
>
> Let's look at what we're doing after this fix:
> Want non-cryptographic random data for UUID, ask kernel for it.
> Kernel has non-cryptographic random data, won't give it to us.
> Wait one second for cryptographic random data, which we didn't need.
> Give up and create our own random data, which is non-cryptographic and
> even worse than what the kernel could have given us from the start.
>
> util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
> tv_usec from gettimeofday(). Pretty bad on an embedded system with no
> RTC and worse than what the kernel in crng_init 1 state can give us.
So what util-linux's libuuid could do is fall back to using
/dev/urandom instead. Whether or not you retry for a second before
you fall back to /dev/urandom really depends on how important the
second U in UUID ("unique") is to you. If you use lower quality
randomness, you can potentially risk getting non-unique UUID's.
If you don't worry leaking your computer's identity and the time when
the UUID was generated, the application could also use the time-based
UUID's. There are privacy implications for doing so, it's not
something we can do automatically (or at least I can't recommend it).
Also, if you don't have the clock sequence file and/or you don't have
a writable root, you might need some randomness anyway to protect
against non-monotonically increasing system time.
> It would seem to be a fact that there will be users of non-
> cryptographic random data in early boot. What is the best practice for
> that? To fall back to each user trying "to find randomly-looking
> things on an 1990s Unix." That doesn't seem good to me. But what's
> the better way?
We could add a new flag to getrandom(2), but application authors can
just as easily fall back to using /dev/urandom. The real concern I
have is application authors that actually *really* need cryptographic
randomness, but they're too lazy to figure out a way to defer key
generation until the last possible moment.
There are other things we can do to add support in the bootloader to
read an entropy state file and inject it into the kernel alongside the
initrd and boot command line. But that doesn't completely solve the
problem; you still have to deal with the "frest from the factory,
first time out of box" experience. And if you have trusted random
number generation hardware, and are reasonably certain you don't have
to worry about a state-sponsored agency from intercepting hardware
shipments and gimmicking your hardware, that can be a solution as
well.
So there are things we can do to improve some of the scenarios.
Unfortunately, there is no silver bullet that will address all of
them.
- Ted
On Fri, 2018-05-18 at 19:22 -0400, Theodore Y. Ts'o wrote:
> On Fri, May 18, 2018 at 10:56:18PM +0000, Trent Piepho wrote:
> >
> > Let's look at what we're doing after this fix:
> > Want non-cryptographic random data for UUID, ask kernel for it.
> > Kernel has non-cryptographic random data, won't give it to us.
> > Wait one second for cryptographic random data, which we didn't need.
> > Give up and create our own random data, which is non-cryptographic and
> > even worse than what the kernel could have given us from the start.
> >
> > util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
> > tv_usec from gettimeofday(). Pretty bad on an embedded system with no
> > RTC and worse than what the kernel in crng_init 1 state can give us.
>
> So what util-linux's libuuid could do is fall back to using
> /dev/urandom instead. Whether or not you retry for a second before
> you fall back to /dev/urandom really depends on how important the
> second U in UUID ("unique") is to you. If you use lower quality
> randomness, you can potentially risk getting non-unique UUID's.
Does it really matter how long one waits? The fact that there is a
fallback that can be used would seem to provide a guarantee of
randomness/uniquness only as good as that fallback.
And here is the fallback, https://github.com/karelzak/util-linux/blob/m
aster/lib/randutils.c#L64
It doesn't seem all that great. Can we say that the kernel, e.g.
urandom, can always provide random data at least as good as the above
without blocking? If the kernel is always as good or better, then
what's the point of having the inferior fallback?
> If you don't worry leaking your computer's identity and the time when
> the UUID was generated, the application could also use the time-based
> UUID's. There are privacy implications for doing so, it's not
libuuid will still ask for random data to initialize its clock file:
https://github.com/karelzak/util-linux/blob/master/libuuid/src/gen_uuid
.c#L281
> > It would seem to be a fact that there will be users of non-
> > cryptographic random data in early boot. What is the best practice for
> > that? To fall back to each user trying "to find randomly-looking
> > things on an 1990s Unix." That doesn't seem good to me. But what's
> > the better way?
>
> We could add a new flag to getrandom(2), but application authors can
> just as easily fall back to using /dev/urandom. The real concern I
I wouldn't say just as easily. It's a more complex code path,
documented across multiple man pages and requires certain file system
access that getrandom() doesn't. But it's certainly readily
achievable, so maybe that's good enough. I think a flag to getrandom
would result in fewer mistakes in userspace code.
> have is application authors that actually *really* need cryptographic
> randomness, but they're too lazy to figure out a way to defer key
> generation until the last possible moment.
Would it be safe to say the the randutils code in util-linux would be
better off falling back to /dev/urandom instead of what it does?
If authors that really need cryptographic data use random_get_bytes()
or uuid_generate(), they'll get code that automatically falls back to
gettimeofday(). And probably not even know it.
I get your concern about lazy authors using an API that isn't
appropriate for their use case.
But we have this api already, in util-linux and code copied/inspired by
it, and it seems there are use cases where it is appropriate. If we
make it better(*), then does the risk of it being used where it
shouldn't go up?
(*) Better: use the best available random data that can be provided
without blocking.
> There are other things we can do to add support in the bootloader to
> read an entropy state file and inject it into the kernel alongside the
> initrd and boot command line. But that doesn't completely solve the
> problem; you still have to deal with the "frest from the factory,
This is problematic on a number of embedded platforms.
The bootloader might have no writable persistent storage to read/write
this entropy from. This requires drivers for the storage hardware,
ability to deal with the storage being in an inconsistent state, and
security of the storage. Assuming hardware for writable storage even
exists.
So if I want u-boot to read/write an encrypted and authenticated flash
file system, there is a lot of code to put in the bootloader! And now
we have to worry about that being exploited. Maybe this means the
bootloader needs an encryption key that it didn't previous need have
access to.
Some systems have a limit on bootloader size and RAM. Cyclone 5 is
64kB, which pretty much requires a two stage bootloader. Arria 10 has
256kB and boots in a single stage, but bootloader features are quite
limited. On imx23, it's possible to boot directly into linux with no
bootloader at all. The cpu's rom can initialize the hardware enough to
run linux just from info in the mxs boot image format.