2016-07-28 22:14:20

by Alex Xu (Hello71)

[permalink] [raw]
Subject: getrandom waits for a long time when /dev/random is insufficiently read from

Linux 4.6, also tried 4.7, qemu 2.6, using this C program:

#include <fcntl.h>
#include <stdlib.h>
#include <syscall.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
char buf[16];
int fd;

if (argc != 2)
return 1;

for (int i = 0; i < atoi(argv[1]); i++) {
sleep(1);

if ((fd = open("/dev/random", O_RDONLY)) == -1)
return 2;

if (read(fd, buf, sizeof(buf)) < 1)
return 3;

if (close(fd) == -1)
return 4;
}

sleep(2);

if (syscall(SYS_getrandom, buf, sizeof(buf), 0) == -1)
return 5;

return 0;
}

$ qemu-system-x86_64 -nodefaults -machine q35,accel=kvm -nographic -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -kernel linux-4.7/arch/x86/boot/bzImage -fsdev local,path="$PWD/root",security_model=none,id=root -device virtio-9p-pci,fsdev=root,mount_tag=/dev/root -device virtio-serial -chardev stdio,id=stdio -device virtconsole,chardev=stdio -monitor none -append "root=/dev/root rw rootfstype=9p rootflags=trans=virtio console=hvc0 init=/strace /test 2"
execve("/test", ["/test", "2"], [/* 2 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x601098) = 0
set_tid_address(0x6010d0) = 29
nanosleep({1, 0}, 0x7ffcdb7ea6b0) = 0
open("/dev/random", O_RDONLY) = 3
read(3, "P'\333\362\352\247\212\272\357E?\343", 16) = 12
close(3) = 0
nanosleep({1, 0}, 0x7ffcdb7ea6b0) = 0
open("/dev/random", O_RDONLY) = 3
read(3, ">>9\252]\332T\322dL\203\231C\255\303\376", 16) = 16
close(3) = 0
nanosleep({2, 0}, 0x7ffcdb7ea6e0) = 0
getrandom(<some time later>[ 89.166661] random: nonblocking pool is initialized
"\217\0\206\220\36t\3\353\t\227\377\356\315\320\2452", 16, 0) = 16
exit_group(0) = ?
+++ exited with 0 +++

Identical command but replaced 2 iterations with 3:

$ qemu-system-x86_64 -nodefaults -machine q35,accel=kvm -nographic -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -kernel linux-4.7/arch/x86/boot/bzImage -fsdev local,path="$PWD/root",security_model=none,id=root -device virtio-9p-pci,fsdev=root,mount_tag=/dev/root -device virtio-serial -chardev stdio,id=stdio -device virtconsole,chardev=stdio -monitor none -append "root=/dev/root rw rootfstype=9p rootflags=trans=virtio console=hvc0 init=/strace /test 3"
execve("/test", ["/test", "3"], [/* 2 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x601098) = 0
set_tid_address(0x6010d0) = 29
nanosleep({1, 0}, 0x7ffc9e13fb70) = 0
open("/dev/random", O_RDONLY) = 3
read(3, ">\202\264\350\226\364\364\320'-\200\16", 16) = 12
close(3) = 0
nanosleep({1, 0}, 0x7ffc9e13fb70) = 0
open("/dev/random", O_RDONLY) = 3
read(3, "\377:\2076\213q0E\307\377\\\234\217\"g\254", 16) = 16
close(3) = 0
nanosleep({1, 0}, 0x7ffc9e13fb70) = 0
open("/dev/random", O_RDONLY) = 3
read(3, [ 3.312266] random: nonblocking pool is initialized
"O\2112g\375\25]\270\347\v\34XP", 16) = 13
close(3) = 0
nanosleep({2, 0}, 0x7ffc9e13fba0) = 0
getrandom("\215\317\207/\324\6\300\216\332zN\351a\323\231\36", 16, 0) = 16
exit_group(0) = ?
+++ exited with 0 +++

(irrelevant kernel messages have been removed for clarity)

Removing the calls to "sleep" produces similar results except without
sleeping or the corresponding strace output. Running both commands
repeatedly also produces similar results; the timing of the getrandom
return and "random: nonblocking pool is initialized" message
is different for each run, but it always takes 90-100 seconds.

Sorry if these aren't the right lists or if this is a known issue.

Please CC me on replies.


2016-07-29 05:40:07

by Stephan Müller

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

Am Donnerstag, 28. Juli 2016, 18:07:32 CEST schrieb Alex Xu:

Hi Alex,

> Linux 4.6, also tried 4.7, qemu 2.6, using this C program:

I am not sure what problem you are referring to, but that is an expected
behavior.

You get partial reads when reading from /dev/random with a minimum of 64
bits. On the other hand getrandom(2) is woken up after the input_pool
received 128 bits of entropy.

In you strace you see that after reading 16 bytes from /dev/random, the
getrandom unblocks and starts delivering.

Note, in virtualized environments the current Linux /dev/random
implementation collects massively less entropy compared to a bare-metal
system. Hence the long wait time of your 90 to 100 secs until getrandom
unblocks.

Besides, even without reading from /dev/random, your getrandom will wait that
long.

And finally, you have a coding error that is very very common but fatal when
reading from /dev/random: you do not account for short reads which implies
that your loop continues even in the case of short reads.

Fix your code with something like the following:

int read_random(char *buf, size_t buflen)
{
int fd = 0;
ssize_t ret = 0;
size_t len = 0;

fd = open("/dev/random", O_RDONLY|O_CLOEXEC);
if(0 > fd)
return fd;
do {
ret = read(fd, (buf + len), (buflen - len));
if (0 < ret)
len += ret;
} while ((0 < ret || EINTR == errno || ERESTART == errno)
&& buflen > len);

...

Ciao
Stephan

2016-07-29 10:25:07

by Nikos Mavrogiannopoulos

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

On Fri, Jul 29, 2016 at 7:40 AM, Stephan Mueller <[email protected]> wrote:
> And finally, you have a coding error that is very very common but fatal when
> reading from /dev/random: you do not account for short reads which implies
> that your loop continues even in the case of short reads.
>
> Fix your code with something like the following:
> int read_random(char *buf, size_t buflen)
> {
> int fd = 0;
> ssize_t ret = 0;
> size_t len = 0;
>
> fd = open("/dev/random", O_RDONLY|O_CLOEXEC);
> if(0 > fd)
> return fd;
> do {
> ret = read(fd, (buf + len), (buflen - len));
> if (0 < ret)
> len += ret;
> } while ((0 < ret || EINTR == errno || ERESTART == errno)
> && buflen > len);

Unless there is a documentation error, the same is required when using
getrandom(). It can also return short as well as to be interrupted.

regards,
Nikos

2016-07-29 13:09:45

by Alex Xu (Hello71)

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

On Fri, 29 Jul 2016 12:24:27 +0200
Nikos Mavrogiannopoulos <[email protected]> wrote:

> On Fri, Jul 29, 2016 at 7:40 AM, Stephan Mueller
> <[email protected]> wrote:
> > And finally, you have a coding error that is very very common but
> > fatal when reading from /dev/random: you do not account for short
> > reads which implies that your loop continues even in the case of
> > short reads.
> >
> > Fix your code with something like the following:
> > int read_random(char *buf, size_t buflen)
> > {
> > int fd = 0;
> > ssize_t ret = 0;
> > size_t len = 0;
> >
> > fd = open("/dev/random", O_RDONLY|O_CLOEXEC);
> > if(0 > fd)
> > return fd;
> > do {
> > ret = read(fd, (buf + len), (buflen - len));
> > if (0 < ret)
> > len += ret;
> > } while ((0 < ret || EINTR == errno || ERESTART == errno)
> > && buflen > len);
>
> Unless there is a documentation error, the same is required when using
> getrandom(). It can also return short as well as to be interrupted.
>
> regards,
> Nikos

I am aware that (according to the documentation) both random(4) and
getrandom(2) may not return the full size of the read. However, that is
(as far as I know) not relevant to the point that I am making.

What I am saying is that based on my understanding of random(4) and
getrandom(2), at boot, given the same buffer size, reading
from /dev/random should have the same behavior as calling getrandom
passing no flags.

The buffer size can also be set to 1 with similar results, but the
iteration number for success must be then increased to a large number.
IME 30 worked consistently while 29 hung; your results may vary.

The interesting thing is though, if GRND_RANDOM is passed to getrandom,
then it does not hang and returns 1 byte immediately (whether or not
GRND_NONBLOCK is set).

The following revised program demonstrates this:

#include <fcntl.h>
#include <linux/random.h>
#include <stdlib.h>
#include <string.h>
#include <syscall.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
char buf[1];
int gr_flags;
const char *iters;

if (!strcmp(argv[1], "-r")) {
gr_flags = GRND_RANDOM;
iters = argv[2];
} else {
gr_flags = 0;
iters = argv[1];
}

for (int i = 0; i < atoi(iters); i++) {
int fd;
if ((fd = open("/dev/random", O_RDONLY)) == -1)
return 2;

if (read(fd, buf, 1) != 1)
return 3;

if (close(fd) == -1)
return 4;
}

if (syscall(SYS_getrandom, buf, 1, gr_flags) != 1)
return 5;

return 0;
}

Again, making the buffer size 1 resolves the complaint regarding short
reads.

With the same command line as my original email, running this in QEMU
results in:

1, 2..29: reads all return 1 byte, getrandom pauses for 90-110 secs then
returns 1 byte
30+: reads all return 1 byte, getrandom immediately returns 1 byte
-r 0: getrandom immediately returns 1 byte
-r 1, -r 2, -r 128, -r 256: reads all return 1 byte, getrandom
immediately returns 1 byte

Moving the open and close calls outside of the loop produces the same
results. Writing 4096 bytes to /dev/urandom also has no effect.

In my opinion, assuming I am not doing something terribly wrong, this
constitutes a bug in the kernel's handling of getrandom calls at boot,
possibly only when the primary source of entropy is virtio.

2016-07-29 13:12:35

by Stephan Müller

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

Am Freitag, 29. Juli 2016, 09:03:45 CEST schrieb Alex Xu:

Hi Alex,

> On Fri, 29 Jul 2016 12:24:27 +0200
>
> Nikos Mavrogiannopoulos <[email protected]> wrote:
> > On Fri, Jul 29, 2016 at 7:40 AM, Stephan Mueller
> >
> > <[email protected]> wrote:
> > > And finally, you have a coding error that is very very common but
> > > fatal when reading from /dev/random: you do not account for short
> > > reads which implies that your loop continues even in the case of
> > > short reads.
> > >
> > > Fix your code with something like the following:
> > > int read_random(char *buf, size_t buflen)
> > > {
> > >
> > > int fd = 0;
> > > ssize_t ret = 0;
> > > size_t len = 0;
> > >
> > > fd = open("/dev/random", O_RDONLY|O_CLOEXEC);
> > > if(0 > fd)
> > >
> > > return fd;
> > >
> > > do {
> > >
> > > ret = read(fd, (buf + len), (buflen - len));
> > > if (0 < ret)
> > >
> > > len += ret;
> > >
> > > } while ((0 < ret || EINTR == errno || ERESTART == errno)
> > >
> > > && buflen > len);
> >
> > Unless there is a documentation error, the same is required when using
> > getrandom(). It can also return short as well as to be interrupted.
> >
> > regards,
> > Nikos
>
> I am aware that (according to the documentation) both random(4) and
> getrandom(2) may not return the full size of the read. However, that is
> (as far as I know) not relevant to the point that I am making.
>
> What I am saying is that based on my understanding of random(4) and
> getrandom(2), at boot, given the same buffer size, reading
> from /dev/random should have the same behavior as calling getrandom
> passing no flags.

/dev/random can return after at least 64 bits received in the input_pool
whereas getrandom waits for 128 bits.
>
> The buffer size can also be set to 1 with similar results, but the
> iteration number for success must be then increased to a large number.
> IME 30 worked consistently while 29 hung; your results may vary.
>
> The interesting thing is though, if GRND_RANDOM is passed to getrandom,
> then it does not hang and returns 1 byte immediately (whether or not
> GRND_NONBLOCK is set).

Sure, because there is one byte in the input_pool at the time user space
boots. Note again, /dev/random waits until having 64 bits.

>
> 1, 2..29: reads all return 1 byte, getrandom pauses for 90-110 secs then
> returns 1 byte
> 30+: reads all return 1 byte, getrandom immediately returns 1 byte
> -r 0: getrandom immediately returns 1 byte
> -r 1, -r 2, -r 128, -r 256: reads all return 1 byte, getrandom
> immediately returns 1 byte
>
I would say that this is expected.

> Moving the open and close calls outside of the loop produces the same
> results. Writing 4096 bytes to /dev/urandom also has no effect.

Sure, it does not update the input_pool. Only the IOCTL can update the
input_pool from user space.
>
> In my opinion, assuming I am not doing something terribly wrong, this
> constitutes a bug in the kernel's handling of getrandom calls at boot,
> possibly only when the primary source of entropy is virtio.

Nope, I do not think that this is true:

- /dev/random returns one byte for one byte of entropy received, but it has a
lower limit of 64 bits

- getrandom behaves like /dev/urandom (i.e. nonblocking) except during boot
where it waits until the RNG has collected 128 bits before operating like a
DRNG that is seeded once in a while when entropy comes in.


Ciao
Stephan

2016-07-29 14:14:12

by Alex Xu (Hello71)

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

On Fri, 29 Jul 2016 15:12:30 +0200
Stephan Mueller <[email protected]> wrote as excerpted:
> Am Freitag, 29. Juli 2016, 09:03:45 CEST schrieb Alex Xu:
> > In my opinion, assuming I am not doing something terribly wrong,
> > this constitutes a bug in the kernel's handling of getrandom calls
> > at boot, possibly only when the primary source of entropy is
> > virtio.
>
> Nope, I do not think that this is true:
>
> - /dev/random returns one byte for one byte of entropy received, but
> it has a lower limit of 64 bits
>
> - getrandom behaves like /dev/urandom (i.e. nonblocking) except
> during boot where it waits until the RNG has collected 128 bits
> before operating like a DRNG that is seeded once in a while when
> entropy comes in.
>
>
> Ciao
> Stephan

I don't follow. Assuming you are correct and this is the issue, then
reading 128 bits (16 bytes) from /dev/random should "exhaust the
supply" and then both reads from /dev/random and calling getrandom
should block.

That, however, is not the behavior I observed, which is that reading
any amount from /dev/random will never block (since it is fed
from /dev/urandom on the host side) whereas calling getrandom will
always block unless /dev/random is read from first.

Moreover, as long as virtio-rng is available (and fed
from /dev/urandom), /proc/sys/kernel/random/entropy_avail is always 961
immediately after booting, which is more than enough to satisfy a
one-byte read. After reading 1 byte, the estimate decreases to 896 or
897, but after reading 29 more bytes it increases to 1106.

Again, these observations are consistent with the conjecture that the
issue arises since virtio-rng is a "pull" source of entropy whereas
most other methods (e.g. interrupt timing) are "push" sources. I
suspect that a similar issue occurs if RDRAND is the only source of
entropy.

I also tried running rngd in the guest which resolved the issue but
seems entirely stupid to me, even moreso since
http://rhelblog.redhat.com/2015/03/09/red-hat-enterprise-linux-virtual-machines-access-to-random-numbers-made-easy/
says that "The use of rngd is now not required and the guest kernel
itself fetches entropy from the host when the available entropy falls
below a specific threshold.".

2016-07-29 17:03:55

by Stephan Müller

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

Am Freitag, 29. Juli 2016, 10:14:07 CEST schrieb Alex Xu:

Hi Alex,

> On Fri, 29 Jul 2016 15:12:30 +0200
>
> Stephan Mueller <[email protected]> wrote as excerpted:
> > Am Freitag, 29. Juli 2016, 09:03:45 CEST schrieb Alex Xu:
> > > In my opinion, assuming I am not doing something terribly wrong,
> > > this constitutes a bug in the kernel's handling of getrandom calls
> > > at boot, possibly only when the primary source of entropy is
> > > virtio.
> >
> > Nope, I do not think that this is true:
> >
> > - /dev/random returns one byte for one byte of entropy received, but
> > it has a lower limit of 64 bits
> >
> > - getrandom behaves like /dev/urandom (i.e. nonblocking) except
> > during boot where it waits until the RNG has collected 128 bits
> > before operating like a DRNG that is seeded once in a while when
> > entropy comes in.
> >
> >
> > Ciao
> > Stephan
>
> I don't follow. Assuming you are correct and this is the issue, then
> reading 128 bits (16 bytes) from /dev/random should "exhaust the
> supply" and then both reads from /dev/random and calling getrandom
> should block.

You assume that getrandom works like /dev/random. This is not the case. It is
a full deterministic RNG like /dev/urandom (which is seeded during its
operation as entropy is available).

getrandom *only* differs from /dev/*u*random in that it waits initially such
that the system collected 128 bits of entropy.

But you point to a real issue: when /dev/random is pulled before getrandom
(and yet insufficient entropy is present), then the getrandom call will be
woken up when the input_pool received 128 bits. But those 128 bits are fed
from the input_pool to the blocking_pool based on the caller at the /dev/
random device. This implies that the reader for getrandom will NOT be able to
obtain data from the input_pool and the nonblocking_pool because the transfer
operation will not succeed. This implies that the nonblocking_pool remains
unseeded and yet getrandom returns data to the caller.
>
> That, however, is not the behavior I observed, which is that reading
> any amount from /dev/random will never block (since it is fed
> from /dev/urandom on the host side) whereas calling getrandom will
> always block unless /dev/random is read from first.

That is a different issue that I did not read from your initial explanation.

I need to look into it a bit deeper.
>
> Moreover, as long as virtio-rng is available (and fed
> from /dev/urandom), /proc/sys/kernel/random/entropy_avail is always 961
> immediately after booting, which is more than enough to satisfy a
> one-byte read. After reading 1 byte, the estimate decreases to 896 or
> 897, but after reading 29 more bytes it increases to 1106.
>
> Again, these observations are consistent with the conjecture that the
> issue arises since virtio-rng is a "pull" source of entropy whereas
> most other methods (e.g. interrupt timing) are "push" sources. I
> suspect that a similar issue occurs if RDRAND is the only source of
> entropy.
>
> I also tried running rngd in the guest which resolved the issue but
> seems entirely stupid to me, even moreso since
> http://rhelblog.redhat.com/2015/03/09/red-hat-enterprise-linux-virtual-machi
> nes-access-to-random-numbers-made-easy/ says that "The use of rngd is now
> not required and the guest kernel itself fetches entropy from the host when
> the available entropy falls below a specific threshold.".

right -- the kernel has now an in-kernel link that makes rngd superflowous in
this case.



Ciao
Stephan

2016-07-29 17:31:18

by Alex Xu (Hello71)

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

On Fri, 29 Jul 2016 19:03:51 +0200
Stephan Mueller <[email protected]> wrote as excerpted:
> Am Freitag, 29. Juli 2016, 10:14:07 CEST schrieb Alex Xu:
> > I don't follow. Assuming you are correct and this is the issue, then
> > reading 128 bits (16 bytes) from /dev/random should "exhaust the
> > supply" and then both reads from /dev/random and calling getrandom
> > should block.
>
> You assume that getrandom works like /dev/random. This is not the
> case. It is a full deterministic RNG like /dev/urandom (which is
> seeded during its operation as entropy is available).

My understanding was that all three methods of obtaining entropy from
userspace all receive data from the CSPRNG in the kernel, and that the
only difference is that /dev/random and getrandom may block depending
on the kernel's estimate of the currently available entropy.

> getrandom *only* differs from /dev/*u*random in that it waits
> initially such that the system collected 128 bits of entropy.

I agree, this is the documented behavior of getrandom.

> But you point to a real issue: when /dev/random is pulled before
> getrandom (and yet insufficient entropy is present), then the
> getrandom call will be woken up when the input_pool received 128
> bits. But those 128 bits are fed from the input_pool to the
> blocking_pool based on the caller at the /dev/ random device. This
> implies that the reader for getrandom will NOT be able to obtain data
> from the input_pool and the nonblocking_pool because the transfer
> operation will not succeed. This implies that the nonblocking_pool
> remains unseeded and yet getrandom returns data to the caller.

I don't understand what this means. For my use case, hwrng is fed from
the host's urandom, so none of /dev/random, /dev/hwrng, /dev/urandom,
or getrandom with any flags in the guest should ever block except
possibly for very large amounts requested (megabytes at least).

> > That, however, is not the behavior I observed, which is that reading
> > any amount from /dev/random will never block (since it is fed
> > from /dev/urandom on the host side) whereas calling getrandom will
> > always block unless /dev/random is read from first.
>
> That is a different issue that I did not read from your initial
> explanation.
>
> I need to look into it a bit deeper.

I have been trying to explain the same problem the entire time. Let me
be clear what the problem is as I see it:

When qemu is started with -object rng-random,filename=/dev/urandom, and
immediately (i.e. with no initrd and as the first thing in init):

1. the guest runs dd if=/dev/random, there is no blocking and tons of
data goes to the screen. the data appears to be random.

2. the guest runs getrandom with any requested amount (tested 1 byte
and 16 bytes) and no flags, it blocks for 90-110 seconds while the
"non-blocking pool is initialized". the returned data appears to be
random.

3. the guest runs getrandom with GRND_RANDOM with any requested amount,
it returns the desired amount or possibly less, but in my experience at
least 10 bytes. the returned data appears to be random.

I believe that the difference between cases 1 and 2 is a bug, since
based on my previous statement, in this scenario, getrandom should
never block.

2016-07-30 22:09:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

On Fri, Jul 29, 2016 at 01:31:14PM -0400, Alex Xu wrote:
>
> My understanding was that all three methods of obtaining entropy from
> userspace all receive data from the CSPRNG in the kernel, and that the
> only difference is that /dev/random and getrandom may block depending
> on the kernel's estimate of the currently available entropy.

This is incorrect.

/dev/random is a legacy interface which dates back to a time when
people didn't have as much trust in the cryptographic primitives ---
when there was concerns that the NSA might have put a back-door into
SHA-1, for example. (As it turns out; we were wrong. NSA put the
back door into Dual EC DRBG.) So it uses a strategy of an extremely
conservative entropy estimator, and will allow N bytes to be
/dev/random pool as the entropy estimator believes that it has
gathered at least N bytes of entropy from environmental noise.

/dev/urandom uses a different output pool from /dev/random (the random
and urandom pools both draw from an common input pool). Originally
the /dev/urandom pool drew from the input pool as needed, but it
wouldn't block if there was insufficient entropy. Over time, it now
has limits about how quickly it can draw from the input pool, and it
behaves more and more like a CSPRNG. In fact, in the most recent set
of patches which Linus has accepted for v4.8-rc1, the urandom pool has
been replaced by an actual CSPRNG using the ChaCha-20 stream cipher.

The getrandom(2) system call uses the same output pool (4.7 and
earlier) or CSPRG (starting with v4.8-rc1) as /dev/urandom. The big
difference is that it blocks until we know for sure that the output
pool or CSRPNG has been seeded with 128 bits of entropy. We don't do
this with /dev/urandom for backwards compatibility reasons. (For
example, if we did make /dev/urandom block until it was seeded, it
would break systemd, because systemd and progams run by systemd draws
from /dev/urandom before it has been initialized, and if /dev/urandom
were to block, the boot would hang, and with the system quiscient, we
wouldn't get much environmental noise, and the system would hang
hours.)

> When qemu is started with -object rng-random,filename=/dev/urandom, and
> immediately (i.e. with no initrd and as the first thing in init):
>
> 1. the guest runs dd if=/dev/random, there is no blocking and tons of
> data goes to the screen. the data appears to be random.
>
> 2. the guest runs getrandom with any requested amount (tested 1 byte
> and 16 bytes) and no flags, it blocks for 90-110 seconds while the
> "non-blocking pool is initialized". the returned data appears to be
> random.
>
> 3. the guest runs getrandom with GRND_RANDOM with any requested amount,
> it returns the desired amount or possibly less, but in my experience at
> least 10 bytes. the returned data appears to be random.
>
> I believe that the difference between cases 1 and 2 is a bug, since
> based on my previous statement, in this scenario, getrandom should
> never block.

This is correct; and it has been fixed in the patches in v4.8-rc1.
The patch which fixes this has been marked for backporting to stable
kernels:

commit 3371f3da08cff4b75c1f2dce742d460539d6566d
Author: Theodore Ts'o <[email protected]>
Date: Sun Jun 12 18:11:51 2016 -0400

random: initialize the non-blocking pool via add_hwgenerator_randomness()

If we have a hardware RNG and are using the in-kernel rngd, we should
use this to initialize the non-blocking pool so that getrandom(2)
doesn't block unnecessarily.

Cc: [email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

Basically, the urandom pool (now CSRPNG) wasn't getting initialized
from the hardware random number generator. Most people didn't notice
because very few people actually *use* hardware random number
generators (although it's much more common in VM's, which is how
you're using it), and use of getrandom(2) is still relatively rare,
given that glibc hasn't yet seen fit to support it yet.

Cheers,

- Ted

Subject: Re: getrandom waits for a long time when /dev/random is insufficiently read from

On Sat, 30 Jul 2016 18:09:22 -0400
Theodore Ts'o <[email protected]> wrote as excerpted:
> On Fri, Jul 29, 2016 at 01:31:14PM -0400, Alex Xu wrote:
> > When qemu is started with -object rng-random,filename=/dev/urandom,
> > and immediately (i.e. with no initrd and as the first thing in
> > init):
> >
> > 1. the guest runs dd if=/dev/random, there is no blocking and tons
> > of data goes to the screen. the data appears to be random.
> >
> > 2. the guest runs getrandom with any requested amount (tested 1 byte
> > and 16 bytes) and no flags, it blocks for 90-110 seconds while the
> > "non-blocking pool is initialized". the returned data appears to be
> > random.
> >
> > 3. the guest runs getrandom with GRND_RANDOM with any requested
> > amount, it returns the desired amount or possibly less, but in my
> > experience at least 10 bytes. the returned data appears to be
> > random.
> >
> > I believe that the difference between cases 1 and 2 is a bug, since
> > based on my previous statement, in this scenario, getrandom should
> > never block.
>
> This is correct; and it has been fixed in the patches in v4.8-rc1.
> The patch which fixes this has been marked for backporting to stable
> kernels:
>
> commit 3371f3da08cff4b75c1f2dce742d460539d6566d
> Author: Theodore Ts'o <[email protected]>
> Date: Sun Jun 12 18:11:51 2016 -0400
>
> random: initialize the non-blocking pool via
> add_hwgenerator_randomness()
> If we have a hardware RNG and are using the in-kernel rngd, we
> should use this to initialize the non-blocking pool so that
> getrandom(2) doesn't block unnecessarily.
>
> Cc: [email protected]
> Signed-off-by: Theodore Ts'o <[email protected]>
>
> Basically, the urandom pool (now CSRPNG) wasn't getting initialized
> from the hardware random number generator. Most people didn't notice
> because very few people actually *use* hardware random number
> generators (although it's much more common in VM's, which is how
> you're using it), and use of getrandom(2) is still relatively rare,
> given that glibc hasn't yet seen fit to support it yet.
>
> Cheers,
>
> - Ted

Dammit, the one time I track down an actual kernel bug someone's already
fixed it. I'd even bothered to check 4.6 so I figured nobody'd gotten
around to it yet.

Thanks for the excellent explanations though. :)