2022-07-14 07:44:59

by GUO Zihua

[permalink] [raw]
Subject: Inquiry about the removal of flag O_NONBLOCK on /dev/random

Hi Community,

Recently we noticed the removal of flag O_NONBLOCK on /dev/random by
commit 30c08efec888 ("random: make /dev/random be almost like
/dev/urandom"), it seems that some of the open_source packages e.g.
random_get_fd() of util-linux and __getrandom() of glibc. The man page
for random() is not updated either.

Would anyone please kindly provide some background knowledge of this
flag and it's removal? Thanks!

--
Best
GUO Zihua


2022-07-18 08:58:06

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/14 15:33, Guozihua (Scott) wrote:
> Hi Community,
>
> Recently we noticed the removal of flag O_NONBLOCK on /dev/random by
> commit 30c08efec888 ("random: make /dev/random be almost like
> /dev/urandom"), it seems that some of the open_source packages e.g.
> random_get_fd() of util-linux and __getrandom() of glibc. The man page
> for random() is not updated either.

Correction: I mean various open source packages are still using
O_NONBLOCK flag while accessing /dev/random
>
> Would anyone please kindly provide some background knowledge of this
> flag and it's removal? Thanks!
>
--
Best
GUO Zihua

2022-07-19 03:49:05

by Eric Biggers

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On Mon, Jul 18, 2022 at 04:52:59PM +0800, Guozihua (Scott) wrote:
> On 2022/7/14 15:33, Guozihua (Scott) wrote:
> > Hi Community,
> >
> > Recently we noticed the removal of flag O_NONBLOCK on /dev/random by
> > commit 30c08efec888 ("random: make /dev/random be almost like
> > /dev/urandom"), it seems that some of the open_source packages e.g.
> > random_get_fd() of util-linux and __getrandom() of glibc. The man page
> > for random() is not updated either.
>
> Correction: I mean various open source packages are still using O_NONBLOCK
> flag while accessing /dev/random
> >
> > Would anyone please kindly provide some background knowledge of this
> > flag and it's removal? Thanks!
> >

This was changed a while ago, in v5.6. /dev/random no longer recognizes
O_NONBLOCK because it no longer blocks, except before the entropy pool has been
initialized.

(I don't know why O_NONBLOCK stopped being recognized *before* the entropy pool
has been initialized; it's either an oversight, or it was decided it doesn't
matter. Probably the latter, since I can't think of a real use case for using
O_NONBLOCK on /dev/random.)

The random(4) man page is indeed in need of an update, not just for this reason
but for some other reasons too.

The util-linux code which you mentioned is opening /dev/random with O_NONBLOCK
if opening /dev/urandom fails, which is pretty much pointless. Perhaps the
author thought that /dev/random with O_NONBLOCK is equivalent to /dev/urandom
(it's not). The glibc code, if you mean sysdeps/mach/hurd/getrandom.c, is
actually code written for GNU Hurd, not for Linux, so it's not relevant.

- Eric

2022-07-19 08:12:40

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

> (I don't know why O_NONBLOCK stopped being recognized *before* the entropy pool
> has been initialized; it's either an oversight, or it was decided it doesn't
> matter. Probably the latter, since I can't think of a real use case for using
> O_NONBLOCK on /dev/random.)

Does this mean that we expect all users of /dev/random to block until it
is initialized?

--
Best
GUO Zihua

2022-07-19 11:12:18

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

Hi,

On Thu, Jul 14, 2022 at 03:33:47PM +0800, Guozihua (Scott) wrote:
> Recently we noticed the removal of flag O_NONBLOCK on /dev/random by
> commit 30c08efec888 ("random: make /dev/random be almost like
> /dev/urandom"), it seems that some of the open_source packages e.g.
> random_get_fd() of util-linux and __getrandom() of glibc. The man page
> for random() is not updated either.
>
> Would anyone please kindly provide some background knowledge of this
> flag and it's removal? Thanks!

I didn't write that code, but I assume it was done this way because it
doesn't really matter that much now, as /dev/random never blocks after
the RNG is seeded. And now a days, the RNG gets seeded with jitter
fairly quickly as a last resort, so almost nobody waits a long time.

Looking at the two examples you mentioned, the one in util-linux does
that if /dev/urandom fails, which means it's mostly unused code, and the
one in glibc is for GNU Hurd, not Linux. I did a global code search and
found a bunch of other instances pretty similar to the util-linux case,
where /dev/random in O_NONBLOCK mode is used as a fallback to
/dev/urandom, which means it's basically never used. (Amusingly one such
user of this pattern is Ted's pwgen code from way back at the turn of
the millennium.)

All together, I couldn't really find anywhere that the removal of
O_NONBLOCK semantics would actually pose a problem for, especially since
/dev/random doesn't block at all after being initialized. So I'm
slightly leaning toward the "doesn't matter, do nothing" course of
action.

But on the other hand, you did somehow notice this, so that's important
perhaps. How did you run into it? *Does* it actually pose a problem? Or
was this a mostly theoretical finding from perusing source code?
Something like the below diff would probably work and isn't too
invasive, but I think I'd prefer to leave it be unless this really did
break some userspace of yours. So please let me know.

Regards,
Jason

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 70d8d1d7e2d7..6f232ac258bf 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1347,6 +1347,10 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter)
{
int ret;

+ if (!crng_ready() &&
+ ((kiocb->ki_flags & IOCB_NOWAIT) || (kiocb->ki_filp->f_flags & O_NONBLOCK)))
+ return -EAGAIN;
+
ret = wait_for_random_bytes();
if (ret != 0)
return ret;

2022-07-21 03:54:36

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/19 19:01, Jason A. Donenfeld wrote:
> Hi,
>
> On Thu, Jul 14, 2022 at 03:33:47PM +0800, Guozihua (Scott) wrote:
>> Recently we noticed the removal of flag O_NONBLOCK on /dev/random by
>> commit 30c08efec888 ("random: make /dev/random be almost like
>> /dev/urandom"), it seems that some of the open_source packages e.g.
>> random_get_fd() of util-linux and __getrandom() of glibc. The man page
>> for random() is not updated either.
>>
>> Would anyone please kindly provide some background knowledge of this
>> flag and it's removal? Thanks!
>
> I didn't write that code, but I assume it was done this way because it
> doesn't really matter that much now, as /dev/random never blocks after
> the RNG is seeded. And now a days, the RNG gets seeded with jitter
> fairly quickly as a last resort, so almost nobody waits a long time.
>
> Looking at the two examples you mentioned, the one in util-linux does
> that if /dev/urandom fails, which means it's mostly unused code, and the
> one in glibc is for GNU Hurd, not Linux. I did a global code search and
> found a bunch of other instances pretty similar to the util-linux case,
> where /dev/random in O_NONBLOCK mode is used as a fallback to
> /dev/urandom, which means it's basically never used. (Amusingly one such
> user of this pattern is Ted's pwgen code from way back at the turn of
> the millennium.)
>
> All together, I couldn't really find anywhere that the removal of
> O_NONBLOCK semantics would actually pose a problem for, especially since
> /dev/random doesn't block at all after being initialized. So I'm
> slightly leaning toward the "doesn't matter, do nothing" course of
> action.
>
> But on the other hand, you did somehow notice this, so that's important
> perhaps. How did you run into it? *Does* it actually pose a problem? Or
> was this a mostly theoretical finding from perusing source code?
> Something like the below diff would probably work and isn't too
> invasive, but I think I'd prefer to leave it be unless this really did
> break some userspace of yours. So please let me know.
>
> Regards,
> Jason
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 70d8d1d7e2d7..6f232ac258bf 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1347,6 +1347,10 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter)
> {
> int ret;
>
> + if (!crng_ready() &&
> + ((kiocb->ki_flags & IOCB_NOWAIT) || (kiocb->ki_filp->f_flags & O_NONBLOCK)))
> + return -EAGAIN;
> +
> ret = wait_for_random_bytes();
> if (ret != 0)
> return ret;
>
> .

Hi Jason, Thanks for the respond.

The reason this comes to me is that we have an environment that is super
clean with very limited random events and with very limited random
hardware access. It would take up to 80 minutes before /dev/random is
fully initialized. I think it would be great if we can restore the
O_NONBLOCK flag.

Would you mind merge this change into mainstream or may I have the honor?

--
Best
GUO Zihua

2022-07-21 04:10:59

by Eric Biggers

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On Thu, Jul 21, 2022 at 11:50:27AM +0800, Guozihua (Scott) wrote:
> On 2022/7/19 19:01, Jason A. Donenfeld wrote:
> > Hi,
> >
> > On Thu, Jul 14, 2022 at 03:33:47PM +0800, Guozihua (Scott) wrote:
> > > Recently we noticed the removal of flag O_NONBLOCK on /dev/random by
> > > commit 30c08efec888 ("random: make /dev/random be almost like
> > > /dev/urandom"), it seems that some of the open_source packages e.g.
> > > random_get_fd() of util-linux and __getrandom() of glibc. The man page
> > > for random() is not updated either.
> > >
> > > Would anyone please kindly provide some background knowledge of this
> > > flag and it's removal? Thanks!
> >
> > I didn't write that code, but I assume it was done this way because it
> > doesn't really matter that much now, as /dev/random never blocks after
> > the RNG is seeded. And now a days, the RNG gets seeded with jitter
> > fairly quickly as a last resort, so almost nobody waits a long time.
> >
> > Looking at the two examples you mentioned, the one in util-linux does
> > that if /dev/urandom fails, which means it's mostly unused code, and the
> > one in glibc is for GNU Hurd, not Linux. I did a global code search and
> > found a bunch of other instances pretty similar to the util-linux case,
> > where /dev/random in O_NONBLOCK mode is used as a fallback to
> > /dev/urandom, which means it's basically never used. (Amusingly one such
> > user of this pattern is Ted's pwgen code from way back at the turn of
> > the millennium.)
> >
> > All together, I couldn't really find anywhere that the removal of
> > O_NONBLOCK semantics would actually pose a problem for, especially since
> > /dev/random doesn't block at all after being initialized. So I'm
> > slightly leaning toward the "doesn't matter, do nothing" course of
> > action.
> >
> > But on the other hand, you did somehow notice this, so that's important
> > perhaps. How did you run into it? *Does* it actually pose a problem? Or
> > was this a mostly theoretical finding from perusing source code?
> > Something like the below diff would probably work and isn't too
> > invasive, but I think I'd prefer to leave it be unless this really did
> > break some userspace of yours. So please let me know.
> >
> > Regards,
> > Jason
> >
> > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > index 70d8d1d7e2d7..6f232ac258bf 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -1347,6 +1347,10 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter)
> > {
> > int ret;
> > + if (!crng_ready() &&
> > + ((kiocb->ki_flags & IOCB_NOWAIT) || (kiocb->ki_filp->f_flags & O_NONBLOCK)))
> > + return -EAGAIN;
> > +
> > ret = wait_for_random_bytes();
> > if (ret != 0)
> > return ret;
> >
> > .
>
> Hi Jason, Thanks for the respond.
>
> The reason this comes to me is that we have an environment that is super
> clean with very limited random events and with very limited random hardware
> access. It would take up to 80 minutes before /dev/random is fully
> initialized. I think it would be great if we can restore the O_NONBLOCK
> flag.
>
> Would you mind merge this change into mainstream or may I have the honor?
>

Can you elaborate on how this change would actually solve a problem for you? Do
you actually have a program that is using /dev/random with O_NONBLOCK, and that
handles the EAGAIN error correctly? Just because you're seeing a program wait
for the RNG to be initialized doesn't necessarily mean that this change would
make a difference, as the program could just be reading from /dev/random without
O_NONBLOCK or calling getrandom() without GRND_NONBLOCK. The behavior of those
(more common) cases would be unchanged by Jason's proposed change.

- Eric

2022-07-21 06:46:48

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/21 12:07, Eric Biggers wrote:
> On Thu, Jul 21, 2022 at 11:50:27AM +0800, Guozihua (Scott) wrote:
>> On 2022/7/19 19:01, Jason A. Donenfeld wrote:
>>> Hi,
>>>
>>> On Thu, Jul 14, 2022 at 03:33:47PM +0800, Guozihua (Scott) wrote:
>>>> Recently we noticed the removal of flag O_NONBLOCK on /dev/random by
>>>> commit 30c08efec888 ("random: make /dev/random be almost like
>>>> /dev/urandom"), it seems that some of the open_source packages e.g.
>>>> random_get_fd() of util-linux and __getrandom() of glibc. The man page
>>>> for random() is not updated either.
>>>>
>>>> Would anyone please kindly provide some background knowledge of this
>>>> flag and it's removal? Thanks!
>>>
>>> I didn't write that code, but I assume it was done this way because it
>>> doesn't really matter that much now, as /dev/random never blocks after
>>> the RNG is seeded. And now a days, the RNG gets seeded with jitter
>>> fairly quickly as a last resort, so almost nobody waits a long time.
>>>
>>> Looking at the two examples you mentioned, the one in util-linux does
>>> that if /dev/urandom fails, which means it's mostly unused code, and the
>>> one in glibc is for GNU Hurd, not Linux. I did a global code search and
>>> found a bunch of other instances pretty similar to the util-linux case,
>>> where /dev/random in O_NONBLOCK mode is used as a fallback to
>>> /dev/urandom, which means it's basically never used. (Amusingly one such
>>> user of this pattern is Ted's pwgen code from way back at the turn of
>>> the millennium.)
>>>
>>> All together, I couldn't really find anywhere that the removal of
>>> O_NONBLOCK semantics would actually pose a problem for, especially since
>>> /dev/random doesn't block at all after being initialized. So I'm
>>> slightly leaning toward the "doesn't matter, do nothing" course of
>>> action.
>>>
>>> But on the other hand, you did somehow notice this, so that's important
>>> perhaps. How did you run into it? *Does* it actually pose a problem? Or
>>> was this a mostly theoretical finding from perusing source code?
>>> Something like the below diff would probably work and isn't too
>>> invasive, but I think I'd prefer to leave it be unless this really did
>>> break some userspace of yours. So please let me know.
>>>
>>> Regards,
>>> Jason
>>>
>>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>>> index 70d8d1d7e2d7..6f232ac258bf 100644
>>> --- a/drivers/char/random.c
>>> +++ b/drivers/char/random.c
>>> @@ -1347,6 +1347,10 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter)
>>> {
>>> int ret;
>>> + if (!crng_ready() &&
>>> + ((kiocb->ki_flags & IOCB_NOWAIT) || (kiocb->ki_filp->f_flags & O_NONBLOCK)))
>>> + return -EAGAIN;
>>> +
>>> ret = wait_for_random_bytes();
>>> if (ret != 0)
>>> return ret;
>>>
>>> .
>>
>> Hi Jason, Thanks for the respond.
>>
>> The reason this comes to me is that we have an environment that is super
>> clean with very limited random events and with very limited random hardware
>> access. It would take up to 80 minutes before /dev/random is fully
>> initialized. I think it would be great if we can restore the O_NONBLOCK
>> flag.
>>
>> Would you mind merge this change into mainstream or may I have the honor?
>>
>
> Can you elaborate on how this change would actually solve a problem for you? Do
> you actually have a program that is using /dev/random with O_NONBLOCK, and that
> handles the EAGAIN error correctly? Just because you're seeing a program wait
> for the RNG to be initialized doesn't necessarily mean that this change would
> make a difference, as the program could just be reading from /dev/random without
> O_NONBLOCK or calling getrandom() without GRND_NONBLOCK. The behavior of those
> (more common) cases would be unchanged by Jason's proposed change.
>
> - Eric
> .

Hi Eric

We have a userspace program that starts pretty early in the boot process
and it tries to fetch random bits from /dev/random with O_NONBLOCK, if
that returns -EAGAIN, it turns to /dev/urandom. Is this a correct
handling of -EAGAIN? Or this is not one of the intended use case of
O_NONBLOCK?

--
Best
GUO Zihua

2022-07-21 06:51:41

by Eric Biggers

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On Thu, Jul 21, 2022 at 02:44:54PM +0800, Guozihua (Scott) wrote:
>
> Hi Eric
>
> We have a userspace program that starts pretty early in the boot process and
> it tries to fetch random bits from /dev/random with O_NONBLOCK, if that
> returns -EAGAIN, it turns to /dev/urandom. Is this a correct handling of
> -EAGAIN? Or this is not one of the intended use case of O_NONBLOCK?

That doesn't make any sense; you should just use /dev/urandom unconditionally.

- Eric

2022-07-21 10:41:53

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

Hi Guozihua,

On Wed, Jul 20, 2022 at 11:50:46PM -0700, Eric Biggers wrote:
> On Thu, Jul 21, 2022 at 02:44:54PM +0800, Guozihua (Scott) wrote:
> >
> > Hi Eric
> >
> > We have a userspace program that starts pretty early in the boot process and
> > it tries to fetch random bits from /dev/random with O_NONBLOCK, if that
> > returns -EAGAIN, it turns to /dev/urandom. Is this a correct handling of
> > -EAGAIN? Or this is not one of the intended use case of O_NONBLOCK?
>
> That doesn't make any sense; you should just use /dev/urandom unconditionally.

What Eric said: this flow doesn't really make sense. Why not use
/dev/urandom unconditionally or getrandom(GRND_INSECURE)?

But also I have to wonder: you wrote '-EAGAIN' but usually userspace
checks errno==EAGAIN, a positive value. That makes me wonder whether you
wrote your email with your code is open. So I just wanted to triple
check that what you've described is actually what the code is doing,
just in case there's some ambiguity.

I'm just trying to find out what this code is and where it is to assess
whether we change the userspace behavior again, given that this has been
sitting for several years now.

Jason

2022-07-21 11:31:03

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On Thu, Jul 21, 2022 at 02:44:54PM +0800, Guozihua (Scott) wrote:
>
> We have a userspace program that starts pretty early in the boot process and
> it tries to fetch random bits from /dev/random with O_NONBLOCK, if that
> returns -EAGAIN, it turns to /dev/urandom. Is this a correct handling of
> -EAGAIN? Or this is not one of the intended use case of O_NONBLOCK?

In addition to the good points which Eric and Jason have raised, the
other thing I would ask you is ***why*** is your userspace program
trying to fetch random bits early in the boot process? Is it, say,
trying to generate a cryptographic key which is security critical. If
so, then DON'T DO THAT.

There have been plenty of really embarrassing security problems caused
by consumer grade products who generate a public/private key pair
within seconds of the customer taking the product out of the box, and
plugging it into the wall for the first time. At which point,
hilarity ensues, unless the box is life- or mission- critical, in
which case tragedy ensues....

Is it possible to move the userspace program so it's not being started
early in the boot process? What is it doing, and why does it need
random data in the first place?

- Ted

2022-07-21 11:40:23

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/21 19:09, Theodore Ts'o wrote:
> On Thu, Jul 21, 2022 at 02:44:54PM +0800, Guozihua (Scott) wrote:
> >
>> We have a userspace program that starts pretty early in the boot process and
>> it tries to fetch random bits from /dev/random with O_NONBLOCK, if that
>> returns -EAGAIN, it turns to /dev/urandom. Is this a correct handling of
>> -EAGAIN? Or this is not one of the intended use case of O_NONBLOCK?
>
> In addition to the good points which Eric and Jason have raised, the
> other thing I would ask you is ***why*** is your userspace program
> trying to fetch random bits early in the boot process? Is it, say,
> trying to generate a cryptographic key which is security critical. If
> so, then DON'T DO THAT.
>
> There have been plenty of really embarrassing security problems caused
> by consumer grade products who generate a public/private key pair
> within seconds of the customer taking the product out of the box, and
> plugging it into the wall for the first time. At which point,
> hilarity ensues, unless the box is life- or mission- critical, in
> which case tragedy ensues....
>
> Is it possible to move the userspace program so it's not being started
> early in the boot process? What is it doing, and why does it need
> random data in the first place?
>
> - Ted
> .

Hi Ted,

Thanks for the comment. The code is not written by me, but I think you
made a good point here and I'll definitely bring this up to the author.

--
Best
GUO Zihua

2022-07-21 11:44:37

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/21 18:37, Jason A. Donenfeld wrote:
> Hi Guozihua,
>
> On Wed, Jul 20, 2022 at 11:50:46PM -0700, Eric Biggers wrote:
>> On Thu, Jul 21, 2022 at 02:44:54PM +0800, Guozihua (Scott) wrote:
>>
>> That doesn't make any sense; you should just use /dev/urandom unconditionally.
>
> What Eric said: this flow doesn't really make sense. Why not use
> /dev/urandom unconditionally or getrandom(GRND_INSECURE)?
>
> But also I have to wonder: you wrote '-EAGAIN' but usually userspace
> checks errno==EAGAIN, a positive value. That makes me wonder whether you
> wrote your email with your code is open. So I just wanted to triple
> check that what you've described is actually what the code is doing,
> just in case there's some ambiguity.
>
> I'm just trying to find out what this code is and where it is to assess
> whether we change the userspace behavior again, given that this has been
> sitting for several years now.
>
> Jason
> .

Hi Jason and Eric.

To clarify, the code in question is not written by me and I did not see
the code myself, the code is from another team. We discovered this
change during the test when we try to run our userspace program on a
newer version kernel, and it blocks for a long time during the boot
process. It seems that the author use the -EAGAIN error code as an
indication that /dev/random is not ready and they implemented a "best
effort" mechanism in terms of getting random data.

Honestly speaking I don't know what they are using those random data
for, and I am trying to get some background knowledge for this flag and
the change, maybe figure out whether that team is using the flag as
intended, and bring this up with them.

--
Best
GUO Zihua

2022-07-26 07:47:19

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/21 18:37, Jason A. Donenfeld wrote:
> Hi Guozihua,
>
> On Wed, Jul 20, 2022 at 11:50:46PM -0700, Eric Biggers wrote:
>> On Thu, Jul 21, 2022 at 02:44:54PM +0800, Guozihua (Scott) wrote:
>>>
>>> Hi Eric
>>>
>>> We have a userspace program that starts pretty early in the boot process and
>>> it tries to fetch random bits from /dev/random with O_NONBLOCK, if that
>>> returns -EAGAIN, it turns to /dev/urandom. Is this a correct handling of
>>> -EAGAIN? Or this is not one of the intended use case of O_NONBLOCK?
>>
>> That doesn't make any sense; you should just use /dev/urandom unconditionally.
>
> What Eric said: this flow doesn't really make sense. Why not use
> /dev/urandom unconditionally or getrandom(GRND_INSECURE)?
>
> But also I have to wonder: you wrote '-EAGAIN' but usually userspace
> checks errno==EAGAIN, a positive value. That makes me wonder whether you
> wrote your email with your code is open. So I just wanted to triple
> check that what you've described is actually what the code is doing,
> just in case there's some ambiguity.
>
> I'm just trying to find out what this code is and where it is to assess
> whether we change the userspace behavior again, given that this has been
> sitting for several years now.
>
> Jason
> .

Hi Jason, Eric and Theodore,

Thanks for all the comments on this inquiry. Does the community has any
channel to publishes changes like these? And will the man pages get
updated? If so, are there any time frame?

--
Best
GUO Zihua

2022-07-26 11:12:15

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

Hi,

On Tue, Jul 26, 2022 at 03:43:31PM +0800, Guozihua (Scott) wrote:
> Thanks for all the comments on this inquiry. Does the community has any
> channel to publishes changes like these? And will the man pages get
> updated? If so, are there any time frame?

I was under the impression you were ultimately okay with the status quo.
Have I misunderstood you?

Thanks,
Jason

2022-07-26 11:36:38

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/26 19:08, Jason A. Donenfeld wrote:
> Hi,
>
> On Tue, Jul 26, 2022 at 03:43:31PM +0800, Guozihua (Scott) wrote:
>> Thanks for all the comments on this inquiry. Does the community has any
>> channel to publishes changes like these? And will the man pages get
>> updated? If so, are there any time frame?
>
> I was under the impression you were ultimately okay with the status quo.
> Have I misunderstood you?
>
> Thanks,
> Jason
> .

Hi Jason.

To clarify, I does not have any issue with this change. I asked here
only because I would like some background knowledge on this flag, to
ensure I am on the same page as the community regarding this flag and
the change. And it seems that I understands it correctly.

However I do think it's a good idea to update the document soon to avoid
any misunderstanding in the future.

--
Best
GUO Zihua

2022-07-28 08:35:50

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/26 19:33, Guozihua (Scott) wrote:
> On 2022/7/26 19:08, Jason A. Donenfeld wrote:
>> Hi,
>>
>> On Tue, Jul 26, 2022 at 03:43:31PM +0800, Guozihua (Scott) wrote:
>>> Thanks for all the comments on this inquiry. Does the community has any
>>> channel to publishes changes like these? And will the man pages get
>>> updated? If so, are there any time frame?
>>
>> I was under the impression you were ultimately okay with the status quo.
>> Have I misunderstood you?
>>
>> Thanks,
>> Jason
>> .
>
> Hi Jason.
>
> To clarify, I does not have any issue with this change. I asked here
> only because I would like some background knowledge on this flag, to
> ensure I am on the same page as the community regarding this flag and
> the change. And it seems that I understands it correctly.
>
> However I do think it's a good idea to update the document soon to avoid
> any misunderstanding in the future.
>

Our colleague suggests that we should inform users clearly about the
change on the flag by returning -EINVAL when /dev/random gets this flag
during boot process. Otherwise programs might silently block for a long
time, causing other issues. Do you think this is a good way to prevent
similar issues on this flag?

--
Best
GUO Zihua

2022-09-06 07:26:34

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/7/28 16:24, Guozihua (Scott) wrote:
> On 2022/7/26 19:33, Guozihua (Scott) wrote:
>> On 2022/7/26 19:08, Jason A. Donenfeld wrote:
>>> Hi,
>>>
>>> On Tue, Jul 26, 2022 at 03:43:31PM +0800, Guozihua (Scott) wrote:
>>>> Thanks for all the comments on this inquiry. Does the community has any
>>>> channel to publishes changes like these? And will the man pages get
>>>> updated? If so, are there any time frame?
>>>
>>> I was under the impression you were ultimately okay with the status quo.
>>> Have I misunderstood you?
>>>
>>> Thanks,
>>> Jason
>>> .
>>
>> Hi Jason.
>>
>> To clarify, I does not have any issue with this change. I asked here
>> only because I would like some background knowledge on this flag, to
>> ensure I am on the same page as the community regarding this flag and
>> the change. And it seems that I understands it correctly.
>>
>> However I do think it's a good idea to update the document soon to
>> avoid any misunderstanding in the future.
>>
>
> Our colleague suggests that we should inform users clearly about the
> change on the flag by returning -EINVAL when /dev/random gets this flag
> during boot process. Otherwise programs might silently block for a long
> time, causing other issues. Do you think this is a good way to prevent
> similar issues on this flag?
>
Hi,

Any comment on this?

--
Best
GUO Zihua

2022-09-06 10:21:25

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On Thu, Jul 28, 2022 at 10:25 AM Guozihua (Scott) <[email protected]> wrote:
>
> On 2022/7/26 19:33, Guozihua (Scott) wrote:
> > On 2022/7/26 19:08, Jason A. Donenfeld wrote:
> >> Hi,
> >>
> >> On Tue, Jul 26, 2022 at 03:43:31PM +0800, Guozihua (Scott) wrote:
> >>> Thanks for all the comments on this inquiry. Does the community has any
> >>> channel to publishes changes like these? And will the man pages get
> >>> updated? If so, are there any time frame?
> >>
> >> I was under the impression you were ultimately okay with the status quo.
> >> Have I misunderstood you?
> >>
> >> Thanks,
> >> Jason
> >> .
> >
> > Hi Jason.
> >
> > To clarify, I does not have any issue with this change. I asked here
> > only because I would like some background knowledge on this flag, to
> > ensure I am on the same page as the community regarding this flag and
> > the change. And it seems that I understands it correctly.
> >
> > However I do think it's a good idea to update the document soon to avoid
> > any misunderstanding in the future.
> >
>
> Our colleague suggests that we should inform users clearly about the
> change on the flag by returning -EINVAL when /dev/random gets this flag
> during boot process. Otherwise programs might silently block for a long
> time, causing other issues. Do you think this is a good way to prevent
> similar issues on this flag?

I still don't really understand what you want. First you said this was
a problem and we should reintroduce the old behavior. Then you said no
big deal and the docs just needed to be updated. Now you're saying
this is a problem and we should reintroduce the old behavior?

I'm just a bit lost on where we were in the conversation.

Also, could you let me know whether this is affecting real things for
Huawei, or if this is just something you happened to notice but
doesn't have any practical impact?

Jason

2022-09-07 13:05:14

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On Tue, Sep 06, 2022 at 12:16:56PM +0200, Jason A. Donenfeld wrote:
> On Thu, Jul 28, 2022 at 10:25 AM Guozihua (Scott) <[email protected]> wrote:
> >
> > On 2022/7/26 19:33, Guozihua (Scott) wrote:
> > > On 2022/7/26 19:08, Jason A. Donenfeld wrote:
> > >> Hi,
> > >>
> > >> On Tue, Jul 26, 2022 at 03:43:31PM +0800, Guozihua (Scott) wrote:
> > >>> Thanks for all the comments on this inquiry. Does the community has any
> > >>> channel to publishes changes like these? And will the man pages get
> > >>> updated? If so, are there any time frame?
> > >>
> > >> I was under the impression you were ultimately okay with the status quo.
> > >> Have I misunderstood you?
> > >>
> > >> Thanks,
> > >> Jason
> > >> .
> > >
> > > Hi Jason.
> > >
> > > To clarify, I does not have any issue with this change. I asked here
> > > only because I would like some background knowledge on this flag, to
> > > ensure I am on the same page as the community regarding this flag and
> > > the change. And it seems that I understands it correctly.
> > >
> > > However I do think it's a good idea to update the document soon to avoid
> > > any misunderstanding in the future.
> > >
> >
> > Our colleague suggests that we should inform users clearly about the
> > change on the flag by returning -EINVAL when /dev/random gets this flag
> > during boot process. Otherwise programs might silently block for a long
> > time, causing other issues. Do you think this is a good way to prevent
> > similar issues on this flag?
>
> I still don't really understand what you want. First you said this was
> a problem and we should reintroduce the old behavior. Then you said no
> big deal and the docs just needed to be updated. Now you're saying
> this is a problem and we should reintroduce the old behavior?
>
> I'm just a bit lost on where we were in the conversation.
>
> Also, could you let me know whether this is affecting real things for
> Huawei, or if this is just something you happened to notice but
> doesn't have any practical impact?

Just following up on this again...

2022-09-08 04:06:30

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/9/7 21:03, Jason A. Donenfeld wrote:
> On Tue, Sep 06, 2022 at 12:16:56PM +0200, Jason A. Donenfeld wrote:
>> On Thu, Jul 28, 2022 at 10:25 AM Guozihua (Scott) <[email protected]> wrote:
>>>
>>> On 2022/7/26 19:33, Guozihua (Scott) wrote:
>>>> On 2022/7/26 19:08, Jason A. Donenfeld wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Jul 26, 2022 at 03:43:31PM +0800, Guozihua (Scott) wrote:
>>>>>> Thanks for all the comments on this inquiry. Does the community has any
>>>>>> channel to publishes changes like these? And will the man pages get
>>>>>> updated? If so, are there any time frame?
>>>>>
>>>>> I was under the impression you were ultimately okay with the status quo.
>>>>> Have I misunderstood you?
>>>>>
>>>>> Thanks,
>>>>> Jason
>>>>> .
>>>>
>>>> Hi Jason.
>>>>
>>>> To clarify, I does not have any issue with this change. I asked here
>>>> only because I would like some background knowledge on this flag, to
>>>> ensure I am on the same page as the community regarding this flag and
>>>> the change. And it seems that I understands it correctly.
>>>>
>>>> However I do think it's a good idea to update the document soon to avoid
>>>> any misunderstanding in the future.
>>>>
>>>
>>> Our colleague suggests that we should inform users clearly about the
>>> change on the flag by returning -EINVAL when /dev/random gets this flag
>>> during boot process. Otherwise programs might silently block for a long
>>> time, causing other issues. Do you think this is a good way to prevent
>>> similar issues on this flag?
>>
>> I still don't really understand what you want. First you said this was
>> a problem and we should reintroduce the old behavior. Then you said no
>> big deal and the docs just needed to be updated. Now you're saying
>> this is a problem and we should reintroduce the old behavior?
>>
>> I'm just a bit lost on where we were in the conversation.
>>
>> Also, could you let me know whether this is affecting real things for
>> Huawei, or if this is just something you happened to notice but
>> doesn't have any practical impact?
>
> Just following up on this again...
> .

Hi Jason,

Thank you for the timely respond and your patient. And sorry for the
confusion.

First of all, what we think is that this change (removing O_NONBLOCK) is
reasonable. However, this do cause issue during the test on one of our
product which uses O_NONBLOCK flag the way I presented earlier in the
Linux 4.4 era. Thus our colleague suggests that returning -EINVAL when
this flag is received would be a good way to indicate this change.

For example:


--
Best
GUO Zihua

--
Best
GUO Zihua

2022-09-08 10:01:48

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

Hi,

On Thu, Sep 08, 2022 at 11:31:31AM +0800, Guozihua (Scott) wrote:
> For example:
>
>
> --
> Best
> GUO Zihua
>
> --
> Best
> GUO Zihua

Looks like you forgot to paste the example...

> Thank you for the timely respond and your patient. And sorry for the
> confusion.
>
> First of all, what we think is that this change (removing O_NONBLOCK) is
> reasonable. However, this do cause issue during the test on one of our
> product which uses O_NONBLOCK flag the way I presented earlier in the
> Linux 4.4 era. Thus our colleague suggests that returning -EINVAL when
> this flag is received would be a good way to indicate this change.

No, I don't think it's wise to introduce yet *new* behavior (your
proposed -EINVAL). That would just exacerbate the (mostly) invisible
breakage from the 5.6-era change.

The question now before us is whether to bring back the behavior that
was there pre-5.6, or to keep the behavior that has existed since 5.6.
Accidental regressions like this (I assume it was accidental, at least)
that are unnoticed for so long tend to ossify and become the new
expected behavior. It's been around 2.5 years since 5.6, and this is the
first report of breakage. But the fact that it does break things for you
*is* still significant.

If this was just something you noticed during idle curiosity but doesn't
have a real impact on anything, then I'm inclined to think we shouldn't
go changing the behavior /again/ after 2.5 years. But it sounds like
actually you have a real user space in a product that stopped working
when you tried to upgrade the kernel from 4.4 to one >5.6. If this is
the case, then this sounds truly like a userspace-breaking regression,
which we should fix by restoring the old behavior. Can you confirm this
is the case? And in the meantime, I'll prepare a patch for restoring
that old behavior.

Jason

2022-09-08 10:44:17

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

Hey Al,

I'm CC'ing you into this thread, as you might have an opinion on the
matter. I also have a few odd questions and observations about
implementing this now that we use iters.

On Thu, Sep 08, 2022 at 11:51:52AM +0200, Jason A. Donenfeld wrote:
> The question now before us is whether to bring back the behavior that
> was there pre-5.6, or to keep the behavior that has existed since 5.6.
> Accidental regressions like this (I assume it was accidental, at least)
> that are unnoticed for so long tend to ossify and become the new
> expected behavior. It's been around 2.5 years since 5.6, and this is the
> first report of breakage. But the fact that it does break things for you
> *is* still significant.
>
> If this was just something you noticed during idle curiosity but doesn't
> have a real impact on anything, then I'm inclined to think we shouldn't
> go changing the behavior /again/ after 2.5 years. But it sounds like
> actually you have a real user space in a product that stopped working
> when you tried to upgrade the kernel from 4.4 to one >5.6. If this is
> the case, then this sounds truly like a userspace-breaking regression,
> which we should fix by restoring the old behavior. Can you confirm this
> is the case? And in the meantime, I'll prepare a patch for restoring
> that old behavior.

The tl;dr of the thread is that kernels before 5.6 would return -EAGAIN
when reading from /dev/random during early boot if the fd was opened
with O_NONBLOCK. Some refactoring done 2.5 years ago evidently removed
this, and so we're now getting a report that this might have broken
something. One question is whether to fix the regression, or if 2.5
years is time enough for ossification. But assuming it should be fixed,
I started looking into implementing that.

The most straight forward approach to directly handle the regression
would be just doing this:

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 79d7d4e4e582..09c944dce7ff 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1347,6 +1347,9 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter)
{
int ret;

+ if (!crng_ready() && (kiocb->ki_filp->f_flags & O_NONBLOCK))
+ return -EAGAIN;
+
ret = wait_for_random_bytes();
if (ret != 0)
return ret;

So that works. But then I started looking at the other knobs in kiocb
and in the fmode interface in general and landed in a world of
confusion. There are a few other things that might be done:

- Also check `kiocb->ki_flags & IOCB_NOWAIT`.
- Also check `kiocb->ki_flags & IOCB_NOIO`.
- Set the `FMODE_NOWAIT` when declaring the file.
- Other weird aio things?

Do any of these make sense to do?

I started playing with userspace programs to try and trigger some of
those ki_flags, and I saw that the preadv2(2) syscall has the RWF_NOWAIT
flag, so I coded that up and nothing happened. I went grepping and found
some things:

In linux/fs.h:
#define IOCB_NOWAIT (__force int) RWF_NOWAIT
so these are intended to be the same. That seems reflected too in
overlayfs/file.c:
if (ifl & IOCB_NOWAIT)
flags |= RWF_NOWAIT;
Then later in linux/fs.h, it seems like RWF_NOWAIT is *also* enabling
IOCB_NOIO:
if (flags & RWF_NOWAIT) {
if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
return -EOPNOTSUPP;
kiocb_flags |= IOCB_NOIO;
}
kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
But then most of the places involved with IOCB_NOIO are also checking
IOCB_NOWAIT at the same time. Except for one place in filemap?

So it's not really clear to me what the intended behavior is of these,
and I'm wondering if we're hitting another no_llseek-like redundancy
again here, or if something else is up. Or, more likely, I'm just
overwhelmed by the undocumented subtleties here.

Jason

2022-09-08 14:28:22

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH] random: restore O_NONBLOCK support

Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would
return -EAGAIN if there was no entropy. When the pools were unified in
5.6, this was lost. The post 5.6 behavior of blocking until the pool is
initialized, and ignoring O_NONBLOCK in the process, went unnoticed,
with no reports about the regression received for two and a half years.
However, eventually this indeed did break somebody's userspace.

So we restore the old behavior, by returning -EAGAIN if the pool is not
initialized. Unlike the old /dev/random, this can only occur during
early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other
expectations, also respect users reading with preadv2(RWF_NOWAIT) and
similar.

Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom")
Reported-by: Guozihua <[email protected]>
Reported-by: Zhongguohua <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Andrew Lutomirski <[email protected]>
Cc: [email protected]
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
drivers/char/mem.c | 4 ++--
drivers/char/random.c | 5 +++++
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 84ca98ed1dad..c2b37009b11e 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -706,8 +706,8 @@ static const struct memdev {
#endif
[5] = { "zero", 0666, &zero_fops, FMODE_NOWAIT },
[7] = { "full", 0666, &full_fops, 0 },
- [8] = { "random", 0666, &random_fops, 0 },
- [9] = { "urandom", 0666, &urandom_fops, 0 },
+ [8] = { "random", 0666, &random_fops, FMODE_NOWAIT },
+ [9] = { "urandom", 0666, &urandom_fops, FMODE_NOWAIT },
#ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, &kmsg_fops, 0 },
#endif
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 79d7d4e4e582..c8cc23515568 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1347,6 +1347,11 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter)
{
int ret;

+ if (!crng_ready() &&
+ ((kiocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) ||
+ (kiocb->ki_filp->f_flags & O_NONBLOCK)))
+ return -EAGAIN;
+
ret = wait_for_random_bytes();
if (ret != 0)
return ret;
--
2.37.3

2022-09-19 10:52:13

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/9/8 17:51, Jason A. Donenfeld wrote:
> Hi,
>
> On Thu, Sep 08, 2022 at 11:31:31AM +0800, Guozihua (Scott) wrote:
>> For example:
>>
>>
>> --
>> Best
>> GUO Zihua
>>
>> --
>> Best
>> GUO Zihua
>
> Looks like you forgot to paste the example...
>
>> Thank you for the timely respond and your patient. And sorry for the
>> confusion.
>>
>> First of all, what we think is that this change (removing O_NONBLOCK) is
>> reasonable. However, this do cause issue during the test on one of our
>> product which uses O_NONBLOCK flag the way I presented earlier in the
>> Linux 4.4 era. Thus our colleague suggests that returning -EINVAL when
>> this flag is received would be a good way to indicate this change.
>
> No, I don't think it's wise to introduce yet *new* behavior (your
> proposed -EINVAL). That would just exacerbate the (mostly) invisible
> breakage from the 5.6-era change.
>
> The question now before us is whether to bring back the behavior that
> was there pre-5.6, or to keep the behavior that has existed since 5.6.
> Accidental regressions like this (I assume it was accidental, at least)
> that are unnoticed for so long tend to ossify and become the new
> expected behavior. It's been around 2.5 years since 5.6, and this is the
> first report of breakage. But the fact that it does break things for you
> *is* still significant.
>
> If this was just something you noticed during idle curiosity but doesn't
> have a real impact on anything, then I'm inclined to think we shouldn't
> go changing the behavior /again/ after 2.5 years. But it sounds like
> actually you have a real user space in a product that stopped working
> when you tried to upgrade the kernel from 4.4 to one >5.6. If this is
> the case, then this sounds truly like a userspace-breaking regression,
> which we should fix by restoring the old behavior. Can you confirm this
> is the case? And in the meantime, I'll prepare a patch for restoring
> that old behavior.
>
> Jason
> .

Hi Jason

Thank for your patience.

To answer your question, yes, we do have a userspace program reading
/dev/random during early boot which relies on O_NONBLOCK. And this
change do breaks it. The userspace program comes from 4.4 era, and as
4.4 is going EOL, we are switching to 5.10 and the breakage is reported.

It would be great if the kernel is able to restore this flag for
backward compatibility.

--
Best
GUO Zihua

2022-09-19 10:52:15

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

Hi,

On Mon, Sep 19, 2022 at 11:27 AM Guozihua (Scott) <[email protected]> wrote:
>
> On 2022/9/8 17:51, Jason A. Donenfeld wrote:
> > Hi,
> >
> > On Thu, Sep 08, 2022 at 11:31:31AM +0800, Guozihua (Scott) wrote:
> >> For example:
> >>
> >>
> >> --
> >> Best
> >> GUO Zihua
> >>
> >> --
> >> Best
> >> GUO Zihua
> >
> > Looks like you forgot to paste the example...
> >
> >> Thank you for the timely respond and your patient. And sorry for the
> >> confusion.
> >>
> >> First of all, what we think is that this change (removing O_NONBLOCK) is
> >> reasonable. However, this do cause issue during the test on one of our
> >> product which uses O_NONBLOCK flag the way I presented earlier in the
> >> Linux 4.4 era. Thus our colleague suggests that returning -EINVAL when
> >> this flag is received would be a good way to indicate this change.
> >
> > No, I don't think it's wise to introduce yet *new* behavior (your
> > proposed -EINVAL). That would just exacerbate the (mostly) invisible
> > breakage from the 5.6-era change.
> >
> > The question now before us is whether to bring back the behavior that
> > was there pre-5.6, or to keep the behavior that has existed since 5.6.
> > Accidental regressions like this (I assume it was accidental, at least)
> > that are unnoticed for so long tend to ossify and become the new
> > expected behavior. It's been around 2.5 years since 5.6, and this is the
> > first report of breakage. But the fact that it does break things for you
> > *is* still significant.
> >
> > If this was just something you noticed during idle curiosity but doesn't
> > have a real impact on anything, then I'm inclined to think we shouldn't
> > go changing the behavior /again/ after 2.5 years. But it sounds like
> > actually you have a real user space in a product that stopped working
> > when you tried to upgrade the kernel from 4.4 to one >5.6. If this is
> > the case, then this sounds truly like a userspace-breaking regression,
> > which we should fix by restoring the old behavior. Can you confirm this
> > is the case? And in the meantime, I'll prepare a patch for restoring
> > that old behavior.
> >
> > Jason
> > .
>
> Hi Jason
>
> Thank for your patience.
>
> To answer your question, yes, we do have a userspace program reading
> /dev/random during early boot which relies on O_NONBLOCK. And this
> change do breaks it. The userspace program comes from 4.4 era, and as
> 4.4 is going EOL, we are switching to 5.10 and the breakage is reported.
>
> It would be great if the kernel is able to restore this flag for
> backward compatibility.

Alright then. Sounds like a clear case of userspace being broken. I'll
include https://git.zx2c4.com/linux-rng/commit/?id=b931eaf6ef5cef474a1171542a872a5e270e3491
or similar in my pull for 6.1, if that's okay with you. For 6.0, we're
already at rc6, so maybe better to let this one stew for a bit longer,
given the change, unless you feel strongly about having it earlier, I
guess.

Jason

2022-09-19 10:53:07

by GUO Zihua

[permalink] [raw]
Subject: Re: Inquiry about the removal of flag O_NONBLOCK on /dev/random

On 2022/9/19 18:40, Jason A. Donenfeld wrote:
> Hi,
>
> On Mon, Sep 19, 2022 at 11:27 AM Guozihua (Scott) <[email protected]> wrote:
>>
>> On 2022/9/8 17:51, Jason A. Donenfeld wrote:
>>> Hi,
>>>
>>> On Thu, Sep 08, 2022 at 11:31:31AM +0800, Guozihua (Scott) wrote:
>>>> For example:
>>>>
>>>>
>>>> --
>>>> Best
>>>> GUO Zihua
>>>>
>>>> --
>>>> Best
>>>> GUO Zihua
>>>
>>> Looks like you forgot to paste the example...
>>>
>>>> Thank you for the timely respond and your patient. And sorry for the
>>>> confusion.
>>>>
>>>> First of all, what we think is that this change (removing O_NONBLOCK) is
>>>> reasonable. However, this do cause issue during the test on one of our
>>>> product which uses O_NONBLOCK flag the way I presented earlier in the
>>>> Linux 4.4 era. Thus our colleague suggests that returning -EINVAL when
>>>> this flag is received would be a good way to indicate this change.
>>>
>>> No, I don't think it's wise to introduce yet *new* behavior (your
>>> proposed -EINVAL). That would just exacerbate the (mostly) invisible
>>> breakage from the 5.6-era change.
>>>
>>> The question now before us is whether to bring back the behavior that
>>> was there pre-5.6, or to keep the behavior that has existed since 5.6.
>>> Accidental regressions like this (I assume it was accidental, at least)
>>> that are unnoticed for so long tend to ossify and become the new
>>> expected behavior. It's been around 2.5 years since 5.6, and this is the
>>> first report of breakage. But the fact that it does break things for you
>>> *is* still significant.
>>>
>>> If this was just something you noticed during idle curiosity but doesn't
>>> have a real impact on anything, then I'm inclined to think we shouldn't
>>> go changing the behavior /again/ after 2.5 years. But it sounds like
>>> actually you have a real user space in a product that stopped working
>>> when you tried to upgrade the kernel from 4.4 to one >5.6. If this is
>>> the case, then this sounds truly like a userspace-breaking regression,
>>> which we should fix by restoring the old behavior. Can you confirm this
>>> is the case? And in the meantime, I'll prepare a patch for restoring
>>> that old behavior.
>>>
>>> Jason
>>> .
>>
>> Hi Jason
>>
>> Thank for your patience.
>>
>> To answer your question, yes, we do have a userspace program reading
>> /dev/random during early boot which relies on O_NONBLOCK. And this
>> change do breaks it. The userspace program comes from 4.4 era, and as
>> 4.4 is going EOL, we are switching to 5.10 and the breakage is reported.
>>
>> It would be great if the kernel is able to restore this flag for
>> backward compatibility.
>
> Alright then. Sounds like a clear case of userspace being broken. I'll
> include https://git.zx2c4.com/linux-rng/commit/?id=b931eaf6ef5cef474a1171542a872a5e270e3491
> or similar in my pull for 6.1, if that's okay with you. For 6.0, we're
> already at rc6, so maybe better to let this one stew for a bit longer,
> given the change, unless you feel strongly about having it earlier, I
> guess.
>
> Jason
> .

Hi Jason

That's OK with us. Thanks.

--
Best
GUO Zihua