This reverts commit 18b915ac6b0ac5ba7ded03156860f60a9f16df2b.
When CONFIG_RANDOM_TRUST_BOOTLOADER is enabled add_bootloader_randomness()
calls add_hwgenerator_randomness() which might sleep, but this is not
possible during early kernel initialization. This revert fixes following
NULL pointer deference:
[ 0.000000] efi: seeding entropy pool
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
[ 0.000000] pc : kthread_should_stop+0x2c/0x60
[ 0.000000] lr : add_hwgenerator_randomness+0x58/0x178
...
[ 0.000000] Call trace:
[ 0.000000] kthread_should_stop+0x2c/0x60
[ 0.000000] add_bootloader_randomness+0x2c/0x38
[ 0.000000] efi_config_parse_tables+0x120/0x250
[ 0.000000] efi_init+0x138/0x1e0
[ 0.000000] setup_arch+0x394/0x778
[ 0.000000] start_kernel+0x90/0x568
Signed-off-by: Ivan T. Ivanov <[email protected]>
---
drivers/firmware/efi/efi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 847f33ffc4ae..8aad3c524947 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -600,7 +600,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
sizeof(*seed) + size);
if (seed != NULL) {
pr_notice("seeding entropy pool\n");
- add_bootloader_randomness(seed->bits, size);
+ add_device_randomness(seed->bits, size);
early_memunmap(seed, sizeof(*seed) + size);
} else {
pr_err("Could not map UEFI random seed!\n");
--
2.33.0
Am Tue, Oct 12, 2021 at 11:27:08AM +0300 schrieb Ivan T. Ivanov:
> This reverts commit 18b915ac6b0ac5ba7ded03156860f60a9f16df2b.
>
> When CONFIG_RANDOM_TRUST_BOOTLOADER is enabled add_bootloader_randomness()
> calls add_hwgenerator_randomness() which might sleep,
Wouldn't it be better to fix add_bootloader_randomness(), considering that
calls to that function are likely to happen quite early during kernel
initialization? Especially as it seems to have worked beforehand?
Thanks,
Dominik
> but this is not
> possible during early kernel initialization. This revert fixes following
> NULL pointer deference:
>
> [ 0.000000] efi: seeding entropy pool
> [ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> ...
> [ 0.000000] pc : kthread_should_stop+0x2c/0x60
> [ 0.000000] lr : add_hwgenerator_randomness+0x58/0x178
> ...
> [ 0.000000] Call trace:
> [ 0.000000] kthread_should_stop+0x2c/0x60
> [ 0.000000] add_bootloader_randomness+0x2c/0x38
> [ 0.000000] efi_config_parse_tables+0x120/0x250
> [ 0.000000] efi_init+0x138/0x1e0
> [ 0.000000] setup_arch+0x394/0x778
> [ 0.000000] start_kernel+0x90/0x568
>
> Signed-off-by: Ivan T. Ivanov <[email protected]>
> ---
> drivers/firmware/efi/efi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 847f33ffc4ae..8aad3c524947 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -600,7 +600,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> sizeof(*seed) + size);
> if (seed != NULL) {
> pr_notice("seeding entropy pool\n");
> - add_bootloader_randomness(seed->bits, size);
> + add_device_randomness(seed->bits, size);
> early_memunmap(seed, sizeof(*seed) + size);
> } else {
> pr_err("Could not map UEFI random seed!\n");
> --
> 2.33.0
>
Hi,
Quoting Dominik Brodowski (2021-10-12 11:40:34)
> Am Tue, Oct 12, 2021 at 11:27:08AM +0300 schrieb Ivan T. Ivanov:
> > This reverts commit 18b915ac6b0ac5ba7ded03156860f60a9f16df2b.
> >
> > When CONFIG_RANDOM_TRUST_BOOTLOADER is enabled add_bootloader_randomness()
> > calls add_hwgenerator_randomness() which might sleep,
>
> Wouldn't it be better to fix add_bootloader_randomness(), considering
> that
> calls to that function are likely to happen quite early during kernel
> initialization? Especially as it seems to have worked beforehand?
I have tried. I made wait_event_interruptible() optional, but then
crng_reseed() segfault badly. And I don't think crng_reseed() is
something that I could fix easily. Suggestions are welcomed ;-)
Regards,
Ivan
On Wed, 13 Oct 2021 at 09:30, Ivan T. Ivanov <[email protected]> wrote:
>
> Hi,
>
> Quoting Dominik Brodowski (2021-10-12 11:40:34)
> > Am Tue, Oct 12, 2021 at 11:27:08AM +0300 schrieb Ivan T. Ivanov:
> > > This reverts commit 18b915ac6b0ac5ba7ded03156860f60a9f16df2b.
> > >
> > > When CONFIG_RANDOM_TRUST_BOOTLOADER is enabled add_bootloader_randomness()
> > > calls add_hwgenerator_randomness() which might sleep,
> >
> > Wouldn't it be better to fix add_bootloader_randomness(), considering
> > that
> > calls to that function are likely to happen quite early during kernel
> > initialization? Especially as it seems to have worked beforehand?
>
> I have tried. I made wait_event_interruptible() optional, but then
> crng_reseed() segfault badly. And I don't think crng_reseed() is
> something that I could fix easily. Suggestions are welcomed ;-)
>
How about
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..1828dc691ebf 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2297,9 +2297,8 @@ EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
*/
void add_bootloader_randomness(const void *buf, unsigned int size)
{
+ add_device_randomness(buf, size);
if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
- add_hwgenerator_randomness(buf, size, size * 8);
- else
- add_device_randomness(buf, size);
+ credit_entropy(&input_pool, size * 8);
}
EXPORT_SYMBOL_GPL(add_bootloader_randomness);
On 2021-10-13 10:50, Ard Biesheuvel wrote:
> On Wed, 13 Oct 2021 at 09:30, Ivan T. Ivanov <[email protected]> wrote:
>>
>> Hi,
>>
>> Quoting Dominik Brodowski (2021-10-12 11:40:34)
>> > Am Tue, Oct 12, 2021 at 11:27:08AM +0300 schrieb Ivan T. Ivanov:
>> > > This reverts commit 18b915ac6b0ac5ba7ded03156860f60a9f16df2b.
>> > >
>> > > When CONFIG_RANDOM_TRUST_BOOTLOADER is enabled add_bootloader_randomness()
>> > > calls add_hwgenerator_randomness() which might sleep,
>> >
>> > Wouldn't it be better to fix add_bootloader_randomness(), considering
>> > that
>> > calls to that function are likely to happen quite early during kernel
>> > initialization? Especially as it seems to have worked beforehand?
>>
>> I have tried. I made wait_event_interruptible() optional, but then
>> crng_reseed() segfault badly. And I don't think crng_reseed() is
>> something that I could fix easily. Suggestions are welcomed ;-)
>>
>
> How about
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 605969ed0f96..1828dc691ebf 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -2297,9 +2297,8 @@ EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
> */
> void add_bootloader_randomness(const void *buf, unsigned int size)
> {
> + add_device_randomness(buf, size);
> if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
> - add_hwgenerator_randomness(buf, size, size * 8);
> - else
> - add_device_randomness(buf, size);
> + credit_entropy(&input_pool, size * 8);
> }
> EXPORT_SYMBOL_GPL(add_bootloader_randomness);
This is more or less what I have done. credit_entropy() calls
crng_reseed() which crash badly.
Will check again, but I am sure it will give me same result.
Regards,
Ivan
Hi,
> On 13 Oct 2021, at 10:50, Ard Biesheuvel <[email protected]> wrote:
>
> On Wed, 13 Oct 2021 at 09:30, Ivan T. Ivanov <[email protected]> wrote:
>>
>> Hi,
>>
>> Quoting Dominik Brodowski (2021-10-12 11:40:34)
>>> Am Tue, Oct 12, 2021 at 11:27:08AM +0300 schrieb Ivan T. Ivanov:
>>>> This reverts commit 18b915ac6b0ac5ba7ded03156860f60a9f16df2b.
>>>>
>>>> When CONFIG_RANDOM_TRUST_BOOTLOADER is enabled add_bootloader_randomness()
>>>> calls add_hwgenerator_randomness() which might sleep,
>>>
>>> Wouldn't it be better to fix add_bootloader_randomness(), considering
>>> that
>>> calls to that function are likely to happen quite early during kernel
>>> initialization? Especially as it seems to have worked beforehand?
>>
>> I have tried. I made wait_event_interruptible() optional, but then
>> crng_reseed() segfault badly. And I don't think crng_reseed() is
>> something that I could fix easily. Suggestions are welcomed ;-)
>>
>
> How about
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 605969ed0f96..1828dc691ebf 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -2297,9 +2297,8 @@ EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
> */
> void add_bootloader_randomness(const void *buf, unsigned int size)
> {
> + add_device_randomness(buf, size);
> if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
> - add_hwgenerator_randomness(buf, size, size * 8);
> - else
> - add_device_randomness(buf, size);
> + credit_entropy(&input_pool, size * 8);
> }
> EXPORT_SYMBOL_GPL(add_bootloader_randomness);
This doesn’t boot. I just changed following and kernel panics.
- credit_entropy
+ credit_entropy_bits
Please see attached file.
Regards,
Ivan
Hi,
> On 13 Oct 2021, at 12:53, Ivan T. Ivanov <[email protected]> wrote:
>
>
>
>> On 13 Oct 2021, at 12:51, Ivan T. Ivanov <[email protected]> wrote:
>>
>> Hi,
>>
>>> On 13 Oct 2021, at 10:50, Ard Biesheuvel <[email protected]> wrote:
>>>
>>> On Wed, 13 Oct 2021 at 09:30, Ivan T. Ivanov <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Quoting Dominik Brodowski (2021-10-12 11:40:34)
>>>>> Am Tue, Oct 12, 2021 at 11:27:08AM +0300 schrieb Ivan T. Ivanov:
>>>>>> This reverts commit 18b915ac6b0ac5ba7ded03156860f60a9f16df2b.
>>>>>>
>>>>>> When CONFIG_RANDOM_TRUST_BOOTLOADER is enabled add_bootloader_randomness()
>>>>>> calls add_hwgenerator_randomness() which might sleep,
>>>>>
>>>>> Wouldn't it be better to fix add_bootloader_randomness(), considering
>>>>> that
>>>>> calls to that function are likely to happen quite early during kernel
>>>>> initialization? Especially as it seems to have worked beforehand?
>>>>
>>>> I have tried. I made wait_event_interruptible() optional, but then
>>>> crng_reseed() segfault badly. And I don't think crng_reseed() is
>>>> something that I could fix easily. Suggestions are welcomed ;-)
>>>>
>>>
>>> How about
>>>
>>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>>> index 605969ed0f96..1828dc691ebf 100644
>>> --- a/drivers/char/random.c
>>> +++ b/drivers/char/random.c
>>> @@ -2297,9 +2297,8 @@ EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
>>> */
>>> void add_bootloader_randomness(const void *buf, unsigned int size)
>>> {
>>> + add_device_randomness(buf, size);
>>> if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
>>> - add_hwgenerator_randomness(buf, size, size * 8);
>>> - else
>>> - add_device_randomness(buf, size);
>>> + credit_entropy(&input_pool, size * 8);
>>> }
>>> EXPORT_SYMBOL_GPL(add_bootloader_randomness);
>>
>> This doesn’t boot. I just changed following and kernel panics.
>>
And before anyone asked “.. Hey but this is 5.3.18 kernel version”
here is the kernel panic with 5.14.11 :-)
Regards,
Ivan
If add_bootloader_randomness() or add_hwgenerator_randomness() is
called for the first time during early boot, crng_init equals 0. Then,
crng_fast_load() gets called -- which is safe to do even if the input
pool is not yet properly set up.
If the added entropy suffices to increase crng_init to 1, future calls
to add_bootloader_randomness() or add_hwgenerator_randomness() used to
progress to credit_entropy_bits(). However, if the input pool is not yet
properly set up, the cmpxchg call within that function can lead to an
infinite recursion. This is not only a hypothetical problem, as qemu
on x86 may provide bootloader entropy via EFI and via devicetree.
As crng_global_init_time is set to != 0 once the input pool is properly
set up, check (also) for this condition to determine which branch to take.
Calls to crng_fast_load() do not modify the input pool; therefore, the
entropy_count for the input pool must not be modified at that early
stage.
Reported-and-tested-by: Ivan T. Ivanov <[email protected]>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Signed-off-by: Dominik Brodowski <[email protected]>
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..4211ff3092f9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
}
/*
- * Note that setup_arch() may call add_device_randomness()
- * long before we get here. This allows seeding of the pools
+ * add_device_randomness() or add_bootloader_randomness() may be
+ * called long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
@@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
{
struct entropy_store *poolp = &input_pool;
- if (unlikely(crng_init == 0)) {
+ /* We cannot do much with the input pool until it is set up in
+ * rand_initalize(); therefore just mix into the crng state.
+ * As this does not affect the input pool, we cannot credit
+ * entropy for this.
+ */
+ if (unlikely(crng_init == 0) || unlikely(crng_global_init_time == 0)) {
crng_fast_load(buffer, count);
return;
}
On Sun, 31 Oct 2021 at 07:31, Dominik Brodowski
<[email protected]> wrote:
>
> If add_bootloader_randomness() or add_hwgenerator_randomness() is
> called for the first time during early boot, crng_init equals 0. Then,
> crng_fast_load() gets called -- which is safe to do even if the input
> pool is not yet properly set up.
>
> If the added entropy suffices to increase crng_init to 1, future calls
> to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> progress to credit_entropy_bits(). However, if the input pool is not yet
> properly set up, the cmpxchg call within that function can lead to an
> infinite recursion. This is not only a hypothetical problem, as qemu
> on x86 may provide bootloader entropy via EFI and via devicetree.
>
arm64 not x86
> As crng_global_init_time is set to != 0 once the input pool is properly
> set up, check (also) for this condition to determine which branch to take.
>
> Calls to crng_fast_load() do not modify the input pool; therefore, the
> entropy_count for the input pool must not be modified at that early
> stage.
>
> Reported-and-tested-by: Ivan T. Ivanov <[email protected]>
Nit: fancy tags like this are more difficult to grep for
Better to use separate Reported-by and Tested-by tags
> Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
> Signed-off-by: Dominik Brodowski <[email protected]>
>
Please don't drop the diffstat. Are you using git format-patch?
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 605969ed0f96..4211ff3092f9 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
> }
>
> /*
> - * Note that setup_arch() may call add_device_randomness()
> - * long before we get here. This allows seeding of the pools
> + * add_device_randomness() or add_bootloader_randomness() may be
> + * called long before we get here. This allows seeding of the pools
> * with some platform dependent data very early in the boot
> * process. But it limits our options here. We must use
> * statically allocated structures that already have all
> @@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
> {
> struct entropy_store *poolp = &input_pool;
>
> - if (unlikely(crng_init == 0)) {
> + /* We cannot do much with the input pool until it is set up in
> + * rand_initalize(); therefore just mix into the crng state.
> + * As this does not affect the input pool, we cannot credit
> + * entropy for this.
> + */
> + if (unlikely(crng_init == 0) || unlikely(crng_global_init_time == 0)) {
Can we just drop the unlikely()s here?
> crng_fast_load(buffer, count);
> return;
> }
Am Sun, Oct 31, 2021 at 01:33:34PM +0100 schrieb Ard Biesheuvel:
> On Sun, 31 Oct 2021 at 07:31, Dominik Brodowski
> <[email protected]> wrote:
> >
> > If add_bootloader_randomness() or add_hwgenerator_randomness() is
> > called for the first time during early boot, crng_init equals 0. Then,
> > crng_fast_load() gets called -- which is safe to do even if the input
> > pool is not yet properly set up.
> >
> > If the added entropy suffices to increase crng_init to 1, future calls
> > to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> > progress to credit_entropy_bits(). However, if the input pool is not yet
> > properly set up, the cmpxchg call within that function can lead to an
> > infinite recursion. This is not only a hypothetical problem, as qemu
> > on x86 may provide bootloader entropy via EFI and via devicetree.
> >
>
> arm64 not x86
Thanks, fixed in v2
> > As crng_global_init_time is set to != 0 once the input pool is properly
> > set up, check (also) for this condition to determine which branch to take.
> >
> > Calls to crng_fast_load() do not modify the input pool; therefore, the
> > entropy_count for the input pool must not be modified at that early
> > stage.
> >
> > Reported-and-tested-by: Ivan T. Ivanov <[email protected]>
>
> Nit: fancy tags like this are more difficult to grep for
>
> Better to use separate Reported-by and Tested-by tags
Well, it's used not all that rarely, but I don't care that much, so updated for v2.
> Please don't drop the diffstat. Are you using git format-patch?
For singular patches no; but fixed for v2.
> > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > index 605969ed0f96..4211ff3092f9 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
> > }
> >
> > /*
> > - * Note that setup_arch() may call add_device_randomness()
> > - * long before we get here. This allows seeding of the pools
> > + * add_device_randomness() or add_bootloader_randomness() may be
> > + * called long before we get here. This allows seeding of the pools
> > * with some platform dependent data very early in the boot
> > * process. But it limits our options here. We must use
> > * statically allocated structures that already have all
> > @@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
> > {
> > struct entropy_store *poolp = &input_pool;
> >
> > - if (unlikely(crng_init == 0)) {
> > + /* We cannot do much with the input pool until it is set up in
> > + * rand_initalize(); therefore just mix into the crng state.
> > + * As this does not affect the input pool, we cannot credit
> > + * entropy for this.
> > + */
> > + if (unlikely(crng_init == 0) || unlikely(crng_global_init_time == 0)) {
>
> Can we just drop the unlikely()s here?
As that would be a different change to the one necessary to resolve the bug,
I'd like to defer that decision to the maintainer of random.c.
Thanks,
Dominik
If add_bootloader_randomness() or add_hwgenerator_randomness() is
called for the first time during early boot, crng_init equals 0. Then,
crng_fast_load() gets called -- which is safe to do even if the input
pool is not yet properly set up.
If the added entropy suffices to increase crng_init to 1, future calls
to add_bootloader_randomness() or add_hwgenerator_randomness() used to
progress to credit_entropy_bits(). However, if the input pool is not yet
properly set up, the cmpxchg call within that function can lead to an
infinite recursion. This is not only a hypothetical problem, as qemu
on arm64 may provide bootloader entropy via EFI and via devicetree.
As crng_global_init_time is set to != 0 once the input pool is properly
set up, check (also) for this condition to determine which branch to take.
Calls to crng_fast_load() do not modify the input pool; therefore, the
entropy_count for the input pool must not be modified at that early
stage.
Reported-by: Ivan T. Ivanov <[email protected]>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Tested-by: Ivan T. Ivanov <[email protected]>
Signed-off-by: Dominik Brodowski <[email protected]>
---
v1->v2: fix commit message; unmerge Reported-and-tested-by-tag (Ard Biesheuvel)
drivers/char/random.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..4211ff3092f9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
}
/*
- * Note that setup_arch() may call add_device_randomness()
- * long before we get here. This allows seeding of the pools
+ * add_device_randomness() or add_bootloader_randomness() may be
+ * called long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
@@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
{
struct entropy_store *poolp = &input_pool;
- if (unlikely(crng_init == 0)) {
+ /* We cannot do much with the input pool until it is set up in
+ * rand_initalize(); therefore just mix into the crng state.
+ * As this does not affect the input pool, we cannot credit
+ * entropy for this.
+ */
+ if (unlikely(crng_init == 0) || unlikely(crng_global_init_time == 0)) {
crng_fast_load(buffer, count);
return;
}
On Wed, 3 Nov 2021 at 08:17, Dominik Brodowski
<[email protected]> wrote:
>
> Am Sun, Oct 31, 2021 at 01:33:34PM +0100 schrieb Ard Biesheuvel:
> > On Sun, 31 Oct 2021 at 07:31, Dominik Brodowski
> > <[email protected]> wrote:
> > >
> > > If add_bootloader_randomness() or add_hwgenerator_randomness() is
> > > called for the first time during early boot, crng_init equals 0. Then,
> > > crng_fast_load() gets called -- which is safe to do even if the input
> > > pool is not yet properly set up.
> > >
> > > If the added entropy suffices to increase crng_init to 1, future calls
> > > to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> > > progress to credit_entropy_bits(). However, if the input pool is not yet
> > > properly set up, the cmpxchg call within that function can lead to an
> > > infinite recursion. This is not only a hypothetical problem, as qemu
> > > on x86 may provide bootloader entropy via EFI and via devicetree.
> > >
> >
> > arm64 not x86
>
> Thanks, fixed in v2
>
> > > As crng_global_init_time is set to != 0 once the input pool is properly
> > > set up, check (also) for this condition to determine which branch to take.
> > >
> > > Calls to crng_fast_load() do not modify the input pool; therefore, the
> > > entropy_count for the input pool must not be modified at that early
> > > stage.
> > >
> > > Reported-and-tested-by: Ivan T. Ivanov <[email protected]>
> >
> > Nit: fancy tags like this are more difficult to grep for
> >
> > Better to use separate Reported-by and Tested-by tags
>
> Well, it's used not all that rarely, but I don't care that much, so updated for v2.
>
> > Please don't drop the diffstat. Are you using git format-patch?
>
> For singular patches no; but fixed for v2.
>
> > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > index 605969ed0f96..4211ff3092f9 100644
> > > --- a/drivers/char/random.c
> > > +++ b/drivers/char/random.c
> > > @@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
> > > }
> > >
> > > /*
> > > - * Note that setup_arch() may call add_device_randomness()
> > > - * long before we get here. This allows seeding of the pools
> > > + * add_device_randomness() or add_bootloader_randomness() may be
> > > + * called long before we get here. This allows seeding of the pools
> > > * with some platform dependent data very early in the boot
> > > * process. But it limits our options here. We must use
> > > * statically allocated structures that already have all
> > > @@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
> > > {
> > > struct entropy_store *poolp = &input_pool;
> > >
> > > - if (unlikely(crng_init == 0)) {
> > > + /* We cannot do much with the input pool until it is set up in
> > > + * rand_initalize(); therefore just mix into the crng state.
> > > + * As this does not affect the input pool, we cannot credit
> > > + * entropy for this.
> > > + */
> > > + if (unlikely(crng_init == 0) || unlikely(crng_global_init_time == 0)) {
> >
> > Can we just drop the unlikely()s here?
>
> As that would be a different change to the one necessary to resolve the bug,
> I'd like to defer that decision to the maintainer of random.c.
>
In that case, can we at least using a single unlikely() for the whole condition?
Am Wed, Nov 03, 2021 at 08:27:39AM +0100 schrieb Ard Biesheuvel:
> > > > - if (unlikely(crng_init == 0)) {
> > > > + /* We cannot do much with the input pool until it is set up in
> > > > + * rand_initalize(); therefore just mix into the crng state.
> > > > + * As this does not affect the input pool, we cannot credit
> > > > + * entropy for this.
> > > > + */
> > > > + if (unlikely(crng_init == 0) || unlikely(crng_global_init_time == 0)) {
> > >
> > > Can we just drop the unlikely()s here?
> >
> > As that would be a different change to the one necessary to resolve the bug,
> > I'd like to defer that decision to the maintainer of random.c.
> >
>
> In that case, can we at least using a single unlikely() for the whole condition?
Fixed for v3.
Thanks,
Dominik
If add_bootloader_randomness() or add_hwgenerator_randomness() is
called for the first time during early boot, crng_init equals 0. Then,
crng_fast_load() gets called -- which is safe to do even if the input
pool is not yet properly set up.
If the added entropy suffices to increase crng_init to 1, future calls
to add_bootloader_randomness() or add_hwgenerator_randomness() used to
progress to credit_entropy_bits(). However, if the input pool is not yet
properly set up, the cmpxchg call within that function can lead to an
infinite recursion. This is not only a hypothetical problem, as qemu
on arm64 may provide bootloader entropy via EFI and via devicetree.
As crng_global_init_time is set to != 0 once the input pool is properly
set up, check (also) for this condition to determine which branch to take.
Calls to crng_fast_load() do not modify the input pool; therefore, the
entropy_count for the input pool must not be modified at that early
stage.
Reported-by: Ivan T. Ivanov <[email protected]>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Tested-by: Ivan T. Ivanov <[email protected]>
Signed-off-by: Dominik Brodowski <[email protected]>
---
v2->v3: onle one unlikely (Ard Biesheuvel)
v1->v2: fix commit message; unmerge Reported-and-tested-by-tag (Ard Biesheuvel)
drivers/char/random.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..18fe804c1bf8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
}
/*
- * Note that setup_arch() may call add_device_randomness()
- * long before we get here. This allows seeding of the pools
+ * add_device_randomness() or add_bootloader_randomness() may be
+ * called long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
@@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
{
struct entropy_store *poolp = &input_pool;
- if (unlikely(crng_init == 0)) {
+ /* We cannot do much with the input pool until it is set up in
+ * rand_initalize(); therefore just mix into the crng state.
+ * As this does not affect the input pool, we cannot credit
+ * entropy for this.
+ */
+ if (unlikely(crng_init == 0 || crng_global_init_time == 0)) {
crng_fast_load(buffer, count);
return;
}
On 11-05 07:04, Dominik Brodowski wrote:
> Date: Fri, 5 Nov 2021 07:04:36 +0100
> From: Dominik Brodowski <[email protected]>
> To: [email protected]
> Cc: "Ivan T. Ivanov" <[email protected]>, Ard Biesheuvel <[email protected]>,
> [email protected], [email protected]
> Subject: [PATCH v3] random: fix crash on multiple early calls to
> add_bootloader_randomness()
> Message-ID: <[email protected]>
Tags: all dt linux me watch
>
Hi,
> If add_bootloader_randomness() or add_hwgenerator_randomness() is
> called for the first time during early boot, crng_init equals 0. Then,
> crng_fast_load() gets called -- which is safe to do even if the input
> pool is not yet properly set up.
>
> If the added entropy suffices to increase crng_init to 1, future calls
> to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> progress to credit_entropy_bits(). However, if the input pool is not yet
> properly set up, the cmpxchg call within that function can lead to an
> infinite recursion. This is not only a hypothetical problem, as qemu
> on arm64 may provide bootloader entropy via EFI and via devicetree.
>
> As crng_global_init_time is set to != 0 once the input pool is properly
> set up, check (also) for this condition to determine which branch to take.
>
> Calls to crng_fast_load() do not modify the input pool; therefore, the
> entropy_count for the input pool must not be modified at that early
> stage.
>
> Reported-by: Ivan T. Ivanov <[email protected]>
> Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
> Tested-by: Ivan T. Ivanov <[email protected]>
> Signed-off-by: Dominik Brodowski <[email protected]>
What is the plan for this fix?
Regards,
Ivan
> ---
> v2->v3: onle one unlikely (Ard Biesheuvel)
> v1->v2: fix commit message; unmerge Reported-and-tested-by-tag (Ard Biesheuvel)
>
> drivers/char/random.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 605969ed0f96..18fe804c1bf8 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
> }
>
> /*
> - * Note that setup_arch() may call add_device_randomness()
> - * long before we get here. This allows seeding of the pools
> + * add_device_randomness() or add_bootloader_randomness() may be
> + * called long before we get here. This allows seeding of the pools
> * with some platform dependent data very early in the boot
> * process. But it limits our options here. We must use
> * statically allocated structures that already have all
> @@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
> {
> struct entropy_store *poolp = &input_pool;
>
> - if (unlikely(crng_init == 0)) {
> + /* We cannot do much with the input pool until it is set up in
> + * rand_initalize(); therefore just mix into the crng state.
> + * As this does not affect the input pool, we cannot credit
> + * entropy for this.
> + */
> + if (unlikely(crng_init == 0 || crng_global_init_time == 0)) {
> crng_fast_load(buffer, count);
> return;
> }
If add_bootloader_randomness() or add_hwgenerator_randomness() is
called for the first time during early boot, crng_init equals 0. Then,
crng_fast_load() gets called -- which is safe to do even if the input
pool is not yet properly set up.
If the added entropy suffices to increase crng_init to 1, future calls
to add_bootloader_randomness() or add_hwgenerator_randomness() used to
progress to credit_entropy_bits(). However, if the input pool is not yet
properly set up, the cmpxchg call within that function can lead to an
infinite recursion. This is not only a hypothetical problem, as qemu
on arm64 may provide bootloader entropy via EFI and via devicetree.
As crng_global_init_time is set to != 0 once the input pool is properly
set up, check (also) for this condition to determine which branch to take.
Calls to crng_fast_load() do not modify the input pool; therefore, the
entropy_count for the input pool must not be modified at that early
stage.
Reported-by: Ivan T. Ivanov <[email protected]>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Tested-by: Ivan T. Ivanov <[email protected]>
Signed-off-by: Dominik Brodowski <[email protected]>
---
re-send to new co-maintainer of random.c
v2->v3: onle one unlikely (Ard Biesheuvel)
v1->v2: fix commit message; unmerge Reported-and-tested-by-tag (Ard Biesheuvel)
drivers/char/random.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..18fe804c1bf8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1763,8 +1763,8 @@ static void __init init_std_data(struct entropy_store *r)
}
/*
- * Note that setup_arch() may call add_device_randomness()
- * long before we get here. This allows seeding of the pools
+ * add_device_randomness() or add_bootloader_randomness() may be
+ * called long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
@@ -2274,7 +2274,12 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
{
struct entropy_store *poolp = &input_pool;
- if (unlikely(crng_init == 0)) {
+ /* We cannot do much with the input pool until it is set up in
+ * rand_initalize(); therefore just mix into the crng state.
+ * As this does not affect the input pool, we cannot credit
+ * entropy for this.
+ */
+ if (unlikely(crng_init == 0 || crng_global_init_time == 0)) {
crng_fast_load(buffer, count);
return;
}
Hi Dominik,
Thanks for the patch. One trivial nit and one question:
On Thu, Dec 2, 2021 at 6:35 AM Dominik Brodowski
<[email protected]> wrote:
> + /* We cannot do much with the input pool until it is set up in
> + * rand_initalize(); therefore just mix into the crng state.
I think you meant "rand_initialize()" here (missing 'i').
> If the added entropy suffices to increase crng_init to 1, future calls
> to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> progress to credit_entropy_bits(). However, if the input pool is not yet
> properly set up, the cmpxchg call within that function can lead to an
> infinite recursion.
I see what this patch does with crng_global_init_time, and that seems
probably sensible, but I didn't understand this part of the reasoning
in the commit message; I might just be a bit slow here. Where's the
recursion exactly? Or even an infinite loop?
As far as I can tell, that portion of credit_entropy_bits() breaks down as:
retry:
entropy_count = orig = READ_ONCE(r->entropy_count);
[ ... do some arithmetic on entropy_count ... ]
if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
goto retry;
Why would this be infinite? Why wouldn't the cmpxchg eventually
converge to a stable value? I don't see any call that modifies
r->entropy_count or orig from inside that block. Is there some other
super-spinny concurrent operation?
Thanks,
Jason
Hi Jason,
Am Thu, Dec 02, 2021 at 11:55:10AM -0500 schrieb Jason A. Donenfeld:
> Thanks for the patch. One trivial nit and one question:
Thanks for your review!
> On Thu, Dec 2, 2021 at 6:35 AM Dominik Brodowski
> <[email protected]> wrote:
> > + /* We cannot do much with the input pool until it is set up in
> > + * rand_initalize(); therefore just mix into the crng state.
>
> I think you meant "rand_initialize()" here (missing 'i').
Indeed, sorry about that.
> > If the added entropy suffices to increase crng_init to 1, future calls
> > to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> > progress to credit_entropy_bits(). However, if the input pool is not yet
> > properly set up, the cmpxchg call within that function can lead to an
> > infinite recursion.
>
> I see what this patch does with crng_global_init_time, and that seems
> probably sensible, but I didn't understand this part of the reasoning
> in the commit message; I might just be a bit slow here. Where's the
> recursion exactly? Or even an infinite loop?
On arm64, it was actually a NULL pointer dereference reported by Ivan T.
Ivanov; see
https://lore.kernel.org/lkml/[email protected]/
Trying to reproduce this rather bluntly on x86/qemu by multiple manual calls
to add_bootloader_randomness(), I mis-interpreted the symptoms to point to an
infinite recursion. The real problem seems to be that crng_reseed() isn't
ready to be called too early in the boot process, in particular before
workqueues are ready (see the call to numa_crng_init()).
However, there seem be additional issues with add_bootloader_randomness()
not yet addressed (or worsened) by my patch:
- If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
add_hwgenerator_randomness() calls crng_fast_load() and returns
immediately. If it is disabled and crng_init==0,
add_device_randnomness() calls crng_slow_load() but still
continues to call _mix_pool_bytes(). That means the seed is
used more extensively if CONFIG_RANDOM_TRUST_BOOTLOADER is not
set!
- If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
the entropy is not credited -- same as if
CONFIG_RANDOM_TRUST_BOOTLOADER is not set. Only subsequent calls
to add_bootloader_randomness() would credit entropy, but that
causes the issue NULL pointer dereference or the hang...
- As crng_fast_load() returns early, that actually means that my
patch causes the additional entropy submitted to
add_hwgenerator_randomness() by subsequent calls to be completely
lost.
- For add_bootloader_randomness(), it makes no sense at all to call
wait_event_interruptible().
Therefore, it might make more sense to
- modify add_bootloader_randomness() to always call
add_device_randomness(), and if CONFIG_RANDOM_TRUST_BOOTLOADER is
enabled, to call credit_entropy_bits().
- update credit_entropy_bits() to not call credit_entropy_bits()
if crng_global_init_time==0, as workqueues (and possibly other
infrastructure) might not be available at that time.
What do you think? Draft patch below. @Ivan: Could you re-test on your
system, please?
Thanks,
Dominik
---
Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, mutliple calls
to add_bootloader_randomness() are broken and can cause a NULL pointer
dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
problem, as qemu on arm64 may provide bootloader entropy via EFI and via
devicetree.
On the first call to add_hwgenerator_randomness(), crng_fast_load() is
executed, and if the seed is long enough, crng_init will be set to 1.
However, no entropy is currently credited for that, even though the
name and description of CONFIG_RANDOM_TRUST_BOOTLOADER states otherwise.
On subsequent calls to add_bootloader_randomness() and then to
add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
wait_event_interruptible() (which makes no sense for the init process)
and then credit_entropy_bits() will be called. If the entropy count for
that second seed is large enough, that proceeds to crng_reseed().
However, crng_reseed() may depend on workqueues being available, which
is not the case early during boot.
To fix these issues, unconditionally call add_device_randomness() but not
add_hwgenerator_randomness() in add_bootloader_randomness(). This has the
additional advantage that the seed provided by the first call to
add_bootloader_randomness() is not only used by crng_{fast,slow}_load(),
but also mixed into the input pool. If CONFIG_RANDOM_TRUST_BOOTLOADER is
set, explicitly credit the entropy. However, avoid a call to crng_reseed()
too early during boot. It is safe to be called after rand_initialize(),
so use crng_global_init_time (which is set to != 0 in that function) to
determine which branch to take.
Reported-by: Ivan T. Ivanov <[email protected]>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Signed-off-by: Dominik Brodowski <[email protected]>
---
v3->v4: complete rewrite
v2->v3: only one unlikely (Ard Biesheuvel)
v1->v2: fix commit message; unmerge Reported-and-tested-by-tag (Ard Biesheuvel)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..d8614b426dfb 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -722,7 +722,8 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
if (r == &input_pool) {
int entropy_bits = entropy_count >> ENTROPY_SHIFT;
- if (crng_init < 2 && entropy_bits >= 128)
+ if (crng_init < 2 && entropy_bits >= 128 &&
+ crng_global_init_time > 0)
crng_reseed(&primary_crng, r);
}
}
@@ -1763,8 +1764,8 @@ static void __init init_std_data(struct entropy_store *r)
}
/*
- * Note that setup_arch() may call add_device_randomness()
- * long before we get here. This allows seeding of the pools
+ * add_device_randomness() or add_bootloader_randomness() may be
+ * called long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
@@ -2291,15 +2292,13 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
/* Handle random seed passed by bootloader.
- * If the seed is trustworthy, it would be regarded as hardware RNGs. Otherwise
- * it would be regarded as device data.
+ * If the seed is trustworthy, its entropy will be credited.
* The decision is controlled by CONFIG_RANDOM_TRUST_BOOTLOADER.
*/
void add_bootloader_randomness(const void *buf, unsigned int size)
{
+ add_device_randomness(buf, size);
if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
- add_hwgenerator_randomness(buf, size, size * 8);
- else
- add_device_randomness(buf, size);
+ credit_entropy_bits(&input_pool, size * 8);
}
EXPORT_SYMBOL_GPL(add_bootloader_randomness);
Hi Dominik,
Thanks for your analysis. Some more questions:
On Fri, Dec 3, 2021 at 8:59 AM Dominik Brodowski
<[email protected]> wrote:
> On subsequent calls to add_bootloader_randomness() and then to
> add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
> wait_event_interruptible() (which makes no sense for the init process)
> and then credit_entropy_bits() will be called. If the entropy count for
> that second seed is large enough, that proceeds to crng_reseed().
> However, crng_reseed() may depend on workqueues being available, which
> is not the case early during boot.
It sounds like *the* issue you've identified is that crng_reseed()
calls into workqueue functions too early in init, right? The bug is
about paths into crng_reseed() that might cause that?
If so, then specifically, are you referring to crng_reseed()'s call to
numa_crng_init()? In other words, the cause of the bug would be
6c1e851c4edc ("random: fix possible sleeping allocation from irq
context")? If that's the case, then I wonder if the problem you're
seeing goes away if you revert both 6c1e851c4edc ("random: fix
possible sleeping allocation from irq context") and its primary
predecessor, 8ef35c866f88 ("random: set up the NUMA crng instances
after the CRNG is fully initialized"). These fix an actual bug, so I'm
not suggesting we actually revert these in the tree, but for the
purpose of testing, I'm wondering if this is actually the root cause
of the bug you're seeing.
Also, if you have a nice way of reproducing this, please do tell - I'd
like to give it a spin if possible.
Regards,
Jason
On 12/3/21 16:39, Jason A. Donenfeld wrote:
> Hi Dominik,
>
> Thanks for your analysis. Some more questions:
>
> On Fri, Dec 3, 2021 at 8:59 AM Dominik Brodowski
> <[email protected]> wrote:
>> On subsequent calls to add_bootloader_randomness() and then to
>> add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
>> wait_event_interruptible() (which makes no sense for the init process)
>> and then credit_entropy_bits() will be called. If the entropy count for
>> that second seed is large enough, that proceeds to crng_reseed().
>> However, crng_reseed() may depend on workqueues being available, which
>> is not the case early during boot.
>
> It sounds like *the* issue you've identified is that crng_reseed()
> calls into workqueue functions too early in init, right? The bug is
> about paths into crng_reseed() that might cause that?
>
> If so, then specifically, are you referring to crng_reseed()'s call to
> numa_crng_init()? In other words, the cause of the bug would be
> 6c1e851c4edc ("random: fix possible sleeping allocation from irq
> context")? If that's the case, then I wonder if the problem you're
> seeing goes away if you revert both 6c1e851c4edc ("random: fix
> possible sleeping allocation from irq context") and its primary
> predecessor, 8ef35c866f88 ("random: set up the NUMA crng instances
> after the CRNG is fully initialized"). These fix an actual bug, so I'm
> not suggesting we actually revert these in the tree, but for the
> purpose of testing, I'm wondering if this is actually the root cause
> of the bug you're seeing.
If the above holds, I wonder if something more basic like the below
would do the trick -- deferring the bringup of the secondary pools until
later on in rand_initialize.
diff --git a/drivers/char/random.c b/drivers/char/random.c
index c81485e2f126..e71b34bf9a2a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -832,7 +832,6 @@ static void __init crng_initialize_primary(struct
crng_state *crng)
_extract_entropy(&input_pool, &crng->state[4], sizeof(__u32) * 12, 0);
if (crng_init_try_arch_early(crng) && trust_cpu) {
invalidate_batched_entropy();
- numa_crng_init();
crng_init = 2;
pr_notice("crng done (trusting CPU's manufacturer)\n");
}
@@ -840,13 +839,13 @@ static void __init crng_initialize_primary(struct
crng_state *crng)
}
#ifdef CONFIG_NUMA
-static void do_numa_crng_init(struct work_struct *work)
+static void numa_crng_init(void)
{
int i;
struct crng_state *crng;
struct crng_state **pool;
- pool = kcalloc(nr_node_ids, sizeof(*pool), GFP_KERNEL|__GFP_NOFAIL);
+ pool = kcalloc(nr_node_ids, sizeof(*pool), GFP_KERNEL | __GFP_NOFAIL);
for_each_online_node(i) {
crng = kmalloc_node(sizeof(struct crng_state),
GFP_KERNEL | __GFP_NOFAIL, i);
@@ -861,13 +860,6 @@ static void do_numa_crng_init(struct work_struct *work)
kfree(pool);
}
}
-
-static DECLARE_WORK(numa_crng_init_work, do_numa_crng_init);
-
-static void numa_crng_init(void)
-{
- schedule_work(&numa_crng_init_work);
-}
#else
static void numa_crng_init(void) {}
#endif
@@ -977,7 +969,6 @@ static void crng_reseed(struct crng_state *crng,
struct entropy_store *r)
spin_unlock_irqrestore(&crng->lock, flags);
if (crng == &primary_crng && crng_init < 2) {
invalidate_batched_entropy();
- numa_crng_init();
crng_init = 2;
process_random_ready_list();
wake_up_interruptible(&crng_init_wait);
@@ -1787,6 +1778,7 @@ int __init rand_initialize(void)
{
init_std_data(&input_pool);
crng_initialize_primary(&primary_crng);
+ numa_crng_init();
crng_global_init_time = jiffies;
if (ratelimit_disable) {
urandom_warning.interval = 0;
Hi Jason,
Am Fri, Dec 03, 2021 at 05:47:41PM +0100 schrieb Jason A. Donenfeld:
> On 12/3/21 16:39, Jason A. Donenfeld wrote:
> > Hi Dominik,
> >
> > Thanks for your analysis. Some more questions:
> >
> > On Fri, Dec 3, 2021 at 8:59 AM Dominik Brodowski
> > <[email protected]> wrote:
> > > On subsequent calls to add_bootloader_randomness() and then to
> > > add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
> > > wait_event_interruptible() (which makes no sense for the init process)
> > > and then credit_entropy_bits() will be called. If the entropy count for
> > > that second seed is large enough, that proceeds to crng_reseed().
> > > However, crng_reseed() may depend on workqueues being available, which
> > > is not the case early during boot.
> >
> > It sounds like *the* issue you've identified is that crng_reseed()
> > calls into workqueue functions too early in init, right? The bug is
> > about paths into crng_reseed() that might cause that?
> >
> > If so, then specifically, are you referring to crng_reseed()'s call to
> > numa_crng_init()? In other words, the cause of the bug would be
> > 6c1e851c4edc ("random: fix possible sleeping allocation from irq
> > context")? If that's the case, then I wonder if the problem you're
> > seeing goes away if you revert both 6c1e851c4edc ("random: fix
> > possible sleeping allocation from irq context") and its primary
> > predecessor, 8ef35c866f88 ("random: set up the NUMA crng instances
> > after the CRNG is fully initialized"). These fix an actual bug, so I'm
> > not suggesting we actually revert these in the tree, but for the
> > purpose of testing, I'm wondering if this is actually the root cause
> > of the bug you're seeing.
>
> If the above holds, I wonder if something more basic like the below would do
> the trick -- deferring the bringup of the secondary pools until later on in
> rand_initialize.
Firstly, wasn't this done before 8ef35c866f88 -- and initializing the NUMA
crng even if not enough entropy had been obtained yet?
Secondly, this approach seems works for small amounts of entropy submitted
to add_bootloader_randomness (e.g. for four calls with 8 bytes), but not for
larger quantities (e.g. for four calls with 64 bytes). Don't ask me why,
though.
Thirdly, this still does not fix the issue that only the second call to
add_bootloader_randomness() credits entropy.
Thanks,
Dominik
On Fri, Dec 3, 2021 at 3:59 PM Dominik Brodowski
<[email protected]> wrote:
>
> Hi Jason,
>
> Am Thu, Dec 02, 2021 at 11:55:10AM -0500 schrieb Jason A. Donenfeld:
> > Thanks for the patch. One trivial nit and one question:
>
> Thanks for your review!
>
> > On Thu, Dec 2, 2021 at 6:35 AM Dominik Brodowski
> > <[email protected]> wrote:
> > > + /* We cannot do much with the input pool until it is set up in
> > > + * rand_initalize(); therefore just mix into the crng state.
> >
> > I think you meant "rand_initialize()" here (missing 'i').
>
> Indeed, sorry about that.
>
> > > If the added entropy suffices to increase crng_init to 1, future calls
> > > to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> > > progress to credit_entropy_bits(). However, if the input pool is not yet
> > > properly set up, the cmpxchg call within that function can lead to an
> > > infinite recursion.
> >
> > I see what this patch does with crng_global_init_time, and that seems
> > probably sensible, but I didn't understand this part of the reasoning
> > in the commit message; I might just be a bit slow here. Where's the
> > recursion exactly? Or even an infinite loop?
>
> On arm64, it was actually a NULL pointer dereference reported by Ivan T.
> Ivanov; see
>
> https://lore.kernel.org/lkml/[email protected]/
>
> Trying to reproduce this rather bluntly on x86/qemu by multiple manual calls
> to add_bootloader_randomness(), I mis-interpreted the symptoms to point to an
> infinite recursion. The real problem seems to be that crng_reseed() isn't
> ready to be called too early in the boot process, in particular before
> workqueues are ready (see the call to numa_crng_init()).
>
> However, there seem be additional issues with add_bootloader_randomness()
> not yet addressed (or worsened) by my patch:
>
> - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> add_hwgenerator_randomness() calls crng_fast_load() and returns
> immediately. If it is disabled and crng_init==0,
> add_device_randnomness() calls crng_slow_load() but still
> continues to call _mix_pool_bytes(). That means the seed is
> used more extensively if CONFIG_RANDOM_TRUST_BOOTLOADER is not
> set!
If called by the crng_slow_load(), it's mixed into the pool but we're
not trusting it. But in crng_fast_load() we're using it to init crng.
>
> - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> the entropy is not credited -- same as if
> CONFIG_RANDOM_TRUST_BOOTLOADER is not set. Only subsequent calls
In crng_fast_load(), the seed would be mixed to primary_crng.state[4],
and then crng_init will be 1 if the added seed is enough.
rng-seed in dt (called in early_init_dt_scan_chosen()) also needs to
use this function to init crng.
With the patch, we're seeing
[ 0.000000] random: get_random_u64 called from
__kmem_cache_create+0x34/0x270 with crng_init=0
While before it should be
[ 0.000000] random: get_random_u64 called from
__kmem_cache_create+0x34/0x280 with crng_init=1
> to add_bootloader_randomness() would credit entropy, but that
> causes the issue NULL pointer dereference or the hang...
>
> - As crng_fast_load() returns early, that actually means that my
> patch causes the additional entropy submitted to
> add_hwgenerator_randomness() by subsequent calls to be completely
> lost.
Only when crng_init==0, if crng is initialized, it would continue with
credit_entropy_bits().
>
> - For add_bootloader_randomness(), it makes no sense at all to call
> wait_event_interruptible().
>
> Therefore, it might make more sense to
>
> - modify add_bootloader_randomness() to always call
> add_device_randomness(), and if CONFIG_RANDOM_TRUST_BOOTLOADER is
> enabled, to call credit_entropy_bits().
>
> - update credit_entropy_bits() to not call credit_entropy_bits()
> if crng_global_init_time==0, as workqueues (and possibly other
> infrastructure) might not be available at that time.
>
> What do you think? Draft patch below. @Ivan: Could you re-test on your
> system, please?
>
> Thanks,
> Dominik
>
> ---
>
> Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, mutliple calls
> to add_bootloader_randomness() are broken and can cause a NULL pointer
> dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
> problem, as qemu on arm64 may provide bootloader entropy via EFI and via
> devicetree.
>
> On the first call to add_hwgenerator_randomness(), crng_fast_load() is
> executed, and if the seed is long enough, crng_init will be set to 1.
> However, no entropy is currently credited for that, even though the
> name and description of CONFIG_RANDOM_TRUST_BOOTLOADER states otherwise.
>
> On subsequent calls to add_bootloader_randomness() and then to
> add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
> wait_event_interruptible() (which makes no sense for the init process)
> and then credit_entropy_bits() will be called. If the entropy count for
> that second seed is large enough, that proceeds to crng_reseed().
> However, crng_reseed() may depend on workqueues being available, which
> is not the case early during boot.
>
> To fix these issues, unconditionally call add_device_randomness() but not
> add_hwgenerator_randomness() in add_bootloader_randomness(). This has the
> additional advantage that the seed provided by the first call to
> add_bootloader_randomness() is not only used by crng_{fast,slow}_load(),
> but also mixed into the input pool. If CONFIG_RANDOM_TRUST_BOOTLOADER is
> set, explicitly credit the entropy. However, avoid a call to crng_reseed()
> too early during boot. It is safe to be called after rand_initialize(),
> so use crng_global_init_time (which is set to != 0 in that function) to
> determine which branch to take.
>
> Reported-by: Ivan T. Ivanov <[email protected]>
> Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
> Signed-off-by: Dominik Brodowski <[email protected]>
>
> ---
> v3->v4: complete rewrite
> v2->v3: only one unlikely (Ard Biesheuvel)
> v1->v2: fix commit message; unmerge Reported-and-tested-by-tag (Ard Biesheuvel)
>
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 605969ed0f96..d8614b426dfb 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -722,7 +722,8 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
> if (r == &input_pool) {
> int entropy_bits = entropy_count >> ENTROPY_SHIFT;
>
> - if (crng_init < 2 && entropy_bits >= 128)
> + if (crng_init < 2 && entropy_bits >= 128 &&
> + crng_global_init_time > 0)
> crng_reseed(&primary_crng, r);
> }
> }
> @@ -1763,8 +1764,8 @@ static void __init init_std_data(struct entropy_store *r)
> }
>
> /*
> - * Note that setup_arch() may call add_device_randomness()
> - * long before we get here. This allows seeding of the pools
> + * add_device_randomness() or add_bootloader_randomness() may be
> + * called long before we get here. This allows seeding of the pools
> * with some platform dependent data very early in the boot
> * process. But it limits our options here. We must use
> * statically allocated structures that already have all
> @@ -2291,15 +2292,13 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
> EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
>
> /* Handle random seed passed by bootloader.
> - * If the seed is trustworthy, it would be regarded as hardware RNGs. Otherwise
> - * it would be regarded as device data.
> + * If the seed is trustworthy, its entropy will be credited.
> * The decision is controlled by CONFIG_RANDOM_TRUST_BOOTLOADER.
> */
> void add_bootloader_randomness(const void *buf, unsigned int size)
> {
> + add_device_randomness(buf, size);
> if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
> - add_hwgenerator_randomness(buf, size, size * 8);
> - else
> - add_device_randomness(buf, size);
> + credit_entropy_bits(&input_pool, size * 8);
> }
> EXPORT_SYMBOL_GPL(add_bootloader_randomness);
On 12-03 16:39, Jason A. Donenfeld wrote:
> Date: Fri, 3 Dec 2021 16:39:55 +0100
> From: "Jason A. Donenfeld" <[email protected]>
> To: Dominik Brodowski <[email protected]>
> Cc: Theodore Ts'o <[email protected]>, "Ivan T. Ivanov" <[email protected]>, Ard
> Biesheuvel <[email protected]>, [email protected], LKML
> <[email protected]>, [email protected]
> Subject: Re: [PATCH v4] random: fix crash on multiple early calls to
> add_bootloader_randomness()
> Message-ID: <CAHmME9qGHo4n6QGxnE+O46pagOR0bA+9E8bi8ZLPAzMuMZpPwg@mail.gmail.com>
Tags: all linux me ring watch
Hi,
>
> Hi Dominik,
>
> Thanks for your analysis. Some more questions:
>
<snip>
> Also, if you have a nice way of reproducing this, please do tell - I'd
> like to give it a spin if possible.
>
Initial bug report could be found here [1]. Comments 14 and onward are
probably helpful. To reproduce the issue I have downloaded "assets"
from [2] and recreated test environment as found in autoinst-log.txt [3].
Search for qemu-img and qemu-system-aarch64 in the log above. Login
credentials for the images could be found by searching for "password"
in the same file.
Regards,
Ivan
[1] https://bugzilla.suse.com/show_bug.cgi?id=1184924
[2] https://openqa.opensuse.org/tests/latest?arch=aarch64&distri=opensuse&flavor=DVD&machine=aarch64&test=extra_tests_in_textmode&version=15.3
[3] https://openqa.opensuse.org/tests/2052459/logfile?filename=autoinst-log.txt
Am Mon, Dec 06, 2021 at 01:42:01PM +0800 schrieb Hsin-Yi Wang:
> On Fri, Dec 3, 2021 at 3:59 PM Dominik Brodowski
> <[email protected]> wrote:
> >
> > Hi Jason,
> >
> > Am Thu, Dec 02, 2021 at 11:55:10AM -0500 schrieb Jason A. Donenfeld:
> > > Thanks for the patch. One trivial nit and one question:
> >
> > Thanks for your review!
> >
> > > On Thu, Dec 2, 2021 at 6:35 AM Dominik Brodowski
> > > <[email protected]> wrote:
> > > > + /* We cannot do much with the input pool until it is set up in
> > > > + * rand_initalize(); therefore just mix into the crng state.
> > >
> > > I think you meant "rand_initialize()" here (missing 'i').
> >
> > Indeed, sorry about that.
> >
> > > > If the added entropy suffices to increase crng_init to 1, future calls
> > > > to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> > > > progress to credit_entropy_bits(). However, if the input pool is not yet
> > > > properly set up, the cmpxchg call within that function can lead to an
> > > > infinite recursion.
> > >
> > > I see what this patch does with crng_global_init_time, and that seems
> > > probably sensible, but I didn't understand this part of the reasoning
> > > in the commit message; I might just be a bit slow here. Where's the
> > > recursion exactly? Or even an infinite loop?
> >
> > On arm64, it was actually a NULL pointer dereference reported by Ivan T.
> > Ivanov; see
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Trying to reproduce this rather bluntly on x86/qemu by multiple manual calls
> > to add_bootloader_randomness(), I mis-interpreted the symptoms to point to an
> > infinite recursion. The real problem seems to be that crng_reseed() isn't
> > ready to be called too early in the boot process, in particular before
> > workqueues are ready (see the call to numa_crng_init()).
> >
> > However, there seem be additional issues with add_bootloader_randomness()
> > not yet addressed (or worsened) by my patch:
> >
> > - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> > add_hwgenerator_randomness() calls crng_fast_load() and returns
> > immediately. If it is disabled and crng_init==0,
> > add_device_randnomness() calls crng_slow_load() but still
> > continues to call _mix_pool_bytes(). That means the seed is
> > used more extensively if CONFIG_RANDOM_TRUST_BOOTLOADER is not
> > set!
> If called by the crng_slow_load(), it's mixed into the pool but we're
> not trusting it. But in crng_fast_load() we're using it to init crng.
>
> >
> > - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> > the entropy is not credited -- same as if
> > CONFIG_RANDOM_TRUST_BOOTLOADER is not set. Only subsequent calls
>
> In crng_fast_load(), the seed would be mixed to primary_crng.state[4],
Actually, that is also the case for crng_slow_load() (see dest_buf there).
> and then crng_init will be 1 if the added seed is enough.
> rng-seed in dt (called in early_init_dt_scan_chosen()) also needs to
> use this function to init crng.
Indeed, crng_init should be set to 1 in that case.
> With the patch, we're seeing
> [ 0.000000] random: get_random_u64 called from
> __kmem_cache_create+0x34/0x270 with crng_init=0
>
> While before it should be
> [ 0.000000] random: get_random_u64 called from
> __kmem_cache_create+0x34/0x280 with crng_init=1
>
> > to add_bootloader_randomness() would credit entropy, but that
> > causes the issue NULL pointer dereference or the hang...
> >
> > - As crng_fast_load() returns early, that actually means that my
> > patch causes the additional entropy submitted to
> > add_hwgenerator_randomness() by subsequent calls to be completely
> > lost.
> Only when crng_init==0, if crng is initialized, it would continue with
> credit_entropy_bits().
However, if workqueues are not up and running (yet), it will fail.
New draft below!
Thanks,
Dominik
---
Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, mutliple calls
to add_bootloader_randomness() are broken and can cause a NULL pointer
dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
problem, as qemu on arm64 may provide bootloader entropy via EFI and via
devicetree.
On the first call to add_hwgenerator_randomness(), crng_fast_load() is
executed, and if the seed is long enough, crng_init will be set to 1.
However, no entropy is currently credited for that, even though the
name and description of CONFIG_RANDOM_TRUST_BOOTLOADER states otherwise.
On subsequent calls to add_bootloader_randomness() and then to
add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
wait_event_interruptible() (which makes no sense for the init process)
and then credit_entropy_bits() will be called. If the entropy count for
that second seed is large enough, that proceeds to crng_reseed().
However, crng_reseed() may depend on workqueues being available, which
is not the case early during boot.
To fix these issues, explicitly call crng_fast_load() or crng_slow_load()
depending on whether the bootloader is trusted -- only in the first
instance, crng_init may progress to 1. Also, mix the seed into the
input pool unconditionally, and credit the entropy for that iff
CONFIG_RANDOM_TRUST_BOOTLOADER is set. However, avoid a call to
crng_reseed() too early during boot. It is safe to be called after
rand_initialize(), so use crng_global_init_time (which is set to != 0
in that function) to determine which branch to take.
Reported-by: Ivan T. Ivanov <[email protected]>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Signed-off-by: Dominik Brodowski <[email protected]>
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..abe4571fd2c0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -722,7 +722,8 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
if (r == &input_pool) {
int entropy_bits = entropy_count >> ENTROPY_SHIFT;
- if (crng_init < 2 && entropy_bits >= 128)
+ if (crng_init < 2 && entropy_bits >= 128 &&
+ crng_global_init_time > 0)
crng_reseed(&primary_crng, r);
}
}
@@ -1763,8 +1764,8 @@ static void __init init_std_data(struct entropy_store *r)
}
/*
- * Note that setup_arch() may call add_device_randomness()
- * long before we get here. This allows seeding of the pools
+ * add_device_randomness() or add_bootloader_randomness() may be
+ * called long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
@@ -2291,15 +2292,29 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
/* Handle random seed passed by bootloader.
- * If the seed is trustworthy, it would be regarded as hardware RNGs. Otherwise
- * it would be regarded as device data.
+ * If the seed is trustworthy, its entropy will be credited.
* The decision is controlled by CONFIG_RANDOM_TRUST_BOOTLOADER.
*/
void add_bootloader_randomness(const void *buf, unsigned int size)
{
- if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
- add_hwgenerator_randomness(buf, size, size * 8);
- else
- add_device_randomness(buf, size);
+ unsigned long time = random_get_entropy() ^ jiffies;
+ unsigned long flags;
+
+ if (!crng_ready() && size) {
+#ifdef CONFIG_RANDOM_TRUST_BOOTLOADER
+ crng_fast_load(buf, size);
+#else
+ crng_slow_load(buf, size);
+#endif /* CONFIG_RANDOM_TRUST_BOOTLOADER */
+ }
+
+ spin_lock_irqsave(&input_pool.lock, flags);
+ _mix_pool_bytes(&input_pool, buf, size);
+ _mix_pool_bytes(&input_pool, &time, sizeof(time));
+ spin_unlock_irqrestore(&input_pool.lock, flags);
+
+#ifdef CONFIG_RANDOM_TRUST_BOOTLOADER
+ credit_entropy_bits(&input_pool, size * 8);
+#endif
}
EXPORT_SYMBOL_GPL(add_bootloader_randomness);
On Tue, Dec 7, 2021 at 4:58 AM Dominik Brodowski
<[email protected]> wrote:
>
> Am Mon, Dec 06, 2021 at 01:42:01PM +0800 schrieb Hsin-Yi Wang:
> > On Fri, Dec 3, 2021 at 3:59 PM Dominik Brodowski
> > <[email protected]> wrote:
> > >
> > > Hi Jason,
> > >
> > > Am Thu, Dec 02, 2021 at 11:55:10AM -0500 schrieb Jason A. Donenfeld:
> > > > Thanks for the patch. One trivial nit and one question:
> > >
> > > Thanks for your review!
> > >
> > > > On Thu, Dec 2, 2021 at 6:35 AM Dominik Brodowski
> > > > <[email protected]> wrote:
> > > > > + /* We cannot do much with the input pool until it is set up in
> > > > > + * rand_initalize(); therefore just mix into the crng state.
> > > >
> > > > I think you meant "rand_initialize()" here (missing 'i').
> > >
> > > Indeed, sorry about that.
> > >
> > > > > If the added entropy suffices to increase crng_init to 1, future calls
> > > > > to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> > > > > progress to credit_entropy_bits(). However, if the input pool is not yet
> > > > > properly set up, the cmpxchg call within that function can lead to an
> > > > > infinite recursion.
> > > >
> > > > I see what this patch does with crng_global_init_time, and that seems
> > > > probably sensible, but I didn't understand this part of the reasoning
> > > > in the commit message; I might just be a bit slow here. Where's the
> > > > recursion exactly? Or even an infinite loop?
> > >
> > > On arm64, it was actually a NULL pointer dereference reported by Ivan T.
> > > Ivanov; see
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > Trying to reproduce this rather bluntly on x86/qemu by multiple manual calls
> > > to add_bootloader_randomness(), I mis-interpreted the symptoms to point to an
> > > infinite recursion. The real problem seems to be that crng_reseed() isn't
> > > ready to be called too early in the boot process, in particular before
> > > workqueues are ready (see the call to numa_crng_init()).
> > >
> > > However, there seem be additional issues with add_bootloader_randomness()
> > > not yet addressed (or worsened) by my patch:
> > >
> > > - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> > > add_hwgenerator_randomness() calls crng_fast_load() and returns
> > > immediately. If it is disabled and crng_init==0,
> > > add_device_randnomness() calls crng_slow_load() but still
> > > continues to call _mix_pool_bytes(). That means the seed is
> > > used more extensively if CONFIG_RANDOM_TRUST_BOOTLOADER is not
> > > set!
> > If called by the crng_slow_load(), it's mixed into the pool but we're
> > not trusting it. But in crng_fast_load() we're using it to init crng.
> >
> > >
> > > - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> > > the entropy is not credited -- same as if
> > > CONFIG_RANDOM_TRUST_BOOTLOADER is not set. Only subsequent calls
> >
> > In crng_fast_load(), the seed would be mixed to primary_crng.state[4],
>
> Actually, that is also the case for crng_slow_load() (see dest_buf there).
>
Right, but the difference is if we want to credit(trust) that for crng init.
> > and then crng_init will be 1 if the added seed is enough.
> > rng-seed in dt (called in early_init_dt_scan_chosen()) also needs to
> > use this function to init crng.
>
> Indeed, crng_init should be set to 1 in that case.
>
> > With the patch, we're seeing
> > [ 0.000000] random: get_random_u64 called from
> > __kmem_cache_create+0x34/0x270 with crng_init=0
> >
> > While before it should be
> > [ 0.000000] random: get_random_u64 called from
> > __kmem_cache_create+0x34/0x280 with crng_init=1
> >
> > > to add_bootloader_randomness() would credit entropy, but that
> > > causes the issue NULL pointer dereference or the hang...
> > >
> > > - As crng_fast_load() returns early, that actually means that my
> > > patch causes the additional entropy submitted to
> > > add_hwgenerator_randomness() by subsequent calls to be completely
> > > lost.
> > Only when crng_init==0, if crng is initialized, it would continue with
> > credit_entropy_bits().
>
> However, if workqueues are not up and running (yet), it will fail.
>
> New draft below!
Thanks, the new draft now takes care of the crng init.
[ 0.000000] random: get_random_u64 called from
__kmem_cache_create+0x34/0x270 with crng_init=1
>
> Thanks,
> Dominik
>
> ---
>
> Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, mutliple calls
> to add_bootloader_randomness() are broken and can cause a NULL pointer
> dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
> problem, as qemu on arm64 may provide bootloader entropy via EFI and via
> devicetree.
>
> On the first call to add_hwgenerator_randomness(), crng_fast_load() is
> executed, and if the seed is long enough, crng_init will be set to 1.
> However, no entropy is currently credited for that, even though the
> name and description of CONFIG_RANDOM_TRUST_BOOTLOADER states otherwise.
>
> On subsequent calls to add_bootloader_randomness() and then to
> add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
> wait_event_interruptible() (which makes no sense for the init process)
> and then credit_entropy_bits() will be called. If the entropy count for
> that second seed is large enough, that proceeds to crng_reseed().
> However, crng_reseed() may depend on workqueues being available, which
> is not the case early during boot.
>
> To fix these issues, explicitly call crng_fast_load() or crng_slow_load()
> depending on whether the bootloader is trusted -- only in the first
> instance, crng_init may progress to 1. Also, mix the seed into the
> input pool unconditionally, and credit the entropy for that iff
> CONFIG_RANDOM_TRUST_BOOTLOADER is set. However, avoid a call to
> crng_reseed() too early during boot. It is safe to be called after
> rand_initialize(), so use crng_global_init_time (which is set to != 0
> in that function) to determine which branch to take.
>
> Reported-by: Ivan T. Ivanov <[email protected]>
> Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
> Signed-off-by: Dominik Brodowski <[email protected]>
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 605969ed0f96..abe4571fd2c0 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -722,7 +722,8 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
> if (r == &input_pool) {
> int entropy_bits = entropy_count >> ENTROPY_SHIFT;
>
> - if (crng_init < 2 && entropy_bits >= 128)
> + if (crng_init < 2 && entropy_bits >= 128 &&
> + crng_global_init_time > 0)
> crng_reseed(&primary_crng, r);
> }
> }
> @@ -1763,8 +1764,8 @@ static void __init init_std_data(struct entropy_store *r)
> }
>
> /*
> - * Note that setup_arch() may call add_device_randomness()
> - * long before we get here. This allows seeding of the pools
> + * add_device_randomness() or add_bootloader_randomness() may be
> + * called long before we get here. This allows seeding of the pools
> * with some platform dependent data very early in the boot
> * process. But it limits our options here. We must use
> * statically allocated structures that already have all
> @@ -2291,15 +2292,29 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
> EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
>
> /* Handle random seed passed by bootloader.
> - * If the seed is trustworthy, it would be regarded as hardware RNGs. Otherwise
> - * it would be regarded as device data.
> + * If the seed is trustworthy, its entropy will be credited.
> * The decision is controlled by CONFIG_RANDOM_TRUST_BOOTLOADER.
> */
> void add_bootloader_randomness(const void *buf, unsigned int size)
> {
> - if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
> - add_hwgenerator_randomness(buf, size, size * 8);
> - else
> - add_device_randomness(buf, size);
> + unsigned long time = random_get_entropy() ^ jiffies;
> + unsigned long flags;
> +
> + if (!crng_ready() && size) {
size is checked here but not below?
> +#ifdef CONFIG_RANDOM_TRUST_BOOTLOADER
> + crng_fast_load(buf, size);
> +#else
> + crng_slow_load(buf, size);
> +#endif /* CONFIG_RANDOM_TRUST_BOOTLOADER */
> + }
> +
> + spin_lock_irqsave(&input_pool.lock, flags);
> + _mix_pool_bytes(&input_pool, buf, size);
> + _mix_pool_bytes(&input_pool, &time, sizeof(time));
> + spin_unlock_irqrestore(&input_pool.lock, flags);
> +
> +#ifdef CONFIG_RANDOM_TRUST_BOOTLOADER
> + credit_entropy_bits(&input_pool, size * 8);
> +#endif
> }
> EXPORT_SYMBOL_GPL(add_bootloader_randomness);
Am Tue, Dec 07, 2021 at 03:09:21PM +0800 schrieb Hsin-Yi Wang:
> On Tue, Dec 7, 2021 at 4:58 AM Dominik Brodowski
> <[email protected]> wrote:
> >
> > Am Mon, Dec 06, 2021 at 01:42:01PM +0800 schrieb Hsin-Yi Wang:
> > > On Fri, Dec 3, 2021 at 3:59 PM Dominik Brodowski
> > > <[email protected]> wrote:
> > > >
> > > > Hi Jason,
> > > >
> > > > Am Thu, Dec 02, 2021 at 11:55:10AM -0500 schrieb Jason A. Donenfeld:
> > > > > Thanks for the patch. One trivial nit and one question:
> > > >
> > > > Thanks for your review!
> > > >
> > > > > On Thu, Dec 2, 2021 at 6:35 AM Dominik Brodowski
> > > > > <[email protected]> wrote:
> > > > > > + /* We cannot do much with the input pool until it is set up in
> > > > > > + * rand_initalize(); therefore just mix into the crng state.
> > > > >
> > > > > I think you meant "rand_initialize()" here (missing 'i').
> > > >
> > > > Indeed, sorry about that.
> > > >
> > > > > > If the added entropy suffices to increase crng_init to 1, future calls
> > > > > > to add_bootloader_randomness() or add_hwgenerator_randomness() used to
> > > > > > progress to credit_entropy_bits(). However, if the input pool is not yet
> > > > > > properly set up, the cmpxchg call within that function can lead to an
> > > > > > infinite recursion.
> > > > >
> > > > > I see what this patch does with crng_global_init_time, and that seems
> > > > > probably sensible, but I didn't understand this part of the reasoning
> > > > > in the commit message; I might just be a bit slow here. Where's the
> > > > > recursion exactly? Or even an infinite loop?
> > > >
> > > > On arm64, it was actually a NULL pointer dereference reported by Ivan T.
> > > > Ivanov; see
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > Trying to reproduce this rather bluntly on x86/qemu by multiple manual calls
> > > > to add_bootloader_randomness(), I mis-interpreted the symptoms to point to an
> > > > infinite recursion. The real problem seems to be that crng_reseed() isn't
> > > > ready to be called too early in the boot process, in particular before
> > > > workqueues are ready (see the call to numa_crng_init()).
> > > >
> > > > However, there seem be additional issues with add_bootloader_randomness()
> > > > not yet addressed (or worsened) by my patch:
> > > >
> > > > - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> > > > add_hwgenerator_randomness() calls crng_fast_load() and returns
> > > > immediately. If it is disabled and crng_init==0,
> > > > add_device_randnomness() calls crng_slow_load() but still
> > > > continues to call _mix_pool_bytes(). That means the seed is
> > > > used more extensively if CONFIG_RANDOM_TRUST_BOOTLOADER is not
> > > > set!
> > > If called by the crng_slow_load(), it's mixed into the pool but we're
> > > not trusting it. But in crng_fast_load() we're using it to init crng.
> > >
> > > >
> > > > - If CONFIG_RANDOM_TRUST_BOOTLOADER is enabled and crng_init==0,
> > > > the entropy is not credited -- same as if
> > > > CONFIG_RANDOM_TRUST_BOOTLOADER is not set. Only subsequent calls
> > >
> > > In crng_fast_load(), the seed would be mixed to primary_crng.state[4],
> >
> > Actually, that is also the case for crng_slow_load() (see dest_buf there).
> >
> Right, but the difference is if we want to credit(trust) that for crng init.
... which is, unfortunately, not the only difference between slow and
fast...
> > > and then crng_init will be 1 if the added seed is enough.
> > > rng-seed in dt (called in early_init_dt_scan_chosen()) also needs to
> > > use this function to init crng.
> >
> > Indeed, crng_init should be set to 1 in that case.
> >
> > > With the patch, we're seeing
> > > [ 0.000000] random: get_random_u64 called from
> > > __kmem_cache_create+0x34/0x270 with crng_init=0
> > >
> > > While before it should be
> > > [ 0.000000] random: get_random_u64 called from
> > > __kmem_cache_create+0x34/0x280 with crng_init=1
> > >
> > > > to add_bootloader_randomness() would credit entropy, but that
> > > > causes the issue NULL pointer dereference or the hang...
> > > >
> > > > - As crng_fast_load() returns early, that actually means that my
> > > > patch causes the additional entropy submitted to
> > > > add_hwgenerator_randomness() by subsequent calls to be completely
> > > > lost.
> > > Only when crng_init==0, if crng is initialized, it would continue with
> > > credit_entropy_bits().
> >
> > However, if workqueues are not up and running (yet), it will fail.
> >
> > New draft below!
>
> Thanks, the new draft now takes care of the crng init.
> [ 0.000000] random: get_random_u64 called from
> __kmem_cache_create+0x34/0x270 with crng_init=1
Thanks for testing!
> > ---
> >
> > Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, mutliple calls
> > to add_bootloader_randomness() are broken and can cause a NULL pointer
> > dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
> > problem, as qemu on arm64 may provide bootloader entropy via EFI and via
> > devicetree.
> >
> > On the first call to add_hwgenerator_randomness(), crng_fast_load() is
> > executed, and if the seed is long enough, crng_init will be set to 1.
> > However, no entropy is currently credited for that, even though the
> > name and description of CONFIG_RANDOM_TRUST_BOOTLOADER states otherwise.
> >
> > On subsequent calls to add_bootloader_randomness() and then to
> > add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
> > wait_event_interruptible() (which makes no sense for the init process)
> > and then credit_entropy_bits() will be called. If the entropy count for
> > that second seed is large enough, that proceeds to crng_reseed().
> > However, crng_reseed() may depend on workqueues being available, which
> > is not the case early during boot.
> >
> > To fix these issues, explicitly call crng_fast_load() or crng_slow_load()
> > depending on whether the bootloader is trusted -- only in the first
> > instance, crng_init may progress to 1. Also, mix the seed into the
> > input pool unconditionally, and credit the entropy for that iff
> > CONFIG_RANDOM_TRUST_BOOTLOADER is set. However, avoid a call to
> > crng_reseed() too early during boot. It is safe to be called after
> > rand_initialize(), so use crng_global_init_time (which is set to != 0
> > in that function) to determine which branch to take.
> >
> > Reported-by: Ivan T. Ivanov <[email protected]>
> > Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
> > Signed-off-by: Dominik Brodowski <[email protected]>
> >
> > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > index 605969ed0f96..abe4571fd2c0 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -722,7 +722,8 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
> > if (r == &input_pool) {
> > int entropy_bits = entropy_count >> ENTROPY_SHIFT;
> >
> > - if (crng_init < 2 && entropy_bits >= 128)
> > + if (crng_init < 2 && entropy_bits >= 128 &&
> > + crng_global_init_time > 0)
> > crng_reseed(&primary_crng, r);
> > }
> > }
> > @@ -1763,8 +1764,8 @@ static void __init init_std_data(struct entropy_store *r)
> > }
> >
> > /*
> > - * Note that setup_arch() may call add_device_randomness()
> > - * long before we get here. This allows seeding of the pools
> > + * add_device_randomness() or add_bootloader_randomness() may be
> > + * called long before we get here. This allows seeding of the pools
> > * with some platform dependent data very early in the boot
> > * process. But it limits our options here. We must use
> > * statically allocated structures that already have all
> > @@ -2291,15 +2292,29 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
> > EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
> >
> > /* Handle random seed passed by bootloader.
> > - * If the seed is trustworthy, it would be regarded as hardware RNGs. Otherwise
> > - * it would be regarded as device data.
> > + * If the seed is trustworthy, its entropy will be credited.
> > * The decision is controlled by CONFIG_RANDOM_TRUST_BOOTLOADER.
> > */
> > void add_bootloader_randomness(const void *buf, unsigned int size)
> > {
> > - if (IS_ENABLED(CONFIG_RANDOM_TRUST_BOOTLOADER))
> > - add_hwgenerator_randomness(buf, size, size * 8);
> > - else
> > - add_device_randomness(buf, size);
> > + unsigned long time = random_get_entropy() ^ jiffies;
> > + unsigned long flags;
> > +
> > + if (!crng_ready() && size) {
> size is checked here but not below?
credit_entropy_bits() returns early if bits==0.
Thanks,
Dominik
Hi Dominik,
Thanks for the followup patch. Some comments below:
On Mon, Dec 6, 2021 at 9:58 PM Dominik Brodowski
<[email protected]> wrote:
> @@ -722,7 +722,8 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
> if (r == &input_pool) {
> int entropy_bits = entropy_count >> ENTROPY_SHIFT;
>
> - if (crng_init < 2 && entropy_bits >= 128)
> + if (crng_init < 2 && entropy_bits >= 128 &&
> + crng_global_init_time > 0)
crng_global_init_time is being used because it's set in
rand_initialize(), and rand_initialize() is called after workqueues
have been set up. Fair enough. But:
a) It's still set to `jiffies - 1` afterwards at runtime, via
ioctl(RNDRESEEDCRNG). I'm not sure if we want to mess around with
having a 1:2**32 chance of getting this at the unlucky 0 wraparound.
It seems like that kind of strategy generally works if unlikely
failure is tolerable, but it seems like that's not a game to play
here. There are a few other options:
b) Use another proxy for the state, like
rand_initialize()->primary_crng.state (ugly) or something better.
c) Add another static global state variable or static branch to random.c.
d) Actually conditionalize this on whether or not workqueues have been
init'd, which is *actually* the thing you care about, not whether
rand_initialize() has fired.
I think that of these, (d) seems the most correct. The thing you're
*actually* trying to prevent is that
`schedule_work(&numa_crng_init_work)` call being triggered before
workqueues are up. It looks like system_wq is made non-NULL by
workqueue_init_early(); perhaps you could get away with
conditionalizing it on that instead?
This also seems like a distinct state machine derp from the rest of
the patch, so I wonder if it could be separated out into a more
trivial patch, in case it does prove problematic somehow.
> +#ifdef CONFIG_RANDOM_TRUST_BOOTLOADER
Can you use IF_ENABLED() like the code that this replaces? Easier to
read and ensures syntax doesn't regress on less-used configurations.
> void add_bootloader_randomness(const void *buf, unsigned int size)
> + unsigned long time = random_get_entropy() ^ jiffies;
> + unsigned long flags;
> +
> + if (!crng_ready() && size) {
> +#ifdef CONFIG_RANDOM_TRUST_BOOTLOADER
> + crng_fast_load(buf, size);
> +#else
> + crng_slow_load(buf, size);
> +#endif /* CONFIG_RANDOM_TRUST_BOOTLOADER */
> + }
> +
> + spin_lock_irqsave(&input_pool.lock, flags);
> + _mix_pool_bytes(&input_pool, buf, size);
> + _mix_pool_bytes(&input_pool, &time, sizeof(time));
> + spin_unlock_irqrestore(&input_pool.lock, flags);
> +
> +#ifdef CONFIG_RANDOM_TRUST_BOOTLOADER
> + credit_entropy_bits(&input_pool, size * 8);
> +#endif
> }
> EXPORT_SYMBOL_GPL(add_bootloader_randomness);
Trying to understand this and compare it to what was here before. Two
cases: trustbootloader and !trustbootloader.
trustbootloader looks like this:
unsigned long time = random_get_entropy() ^ jiffies;
unsigned long flags;
if (!crng_ready() && size)
crng_fast_load(buf, size);
spin_lock_irqsave(&input_pool.lock, flags);
_mix_pool_bytes(&input_pool, buf, size);
_mix_pool_bytes(&input_pool, &time, sizeof(time));
spin_unlock_irqrestore(&input_pool.lock, flags);
credit_entropy_bits(&input_pool, size * 8);
!trustbooloader looks like this:
unsigned long time = random_get_entropy() ^ jiffies;
unsigned long flags;
if (!crng_ready() && size)
crng_slow_load(buf, size);
spin_lock_irqsave(&input_pool.lock, flags);
_mix_pool_bytes(&input_pool, buf, size);
_mix_pool_bytes(&input_pool, &time, sizeof(time));
spin_unlock_irqrestore(&input_pool.lock, flags);
So this means !trustbootloader is the same as add_device_randomness
(with the call to trace_add_device_randomness removed). It'd probably
make sense to continue just branching to add_device_randomness on an
IS_ENABLED(), then, so it's more clear what's up, rather than open
coding the distinct paths and copying code around.
But trustbootloader is a different case. Here the differences appear to be:
1) crng_fast_load is now called for crng_init==1||crng_init==0 [via
crng_ready()] instead of for crng_init==0;
2) The function now continues onward after calling crng_fast_load
instead of returning;
3) The useless call to wait_event_interruptible is now skipped;
4) _mix_pool_bytes(random_get_entropy() ^ jiffies) is now called in
addition to _mix_pool_bytes(buf).
I'd now like to map (1), (2), (3), and (4) back to the rationale in
your commit message.
(2) seems to be related to:
> On the first call to add_hwgenerator_randomness(), crng_fast_load() is
> executed, and if the seed is long enough, crng_init will be set to 1.
> However, no entropy is currently credited for that, even though the
> name and description of CONFIG_RANDOM_TRUST_BOOTLOADER states otherwise.
But it seems like rather than going from:
if (!ready) {
crng_fast_load(buf, size);
return;
}
to:
if (!ready)
crng_fast_load(buf, size);
the actually corresponding fix would be to go instead to:
if (!ready)
crng_fast_load(buf, size);
if (!ready)
return;
(3) seems to be related to:
> wait_event_interruptible() (which makes no sense for the init process)
which is fine.
(1) and (4) I don't see justification for. Perhaps it's a mistake?
Regards,
Jason
Hey Dominik,
Just following up here, as I never heard back from you. These seem
like real bug I'd like to fix at some point. I was waiting to hear
back about your latest patch, and perhaps you wanted to spin a new
revision? Or not, in which case, please let me know?
Thanks,
Jason
Hey Ivan,
On Mon, Dec 6, 2021 at 9:14 AM Ivan T. Ivanov <[email protected]> wrote:
> Initial bug report could be found here [1]. Comments 14 and onward are
> probably helpful. To reproduce the issue I have downloaded "assets"
> from [2] and recreated test environment as found in autoinst-log.txt [3].
> Search for qemu-img and qemu-system-aarch64 in the log above. Login
> credentials for the images could be found by searching for "password"
> in the same file.
>
> Regards,
> Ivan
>
>
> [1] https://bugzilla.suse.com/show_bug.cgi?id=1184924
> [2] https://openqa.opensuse.org/tests/latest?arch=aarch64&distri=opensuse&flavor=DVD&machine=aarch64&test=extra_tests_in_textmode&version=15.3
> [3] https://openqa.opensuse.org/tests/2052459/logfile?filename=autoinst-log.txt
After a few rounds, Dominik and I converged on a set of patches that
are now in the crng/random.git tree. Do you think you could try this
tree out against your various test environments to confirm it fixes
the issue SUSE was seeing?
Tree is here: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git
Thanks,
Jason
On 12-30 19:05, Jason A. Donenfeld wrote:
> Date: Thu, 30 Dec 2021 19:05:23 +0100
> From: "Jason A. Donenfeld" <[email protected]>
> To: "Ivan T. Ivanov" <[email protected]>
> Cc: Dominik Brodowski <[email protected]>, Theodore Ts'o
> <[email protected]>, Ard Biesheuvel <[email protected]>,
> [email protected], LKML <[email protected]>, Hsin-Yi
> Wang <[email protected]>
> Subject: Re: [PATCH v4] random: fix crash on multiple early calls to
> add_bootloader_randomness()
> Message-ID: <CAHmME9q6DnMk=p5kL0c1e4TxJOLpdxJpm3RbbgsNE8x1PWwi9g@mail.gmail.com>
>
Hi,
> Hey Ivan,
>
> On Mon, Dec 6, 2021 at 9:14 AM Ivan T. Ivanov <[email protected]> wrote:
> > Initial bug report could be found here [1]. Comments 14 and onward are
> > probably helpful. To reproduce the issue I have downloaded "assets"
> > from [2] and recreated test environment as found in autoinst-log.txt [3].
> > Search for qemu-img and qemu-system-aarch64 in the log above. Login
> > credentials for the images could be found by searching for "password"
> > in the same file.
> >
> > Regards,
> > Ivan
> >
> >
> > [1] https://bugzilla.suse.com/show_bug.cgi?id=1184924
> > [2] https://openqa.opensuse.org/tests/latest?arch=aarch64&distri=opensuse&flavor=DVD&machine=aarch64&test=extra_tests_in_textmode&version=15.3
> > [3] https://openqa.opensuse.org/tests/2052459/logfile?filename=autoinst-log.txt
>
> After a few rounds, Dominik and I converged on a set of patches that
> are now in the crng/random.git tree. Do you think you could try this
> tree out against your various test environments to confirm it fixes
> the issue SUSE was seeing?
>
> Tree is here: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git
>
Sure, once I have result will post back.
Thanks!
Ivan