2020-11-05 15:31:51

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

When reseeding the CRNG periodically, arch_get_random_seed_long() is
called to obtain entropy from an architecture specific source if one
is implemented. In most cases, these are special instructions, but in
some cases, such as on ARM, we may want to back this using firmware
calls, which are considerably more expensive.

Another call to arch_get_random_seed_long() exists in the CRNG driver,
in add_interrupt_randomness(), which collects entropy by capturing
inter-interrupt timing and relying on interrupt jitter to provide
random bits. This is done by keeping a per-CPU state, and mixing in
the IRQ number, the cycle counter and the return address every time an
interrupt is taken, and mixing this per-CPU state into the entropy pool
every 64 invocations, or at least once per second. The entropy that is
gathered this way is credited as 1 bit of entropy. Every time this
happens, arch_get_random_seed_long() is invoked, and the result is
mixed in as well, and also credited with 1 bit of entropy.

This means that arch_get_random_seed_long() is called at least once
per second on every CPU, which seems excessive, and doesn't really
scale, especially in a virtualization scenario where CPUs may be
oversubscribed: in cases where arch_get_random_seed_long() is backed
by an instruction that actually goes back to a shared hardware entropy
source (such as RNDRRS on ARM), we will end up hitting it hundreds of
times per second.

So let's drop the call to arch_get_random_seed_long() from
add_interrupt_randomness(), and instead, rely on crng_reseed() to call
the arch hook to get random seed material from the platform.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/char/random.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 2a41b21623ae..a9c393c1466d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
cycles_t cycles = random_get_entropy();
__u32 c_high, j_high;
__u64 ip;
- unsigned long seed;
- int credit = 0;

if (cycles == 0)
cycles = get_reg(fast_pool, regs);
@@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)

fast_pool->last = now;
__mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
-
- /*
- * If we have architectural seed generator, produce a seed and
- * add it to the pool. For the sake of paranoia don't let the
- * architectural seed generator dominate the input from the
- * interrupt noise.
- */
- if (arch_get_random_seed_long(&seed)) {
- __mix_pool_bytes(r, &seed, sizeof(seed));
- credit = 1;
- }
spin_unlock(&r->lock);

fast_pool->count = 0;

/* award one bit for the contents of the fast pool */
- credit_entropy_bits(r, credit + 1);
+ credit_entropy_bits(r, 1);
}
EXPORT_SYMBOL_GPL(add_interrupt_randomness);

--
2.17.1


2020-11-11 08:22:03

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

(+ Eric)

On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
>
> When reseeding the CRNG periodically, arch_get_random_seed_long() is
> called to obtain entropy from an architecture specific source if one
> is implemented. In most cases, these are special instructions, but in
> some cases, such as on ARM, we may want to back this using firmware
> calls, which are considerably more expensive.
>
> Another call to arch_get_random_seed_long() exists in the CRNG driver,
> in add_interrupt_randomness(), which collects entropy by capturing
> inter-interrupt timing and relying on interrupt jitter to provide
> random bits. This is done by keeping a per-CPU state, and mixing in
> the IRQ number, the cycle counter and the return address every time an
> interrupt is taken, and mixing this per-CPU state into the entropy pool
> every 64 invocations, or at least once per second. The entropy that is
> gathered this way is credited as 1 bit of entropy. Every time this
> happens, arch_get_random_seed_long() is invoked, and the result is
> mixed in as well, and also credited with 1 bit of entropy.
>
> This means that arch_get_random_seed_long() is called at least once
> per second on every CPU, which seems excessive, and doesn't really
> scale, especially in a virtualization scenario where CPUs may be
> oversubscribed: in cases where arch_get_random_seed_long() is backed
> by an instruction that actually goes back to a shared hardware entropy
> source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> times per second.
>
> So let's drop the call to arch_get_random_seed_long() from
> add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> the arch hook to get random seed material from the platform.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> drivers/char/random.c | 15 +--------------
> 1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 2a41b21623ae..a9c393c1466d 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> cycles_t cycles = random_get_entropy();
> __u32 c_high, j_high;
> __u64 ip;
> - unsigned long seed;
> - int credit = 0;
>
> if (cycles == 0)
> cycles = get_reg(fast_pool, regs);
> @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
>
> fast_pool->last = now;
> __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> -
> - /*
> - * If we have architectural seed generator, produce a seed and
> - * add it to the pool. For the sake of paranoia don't let the
> - * architectural seed generator dominate the input from the
> - * interrupt noise.
> - */
> - if (arch_get_random_seed_long(&seed)) {
> - __mix_pool_bytes(r, &seed, sizeof(seed));
> - credit = 1;
> - }
> spin_unlock(&r->lock);
>
> fast_pool->count = 0;
>
> /* award one bit for the contents of the fast pool */
> - credit_entropy_bits(r, credit + 1);
> + credit_entropy_bits(r, 1);
> }
> EXPORT_SYMBOL_GPL(add_interrupt_randomness);
>
> --
> 2.17.1
>

2020-11-11 09:47:29

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On 11/11/2020 08:19, Ard Biesheuvel wrote:

Hi,

> (+ Eric)
>
> On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
>>
>> When reseeding the CRNG periodically, arch_get_random_seed_long() is
>> called to obtain entropy from an architecture specific source if one
>> is implemented. In most cases, these are special instructions, but in
>> some cases, such as on ARM, we may want to back this using firmware
>> calls, which are considerably more expensive.
>>
>> Another call to arch_get_random_seed_long() exists in the CRNG driver,
>> in add_interrupt_randomness(), which collects entropy by capturing
>> inter-interrupt timing and relying on interrupt jitter to provide
>> random bits. This is done by keeping a per-CPU state, and mixing in
>> the IRQ number, the cycle counter and the return address every time an
>> interrupt is taken, and mixing this per-CPU state into the entropy pool
>> every 64 invocations, or at least once per second. The entropy that is
>> gathered this way is credited as 1 bit of entropy. Every time this
>> happens, arch_get_random_seed_long() is invoked, and the result is
>> mixed in as well, and also credited with 1 bit of entropy.
>>
>> This means that arch_get_random_seed_long() is called at least once
>> per second on every CPU, which seems excessive, and doesn't really
>> scale, especially in a virtualization scenario where CPUs may be
>> oversubscribed: in cases where arch_get_random_seed_long() is backed
>> by an instruction that actually goes back to a shared hardware entropy
>> source (such as RNDRRS on ARM), we will end up hitting it hundreds of
>> times per second.

May I ask why this should be a particular problem? Form what I gathered
on the web, it seems like most h/w RNGs have a capacity of multiple
MBit/s. Wikipedia [1] suggests that the x86 CPU instructions generate at
least 20 Mbit/s (worst case: AMD's 2500 cycles @ 800 MHz), and I
measured around 78 Mbit/s with the raw entropy source on my Juno
(possibly even limited by slow MMIO).
So it seems unlikely that a few kbit/s drain the hardware entropy source.

If we consider this interface comparably cheap, should we then not try
to plug the Arm firmware interface into this?

I am not against this patch, actually am considering this a nice
cleanup, to separate interrupt generated entropy from other sources.
Especially since we call arch_get_random_seed_long() under a spinlock here.
But I am curious about the expectations from arch_get_random in general.

>> So let's drop the call to arch_get_random_seed_long() from
>> add_interrupt_randomness(), and instead, rely on crng_reseed() to call
>> the arch hook to get random seed material from the platform.

So I tested this and it works as expected: I see some calls on
initialisation, then a handful of calls every few seconds from the
periodic reseeding. The large number of calls every second are gone.

>>
>> Signed-off-by: Ard Biesheuvel <[email protected]>

Since the above questions are unrelated to this particular patch:

Reviewed-by: Andre Przywara <[email protected]>
Tested-by: Andre Przywara <[email protected]>

Cheers,
Andre

[1] https://en.wikipedia.org/wiki/RDRAND#Performance

>> ---
>> drivers/char/random.c | 15 +--------------
>> 1 file changed, 1 insertion(+), 14 deletions(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index 2a41b21623ae..a9c393c1466d 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
>> cycles_t cycles = random_get_entropy();
>> __u32 c_high, j_high;
>> __u64 ip;
>> - unsigned long seed;
>> - int credit = 0;
>>
>> if (cycles == 0)
>> cycles = get_reg(fast_pool, regs);
>> @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
>>
>> fast_pool->last = now;
>> __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
>> -
>> - /*
>> - * If we have architectural seed generator, produce a seed and
>> - * add it to the pool. For the sake of paranoia don't let the
>> - * architectural seed generator dominate the input from the
>> - * interrupt noise.
>> - */
>> - if (arch_get_random_seed_long(&seed)) {
>> - __mix_pool_bytes(r, &seed, sizeof(seed));
>> - credit = 1;
>> - }
>> spin_unlock(&r->lock);
>>
>> fast_pool->count = 0;
>>
>> /* award one bit for the contents of the fast pool */
>> - credit_entropy_bits(r, credit + 1);
>> + credit_entropy_bits(r, 1);
>> }
>> EXPORT_SYMBOL_GPL(add_interrupt_randomness);
>>
>> --
>> 2.17.1
>>

2020-11-11 10:10:04

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Wed, 11 Nov 2020 at 10:45, André Przywara <[email protected]> wrote:
>
> On 11/11/2020 08:19, Ard Biesheuvel wrote:
>
> Hi,
>
> > (+ Eric)
> >
> > On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> >>
> >> When reseeding the CRNG periodically, arch_get_random_seed_long() is
> >> called to obtain entropy from an architecture specific source if one
> >> is implemented. In most cases, these are special instructions, but in
> >> some cases, such as on ARM, we may want to back this using firmware
> >> calls, which are considerably more expensive.
> >>
> >> Another call to arch_get_random_seed_long() exists in the CRNG driver,
> >> in add_interrupt_randomness(), which collects entropy by capturing
> >> inter-interrupt timing and relying on interrupt jitter to provide
> >> random bits. This is done by keeping a per-CPU state, and mixing in
> >> the IRQ number, the cycle counter and the return address every time an
> >> interrupt is taken, and mixing this per-CPU state into the entropy pool
> >> every 64 invocations, or at least once per second. The entropy that is
> >> gathered this way is credited as 1 bit of entropy. Every time this
> >> happens, arch_get_random_seed_long() is invoked, and the result is
> >> mixed in as well, and also credited with 1 bit of entropy.
> >>
> >> This means that arch_get_random_seed_long() is called at least once
> >> per second on every CPU, which seems excessive, and doesn't really
> >> scale, especially in a virtualization scenario where CPUs may be
> >> oversubscribed: in cases where arch_get_random_seed_long() is backed
> >> by an instruction that actually goes back to a shared hardware entropy
> >> source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> >> times per second.
>
> May I ask why this should be a particular problem? Form what I gathered
> on the web, it seems like most h/w RNGs have a capacity of multiple
> MBit/s. Wikipedia [1] suggests that the x86 CPU instructions generate at
> least 20 Mbit/s (worst case: AMD's 2500 cycles @ 800 MHz), and I
> measured around 78 Mbit/s with the raw entropy source on my Juno
> (possibly even limited by slow MMIO).
> So it seems unlikely that a few kbit/s drain the hardware entropy source.
>
> If we consider this interface comparably cheap, should we then not try
> to plug the Arm firmware interface into this?
>

I'm not sure I follow. Are you saying we should not wire up a
comparatively expensive firmware interface to
arch_get_random_seed_long() because we currently assume it is backed
by something cheap?

Because doing so would add significantly to the cost. Also note that a
firmware interface would permit other ways of gathering entropy that
are not necessarily backed by a dedicated high bandwidth noise source
(and we already have examples of this)


> I am not against this patch, actually am considering this a nice
> cleanup, to separate interrupt generated entropy from other sources.
> Especially since we call arch_get_random_seed_long() under a spinlock here.
> But I am curious about the expectations from arch_get_random in general.
>

I think it is reasonable to clean this up a little bit. A random
*seed* is not the same thing as a random number, and given that we
expose both interfaces, it makes sense to permit the seed variant to
be more costly, and only use it as intended (i.e., to seed a random
number generator)

> >> So let's drop the call to arch_get_random_seed_long() from
> >> add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> >> the arch hook to get random seed material from the platform.
>
> So I tested this and it works as expected: I see some calls on
> initialisation, then a handful of calls every few seconds from the
> periodic reseeding. The large number of calls every second are gone.
>

Excellent, thanks for confirming.

> >>
> >> Signed-off-by: Ard Biesheuvel <[email protected]>
>
> Since the above questions are unrelated to this particular patch:
>
> Reviewed-by: Andre Przywara <[email protected]>
> Tested-by: Andre Przywara <[email protected]>
>
> Cheers,
> Andre
>
> [1] https://en.wikipedia.org/wiki/RDRAND#Performance
>
> >> ---
> >> drivers/char/random.c | 15 +--------------
> >> 1 file changed, 1 insertion(+), 14 deletions(-)
> >>
> >> diff --git a/drivers/char/random.c b/drivers/char/random.c
> >> index 2a41b21623ae..a9c393c1466d 100644
> >> --- a/drivers/char/random.c
> >> +++ b/drivers/char/random.c
> >> @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> >> cycles_t cycles = random_get_entropy();
> >> __u32 c_high, j_high;
> >> __u64 ip;
> >> - unsigned long seed;
> >> - int credit = 0;
> >>
> >> if (cycles == 0)
> >> cycles = get_reg(fast_pool, regs);
> >> @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
> >>
> >> fast_pool->last = now;
> >> __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> >> -
> >> - /*
> >> - * If we have architectural seed generator, produce a seed and
> >> - * add it to the pool. For the sake of paranoia don't let the
> >> - * architectural seed generator dominate the input from the
> >> - * interrupt noise.
> >> - */
> >> - if (arch_get_random_seed_long(&seed)) {
> >> - __mix_pool_bytes(r, &seed, sizeof(seed));
> >> - credit = 1;
> >> - }
> >> spin_unlock(&r->lock);
> >>
> >> fast_pool->count = 0;
> >>
> >> /* award one bit for the contents of the fast pool */
> >> - credit_entropy_bits(r, credit + 1);
> >> + credit_entropy_bits(r, 1);
> >> }
> >> EXPORT_SYMBOL_GPL(add_interrupt_randomness);
> >>
> >> --
> >> 2.17.1
> >>
>

2020-11-11 10:51:03

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On 11/11/2020 10:05, Ard Biesheuvel wrote:

Hi,

> On Wed, 11 Nov 2020 at 10:45, André Przywara <[email protected]> wrote:
>>
>> On 11/11/2020 08:19, Ard Biesheuvel wrote:
>>
>> Hi,
>>
>>> (+ Eric)
>>>
>>> On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
>>>>
>>>> When reseeding the CRNG periodically, arch_get_random_seed_long() is
>>>> called to obtain entropy from an architecture specific source if one
>>>> is implemented. In most cases, these are special instructions, but in
>>>> some cases, such as on ARM, we may want to back this using firmware
>>>> calls, which are considerably more expensive.
>>>>
>>>> Another call to arch_get_random_seed_long() exists in the CRNG driver,
>>>> in add_interrupt_randomness(), which collects entropy by capturing
>>>> inter-interrupt timing and relying on interrupt jitter to provide
>>>> random bits. This is done by keeping a per-CPU state, and mixing in
>>>> the IRQ number, the cycle counter and the return address every time an
>>>> interrupt is taken, and mixing this per-CPU state into the entropy pool
>>>> every 64 invocations, or at least once per second. The entropy that is
>>>> gathered this way is credited as 1 bit of entropy. Every time this
>>>> happens, arch_get_random_seed_long() is invoked, and the result is
>>>> mixed in as well, and also credited with 1 bit of entropy.
>>>>
>>>> This means that arch_get_random_seed_long() is called at least once
>>>> per second on every CPU, which seems excessive, and doesn't really
>>>> scale, especially in a virtualization scenario where CPUs may be
>>>> oversubscribed: in cases where arch_get_random_seed_long() is backed
>>>> by an instruction that actually goes back to a shared hardware entropy
>>>> source (such as RNDRRS on ARM), we will end up hitting it hundreds of
>>>> times per second.
>>
>> May I ask why this should be a particular problem? Form what I gathered
>> on the web, it seems like most h/w RNGs have a capacity of multiple
>> MBit/s. Wikipedia [1] suggests that the x86 CPU instructions generate at
>> least 20 Mbit/s (worst case: AMD's 2500 cycles @ 800 MHz), and I
>> measured around 78 Mbit/s with the raw entropy source on my Juno
>> (possibly even limited by slow MMIO).
>> So it seems unlikely that a few kbit/s drain the hardware entropy source.
>>
>> If we consider this interface comparably cheap, should we then not try
>> to plug the Arm firmware interface into this?
>>
>
> I'm not sure I follow. Are you saying we should not wire up a
> comparatively expensive firmware interface to
> arch_get_random_seed_long() because we currently assume it is backed
> by something cheap?

Yes. I wanted to (ab)use this patch to clarify this. x86 and arm64 use
CPU instructions (so far), S390 copies from some buffer. PPC uses either
a CPU instruction or an MMIO access. All of these I would consider
comparably cheap, especially when compared to a firmware call with
unknown costs. In fact the current Trusted Firmware implementation[1] is
not really terse, also the generic SMC dispatcher calls a platform
defined routine, which could do anything.
So to also guide the implementation in TF-A, it would be good to
establish what arch_get_random expects to be. The current
implementations and the fact that it lives in a header file suggests
that it's meant as a slim wrapper around something cheap.

> Because doing so would add significantly to the cost. Also note that a
> firmware interface would permit other ways of gathering entropy that
> are not necessarily backed by a dedicated high bandwidth noise source
> (and we already have examples of this)

Yes, agreed.
So I have a hwrng driver for the Arm SMCCC TRNG interface ready. I would
post this, but would like to know if we should drop the proposed
arch_get_random implementation [2][3] of this interface.

>> I am not against this patch, actually am considering this a nice
>> cleanup, to separate interrupt generated entropy from other sources.
>> Especially since we call arch_get_random_seed_long() under a spinlock here.
>> But I am curious about the expectations from arch_get_random in general.
>>
>
> I think it is reasonable to clean this up a little bit. A random
> *seed* is not the same thing as a random number, and given that we
> expose both interfaces, it makes sense to permit the seed variant to
> be more costly, and only use it as intended (i.e., to seed a random
> number generator)

That's true, it seems we chickened out on the arm64 implementation
already, by not using the intended stronger instruction for seed
(RNDRRS), and not implementing arch_get_random_long() at all.
But I guess that's another story.

Cheers,
Andre.

[1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/5585/3
[2]
http://lists.infradead.org/pipermail/linux-arm-kernel/2020-November/615375.html
[3]
http://lists.infradead.org/pipermail/linux-arm-kernel/2020-November/615376.html

>>>> So let's drop the call to arch_get_random_seed_long() from
>>>> add_interrupt_randomness(), and instead, rely on crng_reseed() to call
>>>> the arch hook to get random seed material from the platform.
>>
>> So I tested this and it works as expected: I see some calls on
>> initialisation, then a handful of calls every few seconds from the
>> periodic reseeding. The large number of calls every second are gone.
>>
>
> Excellent, thanks for confirming.
>
>>>>
>>>> Signed-off-by: Ard Biesheuvel <[email protected]>
>>
>> Since the above questions are unrelated to this particular patch:
>>
>> Reviewed-by: Andre Przywara <[email protected]>
>> Tested-by: Andre Przywara <[email protected]>
>>
>> Cheers,
>> Andre
>>
>> [1] https://en.wikipedia.org/wiki/RDRAND#Performance
>>
>>>> ---
>>>> drivers/char/random.c | 15 +--------------
>>>> 1 file changed, 1 insertion(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>>>> index 2a41b21623ae..a9c393c1466d 100644
>>>> --- a/drivers/char/random.c
>>>> +++ b/drivers/char/random.c
>>>> @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
>>>> cycles_t cycles = random_get_entropy();
>>>> __u32 c_high, j_high;
>>>> __u64 ip;
>>>> - unsigned long seed;
>>>> - int credit = 0;
>>>>
>>>> if (cycles == 0)
>>>> cycles = get_reg(fast_pool, regs);
>>>> @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
>>>>
>>>> fast_pool->last = now;
>>>> __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
>>>> -
>>>> - /*
>>>> - * If we have architectural seed generator, produce a seed and
>>>> - * add it to the pool. For the sake of paranoia don't let the
>>>> - * architectural seed generator dominate the input from the
>>>> - * interrupt noise.
>>>> - */
>>>> - if (arch_get_random_seed_long(&seed)) {
>>>> - __mix_pool_bytes(r, &seed, sizeof(seed));
>>>> - credit = 1;
>>>> - }
>>>> spin_unlock(&r->lock);
>>>>
>>>> fast_pool->count = 0;
>>>>
>>>> /* award one bit for the contents of the fast pool */
>>>> - credit_entropy_bits(r, credit + 1);
>>>> + credit_entropy_bits(r, 1);
>>>> }
>>>> EXPORT_SYMBOL_GPL(add_interrupt_randomness);
>>>>
>>>> --
>>>> 2.17.1
>>>>
>>

2020-11-11 11:50:56

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Wed, 11 Nov 2020 at 11:46, André Przywara <[email protected]> wrote:
>
> On 11/11/2020 10:05, Ard Biesheuvel wrote:
>
> Hi,
>
> > On Wed, 11 Nov 2020 at 10:45, André Przywara <[email protected]> wrote:
> >>
> >> On 11/11/2020 08:19, Ard Biesheuvel wrote:
> >>
> >> Hi,
> >>
> >>> (+ Eric)
> >>>
> >>> On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> >>>>
> >>>> When reseeding the CRNG periodically, arch_get_random_seed_long() is
> >>>> called to obtain entropy from an architecture specific source if one
> >>>> is implemented. In most cases, these are special instructions, but in
> >>>> some cases, such as on ARM, we may want to back this using firmware
> >>>> calls, which are considerably more expensive.
> >>>>
> >>>> Another call to arch_get_random_seed_long() exists in the CRNG driver,
> >>>> in add_interrupt_randomness(), which collects entropy by capturing
> >>>> inter-interrupt timing and relying on interrupt jitter to provide
> >>>> random bits. This is done by keeping a per-CPU state, and mixing in
> >>>> the IRQ number, the cycle counter and the return address every time an
> >>>> interrupt is taken, and mixing this per-CPU state into the entropy pool
> >>>> every 64 invocations, or at least once per second. The entropy that is
> >>>> gathered this way is credited as 1 bit of entropy. Every time this
> >>>> happens, arch_get_random_seed_long() is invoked, and the result is
> >>>> mixed in as well, and also credited with 1 bit of entropy.
> >>>>
> >>>> This means that arch_get_random_seed_long() is called at least once
> >>>> per second on every CPU, which seems excessive, and doesn't really
> >>>> scale, especially in a virtualization scenario where CPUs may be
> >>>> oversubscribed: in cases where arch_get_random_seed_long() is backed
> >>>> by an instruction that actually goes back to a shared hardware entropy
> >>>> source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> >>>> times per second.
> >>
> >> May I ask why this should be a particular problem? Form what I gathered
> >> on the web, it seems like most h/w RNGs have a capacity of multiple
> >> MBit/s. Wikipedia [1] suggests that the x86 CPU instructions generate at
> >> least 20 Mbit/s (worst case: AMD's 2500 cycles @ 800 MHz), and I
> >> measured around 78 Mbit/s with the raw entropy source on my Juno
> >> (possibly even limited by slow MMIO).
> >> So it seems unlikely that a few kbit/s drain the hardware entropy source.
> >>
> >> If we consider this interface comparably cheap, should we then not try
> >> to plug the Arm firmware interface into this?
> >>
> >
> > I'm not sure I follow. Are you saying we should not wire up a
> > comparatively expensive firmware interface to
> > arch_get_random_seed_long() because we currently assume it is backed
> > by something cheap?
>
> Yes. I wanted to (ab)use this patch to clarify this. x86 and arm64 use
> CPU instructions (so far), S390 copies from some buffer. PPC uses either
> a CPU instruction or an MMIO access. All of these I would consider
> comparably cheap, especially when compared to a firmware call with
> unknown costs. In fact the current Trusted Firmware implementation[1] is
> not really terse, also the generic SMC dispatcher calls a platform
> defined routine, which could do anything.
> So to also guide the implementation in TF-A, it would be good to
> establish what arch_get_random expects to be. The current
> implementations and the fact that it lives in a header file suggests
> that it's meant as a slim wrapper around something cheap.
>

Reseeding a random number generator is simply not something that you
should need to do hundreds of times per second, regardless of whether
its invocation is from a header file.



> > Because doing so would add significantly to the cost. Also note that a
> > firmware interface would permit other ways of gathering entropy that
> > are not necessarily backed by a dedicated high bandwidth noise source
> > (and we already have examples of this)
>
> Yes, agreed.
> So I have a hwrng driver for the Arm SMCCC TRNG interface ready. I would
> post this, but would like to know if we should drop the proposed
> arch_get_random implementation [2][3] of this interface.
>

No, that would defeat the whole point. There is no reason we should
wait for the entire driver stack to come up just to issue an
architected SMC call. This is basically the reason the spec got issued
in the first place.

> >> I am not against this patch, actually am considering this a nice
> >> cleanup, to separate interrupt generated entropy from other sources.
> >> Especially since we call arch_get_random_seed_long() under a spinlock here.
> >> But I am curious about the expectations from arch_get_random in general.
> >>
> >
> > I think it is reasonable to clean this up a little bit. A random
> > *seed* is not the same thing as a random number, and given that we
> > expose both interfaces, it makes sense to permit the seed variant to
> > be more costly, and only use it as intended (i.e., to seed a random
> > number generator)
>
> That's true, it seems we chickened out on the arm64 implementation
> already, by not using the intended stronger instruction for seed
> (RNDRRS), and not implementing arch_get_random_long() at all.
> But I guess that's another story.
>

Yes it is.

2020-11-17 13:37:07

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Wed, 11 Nov 2020 at 09:19, Ard Biesheuvel <[email protected]> wrote:
>
> (+ Eric)
>
> On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> >
> > When reseeding the CRNG periodically, arch_get_random_seed_long() is
> > called to obtain entropy from an architecture specific source if one
> > is implemented. In most cases, these are special instructions, but in
> > some cases, such as on ARM, we may want to back this using firmware
> > calls, which are considerably more expensive.
> >
> > Another call to arch_get_random_seed_long() exists in the CRNG driver,
> > in add_interrupt_randomness(), which collects entropy by capturing
> > inter-interrupt timing and relying on interrupt jitter to provide
> > random bits. This is done by keeping a per-CPU state, and mixing in
> > the IRQ number, the cycle counter and the return address every time an
> > interrupt is taken, and mixing this per-CPU state into the entropy pool
> > every 64 invocations, or at least once per second. The entropy that is
> > gathered this way is credited as 1 bit of entropy. Every time this
> > happens, arch_get_random_seed_long() is invoked, and the result is
> > mixed in as well, and also credited with 1 bit of entropy.
> >
> > This means that arch_get_random_seed_long() is called at least once
> > per second on every CPU, which seems excessive, and doesn't really
> > scale, especially in a virtualization scenario where CPUs may be
> > oversubscribed: in cases where arch_get_random_seed_long() is backed
> > by an instruction that actually goes back to a shared hardware entropy
> > source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> > times per second.
> >
> > So let's drop the call to arch_get_random_seed_long() from
> > add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> > the arch hook to get random seed material from the platform.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > drivers/char/random.c | 15 +--------------
> > 1 file changed, 1 insertion(+), 14 deletions(-)
> >
> > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > index 2a41b21623ae..a9c393c1466d 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > cycles_t cycles = random_get_entropy();
> > __u32 c_high, j_high;
> > __u64 ip;
> > - unsigned long seed;
> > - int credit = 0;
> >
> > if (cycles == 0)
> > cycles = get_reg(fast_pool, regs);
> > @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
> >
> > fast_pool->last = now;
> > __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> > -
> > - /*
> > - * If we have architectural seed generator, produce a seed and
> > - * add it to the pool. For the sake of paranoia don't let the
> > - * architectural seed generator dominate the input from the
> > - * interrupt noise.
> > - */
> > - if (arch_get_random_seed_long(&seed)) {
> > - __mix_pool_bytes(r, &seed, sizeof(seed));
> > - credit = 1;
> > - }
> > spin_unlock(&r->lock);
> >
> > fast_pool->count = 0;
> >
> > /* award one bit for the contents of the fast pool */
> > - credit_entropy_bits(r, credit + 1);
> > + credit_entropy_bits(r, 1);
> > }
> > EXPORT_SYMBOL_GPL(add_interrupt_randomness);
> >
> > --
> > 2.17.1
> >

Ping?

2020-11-20 04:14:42

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Wed, Nov 11, 2020 at 09:19:37AM +0100, Ard Biesheuvel wrote:
> (+ Eric)
>
> On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> >
> > When reseeding the CRNG periodically, arch_get_random_seed_long() is
> > called to obtain entropy from an architecture specific source if one
> > is implemented. In most cases, these are special instructions, but in
> > some cases, such as on ARM, we may want to back this using firmware
> > calls, which are considerably more expensive.
> >
> > Another call to arch_get_random_seed_long() exists in the CRNG driver,
> > in add_interrupt_randomness(), which collects entropy by capturing
> > inter-interrupt timing and relying on interrupt jitter to provide
> > random bits. This is done by keeping a per-CPU state, and mixing in
> > the IRQ number, the cycle counter and the return address every time an
> > interrupt is taken, and mixing this per-CPU state into the entropy pool
> > every 64 invocations, or at least once per second. The entropy that is
> > gathered this way is credited as 1 bit of entropy. Every time this
> > happens, arch_get_random_seed_long() is invoked, and the result is
> > mixed in as well, and also credited with 1 bit of entropy.
> >
> > This means that arch_get_random_seed_long() is called at least once
> > per second on every CPU, which seems excessive, and doesn't really
> > scale, especially in a virtualization scenario where CPUs may be
> > oversubscribed: in cases where arch_get_random_seed_long() is backed
> > by an instruction that actually goes back to a shared hardware entropy
> > source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> > times per second.
> >
> > So let's drop the call to arch_get_random_seed_long() from
> > add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> > the arch hook to get random seed material from the platform.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > drivers/char/random.c | 15 +--------------
> > 1 file changed, 1 insertion(+), 14 deletions(-)
> >
> > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > index 2a41b21623ae..a9c393c1466d 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > cycles_t cycles = random_get_entropy();
> > __u32 c_high, j_high;
> > __u64 ip;
> > - unsigned long seed;
> > - int credit = 0;
> >
> > if (cycles == 0)
> > cycles = get_reg(fast_pool, regs);
> > @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
> >
> > fast_pool->last = now;
> > __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> > -
> > - /*
> > - * If we have architectural seed generator, produce a seed and
> > - * add it to the pool. For the sake of paranoia don't let the
> > - * architectural seed generator dominate the input from the
> > - * interrupt noise.
> > - */
> > - if (arch_get_random_seed_long(&seed)) {
> > - __mix_pool_bytes(r, &seed, sizeof(seed));
> > - credit = 1;
> > - }
> > spin_unlock(&r->lock);
> >
> > fast_pool->count = 0;
> >
> > /* award one bit for the contents of the fast pool */
> > - credit_entropy_bits(r, credit + 1);
> > + credit_entropy_bits(r, 1);
> > }
> > EXPORT_SYMBOL_GPL(add_interrupt_randomness);

Looks reasonable to me. The CRNG state already gets XOR'ed with the output of
arch_get_random_seed_long() each time the CRNG is reseeded. Calling
arch_get_random_seed_long() here too isn't necessary, and it's not really
appropriate to repeatedly call it during interrupt handling, as you point out.

Reviewed-by: Eric Biggers <[email protected]>

- Eric

2020-11-20 15:31:54

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On 2020-11-05 15:29, Ard Biesheuvel wrote:
> When reseeding the CRNG periodically, arch_get_random_seed_long() is
> called to obtain entropy from an architecture specific source if one
> is implemented. In most cases, these are special instructions, but in
> some cases, such as on ARM, we may want to back this using firmware
> calls, which are considerably more expensive.
>
> Another call to arch_get_random_seed_long() exists in the CRNG driver,
> in add_interrupt_randomness(), which collects entropy by capturing
> inter-interrupt timing and relying on interrupt jitter to provide
> random bits. This is done by keeping a per-CPU state, and mixing in
> the IRQ number, the cycle counter and the return address every time an
> interrupt is taken, and mixing this per-CPU state into the entropy pool
> every 64 invocations, or at least once per second. The entropy that is
> gathered this way is credited as 1 bit of entropy. Every time this
> happens, arch_get_random_seed_long() is invoked, and the result is
> mixed in as well, and also credited with 1 bit of entropy.
>
> This means that arch_get_random_seed_long() is called at least once
> per second on every CPU, which seems excessive, and doesn't really
> scale, especially in a virtualization scenario where CPUs may be
> oversubscribed: in cases where arch_get_random_seed_long() is backed
> by an instruction that actually goes back to a shared hardware entropy
> source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> times per second.
>
> So let's drop the call to arch_get_random_seed_long() from
> add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> the arch hook to get random seed material from the platform.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>

Looks sensible. Having this on the interrupt path looks quite
heavy handed, and my understanding of the above is that it has
an adverse effect on the entropy pool.

Acked-by: Marc Zyngier <[email protected]>

M.
--
Jazz is not dead. It just smells funny...

2020-11-27 12:13:13

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Fri, 20 Nov 2020 at 16:27, Marc Zyngier <[email protected]> wrote:
>
> On 2020-11-05 15:29, Ard Biesheuvel wrote:
> > When reseeding the CRNG periodically, arch_get_random_seed_long() is
> > called to obtain entropy from an architecture specific source if one
> > is implemented. In most cases, these are special instructions, but in
> > some cases, such as on ARM, we may want to back this using firmware
> > calls, which are considerably more expensive.
> >
> > Another call to arch_get_random_seed_long() exists in the CRNG driver,
> > in add_interrupt_randomness(), which collects entropy by capturing
> > inter-interrupt timing and relying on interrupt jitter to provide
> > random bits. This is done by keeping a per-CPU state, and mixing in
> > the IRQ number, the cycle counter and the return address every time an
> > interrupt is taken, and mixing this per-CPU state into the entropy pool
> > every 64 invocations, or at least once per second. The entropy that is
> > gathered this way is credited as 1 bit of entropy. Every time this
> > happens, arch_get_random_seed_long() is invoked, and the result is
> > mixed in as well, and also credited with 1 bit of entropy.
> >
> > This means that arch_get_random_seed_long() is called at least once
> > per second on every CPU, which seems excessive, and doesn't really
> > scale, especially in a virtualization scenario where CPUs may be
> > oversubscribed: in cases where arch_get_random_seed_long() is backed
> > by an instruction that actually goes back to a shared hardware entropy
> > source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> > times per second.
> >
> > So let's drop the call to arch_get_random_seed_long() from
> > add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> > the arch hook to get random seed material from the platform.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
>
> Looks sensible. Having this on the interrupt path looks quite
> heavy handed, and my understanding of the above is that it has
> an adverse effect on the entropy pool.
>
> Acked-by: Marc Zyngier <[email protected]>
>

Thanks Marc.

Ted, any thoughts?

2020-12-01 12:30:40

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

(+ Jason)

On Fri, 20 Nov 2020 at 05:11, Eric Biggers <[email protected]> wrote:
>
> On Wed, Nov 11, 2020 at 09:19:37AM +0100, Ard Biesheuvel wrote:
> > (+ Eric)
> >
> > On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > When reseeding the CRNG periodically, arch_get_random_seed_long() is
> > > called to obtain entropy from an architecture specific source if one
> > > is implemented. In most cases, these are special instructions, but in
> > > some cases, such as on ARM, we may want to back this using firmware
> > > calls, which are considerably more expensive.
> > >
> > > Another call to arch_get_random_seed_long() exists in the CRNG driver,
> > > in add_interrupt_randomness(), which collects entropy by capturing
> > > inter-interrupt timing and relying on interrupt jitter to provide
> > > random bits. This is done by keeping a per-CPU state, and mixing in
> > > the IRQ number, the cycle counter and the return address every time an
> > > interrupt is taken, and mixing this per-CPU state into the entropy pool
> > > every 64 invocations, or at least once per second. The entropy that is
> > > gathered this way is credited as 1 bit of entropy. Every time this
> > > happens, arch_get_random_seed_long() is invoked, and the result is
> > > mixed in as well, and also credited with 1 bit of entropy.
> > >
> > > This means that arch_get_random_seed_long() is called at least once
> > > per second on every CPU, which seems excessive, and doesn't really
> > > scale, especially in a virtualization scenario where CPUs may be
> > > oversubscribed: in cases where arch_get_random_seed_long() is backed
> > > by an instruction that actually goes back to a shared hardware entropy
> > > source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> > > times per second.
> > >
> > > So let's drop the call to arch_get_random_seed_long() from
> > > add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> > > the arch hook to get random seed material from the platform.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > ---
> > > drivers/char/random.c | 15 +--------------
> > > 1 file changed, 1 insertion(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > index 2a41b21623ae..a9c393c1466d 100644
> > > --- a/drivers/char/random.c
> > > +++ b/drivers/char/random.c
> > > @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > > cycles_t cycles = random_get_entropy();
> > > __u32 c_high, j_high;
> > > __u64 ip;
> > > - unsigned long seed;
> > > - int credit = 0;
> > >
> > > if (cycles == 0)
> > > cycles = get_reg(fast_pool, regs);
> > > @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > >
> > > fast_pool->last = now;
> > > __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> > > -
> > > - /*
> > > - * If we have architectural seed generator, produce a seed and
> > > - * add it to the pool. For the sake of paranoia don't let the
> > > - * architectural seed generator dominate the input from the
> > > - * interrupt noise.
> > > - */
> > > - if (arch_get_random_seed_long(&seed)) {
> > > - __mix_pool_bytes(r, &seed, sizeof(seed));
> > > - credit = 1;
> > > - }
> > > spin_unlock(&r->lock);
> > >
> > > fast_pool->count = 0;
> > >
> > > /* award one bit for the contents of the fast pool */
> > > - credit_entropy_bits(r, credit + 1);
> > > + credit_entropy_bits(r, 1);
> > > }
> > > EXPORT_SYMBOL_GPL(add_interrupt_randomness);
>
> Looks reasonable to me. The CRNG state already gets XOR'ed with the output of
> arch_get_random_seed_long() each time the CRNG is reseeded. Calling
> arch_get_random_seed_long() here too isn't necessary, and it's not really
> appropriate to repeatedly call it during interrupt handling, as you point out.
>
> Reviewed-by: Eric Biggers <[email protected]>
>
> - Eric

2020-12-07 08:15:59

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Tue, 1 Dec 2020 at 13:23, Ard Biesheuvel <[email protected]> wrote:
>
> (+ Jason)
>
> On Fri, 20 Nov 2020 at 05:11, Eric Biggers <[email protected]> wrote:
> >
> > On Wed, Nov 11, 2020 at 09:19:37AM +0100, Ard Biesheuvel wrote:
> > > (+ Eric)
> > >
> > > On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> > > >
> > > > When reseeding the CRNG periodically, arch_get_random_seed_long() is
> > > > called to obtain entropy from an architecture specific source if one
> > > > is implemented. In most cases, these are special instructions, but in
> > > > some cases, such as on ARM, we may want to back this using firmware
> > > > calls, which are considerably more expensive.
> > > >
> > > > Another call to arch_get_random_seed_long() exists in the CRNG driver,
> > > > in add_interrupt_randomness(), which collects entropy by capturing
> > > > inter-interrupt timing and relying on interrupt jitter to provide
> > > > random bits. This is done by keeping a per-CPU state, and mixing in
> > > > the IRQ number, the cycle counter and the return address every time an
> > > > interrupt is taken, and mixing this per-CPU state into the entropy pool
> > > > every 64 invocations, or at least once per second. The entropy that is
> > > > gathered this way is credited as 1 bit of entropy. Every time this
> > > > happens, arch_get_random_seed_long() is invoked, and the result is
> > > > mixed in as well, and also credited with 1 bit of entropy.
> > > >
> > > > This means that arch_get_random_seed_long() is called at least once
> > > > per second on every CPU, which seems excessive, and doesn't really
> > > > scale, especially in a virtualization scenario where CPUs may be
> > > > oversubscribed: in cases where arch_get_random_seed_long() is backed
> > > > by an instruction that actually goes back to a shared hardware entropy
> > > > source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> > > > times per second.
> > > >
> > > > So let's drop the call to arch_get_random_seed_long() from
> > > > add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> > > > the arch hook to get random seed material from the platform.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > > ---
> > > > drivers/char/random.c | 15 +--------------
> > > > 1 file changed, 1 insertion(+), 14 deletions(-)
> > > >
> > > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > > index 2a41b21623ae..a9c393c1466d 100644
> > > > --- a/drivers/char/random.c
> > > > +++ b/drivers/char/random.c
> > > > @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > > > cycles_t cycles = random_get_entropy();
> > > > __u32 c_high, j_high;
> > > > __u64 ip;
> > > > - unsigned long seed;
> > > > - int credit = 0;
> > > >
> > > > if (cycles == 0)
> > > > cycles = get_reg(fast_pool, regs);
> > > > @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > > >
> > > > fast_pool->last = now;
> > > > __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> > > > -
> > > > - /*
> > > > - * If we have architectural seed generator, produce a seed and
> > > > - * add it to the pool. For the sake of paranoia don't let the
> > > > - * architectural seed generator dominate the input from the
> > > > - * interrupt noise.
> > > > - */
> > > > - if (arch_get_random_seed_long(&seed)) {
> > > > - __mix_pool_bytes(r, &seed, sizeof(seed));
> > > > - credit = 1;
> > > > - }
> > > > spin_unlock(&r->lock);
> > > >
> > > > fast_pool->count = 0;
> > > >
> > > > /* award one bit for the contents of the fast pool */
> > > > - credit_entropy_bits(r, credit + 1);
> > > > + credit_entropy_bits(r, 1);
> > > > }
> > > > EXPORT_SYMBOL_GPL(add_interrupt_randomness);
> >
> > Looks reasonable to me. The CRNG state already gets XOR'ed with the output of
> > arch_get_random_seed_long() each time the CRNG is reseeded. Calling
> > arch_get_random_seed_long() here too isn't necessary, and it's not really
> > appropriate to repeatedly call it during interrupt handling, as you point out.
> >
> > Reviewed-by: Eric Biggers <[email protected]>
> >
> > - Eric

Ping?

2020-12-07 14:33:40

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

Hi Ard,

On Tue, Dec 1, 2020 at 1:24 PM Ard Biesheuvel <[email protected]> wrote:
> > > > is implemented. In most cases, these are special instructions, but in
> > > > some cases, such as on ARM, we may want to back this using firmware
> > > > calls, which are considerably more expensive.

This seems fine. But I suppose I'm curious to learn more about what
you have in mind for ARM. We've been assuming that arch_get_random is
not horribly expensive. Usually external RNGs that are horribly
expensive separate hardware take a different route and aren't hooked
into arch_get_random. When you say "we may want to back this using
firmware", does that mean it hasn't happened yet, and you're in a
position to direct the design otherwise? If so, would it be reasonable
to take a different route with that hardware, and keep arch_get_random
for when it's actually implemented by the hardware? Or are there
actually good reasons for keeping it one and the same?

On the other hand, rdrand on intel is getting slower and slower, to
the point where we've removed it from a few places that used to use
it. And I don't see anything terribly wrong with removing the extra
call in this path here. So need be, I'll offer my Reviewed-by. But I
wanted to get an understanding of the fuller story first.

Jason

2020-12-07 15:38:56

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Mon, 7 Dec 2020 at 15:28, Jason A. Donenfeld <[email protected]> wrote:
>
> Hi Ard,
>
> On Tue, Dec 1, 2020 at 1:24 PM Ard Biesheuvel <[email protected]> wrote:
> > > > > is implemented. In most cases, these are special instructions, but in
> > > > > some cases, such as on ARM, we may want to back this using firmware
> > > > > calls, which are considerably more expensive.
>
> This seems fine. But I suppose I'm curious to learn more about what
> you have in mind for ARM. We've been assuming that arch_get_random is
> not horribly expensive. Usually external RNGs that are horribly
> expensive separate hardware take a different route and aren't hooked
> into arch_get_random. When you say "we may want to back this using
> firmware", does that mean it hasn't happened yet, and you're in a
> position to direct the design otherwise? If so, would it be reasonable
> to take a different route with that hardware, and keep arch_get_random
> for when it's actually implemented by the hardware? Or are there
> actually good reasons for keeping it one and the same?
>

Many older generation ARM SoCs have IP blocks that expose an entropy
source of some kind, and map it in the normal world, which is
accessible by the OS directly. These are driven as hwrngs via the
driver stack, which models them as actual devices, with clock and
regulator handling, power management hooks, etc etc.

There are multiple examples where such a SoC is being revved up with
newer cores etc, and now, the IP block is in the secure world, which
means the OS cannot access it directly, and needs to issue an SMC
instruction to perform a firmware call to access it. The secure world
firmware is now entirely in charge of the hardware, and so this SMC
call is really the only thing that goes on in this driver (no clocks,
regulators, etc)

So to prevent fragmentation, as well as make the entropy source
available much earlier in the boot, ARM has issued a firmware spec
that unifies these SMC calls, and defines them as non-blocking, i.e.,
return the requested number of entropy bits, or fail immediately.

Therefore, this should not be super expensive, given that the only
overhead is the CPU cycles spent on calling into the firmware (and a
bit of overhead perhaps from poking some MMIO registers in the IP
block). But it is definitely not suitable for being called hundreds of
times per second, hence this patch. (Note that we are talking about
arch_get_random_seed_long() here, not arch_get_random_long())

> On the other hand, rdrand on intel is getting slower and slower, to
> the point where we've removed it from a few places that used to use
> it. And I don't see anything terribly wrong with removing the extra
> call in this path here. So need be, I'll offer my Reviewed-by. But I
> wanted to get an understanding of the fuller story first.
>

Given that we already have both arch_get_random_seed_long() and
arch_get_random_long(), I think it is reasonable for the former to be
allowed to be slightly more expensive, and we should only invoke it
for the purpose of reseeding a pseudo-RNG. If this occurs on a hot
path, there is something terribly wrong already, so I don't think
RDRAND/RDSEED performance should be a huge concern here.

Note that once this patch lands, we also intend to change the current
way the arm64 RNDR and RNDRRS instructions are mapped to
arch_get_random_seed_long() and arch_get_random_long(). (RNDR returns
64-bit from a DRBG that reseeds an an implementation defined rate, and
RNDRRS does the same but forces a reseed to occur first)

Thanks,
Ard.

2021-01-04 19:12:28

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Tue, 17 Nov 2020 at 14:33, Ard Biesheuvel <[email protected]> wrote:
>
> On Wed, 11 Nov 2020 at 09:19, Ard Biesheuvel <[email protected]> wrote:
> >
> > (+ Eric)
> >
> > On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > When reseeding the CRNG periodically, arch_get_random_seed_long() is
> > > called to obtain entropy from an architecture specific source if one
> > > is implemented. In most cases, these are special instructions, but in
> > > some cases, such as on ARM, we may want to back this using firmware
> > > calls, which are considerably more expensive.
> > >
> > > Another call to arch_get_random_seed_long() exists in the CRNG driver,
> > > in add_interrupt_randomness(), which collects entropy by capturing
> > > inter-interrupt timing and relying on interrupt jitter to provide
> > > random bits. This is done by keeping a per-CPU state, and mixing in
> > > the IRQ number, the cycle counter and the return address every time an
> > > interrupt is taken, and mixing this per-CPU state into the entropy pool
> > > every 64 invocations, or at least once per second. The entropy that is
> > > gathered this way is credited as 1 bit of entropy. Every time this
> > > happens, arch_get_random_seed_long() is invoked, and the result is
> > > mixed in as well, and also credited with 1 bit of entropy.
> > >
> > > This means that arch_get_random_seed_long() is called at least once
> > > per second on every CPU, which seems excessive, and doesn't really
> > > scale, especially in a virtualization scenario where CPUs may be
> > > oversubscribed: in cases where arch_get_random_seed_long() is backed
> > > by an instruction that actually goes back to a shared hardware entropy
> > > source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> > > times per second.
> > >
> > > So let's drop the call to arch_get_random_seed_long() from
> > > add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> > > the arch hook to get random seed material from the platform.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > ---
> > > drivers/char/random.c | 15 +--------------
> > > 1 file changed, 1 insertion(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > index 2a41b21623ae..a9c393c1466d 100644
> > > --- a/drivers/char/random.c
> > > +++ b/drivers/char/random.c
> > > @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > > cycles_t cycles = random_get_entropy();
> > > __u32 c_high, j_high;
> > > __u64 ip;
> > > - unsigned long seed;
> > > - int credit = 0;
> > >
> > > if (cycles == 0)
> > > cycles = get_reg(fast_pool, regs);
> > > @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > >
> > > fast_pool->last = now;
> > > __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> > > -
> > > - /*
> > > - * If we have architectural seed generator, produce a seed and
> > > - * add it to the pool. For the sake of paranoia don't let the
> > > - * architectural seed generator dominate the input from the
> > > - * interrupt noise.
> > > - */
> > > - if (arch_get_random_seed_long(&seed)) {
> > > - __mix_pool_bytes(r, &seed, sizeof(seed));
> > > - credit = 1;
> > > - }
> > > spin_unlock(&r->lock);
> > >
> > > fast_pool->count = 0;
> > >
> > > /* award one bit for the contents of the fast pool */
> > > - credit_entropy_bits(r, credit + 1);
> > > + credit_entropy_bits(r, 1);
> > > }
> > > EXPORT_SYMBOL_GPL(add_interrupt_randomness);
> > >
> > > --
> > > 2.17.1
> > >
>
> Ping?

Ping?

2021-01-10 09:51:57

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Mon, 4 Jan 2021 at 20:09, Ard Biesheuvel <[email protected]> wrote:
>
> On Tue, 17 Nov 2020 at 14:33, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Wed, 11 Nov 2020 at 09:19, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > (+ Eric)
> > >
> > > On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <[email protected]> wrote:
> > > >
> > > > When reseeding the CRNG periodically, arch_get_random_seed_long() is
> > > > called to obtain entropy from an architecture specific source if one
> > > > is implemented. In most cases, these are special instructions, but in
> > > > some cases, such as on ARM, we may want to back this using firmware
> > > > calls, which are considerably more expensive.
> > > >
> > > > Another call to arch_get_random_seed_long() exists in the CRNG driver,
> > > > in add_interrupt_randomness(), which collects entropy by capturing
> > > > inter-interrupt timing and relying on interrupt jitter to provide
> > > > random bits. This is done by keeping a per-CPU state, and mixing in
> > > > the IRQ number, the cycle counter and the return address every time an
> > > > interrupt is taken, and mixing this per-CPU state into the entropy pool
> > > > every 64 invocations, or at least once per second. The entropy that is
> > > > gathered this way is credited as 1 bit of entropy. Every time this
> > > > happens, arch_get_random_seed_long() is invoked, and the result is
> > > > mixed in as well, and also credited with 1 bit of entropy.
> > > >
> > > > This means that arch_get_random_seed_long() is called at least once
> > > > per second on every CPU, which seems excessive, and doesn't really
> > > > scale, especially in a virtualization scenario where CPUs may be
> > > > oversubscribed: in cases where arch_get_random_seed_long() is backed
> > > > by an instruction that actually goes back to a shared hardware entropy
> > > > source (such as RNDRRS on ARM), we will end up hitting it hundreds of
> > > > times per second.
> > > >
> > > > So let's drop the call to arch_get_random_seed_long() from
> > > > add_interrupt_randomness(), and instead, rely on crng_reseed() to call
> > > > the arch hook to get random seed material from the platform.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > > ---
> > > > drivers/char/random.c | 15 +--------------
> > > > 1 file changed, 1 insertion(+), 14 deletions(-)
> > > >
> > > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > > index 2a41b21623ae..a9c393c1466d 100644
> > > > --- a/drivers/char/random.c
> > > > +++ b/drivers/char/random.c
> > > > @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > > > cycles_t cycles = random_get_entropy();
> > > > __u32 c_high, j_high;
> > > > __u64 ip;
> > > > - unsigned long seed;
> > > > - int credit = 0;
> > > >
> > > > if (cycles == 0)
> > > > cycles = get_reg(fast_pool, regs);
> > > > @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
> > > >
> > > > fast_pool->last = now;
> > > > __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> > > > -
> > > > - /*
> > > > - * If we have architectural seed generator, produce a seed and
> > > > - * add it to the pool. For the sake of paranoia don't let the
> > > > - * architectural seed generator dominate the input from the
> > > > - * interrupt noise.
> > > > - */
> > > > - if (arch_get_random_seed_long(&seed)) {
> > > > - __mix_pool_bytes(r, &seed, sizeof(seed));
> > > > - credit = 1;
> > > > - }
> > > > spin_unlock(&r->lock);
> > > >
> > > > fast_pool->count = 0;
> > > >
> > > > /* award one bit for the contents of the fast pool */
> > > > - credit_entropy_bits(r, credit + 1);
> > > > + credit_entropy_bits(r, 1);
> > > > }
> > > > EXPORT_SYMBOL_GPL(add_interrupt_randomness);
> > > >
> > > > --
> > > > 2.17.1
> > > >
> >
> > Ping?
>
> Ping?

Ping again?

Ted,

Acceptance of this patch is now blocking RNG related work that is in
flight for ARM and arm64. [0]

So please shout if you have any objections to this patch, or if you
don't, please ack it so it can be taken through one of the ARM trees.

Thanks,
Ard.

[0] https://lore.kernel.org/kvmarm/[email protected]/

2021-01-15 13:27:31

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

In case it helps,

Reviewed-by: Jason A. Donenfeld <[email protected]>

[resending, as my mail server got blocked from vger for a week and my
message bounced]

2021-01-21 17:59:48

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

On Thu, 5 Nov 2020 16:29:44 +0100, Ard Biesheuvel wrote:
> When reseeding the CRNG periodically, arch_get_random_seed_long() is
> called to obtain entropy from an architecture specific source if one
> is implemented. In most cases, these are special instructions, but in
> some cases, such as on ARM, we may want to back this using firmware
> calls, which are considerably more expensive.
>
> Another call to arch_get_random_seed_long() exists in the CRNG driver,
> in add_interrupt_randomness(), which collects entropy by capturing
> inter-interrupt timing and relying on interrupt jitter to provide
> random bits. This is done by keeping a per-CPU state, and mixing in
> the IRQ number, the cycle counter and the return address every time an
> interrupt is taken, and mixing this per-CPU state into the entropy pool
> every 64 invocations, or at least once per second. The entropy that is
> gathered this way is credited as 1 bit of entropy. Every time this
> happens, arch_get_random_seed_long() is invoked, and the result is
> mixed in as well, and also credited with 1 bit of entropy.
>
> [...]

Applied to arm64 (for-next/random), thanks!

[1/1] random: avoid arch_get_random_seed_long() when collecting IRQ randomness
https://git.kernel.org/arm64/c/390596c9959c

Ted -- please shout if you would prefer me not to carry this in the arm64
tree. I've haven't seen a response from you on this thread, but this patch
is currently blocking support for the TRNG firmware call on arm64 [1], so
I've pulled it in as a dependency. The branch above is stable, so you can
pull it in as well if necessary.

[1] https://lore.kernel.org/r/[email protected]

Cheers,
--
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev