2014-02-02 20:56:13

by Jörn Engel

[permalink] [raw]
Subject: [PATCH,RFC] random: collect cpu randomness

Collects entropy from random behaviour all modern cpus exhibit. The
scheduler and slab allocator are instrumented for this purpose. How
much randomness can be gathered is clearly hardware-dependent and hard
to estimate. Therefore the entropy estimate is zero, but random bits
still get mixed into the pools.

Performance overhead seems low. I did 20 kernel compiles with
allnoconfig, everything precompiled and warm caches. Difference was in
the noise and on average performance was actually better when collecting
randomness.

To judge the benefits I ran this in kvm with instrumentation to print
out the various input values. Entropy was estimated by booting ten
times, collecting the output for each boot, doing a diff between any two
and counting the diff hunks. Numbers like "46 .. 215" means the two
most similar runs had 46 hunks, the two most dissimilar ones had 215.

Running kvm with -smp 4 gave the following results:
slab+sched slab sched
input[0] 46 .. 215 202 .. 342 0 .. 0
input[1] 76 .. 180 0 .. 4 0 .. 0
input[2] 147 .. 388 4 .. 44 286 .. 464
input[3] 56 .. 185 0 .. 2 9 .. 40
jiffies 8 .. 67 10 .. 28 19 .. 50
caller 114 .. 306 25 .. 178 219 .. 422
val 54 .. 254 15 .. 106 175 .. 321
&input 50 .. 246 0 .. 59 178 .. 387

First column collected entropy from both the slab allocator and the
scheduler, second and third only collected from one source. I only used
the first 512 values per cpu. Therefore the combined numbers can be
lower than the individual ones - there was more entropy collected, it
was just swamped by non-entropy.

Lots of entropy gets collected and about half of it stems from the
uninitialized valued of input[] on the stack.

Rerunning only the first column with -smp 1 was more sobering:
slab+sched
input[0] 0 .. 13
input[1] 0 .. 13
input[2] 0 .. 14
input[3] 0 .. 11
jiffies 2 .. 22
caller 0 .. 19
val 0 .. 16
&input 0 .. 4

Every single input value contains entropy in some runs, but the only one
that was an entropy guarantee was jiffies. In other words, a
low-precision clock. This clearly shows how important it is to have a
high-precision clock, which would gather significantly more entropy.

Measuring the randomness from random_get_entropy() with above approach
failed because there was so much randomness. All numbers in all runs
were different. Taking the delta between the numbers, again almost all
numbers were different with at most 1 identical delta per 1000.
Compared to a high-precision clock, no other input comes within two
orders of magnitude.

An interesting aside is that the cpu actually functions as a
high-precision clock. This partially explains why the -smp 4 number are
so much better - any drift between the four cpu-clocks gets turned into
randomness. The other explanation is that kvm is actually collecting
randomness from the host system. I tried to minimize that effect by not
touching my notebook while running these tests, but a hardware test box
would clearly yield more meaningful results.

I invite others to also test whether the patch collects useful
randomness or causes a measureable performance loss on any hardware or
workload. Preferrably not using my methods, in order to avoid
systematic errors.

Signed-off-by: Joern Engel <[email protected]>
---
drivers/char/random.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/random.h | 2 ++
kernel/sched/core.c | 1 +
mm/slab.c | 1 +
4 files changed, 65 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 429b75bb60e8..693dea730a3e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -587,6 +587,30 @@ static void fast_mix(struct fast_pool *f, __u32 input[4])
}

/*
+ * An even faster variant of fast_mix, albeit with worse mixing.
+ */
+static void weak_mix(struct fast_pool *f, __u32 input[4])
+{
+ __u32 acc, carry = f->pool[3] >> 25;
+
+ acc = (f->pool[0] << 7) ^ input[0] ^ carry;
+ carry = f->pool[0] >> 25;
+ f->pool[0] = acc;
+
+ acc = (f->pool[1] << 7) ^ input[1] ^ carry;
+ carry = f->pool[1] >> 25;
+ f->pool[1] = acc;
+
+ acc = (f->pool[2] << 7) ^ input[2] ^ carry;
+ carry = f->pool[2] >> 25;
+ f->pool[2] = acc;
+
+ acc = (f->pool[3] << 7) ^ input[3] ^ carry;
+ //carry = f->pool[3] >> 25;
+ f->pool[3] = acc;
+}
+
+/*
* Credit (or debit) the entropy store with n bits of entropy.
* Use credit_entropy_bits_safe() if the value comes from userspace
* or otherwise should be checked for extreme values.
@@ -833,6 +857,43 @@ void add_input_randomness(unsigned int type, unsigned int code,
}
EXPORT_SYMBOL_GPL(add_input_randomness);

+static DEFINE_PER_CPU(struct fast_pool, cpu_randomness);
+
+void __add_cpu_randomness(void *caller, void *val)
+{
+ struct entropy_store *r;
+ struct fast_pool *fast_pool = &__get_cpu_var(cpu_randomness);
+ unsigned long now = jiffies;
+ __u32 input[4], cycles = random_get_entropy();
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wuninitialized"
+ input[0] ^= cycles ^ jiffies;
+ input[1] ^= (unsigned long)caller;
+ input[2] ^= (unsigned long)val;
+ input[3] ^= (unsigned long)&input;
+#pragma GCC diagnostic pop
+
+ weak_mix(fast_pool, input);
+
+ if ((fast_pool->count++ & 1023) &&
+ !time_after(now, fast_pool->last + HZ))
+ return;
+
+ fast_pool->last = now;
+ fast_pool->count = 1;
+
+ r = nonblocking_pool.initialized ? &input_pool : &nonblocking_pool;
+ __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool), NULL);
+}
+
+void add_cpu_randomness(void *caller, void *val)
+{
+ preempt_disable();
+ __add_cpu_randomness(caller, val);
+ preempt_enable();
+}
+
static DEFINE_PER_CPU(struct fast_pool, irq_randomness);

void add_interrupt_randomness(int irq, int irq_flags)
diff --git a/include/linux/random.h b/include/linux/random.h
index 4002b3df4c85..ce0ccdcd1d63 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -12,6 +12,8 @@
extern void add_device_randomness(const void *, unsigned int);
extern void add_input_randomness(unsigned int type, unsigned int code,
unsigned int value);
+extern void __add_cpu_randomness(void *caller, void *val);
+extern void add_cpu_randomness(void *caller, void *val);
extern void add_interrupt_randomness(int irq, int irq_flags);

extern void get_random_bytes(void *buf, int nbytes);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a88f4a485c5e..7af6389f9b9e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2511,6 +2511,7 @@ need_resched:
rq = cpu_rq(cpu);
rcu_note_context_switch(cpu);
prev = rq->curr;
+ __add_cpu_randomness(__builtin_return_address(1), prev);

schedule_debug(prev);

diff --git a/mm/slab.c b/mm/slab.c
index eb043bf05f4c..ea5a30d44ad1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3587,6 +3587,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
trace_kmalloc(caller, ret,
size, cachep->size, flags);

+ add_cpu_randomness(__builtin_return_address(2), ret);
return ret;
}

--
1.8.5.2


2014-02-02 21:34:27

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

Am Sonntag, 2. Februar 2014, 15:36:17 schrieb J?rn Engel:

Hi J?rn,

> Collects entropy from random behaviour all modern cpus exhibit. The
> scheduler and slab allocator are instrumented for this purpose. How
> much randomness can be gathered is clearly hardware-dependent and hard
> to estimate. Therefore the entropy estimate is zero, but random bits
> still get mixed into the pools.

May I ask what the purpose of the patches is when no entropy is implied? I see
that the pool is stirred more. But is that really a problem that needs
addressing?

Please, do not get me wrong with the presented critisism here -- the approach
in general looks interesting.

However, the following patches makes me wonder big time.

> extern void get_random_bytes(void *buf, int nbytes);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a88f4a485c5e..7af6389f9b9e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2511,6 +2511,7 @@ need_resched:
> rq = cpu_rq(cpu);
> rcu_note_context_switch(cpu);
> prev = rq->curr;
> + __add_cpu_randomness(__builtin_return_address(1), prev);
>
> schedule_debug(prev);
>
> diff --git a/mm/slab.c b/mm/slab.c
> index eb043bf05f4c..ea5a30d44ad1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3587,6 +3587,7 @@ static __always_inline void *__do_kmalloc(size_t size,
> gfp_t flags, trace_kmalloc(caller, ret,
> size, cachep->size, flags);
>
> + add_cpu_randomness(__builtin_return_address(2), ret);
> return ret;
> }

First, the noise source you add is constantly triggered throughout the
execution of the kernel. Entropy is very important, we (who are interested in
crypto) know that. But how often is entropy needed? Other folks wonder about
the speed of the kernel. And with these two patches, every kmalloc and every
scheduling invocation now dives into the random.c code to do something. I
would think this is a bit expensive, especially to stir the pool without
increasing the entropy estimator. I think entropy collection should be
performed when it is needed and not throughout the lifetime of the system.

Second, when I offered my initial patch which independently collects some
entropy on the CPU execution timing, I got shot down with one concern raised
by Ted, and that was about whether a user can influence the entropy collection
process. When I am trying to measure CPU execution timing in the RNG, the
concern was raised that the measured timing variations was due to CPU states
that were influenced by users. Your patch here clearly hooks into code paths
which are definitely affected by user actions. So, this patch therefore would
be subject to the same concerns. I personally think that this is not so much
an issue, yet it was raised previously.

It seems I have a bad timing, because just two days ago I released a new
attempt on the CPU jitter RNG [1] with a new noise source, and I was just
about to prepare a release email. With that attempt, both issues raised above
are addressed, including a theoretical foundation of the noise source.

[1] http://www.chronox.de/

Ciao
Stephan
--
| Cui bono? |

2014-02-03 01:23:23

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Sun, 2 February 2014 22:25:31 +0100, Stephan Mueller wrote:
> Am Sonntag, 2. Februar 2014, 15:36:17 schrieb Jörn Engel:
>
> > Collects entropy from random behaviour all modern cpus exhibit. The
> > scheduler and slab allocator are instrumented for this purpose. How
> > much randomness can be gathered is clearly hardware-dependent and hard
> > to estimate. Therefore the entropy estimate is zero, but random bits
> > still get mixed into the pools.
>
> May I ask what the purpose of the patches is when no entropy is implied? I see
> that the pool is stirred more. But is that really a problem that needs
> addressing?

For my part, I think the whole business of estimating entropy is
bordering on the esoteric. If the hash on the output side is any
good, you have a completely unpredictable prng once the entropy pool
is unpredictable. Additional random bits are nice, but not all that
useful. Blocking /dev/random based on entropy estimates is likewise
not all that useful.

Key phrase is "once the entropy pool is unpredictable". So early in
bootup it may make sense to estimate the entropy. But here the
problem is that you cannot measure entropy, at least not within a
single system and a reasonable amount of time. That leaves you with a
heuristic that, like all heuristics, is wrong.

I personally care more about generating high-quality randomness as
soon as possible and with low cost to the system. Feel free to
disagree or set your priorities differently.

> Please, do not get me wrong with the presented critisism here -- the approach
> in general looks interesting.
>
> However, the following patches makes me wonder big time.
>
> > extern void get_random_bytes(void *buf, int nbytes);
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index a88f4a485c5e..7af6389f9b9e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2511,6 +2511,7 @@ need_resched:
> > rq = cpu_rq(cpu);
> > rcu_note_context_switch(cpu);
> > prev = rq->curr;
> > + __add_cpu_randomness(__builtin_return_address(1), prev);
> >
> > schedule_debug(prev);
> >
> > diff --git a/mm/slab.c b/mm/slab.c
> > index eb043bf05f4c..ea5a30d44ad1 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3587,6 +3587,7 @@ static __always_inline void *__do_kmalloc(size_t size,
> > gfp_t flags, trace_kmalloc(caller, ret,
> > size, cachep->size, flags);
> >
> > + add_cpu_randomness(__builtin_return_address(2), ret);
> > return ret;
> > }
>
> First, the noise source you add is constantly triggered throughout the
> execution of the kernel. Entropy is very important, we (who are interested in
> crypto) know that. But how often is entropy needed? Other folks wonder about
> the speed of the kernel. And with these two patches, every kmalloc and every
> scheduling invocation now dives into the random.c code to do something. I
> would think this is a bit expensive, especially to stir the pool without
> increasing the entropy estimator. I think entropy collection should be
> performed when it is needed and not throughout the lifetime of the system.

Please measure how expensive it really is. My measurement gave me a
"doesn't matter" result, surprising as it may seem.

If the cost actually matters, we can either disable or rate-limit the
randomness collection at some point after boot. But that would bring
us back into the estimation business.

> Second, when I offered my initial patch which independently collects some
> entropy on the CPU execution timing, I got shot down with one concern raised
> by Ted, and that was about whether a user can influence the entropy collection
> process. When I am trying to measure CPU execution timing in the RNG, the
> concern was raised that the measured timing variations was due to CPU states
> that were influenced by users. Your patch here clearly hooks into code paths
> which are definitely affected by user actions. So, this patch therefore would
> be subject to the same concerns. I personally think that this is not so much
> an issue, yet it was raised previously.

The nice thing about the random pool is that mixing any amount of
deterministic data into it does not diminish the randomness already in
it. Given that attribute, I don't understand the concern.

> It seems I have a bad timing, because just two days ago I released a new
> attempt on the CPU jitter RNG [1] with a new noise source, and I was just
> about to prepare a release email. With that attempt, both issues raised above
> are addressed, including a theoretical foundation of the noise source.
>
> [1] http://www.chronox.de/

I am not married to my patch. If the approach makes sense, let's
merge it. If the approach does not make sense or there is a better
alternative, drop it on the floor.

The problem I see with your approach is this:
"The only prerequisite is the availability of a high-resolution timer
that is available in modern CPUs."

Given a modern CPU with a high-resolution timer, you will almost
certainly collect enough randomness for good random numbers. Problem
solved and additional improvements are useless.

But on embedded systems with less modern CPUs, few interrupt sources,
no user interface, etc. you may have trouble collecting enough
randomness or doing it soon enough. That is the problem worth fixing.
It is also a hard problem to fix and I am not entirely convinced I
found a good approach.

Jörn

--
It's just what we asked for, but not what we want!
-- anonymous

2014-02-03 01:29:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On 02/02/2014 05:24 PM, Jörn Engel wrote:
>
> For my part, I think the whole business of estimating entropy is
> bordering on the esoteric. If the hash on the output side is any
> good, you have a completely unpredictable prng once the entropy pool
> is unpredictable. Additional random bits are nice, but not all that
> useful. Blocking /dev/random based on entropy estimates is likewise
> not all that useful.
>
> Key phrase is "once the entropy pool is unpredictable". So early in
> bootup it may make sense to estimate the entropy. But here the
> problem is that you cannot measure entropy, at least not within a
> single system and a reasonable amount of time. That leaves you with a
> heuristic that, like all heuristics, is wrong.
>

The entropy bound needs to be a conservative lower bound. Its main use
is to provide backpressure (should we spend more CPU time producing
entropy) although the forward pressure on /dev/random is potentially
useful for high security applications.

This does NOT mean that zero-credit entropy generation is useless, far
from it. It just means that we are doing it on an "it can't hurt"
basis, rather than "I know for sure that this is valuable."

-hpa

2014-02-03 01:39:41

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Sun, Feb 02, 2014 at 10:25:31PM +0100, Stephan Mueller wrote:
> Second, when I offered my initial patch which independently collects some
> entropy on the CPU execution timing, I got shot down with one concern raised
> by Ted, and that was about whether a user can influence the entropy collection
> process.

Um, that wasn't my concern. After all, when we sample keyboard timing
while trying to generate a GPG key, of course the user can and does
influence the entropy collection process.

The question is whether an attacker who has deep knowledge of the how
the CPU works internally, perhaps made worse with quantization effects
(i.e., it doesn't matter if analog-generated settling time is measured
in microseconds if the output is being clocked out in milliseconds),
such that it is predictable.

I really like J?rn's tests doing repeated boot testing and observing
on a SMP system, the slab allocation pattern is quite deterministic.
So even though the numbers might *look* random, an attacker with deep
knowledge of how the kernel was compiled and what memory allocations
get done during the boot sequence would be able to quite successfuly
measure it.

I'm guessing that indeed, on a 4-CPU KVM system, what you're measuring
is the when the host OS happens to be scheduling the KVM threads, with
some variability caused by external networking interrupts, etc. It
would definitely be a good idea to retry that experiment on a real
4-CPU system to see what sort of results you might get. It might very
well be that the attacker who knows the relative ordering of the
slab/thread activations but for which it's not entirely clear whether
one cpu will be ahead of another, that there is *some* entropy, but
perhaps only a handful bits. It's the fact that we can't be sure how
much uncertainty there might be with an attacker with very deep
knowledge the CPU which is why J?rn's conservatism of not crediting
the entropy counter is quite understandable.

Of course, this doesn't help someone who is trying to speed up the
time it takes GPG to generate a new key pair. But in terms of
improving /dev/urandom as it is used by many crypto applications, it
certainly can't hurt.

The real question is how much overhead does it add, and is it worth
it. J?rn, I take it that was the reason for creating an even faster,
but weaker mixing function? Was the existing "fast mix" causing a
measurable overhead, or was this your just being really paranoid about
not adding anything to the various kernel fastpaths?

- Ted

2014-02-03 03:34:52

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Sun, 2 February 2014 20:39:22 -0500, Theodore Ts'o wrote:
>
> The real question is how much overhead does it add, and is it worth
> it. Jörn, I take it that was the reason for creating an even faster,
> but weaker mixing function? Was the existing "fast mix" causing a
> measurable overhead, or was this your just being really paranoid about
> not adding anything to the various kernel fastpaths?

It was paranoia. And I am still somewhat paranoid and don't trust my
benchmark results yet. Maybe on an 1024-CPU Altix with a 100k-thread
workload the overhead is too much. Just because I couldn't measure a
difference on my wimpy notebook does not mean much.

Jörn

--
One of the painful things about our time is that those who feel certainty
are stupid, and those with any imagination and understanding are filled
with doubt and indecision.
-- Bertrand Russell

2014-02-03 13:06:22

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

Am Sonntag, 2. Februar 2014, 20:39:22 schrieb Theodore Ts'o:

Hi Theodore,

>On Sun, Feb 02, 2014 at 10:25:31PM +0100, Stephan Mueller wrote:
>> Second, when I offered my initial patch which independently collects
>> some entropy on the CPU execution timing, I got shot down with one
>> concern raised by Ted, and that was about whether a user can
>> influence the entropy collection process.
>
>Um, that wasn't my concern. After all, when we sample keyboard timing
>while trying to generate a GPG key, of course the user can and does
>influence the entropy collection process.

Thank you for clarifying and sorry that I misunderstood you.

>I really like J?rn's tests doing repeated boot testing and observing
>on a SMP system, the slab allocation pattern is quite deterministic.
>So even though the numbers might *look* random, an attacker with deep
>knowledge of how the kernel was compiled and what memory allocations
>get done during the boot sequence would be able to quite successfuly
>measure it.
>
>I'm guessing that indeed, on a 4-CPU KVM system, what you're measuring

Please let me point out that I am not testing on KVM, but on Linux
running natively on the hardware. All major CPUs were tested. Even tests
are executed on bare metal, i.e. without any OS and interrupts disabled.
But I will explain that in a separate email and do not want to hijack
this thread.

Ciao
Stephan

2014-02-03 13:33:54

by Thorsten Glaser

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

Theodore Ts'o dixit:

>I really like Jvrn's tests doing repeated boot testing and observing
>on a SMP system, the slab allocation pattern is quite deterministic.
>So even though the numbers might *look* random, an attacker with deep
>knowledge of how the kernel was compiled and what memory allocations

For this reason, we pre-initialise (one of) the RNG during kernel
compile time, using host randomness, in MirBSD. We’re also considering
pushing some additional seed using the bootloader, that can be mixed
in very very early. As an added benefit, everything in the kernel can
call arc4random(), arc4random_buf() and arc4random_uniform() without
worrying about initialisation, at all times. (This is mostly for the
places where it replaced random(), or which explicitly waited for the
regular RNG initialisation to be done, or which reseeded after that.)

In GNU/Linux, this could work like, GRUB will offer the contents of
/boot/grub/randseed.bin to the Linux kernel, which will add it into
the pool using the normal mechanisms, and (very?) early userspace
will then recreate that file (to prevent seed reuse, just like the
normal /var/db/host.random or /var/lib/urandom/random-seed processing
is done). Downside: write access to the boot medium needed. (No GRUB
modification needed, this could be passed as “faux kernel module”.)
Upside: this is way earlier than anything user space can do, and
e.g. early enough to affect things like kernel memory management.

bye,
//mirabilos
--
<Natureshadow> Ach, mach doch was du willst, du hast doch eh immer Recht!
<mirabilos> jupp ~/.etc/sig………
<Natureshadow> unfaßbar…
<Natureshadow> Mit Eszett sogar, unfaßbar!

2014-02-03 13:37:22

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

Am Sonntag, 2. Februar 2014, 20:24:21 schrieb J?rn Engel:

Hi J?rn,

>On Sun, 2 February 2014 22:25:31 +0100, Stephan Mueller wrote:
>> Am Sonntag, 2. Februar 2014, 15:36:17 schrieb J?rn Engel:
>> > Collects entropy from random behaviour all modern cpus exhibit.
>> > The
>> > scheduler and slab allocator are instrumented for this purpose.
>> > How
>> > much randomness can be gathered is clearly hardware-dependent and
>> > hard
>> > to estimate. Therefore the entropy estimate is zero, but random
>> > bits
>> > still get mixed into the pools.
>>
>> May I ask what the purpose of the patches is when no entropy is
>> implied? I see that the pool is stirred more. But is that really a
>> problem that needs addressing?
>
>For my part, I think the whole business of estimating entropy is
>bordering on the esoteric. If the hash on the output side is any
>good, you have a completely unpredictable prng once the entropy pool
>is unpredictable. Additional random bits are nice, but not all that
>useful. Blocking /dev/random based on entropy estimates is likewise
>not all that useful.

I really like that statement, because for the most part I concur :-)

However, there are a number of cryptographers out there, which insist on
such entropy assessments and even the blocking behavior. For example, I
work with cryptographers from the German BSI. We created a quantitative
assessment of /dev/random for them (see [1]). During the discussion, I
learned that the key reason they like /dev/random and dislike
/dev/urandom is the fact that /dev/random ensures that any output of
data is always backed by hardware entropy. In order to ensure that you
always have hardware entropy that backs your output, you somehow must
quantify that hardware entropy. Thus, a dropping of the entropy
estimation would be catastrophic for them.

When you look at NIST and the base discussions in SP800-90A, you see
that for deterministic RNGs, NIST is not that strict as BSI. Yet, they
require a DRNG to be reseeded with entropy after a (large) number of
generated bits. For that reseeding process, some entropy estimation is
needed. But when looking at SP800-90B, things get hairy again where some
strict entropy estimations are needed.

Even if you subscribe to the notion that an RNG only needs some X bits
of entropy for starters and then can spin indefinitely on this entropy,
there is still a need on estimating entropy, at least at the beginning.
>
>Key phrase is "once the entropy pool is unpredictable". So early in
>bootup it may make sense to estimate the entropy. But here the

I am fully in agreement here.

>problem is that you cannot measure entropy, at least not within a
>single system and a reasonable amount of time. That leaves you with a
>heuristic that, like all heuristics, is wrong.

No argument here :-)

(side note: The interesting thing is that the /dev/random heuristic on
entropy seems to underestimate the entropy present in events where the
heuristic assumes low entropy, but way overestimates entropy where the
heuristic entropy assumes high entropy)
>
>I personally care more about generating high-quality randomness as
>soon as possible and with low cost to the system. Feel free to
>disagree or set your priorities differently.

Fully in agreement

[..]
>> First, the noise source you add is constantly triggered throughout
>> the
>> execution of the kernel. Entropy is very important, we (who are
>> interested in crypto) know that. But how often is entropy needed?
>> Other folks wonder about the speed of the kernel. And with these two
>> patches, every kmalloc and every scheduling invocation now dives
>> into the random.c code to do something. I would think this is a bit
>> expensive, especially to stir the pool without increasing the
>> entropy estimator. I think entropy collection should be performed
>> when it is needed and not throughout the lifetime of the system.
>Please measure how expensive it really is. My measurement gave me a
>"doesn't matter" result, surprising as it may seem.

That sounds really good.
>
>If the cost actually matters, we can either disable or rate-limit the
>randomness collection at some point after boot. But that would bring
>us back into the estimation business.
>
>> It seems I have a bad timing, because just two days ago I released a
>> new attempt on the CPU jitter RNG [1] with a new noise source, and I
>> was just about to prepare a release email. With that attempt, both
>> issues raised above are addressed, including a theoretical
>> foundation of the noise source.
>>
>> [1] http://www.chronox.de/
>
>I am not married to my patch. If the approach makes sense, let's
>merge it. If the approach does not make sense or there is a better
>alternative, drop it on the floor.
>
>The problem I see with your approach is this:
>"The only prerequisite is the availability of a high-resolution timer
>that is available in modern CPUs."

Right, and with the absence of a high-resolution counter, my RNG breaks
down. Though, I have more goals than to just run my RNG inside the Linux
kernel and thus the reliance on only the timer.

However, during my testing on also embedded systems, a significant
number have them -- at least a counter that is integrated with the
clock_source framework.

Allow me to explain my RNG and the new developments in a separate email
to not hijack your thread here.

I think both have merits, considering that based on your statement, the
integration into schedule/kmalloc code paths is not that expensive.
>
>Given a modern CPU with a high-resolution timer, you will almost
>certainly collect enough randomness for good random numbers. Problem
>solved and additional improvements are useless.
>
>But on embedded systems with less modern CPUs, few interrupt sources,
>no user interface, etc. you may have trouble collecting enough
>randomness or doing it soon enough. That is the problem worth fixing.
>It is also a hard problem to fix and I am not entirely convinced I
>found a good approach.

I do not think that there can be a right approach given the variety of
systems Linux can run on. Though, the current random.c definitely is
challenged in embedded systems, but also on "normal" headless systems
with SSDs. Thus, any proposal of using new entropy sources is good.

[1]
https://www.bsi.bund.de/DE/Publikationen/Studien/LinuxRNG/index_htm.html

Ciao
Stephan

2014-02-03 15:49:45

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Sun, 2 February 2014 15:36:17 -0500, Jörn Engel wrote:
>
> Measuring the randomness from random_get_entropy() with above approach
> failed because there was so much randomness. All numbers in all runs
> were different. Taking the delta between the numbers, again almost all
> numbers were different with at most 1 identical delta per 1000.
> Compared to a high-precision clock, no other input comes within two
> orders of magnitude.

I think this is a key result from my tests. The best source is a
timer (counter) that is both
a) high-resolution and
b) asynchronous to the measurement event.

If the measurement event is an interrupt and the CPU has a
cycle-counter, you are set. On interesting systems lacking a
cycle-counter, we still have a high-resolution counter or sorts that
is the CPU itself.

Instruction pointer and stack pointer for both kernel and userland are
one way to read out the "counter". Main problem here are tight loops
where your "counter" is not high-resolution at all. But something
within the CPU is constantly changing. And that something tends to be
contained in the registers.

How about taking the saved registers from the interrupted CPU, xor'ing
them all and calling the result random_get_entropy() on systems
lacking a good cycles-counter?

Jörn

--
I can say that I spend most of my time fixing bugs even if I have lots
of new features to implement in mind, but I give bugs more priority.
-- Andrea Arcangeli, 2000

2014-02-03 16:37:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Mon, Feb 03, 2014 at 10:50:42AM -0500, J?rn Engel wrote:
> If the measurement event is an interrupt and the CPU has a
> cycle-counter, you are set. On interesting systems lacking a
> cycle-counter, we still have a high-resolution counter or sorts that
> is the CPU itself.
>
> Instruction pointer and stack pointer for both kernel and userland are
> one way to read out the "counter". Main problem here are tight loops
> where your "counter" is not high-resolution at all. But something
> within the CPU is constantly changing. And that something tends to be
> contained in the registers.
>
> How about taking the saved registers from the interrupted CPU, xor'ing
> them all and calling the result random_get_entropy() on systems
> lacking a good cycles-counter?

So we could take the struct pt_regs which we get from get_irq_regs(),
XOR them together and use them to feed into input[2] amd input[3] in
add_interrupt_randomness(). Or some other way of distributing the
values of all of the irq registers into the __u32 input[4] array.

That would probably be a good and useful thing to do. Was that
basically what you were suggesting?

- Ted

2014-02-03 18:48:03

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Mon, 3 February 2014 11:37:29 -0500, Theodore Ts'o wrote:
>
> So we could take the struct pt_regs which we get from get_irq_regs(),
> XOR them together and use them to feed into input[2] amd input[3] in
> add_interrupt_randomness(). Or some other way of distributing the
> values of all of the irq registers into the __u32 input[4] array.
>
> That would probably be a good and useful thing to do. Was that
> basically what you were suggesting?

Yes.

Jörn

--
When you copy some code, you are supposed to read it. If nothing else,
there's a chance to spot and fix an obvious bug instead of sharing it...
-- Al Viro

2014-02-03 22:01:23

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Mon, 3 Feb 2014, Theodore Ts'o wrote:

> > How about taking the saved registers from the interrupted CPU, xor'ing
> > them all and calling the result random_get_entropy() on systems
> > lacking a good cycles-counter?
>
> So we could take the struct pt_regs which we get from get_irq_regs(),
> XOR them together and use them to feed into input[2] amd input[3] in
> add_interrupt_randomness(). Or some other way of distributing the
> values of all of the irq registers into the __u32 input[4] array.

Can we be sure we don't leak information this way? Just being
paranoid...

Maciej

2014-02-03 22:44:57

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Mon, Feb 03, 2014 at 09:54:22PM +0000, Maciej W. Rozycki wrote:
>
> Can we be sure we don't leak information this way? Just being
> paranoid...

The register information will be mixed pretty thoroughly by the time
it gets to the entropy pool, and then we don't ever expose the entropy
pool to userspace. So no, I don't think we need to worry about
leaking information.

- Ted

2014-02-06 22:20:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

Hi J?rn,

On Sun, Feb 02, 2014 at 03:36:17PM -0500, J?rn Engel wrote:
> Collects entropy from random behaviour all modern cpus exhibit. The
> scheduler and slab allocator are instrumented for this purpose. How
> much randomness can be gathered is clearly hardware-dependent and hard
> to estimate. Therefore the entropy estimate is zero, but random bits
> still get mixed into the pools.

Have you seen this work from PaX Team?

http://grsecurity.net/pipermail/grsecurity/2012-July/001093.html

See http://grsecurity.net/test/grsecurity-3.0-3.13.1-201402052349.patch
and search for PAX_LATENT_ENTROPY.

-Kees

--
Kees Cook @outflux.net

2014-02-06 22:22:00

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Thu, Feb 6, 2014 at 5:20 PM, Kees Cook <[email protected]> wrote:
> Hi J?rn,
>
> On Sun, Feb 02, 2014 at 03:36:17PM -0500, J?rn Engel wrote:
>> Collects entropy from random behaviour all modern cpus exhibit. The
>> scheduler and slab allocator are instrumented for this purpose. How
>> much randomness can be gathered is clearly hardware-dependent and hard
>> to estimate. Therefore the entropy estimate is zero, but random bits
>> still get mixed into the pools.
>
> Have you seen this work from PaX Team?
>
> http://grsecurity.net/pipermail/grsecurity/2012-July/001093.html
>
> See http://grsecurity.net/test/grsecurity-3.0-3.13.1-201402052349.patch
> and search for PAX_LATENT_ENTROPY.

The hardware rng world just got easier with the "hashlet".

https://plus.google.com/u/0/107942175615993706558/posts/4iq6W524SxL

Kernel driver wanted...

> -Kees
>
> --
> Kees Cook @outflux.net



--
Dave T?ht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

2014-02-07 07:43:53

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

On Thu, 6 February 2014 14:20:02 -0800, Kees Cook wrote:
> On Sun, Feb 02, 2014 at 03:36:17PM -0500, Jörn Engel wrote:
> > Collects entropy from random behaviour all modern cpus exhibit. The
> > scheduler and slab allocator are instrumented for this purpose. How
> > much randomness can be gathered is clearly hardware-dependent and hard
> > to estimate. Therefore the entropy estimate is zero, but random bits
> > still get mixed into the pools.
>
> Have you seen this work from PaX Team?
>
> http://grsecurity.net/pipermail/grsecurity/2012-July/001093.html

Interesting.

> See http://grsecurity.net/test/grsecurity-3.0-3.13.1-201402052349.patch
> and search for PAX_LATENT_ENTROPY.

Server gives me an error. Archive.org doesn't have a copy either,
thanks to robots.txt.

Can you send me a copy via mail?

Jörn

--
Functionality is an asset, but code is a liability.
--Ted Dziuba

2014-02-20 09:50:29

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH,RFC] random: collect cpu randomness

Il 02/02/2014 21:36, Jörn Engel ha scritto:
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wuninitialized"
> + input[0] ^= cycles ^ jiffies;
> + input[1] ^= (unsigned long)caller;
> + input[2] ^= (unsigned long)val;
> + input[3] ^= (unsigned long)&input;
> +#pragma GCC diagnostic pop

Your tests demonstrate that this works, and presumably you have checked
the assembly too. Still, this is invoking undefined behavior and the
compiler could justifiably change those "^=" to "=".

An "asm" would be a safer way to convince the compiler that input[] is
now initialized:

asm volatile ("" :
"=m" (input[0]), "=m" (input[1]),
"=m" (input[2]), "=m" (input[3]));

and *really* XOR the values into the contents of the stack.

Of course the compiler could still have a "feature" where it
pre-initializes the whole stack frame with some kind of canary, but that
would be a problem even with your version of the code.

Paolo