2007-08-16 02:49:27

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH] Make rcutorture RNG use temporal entropy

Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.

The locking used by get_random_bytes() can conflict with the
preempt_disable() and synchronize_sched() form of RCU. This patch changes
rcutorture's RNG to gather entropy from the new cpu_clock() interface
(relying on interrupts, preemption, daemons, and rcutorture's reader
thread's rock-bottom scheduling priority to provide useful entropy),
and also adds and EXPORT_SYMBOL_GPL() to make that interface available
to GPLed kernel modules such as rcutorture.

Passes several hours of rcutorture.

Signed-off-by: Paul E. McKenney <[email protected]>
---

rcutorture.c | 8 ++------
sched.c | 2 ++
2 files changed, 4 insertions(+), 6 deletions(-)

diff -urpNa -X dontdiff linux-2.6.23-rc2/kernel/rcutorture.c linux-2.6.23-rc2-rcutorturesched/kernel/rcutorture.c
--- linux-2.6.23-rc2/kernel/rcutorture.c 2007-08-03 19:49:55.000000000 -0700
+++ linux-2.6.23-rc2-rcutorturesched/kernel/rcutorture.c 2007-08-10 17:15:22.000000000 -0700
@@ -42,7 +42,6 @@
#include <linux/notifier.h>
#include <linux/freezer.h>
#include <linux/cpu.h>
-#include <linux/random.h>
#include <linux/delay.h>
#include <linux/byteorder/swabb.h>
#include <linux/stat.h>
@@ -166,16 +165,13 @@ struct rcu_random_state {

/*
* Crude but fast random-number generator. Uses a linear congruential
- * generator, with occasional help from get_random_bytes().
+ * generator, with occasional help from cpu_clock().
*/
static unsigned long
rcu_random(struct rcu_random_state *rrsp)
{
- long refresh;
-
if (--rrsp->rrs_count < 0) {
- get_random_bytes(&refresh, sizeof(refresh));
- rrsp->rrs_state += refresh;
+ rrsp->rrs_state += (unsigned long)cpu_clock(smp_processor_id());
rrsp->rrs_count = RCU_RANDOM_REFRESH;
}
rrsp->rrs_state = rrsp->rrs_state * RCU_RANDOM_MULT + RCU_RANDOM_ADD;
diff -urpNa -X dontdiff linux-2.6.23-rc2/kernel/sched.c linux-2.6.23-rc2-rcutorturesched/kernel/sched.c
--- linux-2.6.23-rc2/kernel/sched.c 2007-08-03 19:49:55.000000000 -0700
+++ linux-2.6.23-rc2-rcutorturesched/kernel/sched.c 2007-08-10 17:22:57.000000000 -0700
@@ -394,6 +394,8 @@ unsigned long long cpu_clock(int cpu)
return now;
}

+EXPORT_SYMBOL_GPL(cpu_clock);
+
#ifdef CONFIG_FAIR_GROUP_SCHED
/* Change a task's ->cfs_rq if it moves across CPUs */
static inline void set_task_cfs_rq(struct task_struct *p)


2007-08-17 18:54:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Wed, 15 Aug 2007 19:49:04 -0700
"Paul E. McKenney" <[email protected]> wrote:

> Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
>
> The locking used by get_random_bytes() can conflict with the
> preempt_disable() and synchronize_sched() form of RCU. This patch changes
> rcutorture's RNG to gather entropy from the new cpu_clock() interface
> (relying on interrupts, preemption, daemons, and rcutorture's reader
> thread's rock-bottom scheduling priority to provide useful entropy),
> and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> to GPLed kernel modules such as rcutorture.
>
> Passes several hours of rcutorture.

Please explain what "conflict with" means so that I can work out if
this is a needed-in-2.6.23 change, thanks.

2007-08-17 20:00:36

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Fri, Aug 17, 2007 at 11:53:56AM -0700, Andrew Morton wrote:
> On Wed, 15 Aug 2007 19:49:04 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
> >
> > The locking used by get_random_bytes() can conflict with the
> > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > thread's rock-bottom scheduling priority to provide useful entropy),
> > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > to GPLed kernel modules such as rcutorture.
> >
> > Passes several hours of rcutorture.
>
> Please explain what "conflict with" means so that I can work out if
> this is a needed-in-2.6.23 change, thanks.

Not needed in 2.6.23. This change falls into the "preparation for -rt"
category. Also in the "don't unnecessarily eat entropy, leave some for
the people needing crypographically secure randomness" category.

Thanx, Paul

2007-08-23 18:08:37

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Fri, Aug 17, 2007 at 01:00:22PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 17, 2007 at 11:53:56AM -0700, Andrew Morton wrote:
> > On Wed, 15 Aug 2007 19:49:04 -0700
> > "Paul E. McKenney" <[email protected]> wrote:
> >
> > > Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
> > >
> > > The locking used by get_random_bytes() can conflict with the
> > > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > > thread's rock-bottom scheduling priority to provide useful entropy),
> > > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > > to GPLed kernel modules such as rcutorture.
> > >
> > > Passes several hours of rcutorture.
> >
> > Please explain what "conflict with" means so that I can work out if
> > this is a needed-in-2.6.23 change, thanks.
>
> Not needed in 2.6.23. This change falls into the "preparation for -rt"
> category. Also in the "don't unnecessarily eat entropy, leave some for
> the people needing crypographically secure randomness" category.

We've had several calls for a more fast and loose version of
get_random_bytes. Generalizing one of the cookie generation functions
is probably a good way to go.

--
Mathematics is the supreme nostalgia of our time.

2007-08-23 18:58:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Thu, Aug 23, 2007 at 01:06:58PM -0500, Matt Mackall wrote:
> On Fri, Aug 17, 2007 at 01:00:22PM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 17, 2007 at 11:53:56AM -0700, Andrew Morton wrote:
> > > On Wed, 15 Aug 2007 19:49:04 -0700
> > > "Paul E. McKenney" <[email protected]> wrote:
> > >
> > > > Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
> > > >
> > > > The locking used by get_random_bytes() can conflict with the
> > > > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > > > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > > > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > > > thread's rock-bottom scheduling priority to provide useful entropy),
> > > > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > > > to GPLed kernel modules such as rcutorture.
> > > >
> > > > Passes several hours of rcutorture.
> > >
> > > Please explain what "conflict with" means so that I can work out if
> > > this is a needed-in-2.6.23 change, thanks.
> >
> > Not needed in 2.6.23. This change falls into the "preparation for -rt"
> > category. Also in the "don't unnecessarily eat entropy, leave some for
> > the people needing crypographically secure randomness" category.
>
> We've had several calls for a more fast and loose version of
> get_random_bytes. Generalizing one of the cookie generation functions
> is probably a good way to go.

Are you thinking in terms of secure_tcp_syn_cookie(), or did you have
something else in mind?

Thanx, Paul

2007-08-23 19:40:19

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Thu, Aug 23, 2007 at 11:58:31AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 23, 2007 at 01:06:58PM -0500, Matt Mackall wrote:
> > On Fri, Aug 17, 2007 at 01:00:22PM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 17, 2007 at 11:53:56AM -0700, Andrew Morton wrote:
> > > > On Wed, 15 Aug 2007 19:49:04 -0700
> > > > "Paul E. McKenney" <[email protected]> wrote:
> > > >
> > > > > Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
> > > > >
> > > > > The locking used by get_random_bytes() can conflict with the
> > > > > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > > > > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > > > > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > > > > thread's rock-bottom scheduling priority to provide useful entropy),
> > > > > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > > > > to GPLed kernel modules such as rcutorture.
> > > > >
> > > > > Passes several hours of rcutorture.
> > > >
> > > > Please explain what "conflict with" means so that I can work out if
> > > > this is a needed-in-2.6.23 change, thanks.
> > >
> > > Not needed in 2.6.23. This change falls into the "preparation for -rt"
> > > category. Also in the "don't unnecessarily eat entropy, leave some for
> > > the people needing crypographically secure randomness" category.
> >
> > We've had several calls for a more fast and loose version of
> > get_random_bytes. Generalizing one of the cookie generation functions
> > is probably a good way to go.
>
> Are you thinking in terms of secure_tcp_syn_cookie(), or did you have
> something else in mind?

Yes. Using a hash function rather than a trivial LFSR is preferable.
But pulling the guts out and giving it an n-bytes interface like
get_random_bytes().

--
Mathematics is the supreme nostalgia of our time.

2007-08-28 01:16:11

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Thu, Aug 23, 2007 at 02:40:37PM -0500, Matt Mackall wrote:
> On Thu, Aug 23, 2007 at 11:58:31AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 23, 2007 at 01:06:58PM -0500, Matt Mackall wrote:
> > > On Fri, Aug 17, 2007 at 01:00:22PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Aug 17, 2007 at 11:53:56AM -0700, Andrew Morton wrote:
> > > > > On Wed, 15 Aug 2007 19:49:04 -0700
> > > > > "Paul E. McKenney" <[email protected]> wrote:
> > > > >
> > > > > > Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
> > > > > >
> > > > > > The locking used by get_random_bytes() can conflict with the
> > > > > > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > > > > > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > > > > > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > > > > > thread's rock-bottom scheduling priority to provide useful entropy),
> > > > > > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > > > > > to GPLed kernel modules such as rcutorture.
> > > > > >
> > > > > > Passes several hours of rcutorture.
> > > > >
> > > > > Please explain what "conflict with" means so that I can work out if
> > > > > this is a needed-in-2.6.23 change, thanks.
> > > >
> > > > Not needed in 2.6.23. This change falls into the "preparation for -rt"
> > > > category. Also in the "don't unnecessarily eat entropy, leave some for
> > > > the people needing crypographically secure randomness" category.
> > >
> > > We've had several calls for a more fast and loose version of
> > > get_random_bytes. Generalizing one of the cookie generation functions
> > > is probably a good way to go.
> >
> > Are you thinking in terms of secure_tcp_syn_cookie(), or did you have
> > something else in mind?
>
> Yes. Using a hash function rather than a trivial LFSR is preferable.
> But pulling the guts out and giving it an n-bytes interface like
> get_random_bytes().

OK. But this cannot be the first discussion of getting a fast and loose
version of get_random_bytes() into the kernel. Anyplace I should look
for cautionary tales? A quick search located a spirited discussion of
proposed kernel infrastructure for user-mode random number generation
back in 2003, but...

Also a 2006 proposal from Stephan Eranian: http://lkml.org/lkml/2006/8/23/41
This appears to have gotten zero replies. :-/ (Though not hash-based.)

Other semi-related threads:

http://lkml.org/lkml/2005/3/15/102
http://lkml.org/lkml/2004/9/23/337

Some years back, my reflexive design would have been per-CPU state,
accessed with interrupts disabled. Not so good for realtime usage,
though. One could go with per-task state in order to avoid the
interrupt disabling, which might be OK if the state is quite small.

Thanx, Paul

2007-09-03 13:27:22

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Mon, Aug 27, 2007 at 06:15:54PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 23, 2007 at 02:40:37PM -0500, Matt Mackall wrote:
> > On Thu, Aug 23, 2007 at 11:58:31AM -0700, Paul E. McKenney wrote:
> > > On Thu, Aug 23, 2007 at 01:06:58PM -0500, Matt Mackall wrote:
> > > > On Fri, Aug 17, 2007 at 01:00:22PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Aug 17, 2007 at 11:53:56AM -0700, Andrew Morton wrote:
> > > > > > On Wed, 15 Aug 2007 19:49:04 -0700
> > > > > > "Paul E. McKenney" <[email protected]> wrote:
> > > > > >
> > > > > > > Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
> > > > > > >
> > > > > > > The locking used by get_random_bytes() can conflict with the
> > > > > > > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > > > > > > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > > > > > > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > > > > > > thread's rock-bottom scheduling priority to provide useful entropy),
> > > > > > > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > > > > > > to GPLed kernel modules such as rcutorture.
> > > > > > >
> > > > > > > Passes several hours of rcutorture.
> > > > > >
> > > > > > Please explain what "conflict with" means so that I can work out if
> > > > > > this is a needed-in-2.6.23 change, thanks.
> > > > >
> > > > > Not needed in 2.6.23. This change falls into the "preparation for -rt"
> > > > > category. Also in the "don't unnecessarily eat entropy, leave some for
> > > > > the people needing crypographically secure randomness" category.
> > > >
> > > > We've had several calls for a more fast and loose version of
> > > > get_random_bytes. Generalizing one of the cookie generation functions
> > > > is probably a good way to go.
> > >
> > > Are you thinking in terms of secure_tcp_syn_cookie(), or did you have
> > > something else in mind?
> >
> > Yes. Using a hash function rather than a trivial LFSR is preferable.
> > But pulling the guts out and giving it an n-bytes interface like
> > get_random_bytes().
>
> OK. But this cannot be the first discussion of getting a fast and loose
> version of get_random_bytes() into the kernel. Anyplace I should look
> for cautionary tales? A quick search located a spirited discussion of
> proposed kernel infrastructure for user-mode random number generation
> back in 2003, but...
>
> Also a 2006 proposal from Stephan Eranian: http://lkml.org/lkml/2006/8/23/41
> This appears to have gotten zero replies. :-/ (Though not hash-based.)

You probably did the same searches I would do, so no, don't have any
pointers, just vague recollections.

> Other semi-related threads:
>
> http://lkml.org/lkml/2005/3/15/102
> http://lkml.org/lkml/2004/9/23/337
>
> Some years back, my reflexive design would have been per-CPU state,
> accessed with interrupts disabled. Not so good for realtime usage,
> though. One could go with per-task state in order to avoid the
> interrupt disabling, which might be OK if the state is quite small.

We only need be concerned here with locking insofar as we'd produce
duplicate output without it. So it's fairly easy to imagine a lockless
design using percpu data and perhaps folding in the preemption state.

--
Mathematics is the supreme nostalgia of our time.

2007-09-03 20:09:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Mon, Sep 03, 2007 at 08:29:04AM -0500, Matt Mackall wrote:
> On Mon, Aug 27, 2007 at 06:15:54PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 23, 2007 at 02:40:37PM -0500, Matt Mackall wrote:
> > > Yes. Using a hash function rather than a trivial LFSR is preferable.
> > > But pulling the guts out and giving it an n-bytes interface like
> > > get_random_bytes().
> >
> > OK. But this cannot be the first discussion of getting a fast and loose
> > version of get_random_bytes() into the kernel. Anyplace I should look
> > for cautionary tales? A quick search located a spirited discussion of
> > proposed kernel infrastructure for user-mode random number generation
> > back in 2003, but...
> >
> > Also a 2006 proposal from Stephan Eranian: http://lkml.org/lkml/2006/8/23/41
> > This appears to have gotten zero replies. :-/ (Though not hash-based.)
>
> You probably did the same searches I would do, so no, don't have any
> pointers, just vague recollections.

I was afraid of that. ;-)

> > Other semi-related threads:
> >
> > http://lkml.org/lkml/2005/3/15/102
> > http://lkml.org/lkml/2004/9/23/337
> >
> > Some years back, my reflexive design would have been per-CPU state,
> > accessed with interrupts disabled. Not so good for realtime usage,
> > though. One could go with per-task state in order to avoid the
> > interrupt disabling, which might be OK if the state is quite small.
>
> We only need be concerned here with locking insofar as we'd produce
> duplicate output without it. So it's fairly easy to imagine a lockless
> design using percpu data and perhaps folding in the preemption state.

And if we have a hash on the output, conflicting updates to the state
should be tolerable as well. Still want per-CPU state in order to
avoid cache thrashing, of course, but should be able to avoid preemption
disabling. So the trick will be getting a performance/size/entropy
tradeoff that 75% of the current roll-your-own random-number-generator
uses can live with.

Thanx, Paul

2007-09-04 05:33:58

by Satyam Sharma

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

Hi Paul,


On Wed, 15 Aug 2007, Paul E. McKenney wrote:
>
> The locking used by get_random_bytes() can conflict with the
> preempt_disable() and synchronize_sched() form of RCU. This patch changes
> rcutorture's RNG to gather entropy from the new cpu_clock() interface
> (relying on interrupts, preemption, daemons, and rcutorture's reader
> thread's rock-bottom scheduling priority to provide useful entropy),
> and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> to GPLed kernel modules such as rcutorture.

Honestly, rcutorture goes to some amazing lengths just to have this
randomizing-the-delays-that-read/write-test-threads-spend-inside-or-
outside-the-critical-sections thing :-) Especially, seeing that
synchro-test, the other "comparable" module, just doesn't bother with
all this at all. (especially check out its load == interval == do_sched
== 0 case! :-)

So IMHO, considering that rcutorture isn't a "serious" user of randomness
in the first place (of even a "fast-and-loose version" for that matter),
you could consider a solution where you gather all the randomness you need
at module_init time itself and save it somewhere, and then use it wherever
you're calling into rcu_random()->cpu_clock() [ or get_random_bytes() ]
in the current code. You could even make some trivial updates to those
random numbers after every RCU_RANDOM_REFRESH uses, like present.

Agreed, anybody running rcutorture isn't really looking for performance,
but why call get_random_bytes() or cpu_clock() (and the smp_processor_id()
+ irq_save/restore + export_symbol() that goes with it) when it isn't
_really_ "required" as such ...

Just a suggestion,

Satyam

2007-09-04 16:14:30

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Tue, Sep 04, 2007 at 11:16:50AM +0530, Satyam Sharma wrote:
> Hi Paul,
>
>
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> >
> > The locking used by get_random_bytes() can conflict with the
> > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > thread's rock-bottom scheduling priority to provide useful entropy),
> > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > to GPLed kernel modules such as rcutorture.
>
> Honestly, rcutorture goes to some amazing lengths just to have this
> randomizing-the-delays-that-read/write-test-threads-spend-inside-or-
> outside-the-critical-sections thing :-) Especially, seeing that
> synchro-test, the other "comparable" module, just doesn't bother with
> all this at all. (especially check out its load == interval == do_sched
> == 0 case! :-)

Yep. The need for that level of randomization in rcutorture has been made
painfully clear to me over a period of more than a decade. Of course,
the overhead of the re-seeding does get diluted by a factor of 10,000 or
100,000, depending on what version you are using. So, from a throughput
standpoint, the overhead is essentially that of a linear congruential
random-number generator. This is critically important given the low
overhead of rcu_read_lock() and rcu_read_unlock().

Still, this is indeed not what you want on a fastpath of a realtime
system, where average performance means nothing -- only the worst case
counts. And this is why I am -not- putting the rcutorture RNG forward
for general-purpose use. So we are at least in agreement on that piece!

And, as you hint below, anyone running rcutorture while also running
a production realtime workload needs to seriously rethink their design. ;-)
(If you are instead running it to provide a test load for your realtime
testing, fine and good.)

> So IMHO, considering that rcutorture isn't a "serious" user of randomness
> in the first place (of even a "fast-and-loose version" for that matter),
> you could consider a solution where you gather all the randomness you need
> at module_init time itself and save it somewhere, and then use it wherever
> you're calling into rcu_random()->cpu_clock() [ or get_random_bytes() ]
> in the current code. You could even make some trivial updates to those
> random numbers after every RCU_RANDOM_REFRESH uses, like present.

Well, assuming that the Linux kernel really needs a central implementation
of a "pretty fast" and "pretty good" RNG, one could imagine all sorts of
designs:

1. Use an LCRNG feeding into an array, as the old Berkeley random()
does (or see Knuth for an earlier citation), but make it per-CPU.
When pulling out randomness, do an MDn hash on the array
along with a per-task counter and the per-CPU preempt counter.
Increment the per-task counter on each use. Do an LCRNG step
on each use. Since this is a fixed array, the collisions in
CONFIG_PREEMPT due to preemption can be permitted to happen
without penalty.

This approach avoids all locking, all interrupt disabling, and
all preemption disabling. But the MD hashes aren't the fastest
things in the kernel, from what I understand.

Question: will this be fast enough? If so, which of the MD
hashes should be used?

2. As in #1 above, but use some simpler hash, such as addition or
XOR. Maybe CRC. (Benchmark for speed.)

3. Just use a simple LCRNG with per-task state. Perturb from some
statistical counter (the per-CPU RCU grace-period counter might
be appropriate). Or don't even bother doing that.

This would be -much- faster than any of the above, and would
be deterministic, hence good for realtime use. LCRNG might not
satisfy more-demanding users, especially the paranoid ones.

(This is what you are proposing above, correct?)

4. Just use LCRNG into a array like Berkeley random(), but replicate
on a per-CPU basis. Maybe or maybe not perturb occasionally
from some statistical counter as in #3 above.

This would be reasonably fast, and should satisfy most users.
People needing cryptographically secure RNGs should of course
stick with get_random_bytes().

[If I had some blazing reason to implement this -right- -now-,
this would be the approach I would take.]

5. Stick with the current situation where people needing fast
and dirty RNGs roll their own.

> Agreed, anybody running rcutorture isn't really looking for performance,
> but why call get_random_bytes() or cpu_clock() (and the smp_processor_id()
> + irq_save/restore + export_symbol() that goes with it) when it isn't
> _really_ "required" as such ...

Well, that would in fact be why the high-overhead path is taken only
very rarely.

And again, I am -not- putting the rcutorture RNG forward for general use,
as it is a no-go for realtime fastpath use.

Votes for #5 above? Given the total lack of any sort of response to
Stephan Eranian's proposal last year, might be optimal. ;-)

Thanx, Paul

2007-09-04 17:47:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Make rcutorture RNG use temporal entropy

On Tue, Sep 04, 2007 at 09:14:19AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 04, 2007 at 11:16:50AM +0530, Satyam Sharma wrote:
> > Hi Paul,
> >
> > On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> > >
> > > The locking used by get_random_bytes() can conflict with the
> > > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > > thread's rock-bottom scheduling priority to provide useful entropy),
> > > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > > to GPLed kernel modules such as rcutorture.
> >
> > Honestly, rcutorture goes to some amazing lengths just to have this
> > randomizing-the-delays-that-read/write-test-threads-spend-inside-or-
> > outside-the-critical-sections thing :-) Especially, seeing that
> > synchro-test, the other "comparable" module, just doesn't bother with
> > all this at all. (especially check out its load == interval == do_sched
> > == 0 case! :-)
>
> Yep. The need for that level of randomization in rcutorture has been made
> painfully clear to me over a period of more than a decade. Of course,
> the overhead of the re-seeding does get diluted by a factor of 10,000 or
> 100,000, depending on what version you are using. So, from a throughput
> standpoint, the overhead is essentially that of a linear congruential
> random-number generator. This is critically important given the low
> overhead of rcu_read_lock() and rcu_read_unlock().
>
> Still, this is indeed not what you want on a fastpath of a realtime
> system, where average performance means nothing -- only the worst case
> counts. And this is why I am -not- putting the rcutorture RNG forward
> for general-purpose use. So we are at least in agreement on that piece!
>
> And, as you hint below, anyone running rcutorture while also running
> a production realtime workload needs to seriously rethink their design. ;-)
> (If you are instead running it to provide a test load for your realtime
> testing, fine and good.)
>
> > So IMHO, considering that rcutorture isn't a "serious" user of randomness
> > in the first place (of even a "fast-and-loose version" for that matter),
> > you could consider a solution where you gather all the randomness you need
> > at module_init time itself and save it somewhere, and then use it wherever
> > you're calling into rcu_random()->cpu_clock() [ or get_random_bytes() ]
> > in the current code. You could even make some trivial updates to those
> > random numbers after every RCU_RANDOM_REFRESH uses, like present.
>
> Well, assuming that the Linux kernel really needs a central implementation
> of a "pretty fast" and "pretty good" RNG, one could imagine all sorts of
> designs:
>
> 1. Use an LCRNG feeding into an array, as the old Berkeley random()
> does (or see Knuth for an earlier citation), but make it per-CPU.
> When pulling out randomness, do an MDn hash on the array
> along with a per-task counter and the per-CPU preempt counter.
> Increment the per-task counter on each use. Do an LCRNG step
> on each use. Since this is a fixed array, the collisions in
> CONFIG_PREEMPT due to preemption can be permitted to happen
> without penalty.
>
> This approach avoids all locking, all interrupt disabling, and
> all preemption disabling. But the MD hashes aren't the fastest
> things in the kernel, from what I understand.
>
> Question: will this be fast enough? If so, which of the MD
> hashes should be used?
>
> 2. As in #1 above, but use some simpler hash, such as addition or
> XOR. Maybe CRC. (Benchmark for speed.)
>
> 3. Just use a simple LCRNG with per-task state. Perturb from some
> statistical counter (the per-CPU RCU grace-period counter might
> be appropriate). Or don't even bother doing that.
>
> This would be -much- faster than any of the above, and would
> be deterministic, hence good for realtime use. LCRNG might not
> satisfy more-demanding users, especially the paranoid ones.
>
> (This is what you are proposing above, correct?)
>
> 4. Just use LCRNG into a array like Berkeley random(), but replicate
> on a per-CPU basis. Maybe or maybe not perturb occasionally
> from some statistical counter as in #3 above.
>
> This would be reasonably fast, and should satisfy most users.
> People needing cryptographically secure RNGs should of course
> stick with get_random_bytes().
>
> [If I had some blazing reason to implement this -right- -now-,
> this would be the approach I would take.]
>
> 5. Stick with the current situation where people needing fast
> and dirty RNGs roll their own.

Or, better yet, as suggested by Rusty:

6. Use random32() from lib/random32.c and be happy. This does
disable preemption across the calculation, which should not
be a problem in most situations. Although it does not protect
against interrupts, the effect would simply be to scramble
the state a bit more (or perhaps unscramble it a bit).

The overhead should not be too bad.

There. My work is done. ;-)

Thanx, Paul

> > Agreed, anybody running rcutorture isn't really looking for performance,
> > but why call get_random_bytes() or cpu_clock() (and the smp_processor_id()
> > + irq_save/restore + export_symbol() that goes with it) when it isn't
> > _really_ "required" as such ...
>
> Well, that would in fact be why the high-overhead path is taken only
> very rarely.
>
> And again, I am -not- putting the rcutorture RNG forward for general use,
> as it is a no-go for realtime fastpath use.
>
> Votes for #5 above? Given the total lack of any sort of response to
> Stephan Eranian's proposal last year, might be optimal. ;-)
>
> Thanx, Paul