LinuxLists.cc - [PATCH 0/7] random32: Various minor cleanups

2014-06-07 08:18:31

Subject: [PATCH 0/7] random32: Various minor cleanups

I was looking for self-test code to emulate for lib/glob.c and, while
canvassing other code with self-tests, found some code that made me itch,
so I scratched it.

7 patches;

1/7: Mark the self-test data as __initconst
2/7: Remove excess calls to prandom_u32_state in initialization
3/7: Replace an #ifdef with a stub prandom_state_selftest()
4/7: Use <asm/unaligned.h> instead of hand-rolling it
5/7: Make prandom_u32_max efficient for powers of 2
6/7: Randomize timeout to the millisecond, not the second
7/7: Remove redundant U suffixes on integers

I have a big follow-on patch series to replace all the (many) instances
of "prandom_u32() % x" in the kernel with prandom_u32_max(x), which is
more efficient.

As of patch 5/7, "prandom_u32() & (x-1)" can also be replaced by
"prandom_u32_max(x)", but that's optional.

Patch 6/7 is dubious, and I'd like comments. I just don't see
a reason why integer granularity would be desirable.

2014-06-07 08:19:37

by George Spelvin

[permalink] [raw]

Subject: [PATCH 1/7] lib/random32.c: Mark self-test data as __initconst

So it can be thrown away along with the code that uses it.

Signed-off-by: George Spelvin <[email protected]>
---
lib/random32.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/random32.c b/lib/random32.c
index fa5da61c..4da5d281 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -296,7 +296,7 @@ late_initcall(prandom_reseed);
static struct prandom_test1 {
u32 seed;
u32 result;
-} test1[] = {
+} const test1[] __initconst = {
{ 1U, 3484351685U },
{ 2U, 2623130059U },
{ 3U, 3125133893U },
@@ -307,7 +307,7 @@ static struct prandom_test2 {
u32 seed;
u32 iteration;
u32 result;
-} test2[] = {
+} const test2[] __initconst = {
/* Test cases against taus113 from GSL library. */
{ 931557656U, 959U, 2975593782U },
{ 1339693295U, 876U, 3887776532U },
--
2.0.0

2014-06-07 08:20:59

by George Spelvin

[permalink] [raw]

Subject: [PATCH 2/7] lib/random32.c: Remove excess calls to prandom_u32_state in initialization

Unrolling code in single-use code paths is just silly. There are two
instances:

1) prandom_warmup() calls 10 times.
2) prandom_state_selftest() can avoid one call and simplify the
loop logic by repeating an assignment to a local variable
(that probably adds zero code anyway)

Signed-off-by: George Spelvin <[email protected]>
---
lib/random32.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/lib/random32.c b/lib/random32.c
index 4da5d281..2e7c15c0 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -134,17 +134,11 @@ EXPORT_SYMBOL(prandom_bytes);

static void prandom_warmup(struct rnd_state *state)
{
+ int i;
+
/* Calling RNG ten times to satify recurrence condition */
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
+ for (i = 0; i < 10; i++)
+ prandom_u32_state(state);
}

static void prandom_seed_very_weak(struct rnd_state *state, u32 seed)
@@ -433,14 +427,15 @@ static void __init prandom_state_selftest(void)

for (i = 0; i < ARRAY_SIZE(test2); i++) {
struct rnd_state state;
+ u32 result;

prandom_seed_very_weak(&state, test2[i].seed);
prandom_warmup(&state);

- for (j = 0; j < test2[i].iteration - 1; j++)
- prandom_u32_state(&state);
+ for (j = 0; j < test2[i].iteration; j++)
+ result = prandom_u32_state(&state);

- if (test2[i].result != prandom_u32_state(&state))
+ if (test2[i].result != result)
errors++;

runs++;
--
2.0.0

2014-06-07 08:22:37

by George Spelvin

[permalink] [raw]

Subject: [PATCH 3/7] lib/random32.c: Replace an #ifdef with a stub prandom_state_selftest()

The preferred Linux style for optional features is to define
no-op stub functions rather than wrap each call site in #ifdef.

Signed-off-by: George Spelvin <[email protected]>
---
lib/random32.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/random32.c b/lib/random32.c
index 2e7c15c0..e8f3557b 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -40,6 +40,8 @@

#ifdef CONFIG_RANDOM32_SELFTEST
static void __init prandom_state_selftest(void);
+#else
+#define prandom_state_selftest() (void)0
#endif

static DEFINE_PER_CPU(struct rnd_state, net_rand_state);
@@ -188,9 +190,7 @@ static int __init prandom_init(void)
{
int i;

-#ifdef CONFIG_RANDOM32_SELFTEST
prandom_state_selftest();
-#endif

for_each_possible_cpu(i) {
struct rnd_state *state = &per_cpu(net_rand_state,i);
--
2.0.0

2014-06-07 08:25:21

by George Spelvin

[permalink] [raw]

Subject: [PATCH 4/7] lib/random32.c: Use <asm/unaligned.h> instead of hand-rolling it

The functions exist for a reason; the manual byte-at-a-time code
is unnecessarily slow (and bloated).

Signed-off-by: George Spelvin <[email protected]>
---
lib/random32.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/lib/random32.c b/lib/random32.c
index e8f3557b..eee60100 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -37,6 +37,7 @@
#include <linux/jiffies.h>
#include <linux/random.h>
#include <linux/sched.h>
+#include <asm/unaligned.h>

#ifdef CONFIG_RANDOM32_SELFTEST
static void __init prandom_state_selftest(void);
@@ -97,25 +98,23 @@ EXPORT_SYMBOL(prandom_u32);
*/
void prandom_bytes_state(struct rnd_state *state, void *buf, int bytes)
{
- unsigned char *p = buf;
+ u8 *p = buf;
int i;

- for (i = 0; i < round_down(bytes, sizeof(u32)); i += sizeof(u32)) {
- u32 random = prandom_u32_state(state);
- int j;
+ for (i = 0; i < round_down(bytes, sizeof(u32)); i += sizeof(u32))
+ put_unaligned_le32(prandom_u32_state(state), p+i);

- for (j = 0; j < sizeof(u32); j++) {
- p[i + j] = random;
- random >>= BITS_PER_BYTE;
- }
- }
if (i < bytes) {
u32 random = prandom_u32_state(state);

- for (; i < bytes; i++) {
- p[i] = random;
- random >>= BITS_PER_BYTE;
+ if (bytes & 2) {
+ put_unaligned_le16((u16)random, p+i);
+ if ((bytes & 1) == 0)
+ return;
+ i += 2;
+ random >>= 16;
}
+ p[i] = (u8)random;
}
}
EXPORT_SYMBOL(prandom_bytes_state);
--
2.0.0

2014-06-07 08:28:08

by George Spelvin

[permalink] [raw]

Subject: [PATCH 5/7] lib/random32.c: Make prandom_u32_max efficient for powers of 2

The multiply-and-shift is efficient in the general case, but slower than
a simple bitwise AND if the range is a power of 2. Make the function
handle this case so callers don't have to worry about micro-optimizing it.

Signed-off-by: George Spelvin <[email protected]>
---
include/linux/random.h | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbffd..e1f3ec9a 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -47,11 +47,23 @@ void prandom_bytes_state(struct rnd_state *state, void *buf, int nbytes);
* generator, that is, prandom_u32(). This is useful when requesting a
* random index of an array containing ep_ro elements, for example.
*
+ * If ep_ro is a power of 2 known at compile time, a modulo operation
+ * reduces to a simple mask to extract the low order bits. Otherwise,
+ * it uses a multiply and shift, which is faster than a general modulus.
+ *
* Returns: pseudo-random number in interval [0, ep_ro)
*/
static inline u32 prandom_u32_max(u32 ep_ro)
{
- return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
+ /*
+ * Instead of just __builtin_constant_p(ep_ro), this test is
+ * "is it known at compile time that ep_ro is a power of 2?", and
+ * can in theory handle the case that it's an unknown power of 2.
+ */
+ if (__builtin_constant_p(ep_ro & (ep_ro-1)) && !(ep_ro & (ep_ro-1)))
+ return prandom_u32() & (ep_ro-1);
+ else
+ return (u32)((u64)prandom_u32() * ep_ro >> 32);
}

/*
--
2.0.0

2014-06-07 08:29:01

by George Spelvin

[permalink] [raw]

Subject: [PATCH 6/7] lib/random32.c: Randomize timeout to the millisecond, not the second

If you're going to bother randomizing it, do it right.
And use prandom_u32_max(), which is designed for the job, rather
than "% 40".

Signed-off-by: George Spelvin <[email protected]>
---
lib/random32.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/random32.c b/lib/random32.c
index eee60100..9cc410dd 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -208,14 +208,14 @@ static DEFINE_TIMER(seed_timer, __prandom_timer, 0, 0);
static void __prandom_timer(unsigned long dontcare)
{
u32 entropy;
- unsigned long expires;
+ unsigned int expires;

get_random_bytes(&entropy, sizeof(entropy));
prandom_seed(entropy);

/* reseed every ~60 seconds, in [40 .. 80) interval with slack */
- expires = 40 + (prandom_u32() % 40);
- seed_timer.expires = jiffies + msecs_to_jiffies(expires * MSEC_PER_SEC);
+ expires = 40000 + prandom_u32_max(40000);
+ seed_timer.expires = jiffies + msecs_to_jiffies(expires);

add_timer(&seed_timer);
}
--
2.0.0

2014-06-07 08:31:23

by George Spelvin

[permalink] [raw]

Subject: [PATCH 7/7] lib/random32.c: Remove redundant U suffixes on integers

Get rid of a few of the extraneous U suffixes on ordinary integers.

Signed-off-by: George Spelvin <[email protected]>
---
lib/random32.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/random32.c b/lib/random32.c
index 9cc410dd..ad0c2ed1 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -58,10 +58,10 @@ u32 prandom_u32_state(struct rnd_state *state)
{
#define TAUSWORTHE(s,a,b,c,d) ((s&c)<<d) ^ (((s <<a) ^ s)>>b)

- state->s1 = TAUSWORTHE(state->s1, 6U, 13U, 4294967294U, 18U);
- state->s2 = TAUSWORTHE(state->s2, 2U, 27U, 4294967288U, 2U);
- state->s3 = TAUSWORTHE(state->s3, 13U, 21U, 4294967280U, 7U);
- state->s4 = TAUSWORTHE(state->s4, 3U, 12U, 4294967168U, 13U);
+ state->s1 = TAUSWORTHE(state->s1, 6, 13, 4294967294U, 18);
+ state->s2 = TAUSWORTHE(state->s2, 2, 27, 4294967288U, 2);
+ state->s3 = TAUSWORTHE(state->s3, 13, 21, 4294967280U, 7);
+ state->s4 = TAUSWORTHE(state->s4, 3, 12, 4294967168U, 13);

return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4);
}
@@ -77,9 +77,8 @@ EXPORT_SYMBOL(prandom_u32_state);
u32 prandom_u32(void)
{
struct rnd_state *state = &get_cpu_var(net_rand_state);
- u32 res;
+ u32 res = prandom_u32_state(state);

- res = prandom_u32_state(state);
put_cpu_var(state);

return res;
--
2.0.0

2014-06-08 10:36:30

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 7/7] lib/random32.c: Remove redundant U suffixes on integers

[ Please also Cc Hannes in your series. ]

On 06/07/2014 10:31 AM, George Spelvin wrote:
> Get rid of a few of the extraneous U suffixes on ordinary integers.
>
> Signed-off-by: George Spelvin <[email protected]>
> ---
> lib/random32.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/lib/random32.c b/lib/random32.c
> index 9cc410dd..ad0c2ed1 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -58,10 +58,10 @@ u32 prandom_u32_state(struct rnd_state *state)
> {
> #define TAUSWORTHE(s,a,b,c,d) ((s&c)<<d) ^ (((s <<a) ^ s)>>b)
>
> - state->s1 = TAUSWORTHE(state->s1, 6U, 13U, 4294967294U, 18U);
> - state->s2 = TAUSWORTHE(state->s2, 2U, 27U, 4294967288U, 2U);
> - state->s3 = TAUSWORTHE(state->s3, 13U, 21U, 4294967280U, 7U);
> - state->s4 = TAUSWORTHE(state->s4, 3U, 12U, 4294967168U, 13U);
> + state->s1 = TAUSWORTHE(state->s1, 6, 13, 4294967294U, 18);
> + state->s2 = TAUSWORTHE(state->s2, 2, 27, 4294967288U, 2);
> + state->s3 = TAUSWORTHE(state->s3, 13, 21, 4294967280U, 7);
> + state->s4 = TAUSWORTHE(state->s4, 3, 12, 4294967168U, 13);

I don't see the point in why we _need_ to change this here.

> return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4);
> }
> @@ -77,9 +77,8 @@ EXPORT_SYMBOL(prandom_u32_state);
> u32 prandom_u32(void)
> {
> struct rnd_state *state = &get_cpu_var(net_rand_state);
> - u32 res;
> + u32 res = prandom_u32_state(state);
>
> - res = prandom_u32_state(state);
> put_cpu_var(state);

Ditto.

> return res;
>

2014-06-08 10:43:12

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 6/7] lib/random32.c: Randomize timeout to the millisecond, not the second

On 06/07/2014 10:28 AM, George Spelvin wrote:
> If you're going to bother randomizing it, do it right.
> And use prandom_u32_max(), which is designed for the job, rather
> than "% 40".
>
> Signed-off-by: George Spelvin <[email protected]>

Fine by me this cleanup, although not strictly needed.

> ---
> lib/random32.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/lib/random32.c b/lib/random32.c
> index eee60100..9cc410dd 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -208,14 +208,14 @@ static DEFINE_TIMER(seed_timer, __prandom_timer, 0, 0);
> static void __prandom_timer(unsigned long dontcare)
> {
> u32 entropy;
> - unsigned long expires;
> + unsigned int expires;
>
> get_random_bytes(&entropy, sizeof(entropy));
> prandom_seed(entropy);
>
> /* reseed every ~60 seconds, in [40 .. 80) interval with slack */
> - expires = 40 + (prandom_u32() % 40);
> - seed_timer.expires = jiffies + msecs_to_jiffies(expires * MSEC_PER_SEC);
> + expires = 40000 + prandom_u32_max(40000);
> + seed_timer.expires = jiffies + msecs_to_jiffies(expires);
>
> add_timer(&seed_timer);
> }
>

2014-06-08 11:14:28

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 7/7] lib/random32.c: Remove redundant U suffixes on integers

> [ Please also Cc Hannes in your series. ]

Thank you! But there are two frequent kernel contributors with that name;
the other is Hannes Reinecke <[email protected]>.

> I don't see the point in why we _need_ to change this here.

We don't. It was just a slight legibility improvement (at least, *I*
find it more legible) included as long as I was working in the area.
But split out to avoid making review of the substantive parts any
more difficult, and so it can be dropped easily if there are objections.

We tend to avoid cosmetic cleanups if there is not other work to
do in an area, so I assume that if there *is* other work to do in
an area, cosmetic cleanups are okay.

2014-06-08 11:30:04

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 6/7] lib/random32.c: Randomize timeout to the millisecond, not the second

> Fine by me this cleanup, although not strictly needed.

Agreed. The timer slack is set to HZ (1 second) anyway.

It just dawned on me that a simpler and more efficient way to do this
(which I'll do in v2 of this) would be:

/* reseed every ~60 seconds, in [40 .. 80) interval with slack */
- expires = 40 + (prandom_u32() % 40);
- seed_timer.expires = jiffies + msecs_to_jiffies(expires * MSEC_PER_SEC);
+ expires = 40*HZ + prandom_u32_max(40*HZ);
+ seed_timer.expires = jiffies + expires;

That avoids calling msecs_to_jiffies entirely.

2014-06-08 12:03:47

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 5/7] lib/random32.c: Make prandom_u32_max efficient for powers of 2

On 06/07/2014 10:28 AM, George Spelvin wrote:
> The multiply-and-shift is efficient in the general case, but slower than
> a simple bitwise AND if the range is a power of 2. Make the function
> handle this case so callers don't have to worry about micro-optimizing it.
>
> Signed-off-by: George Spelvin <[email protected]>
> ---
> include/linux/random.h | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/random.h b/include/linux/random.h
> index 57fbbffd..e1f3ec9a 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -47,11 +47,23 @@ void prandom_bytes_state(struct rnd_state *state, void *buf, int nbytes);
> * generator, that is, prandom_u32(). This is useful when requesting a
> * random index of an array containing ep_ro elements, for example.
> *
> + * If ep_ro is a power of 2 known at compile time, a modulo operation
> + * reduces to a simple mask to extract the low order bits. Otherwise,
> + * it uses a multiply and shift, which is faster than a general modulus.
> + *
> * Returns: pseudo-random number in interval [0, ep_ro)
> */
> static inline u32 prandom_u32_max(u32 ep_ro)
> {
> - return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
> + /*
> + * Instead of just __builtin_constant_p(ep_ro), this test is
> + * "is it known at compile time that ep_ro is a power of 2?", and
> + * can in theory handle the case that it's an unknown power of 2.
> + */
> + if (__builtin_constant_p(ep_ro & (ep_ro-1)) && !(ep_ro & (ep_ro-1)))
> + return prandom_u32() & (ep_ro-1);
> + else
> + return (u32)((u64)prandom_u32() * ep_ro >> 32);

Okay, I guess it's fine since we expect a random number here anyway, so
it doesn't matter much how we get to the result; otherwise I'd have
complained as both function don't do the same thing. Please leave some
whitespace, i.e. I mean "ep_ro-1" -> "ep_ro - 1".

Btw, there are many other users which hard code prandom_u32_max() resp.
reciprocal_scale() function in the source tree. If you have some cycles,
feel free to migrate them and let them use these functions.

> }
>
> /*
>

2014-06-08 12:05:50

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 7/7] lib/random32.c: Remove redundant U suffixes on integers

On 06/08/2014 01:14 PM, George Spelvin wrote:
>> [ Please also Cc Hannes in your series. ]
>
> Thank you! But there are two frequent kernel contributors with that name;
> the other is Hannes Reinecke <[email protected]>.

I mean <[email protected]> which I already did for you. I
thought this would have been obvious based on:

git log lib/random32.c

;)

2014-06-08 12:06:49

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 1/7] lib/random32.c: Mark self-test data as __initconst

On 06/07/2014 10:19 AM, George Spelvin wrote:
> So it can be thrown away along with the code that uses it.
>
> Signed-off-by: George Spelvin <[email protected]>

Fine by me.

Acked-by: Daniel Borkmann <[email protected]>

2014-06-08 12:11:16

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 2/7] lib/random32.c: Remove excess calls to prandom_u32_state in initialization

On 06/07/2014 10:20 AM, George Spelvin wrote:
> Unrolling code in single-use code paths is just silly. There are two
> instances:
>
> 1) prandom_warmup() calls 10 times.
> 2) prandom_state_selftest() can avoid one call and simplify the
> loop logic by repeating an assignment to a local variable
> (that probably adds zero code anyway)
>
> Signed-off-by: George Spelvin <[email protected]>

Fine by me although we simply resembled initialization code from
GSL here. I think that your subject line is a bit misleading though.

> ---
> lib/random32.c | 21 ++++++++-------------
> 1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/lib/random32.c b/lib/random32.c
> index 4da5d281..2e7c15c0 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -134,17 +134,11 @@ EXPORT_SYMBOL(prandom_bytes);
>
> static void prandom_warmup(struct rnd_state *state)
> {
> + int i;
> +
> /* Calling RNG ten times to satify recurrence condition */
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> - prandom_u32_state(state);
> + for (i = 0; i < 10; i++)
> + prandom_u32_state(state);
> }
>
> static void prandom_seed_very_weak(struct rnd_state *state, u32 seed)
> @@ -433,14 +427,15 @@ static void __init prandom_state_selftest(void)
>
> for (i = 0; i < ARRAY_SIZE(test2); i++) {
> struct rnd_state state;
> + u32 result;
>
> prandom_seed_very_weak(&state, test2[i].seed);
> prandom_warmup(&state);
>
> - for (j = 0; j < test2[i].iteration - 1; j++)
> - prandom_u32_state(&state);
> + for (j = 0; j < test2[i].iteration; j++)
> + result = prandom_u32_state(&state);
>
> - if (test2[i].result != prandom_u32_state(&state))
> + if (test2[i].result != result)
> errors++;
>
> runs++;
>

2014-06-08 12:16:28

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 3/7] lib/random32.c: Replace an #ifdef with a stub prandom_state_selftest()

On 06/07/2014 10:22 AM, George Spelvin wrote:
> The preferred Linux style for optional features is to define
> no-op stub functions rather than wrap each call site in #ifdef.
>
> Signed-off-by: George Spelvin <[email protected]>
> ---
> lib/random32.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/random32.c b/lib/random32.c
> index 2e7c15c0..e8f3557b 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -40,6 +40,8 @@
>
> #ifdef CONFIG_RANDOM32_SELFTEST
> static void __init prandom_state_selftest(void);
> +#else
> +#define prandom_state_selftest() (void)0

Fine by me. I think you can remove this '(void)0' here, though.

> #endif
>
> static DEFINE_PER_CPU(struct rnd_state, net_rand_state);
> @@ -188,9 +190,7 @@ static int __init prandom_init(void)
> {
> int i;
>
> -#ifdef CONFIG_RANDOM32_SELFTEST
> prandom_state_selftest();
> -#endif
>
> for_each_possible_cpu(i) {
> struct rnd_state *state = &per_cpu(net_rand_state,i);
>

2014-06-08 12:19:13

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 2/7] lib/random32.c: Remove excess calls to prandom_u32_state in initialization

Thank you for your time and attention on all these comments.

> Fine by me although we simply resembled initialization code from
> GSL here. I think that your subject line is a bit misleading though.

Yes, I had a hard time coming up with a good summary. I'm removing
*static* calls but not dynamic ones.

How about "don't unroll one-time initialization code"?
Would that be a better way to put it?

2014-06-08 12:25:36

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 4/7] lib/random32.c: Use <asm/unaligned.h> instead of hand-rolling it

On 06/07/2014 10:25 AM, George Spelvin wrote:
> The functions exist for a reason; the manual byte-at-a-time code
> is unnecessarily slow (and bloated).
>
> Signed-off-by: George Spelvin <[email protected]>

Seems fine by me, since it's random anyway archs might not care
about the *_le32, though this might yield additional work in some
cases I presume.

> ---
> lib/random32.c | 23 +++++++++++------------
> 1 file changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/lib/random32.c b/lib/random32.c
> index e8f3557b..eee60100 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -37,6 +37,7 @@
> #include <linux/jiffies.h>
> #include <linux/random.h>
> #include <linux/sched.h>
> +#include <asm/unaligned.h>
>
> #ifdef CONFIG_RANDOM32_SELFTEST
> static void __init prandom_state_selftest(void);
> @@ -97,25 +98,23 @@ EXPORT_SYMBOL(prandom_u32);
> */
> void prandom_bytes_state(struct rnd_state *state, void *buf, int bytes)
> {
> - unsigned char *p = buf;
> + u8 *p = buf;
> int i;
>
> - for (i = 0; i < round_down(bytes, sizeof(u32)); i += sizeof(u32)) {
> - u32 random = prandom_u32_state(state);
> - int j;
> + for (i = 0; i < round_down(bytes, sizeof(u32)); i += sizeof(u32))
> + put_unaligned_le32(prandom_u32_state(state), p+i);

Nit: 'p + i'

>
> - for (j = 0; j < sizeof(u32); j++) {
> - p[i + j] = random;
> - random >>= BITS_PER_BYTE;
> - }
> - }
> if (i < bytes) {
> u32 random = prandom_u32_state(state);
>
> - for (; i < bytes; i++) {
> - p[i] = random;
> - random >>= BITS_PER_BYTE;
> + if (bytes & 2) {
> + put_unaligned_le16((u16)random, p+i);

Ditto.

> + if ((bytes & 1) == 0)
> + return;
> + i += 2;
> + random >>= 16;
> }
> + p[i] = (u8)random;

Nit: '(u8) random'

You could probably use a switch statement with fall-through for
filling the remaining stuff, might simplify it further, perhaps.

> }
> }
> EXPORT_SYMBOL(prandom_bytes_state);
>

2014-06-08 12:27:17

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 3/7] lib/random32.c: Replace an #ifdef with a stub prandom_state_selftest()

>> #ifdef CONFIG_RANDOM32_SELFTEST
>> static void __init prandom_state_selftest(void);
>> +#else
>> +#define prandom_state_selftest() (void)0

> Fine by me. I think you can remove this '(void)0' here, though.

That's the standard way to write a no-op statement in C.

I seem to recall there's a reason that the empty string can cause problems
in some syntactic contexts, but I can't figure out what the situation is.

At first, I thought of the obvious:

if (condition)
prandom_state_selftest();
unconditional_code();

... but the semicolon makes that work. I'll try to remember
the reason.

(I know that nobody uses it in any such context, but it's
good manners to make a function-like macro behave as exactly
like a function as possible.)

2014-06-08 12:28:24

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 6/7] lib/random32.c: Randomize timeout to the millisecond, not the second

On 06/08/2014 01:30 PM, George Spelvin wrote:
>> Fine by me this cleanup, although not strictly needed.
>
> Agreed. The timer slack is set to HZ (1 second) anyway.
>
> It just dawned on me that a simpler and more efficient way to do this
> (which I'll do in v2 of this) would be:

Note, when you talk about efficiency here, this is called once every
40+ secs ... ;)

> /* reseed every ~60 seconds, in [40 .. 80) interval with slack */
> - expires = 40 + (prandom_u32() % 40);
> - seed_timer.expires = jiffies + msecs_to_jiffies(expires * MSEC_PER_SEC);
> + expires = 40*HZ + prandom_u32_max(40*HZ);
> + seed_timer.expires = jiffies + expires;
>
> That avoids calling msecs_to_jiffies entirely.
>

2014-06-08 12:40:34

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 4/7] lib/random32.c: Use <asm/unaligned.h> instead of hand-rolling it

> Seems fine by me, since it's random anyway archs might not care
> about the *_le32, though this might yield additional work in some
> cases I presume.

If you think that's okay, sure. I kept it consistent to get byte-for-byte
identical results.

If variations in that are allowed, that can also
simplify the trailing storage case:

+ if (bytes & 2)
+ put_unaligned_le16((u16)random, p+i);
+ if (bytes & 1)
+ p[i+bytes-1] = (u8)(random >> 16);

>> + for (i = 0; i < round_down(bytes, sizeof(u32)); i += sizeof(u32))
>> + put_unaligned_le32(prandom_u32_state(state), p+i);

> Nit: 'p + i'

I don't really care, but I'm happy without the spaces; I add them
to show what binds more weakly, and in this case that's the
argument-separating comma.

>> + p[i] = (u8)random;

> Nit: '(u8) random'

Actually, my style and most of the kerel is to not put
a space in a cast, since it binds so tighty.

E.g. compare

git grep ')[a-z]' kernel/
git grep ') [a-z]' kernel/

Notice that the second is alsmost all comments.
(There are some spaces in kernel/ptrace.v.)

I'd rather leave it without the space unless you feel
very strongly about it.

2014-06-08 12:42:14

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 6/7] lib/random32.c: Randomize timeout to the millisecond, not the second

> Note, when you talk about efficiency here, this is called once every
> 40+ secs ... ;)

Agreed, in this case the code size saving (avoid a function call) is the
main benefit. And I prefer to avoid inefficiency on general priinciples,
if it's not serving some other goal.

2014-06-08 17:34:15

by Hannes Frederic Sowa

[permalink] [raw]

Subject: Re: [PATCH 5/7] lib/random32.c: Make prandom_u32_max efficient for powers of 2

On Sat, Jun 7, 2014, at 1:28, George Spelvin wrote:
> diff --git a/include/linux/random.h b/include/linux/random.h
> index 57fbbffd..e1f3ec9a 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -47,11 +47,23 @@ void prandom_bytes_state(struct rnd_state *state, void *buf, int nbytes);
> * generator, that is, prandom_u32(). This is useful when requesting a
> * random index of an array containing ep_ro elements, for example.
> *
> + * If ep_ro is a power of 2 known at compile time, a modulo operation
> + * reduces to a simple mask to extract the low order bits. Otherwise,
> + * it uses a multiply and shift, which is faster than a general modulus.
> + *
> * Returns: pseudo-random number in interval [0, ep_ro)
> */
> static inline u32 prandom_u32_max(u32 ep_ro)
> {
> - return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
> + /*
> + * Instead of just __builtin_constant_p(ep_ro), this test is
> + * "is it known at compile time that ep_ro is a power of 2?", and
> + * can in theory handle the case that it's an unknown power of 2.
> + */
> + if (__builtin_constant_p(ep_ro & (ep_ro-1)) && !(ep_ro & (ep_ro-1)))
> + return prandom_u32() & (ep_ro-1);
> + else
> + return (u32)((u64)prandom_u32() * ep_ro >> 32);
> }

Have you checked assembler output if this helps anything at all? Constant propagation in the compiler should be able to figure that out all by itself. The only places I use __builtin_constant_p today are where I also make use of inline assembler.

Please check this as it makes the code more complicated and I doubt it is worth it.

Btw, IIRC there is a function is_power_of_2 somewhere. ;)

Thanks,

Hannes

2014-06-08 20:02:25

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 6/7] lib/random32.c: Randomize timeout to the millisecond, not the second

On 06/08/2014 02:42 PM, George Spelvin wrote:
>> Note, when you talk about efficiency here, this is called once every
>> 40+ secs ... ;)
>
> Agreed, in this case the code size saving (avoid a function call) is the
> main benefit. And I prefer to avoid inefficiency on general priinciples,
> if it's not serving some other goal.

Hm, don't think it will make much of a difference.

2014-06-08 20:02:57

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 5/7] lib/random32.c: Make prandom_u32_max efficient for powers of 2

On 06/08/2014 07:34 PM, Hannes Frederic Sowa wrote:
...
> Please check this as it makes the code more complicated and I doubt it is worth it.

Agreed.

2014-06-08 20:26:45

by Hannes Frederic Sowa

[permalink] [raw]

Subject: Re: [PATCH 4/7] lib/random32.c: Use <asm/unaligned.h> instead of hand-rolling it

On Sun, Jun 8, 2014, at 5:40, George Spelvin wrote:
> > Seems fine by me, since it's random anyway archs might not care
> > about the *_le32, though this might yield additional work in some
> > cases I presume.
>
> If you think that's okay, sure. I kept it consistent to get byte-for-byte
> identical results.
>
> If variations in that are allowed, that can also
> simplify the trailing storage case:
>
> + if (bytes & 2)
> + put_unaligned_le16((u16)random, p+i);
> + if (bytes & 1)
> + p[i+bytes-1] = (u8)(random >> 16);

Yes, please! Otherwise the code looks much too complicated for what it does.

> > Nit: '(u8) random'
>
> Actually, my style and most of the kerel is to not put
> a space in a cast, since it binds so tighty.
>
> E.g. compare
>
> git grep ')[a-z]' kernel/
> git grep ') [a-z]' kernel/
>
> Notice that the second is alsmost all comments.
> (There are some spaces in kernel/ptrace.v.)
>
> I'd rather leave it without the space unless you feel
> very strongly about it.

IMHO I wouldn't put a whitespace here.

Bye,
Hannes

2014-06-08 20:48:15

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 5/7] lib/random32.c: Make prandom_u32_max efficient for powers of 2

Thank you for your comments!

> Have you checked assembler output if this helps anything at all? Constant
> propagation in the compiler should be able to figure that out all by
> itself. The only places I use __builtin_constant_p today are where I
> also make use of inline assembler.

Yes, I did. (I'll expand the commit comment for v2; my bad.)

It seems that GCC isn't smart enough to reduce this to a single shift.
With the multiply and reduce, the code looks like:
call prandom_u32
xorl %edx, %edx
shldl $4, %eax, %edx
movl %edx, %eax

Instead of the hoped-for
call prandom_u32
shrl $28, %eax

Converting to a single mask is something the compiler can't do,
because it doesn't understand that using the lsbits instead of the
msbits is okay.

With the mask, it turns into the spectacularly simple:
call prandom_u32
andl $15, %eax

An interesting question is which is preferred in general.

The AND allows non-constant powers of 2 without requiring CLZ. But I
don't recall seeing that actually happen anywhere. And the shift allows
a smaller encoding (8-bit rather than 32-bit immediate constant) when
the power of 2 is known at compile time and is larger than 128 (for
example, PAGE_SIZE).

Me, I thought it was in the noise and not worth stressing about,
but I also understand the hackers's urge for maximum tweaking.

The other thing that I couldn't think of a clean wrapper for is the
"probability 1 in N" case that happens in several bits of code.
For example, in drivers/mtd/ubi/debug.h, I made the following change:

static inline int ubi_dbg_is_bitflip(const struct ubi_device *ubi)
{
- if (ubi->dbg.emulate_bitflips)
- return !(prandom_u32() % 200);
- return 0;
+ return ubi->dbg.emulate_bitflips && prandom_u32() < -1u/200;
}

GCC doesn't know how to optimize "prandom_u32_max(200) == 0", which
is basically the same. It spits out:

call prandom_u32
movl %eax, %edx
movl $200, %eax
mull %edx
testl %edx, %edx

I couldn't think of a good enough function name for this, so I just left
it as inline code. (Using "-1u/200" rather than the technically correct
"(u32)(((u64)1 << 32)/200)" is okay as long as 200 isn't a power of 2, and
even if it were, the error wouldn't matter since it's approximate anyway.)

> Btw, IIRC there is a function is_power_of_2 somewhere. ;)

In <linux/log2.h>; thanks for the pointer.

2014-06-09 00:28:31

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 5/7] lib/random32.c: Make prandom_u32_max efficient for powers of 2

Daniel Borkmann wrote:
> On 06/08/2014 07:34 PM, Hannes Frederic Sowa wrote:
>> Please check this as it makes the code more complicated and I doubt it is worth it.
>
> Agreed.

Just to clarify, the goal is to be able to replace "prandom_u32() %
range" all over the kernel with "prandom_u32_max(range)" and promise
people that it's guaranteed to be just as good.

My goal is a one-time cleanup to eliminate "prandom_u32() % range" so
people will stop cutting and pasting it into new code.

(FIXME: Add a kdoc update on the subject.)

It uglifies the random32 code for sure, but by encapsulating the cleverness
it lets all the call sites be simpler and reduces arguments about the change.

E.g. have a look at all the prandom_u32 calls in net/core/pktgen.c.

2014-06-09 10:30:33

by Hannes Frederic Sowa

[permalink] [raw]

Subject: Re: [PATCH 5/7] lib/random32.c: Make prandom_u32_max efficient for powers of 2

Thanks for your detailed explanation!

On Sun, Jun 8, 2014, at 13:48, George Spelvin wrote:
> Thank you for your comments!
>
> > Have you checked assembler output if this helps anything at all? Constant
> > propagation in the compiler should be able to figure that out all by
> > itself. The only places I use __builtin_constant_p today are where I
> > also make use of inline assembler.
>
> Yes, I did. (I'll expand the commit comment for v2; my bad.)
>
> It seems that GCC isn't smart enough to reduce this to a single shift.
> With the multiply and reduce, the code looks like:
> call prandom_u32
> xorl %edx, %edx
> shldl $4, %eax, %edx
> movl %edx, %eax
>
> Instead of the hoped-for
> call prandom_u32
> shrl $28, %eax

On x86_64 I get the above result. Seems like gcc doesn't see the downcast to u32 far enough ahead and stays in DI mode on i386, thus the shldl. It shouldn't matter that much... ;)

> Converting to a single mask is something the compiler can't do,
> because it doesn't understand that using the lsbits instead of the
> msbits is okay.

Yep, sure.

> With the mask, it turns into the spectacularly simple:
> call prandom_u32
> andl $15, %eax
>
> An interesting question is which is preferred in general.
>
> The AND allows non-constant powers of 2 without requiring CLZ. But I
> don't recall seeing that actually happen anywhere. And the shift allows
> a smaller encoding (8-bit rather than 32-bit immediate constant) when
> the power of 2 is known at compile time and is larger than 128 (for
> example, PAGE_SIZE).

I actually don't know if folding logic in gcc is so enhanced to see that coming. ;)
Would be interesting tough, maybe I'll try that later.

> Me, I thought it was in the noise and not worth stressing about,
> but I also understand the hackers's urge for maximum tweaking.

Totally ok. ;)

I don't have any problems with the patch, although such a detailed changelog would be nice + some approx. numbers of how many times we run into the new optimization.

Bye,
Hannes

2014-06-10 15:13:22

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH 4/7] lib/random32.c: Use <asm/unaligned.h> instead of hand-rolling it

On 06/08/2014 02:40 PM, George Spelvin wrote:
...
>>> + for (i = 0; i < round_down(bytes, sizeof(u32)); i += sizeof(u32))
>>> + put_unaligned_le32(prandom_u32_state(state), p+i);
>
>> Nit: 'p + i'
>
> I don't really care, but I'm happy without the spaces; I add them
> to show what binds more weakly, and in this case that's the
> argument-separating comma.

If you invent such things, then first send a patch to
Documentation/CodingStyle to change and discuss this.

Otherwise, I urge you to read:

vi Documentation/CodingStyle +206