2005-04-07 12:38:43

by Patrice Martinez

[permalink] [raw]
Subject: /dev/random problem on 2.6.12-rc1

When using a machine with a 2612-rc 1kernel, I encounter problems
reading /dev/random:
it simply nevers returns anything, and the process is blocked in the
read...
The easiest way to see it is to type:
od < /dev/random

Any idea?

--

Best regards

Patrice Martinez

Linux Kernel Architect.

OFFICE : B1-405
PHONE : +33 (0)4 76 29 74 69
EMAIL : [email protected]
ADDR : BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE


2005-04-07 13:11:13

by Yura Pakhuchiy

[permalink] [raw]
Subject: Re: /dev/random problem on 2.6.12-rc1

On Thu, 2005-04-07 at 14:40 +0200, Patrice Martinez wrote:
> When using a machine with a 2612-rc 1kernel, I encounter problems
> reading /dev/random:
> it simply nevers returns anything, and the process is blocked in the
> read...
> The easiest way to see it is to type:
> od < /dev/random
>
> Any idea?

Because, /dev/random use user input, mouse movements and other things to
generate next random number. Use /dev/urandom if you want version that
will never block your machine.

Read "man 4 random" for details.

Best regards,
Yura

2005-04-07 13:14:52

by Yura Pakhuchiy

[permalink] [raw]
Subject: Re: /dev/random problem on 2.6.12-rc1

On Thu, 2005-04-07 at 14:40 +0200, Patrice Martinez wrote:
> When using a machine with a 2612-rc 1kernel, I encounter problems
> reading /dev/random:
> it simply nevers returns anything, and the process is blocked in the
> read...
> The easiest way to see it is to type:
> od < /dev/random
>
> Any idea?

Because, /dev/random use user input, mouse movements and other things to
generate next random number. Use /dev/urandom if you want version that
will never block your machine.

Read "man 4 random" for details.

Best regards,
Yura

2005-04-07 15:37:39

by Simon Derr

[permalink] [raw]
Subject: Re: /dev/random problem on 2.6.12-rc1



On Thu, 7 Apr 2005, Yura Pakhuchiy wrote:

> On Thu, 2005-04-07 at 14:40 +0200, Patrice Martinez wrote:
> > When using a machine with a 2612-rc 1kernel, I encounter problems
> > reading /dev/random:
> > it simply nevers returns anything, and the process is blocked in the
> > read...
> > The easiest way to see it is to type:
> > od < /dev/random
> >
> > Any idea?
>
> Because, /dev/random use user input, mouse movements and other things to
> generate next random number. Use /dev/urandom if you want version that
> will never block your machine.
>
> Read "man 4 random" for details.
>
Something changed since previous versions of the kernel, I guess.
Running `find /usr | wc' on a ssh session generates both network and disk
activity, and you should not expect any other kind of input on a networked
server.

Anyway, still zero bytes coming from /dev/random, for the few minutes I
waited.

This on Linux-2.6.12-rc-bk1, on IA64.



2005-04-07 21:14:23

by Matt Mackall

[permalink] [raw]
Subject: Re: /dev/random problem on 2.6.12-rc1

On Thu, Apr 07, 2005 at 05:36:59PM +0200, Simon Derr wrote:
>
>
> On Thu, 7 Apr 2005, Yura Pakhuchiy wrote:
>
> > On Thu, 2005-04-07 at 14:40 +0200, Patrice Martinez wrote:
> > > When using a machine with a 2612-rc 1kernel, I encounter problems
> > > reading /dev/random:
> > > it simply nevers returns anything, and the process is blocked in the
> > > read...
> > > The easiest way to see it is to type:
> > > od < /dev/random
> > >
> > > Any idea?
> >
> > Because, /dev/random use user input, mouse movements and other things to
> > generate next random number. Use /dev/urandom if you want version that
> > will never block your machine.
> >
> > Read "man 4 random" for details.
> >
> Something changed since previous versions of the kernel, I guess.
> Running `find /usr | wc' on a ssh session generates both network and disk
> activity, and you should not expect any other kind of input on a networked
> server.

FYI, network activity only generates entropy on a very small subset of
NICs, and probably not the one you're using. This is good, as network
activity is assumed passively observable/timable.

> Anyway, still zero bytes coming from /dev/random, for the few minutes I
> waited.

Are you and Patrice both experiencing this on the same machine? What
was the last kernel that was known to work for you? Do you see the
contents of /proc/sys/kernel/random/entropy_avail change over time?
Are there any other entropy consumers on your machine?

--
Mathematics is the supreme nostalgia of our time.

2005-04-08 06:57:19

by Simon Derr

[permalink] [raw]
Subject: Re: /dev/random problem on 2.6.12-rc1

On Thu, 7 Apr 2005, Matt Mackall wrote:

> On Thu, Apr 07, 2005 at 05:36:59PM +0200, Simon Derr wrote:
> >
> >
> > On Thu, 7 Apr 2005, Yura Pakhuchiy wrote:
> >
> > > On Thu, 2005-04-07 at 14:40 +0200, Patrice Martinez wrote:
> > > > When using a machine with a 2612-rc 1kernel, I encounter problems
> > > > reading /dev/random:
> > > > it simply nevers returns anything, and the process is blocked in the
> > > > read...
> > > > The easiest way to see it is to type:
> > > > od < /dev/random
> > > >
> > > > Any idea?
> > >
> > > Because, /dev/random use user input, mouse movements and other things to
> > > generate next random number. Use /dev/urandom if you want version that
> > > will never block your machine.
> > >
> > > Read "man 4 random" for details.
> > >
> > Something changed since previous versions of the kernel, I guess.
> > Running `find /usr | wc' on a ssh session generates both network and disk
> > activity, and you should not expect any other kind of input on a networked
> > server.
>
Oops, the command is actually "find /usr | xargs wc", witch causes lots of
disk activity.

> FYI, network activity only generates entropy on a very small subset of
> NICs, and probably not the one you're using. This is good, as network
> activity is assumed passively observable/timable.
Offtopic, but why isn't the policy the same for all NICs ?

> > Anyway, still zero bytes coming from /dev/random, for the few minutes I
> > waited.
>
> Are you and Patrice both experiencing this on the same machine?
Both IA-64, but that's the only common point.

> What
> was the last kernel that was known to work for you? Do you see the
> contents of /proc/sys/kernel/random/entropy_avail change over time?
> Are there any other entropy consumers on your machine?
None that I am aware of.

I run:
# dd if=/dev/random bs=1 count=1 | od

Another shell:
# lsof /dev/random
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
dd 1496 root 0r CHR 1,8 99952 /dev/random

Now, find /usr | xargs wc running in background.

About /proc/sys/kernel/random/entropy_avail:
(5 second refresh interval)

0
0
0
0
[lots of 0]
0
0
0
6
0
8
2
0
0
[lots of 0]
0
0
0
3
1
0
0
[lots of 0]
0
0
0

...


After 10 minutes, dd is still running.

This is on Linux 2.6.12-rc1-bk1, same results for 2.6.12-rc2.

And /dev/random works fine on the same machine with Linux 2.6.11.
(i.e when running "find /usr | xargs wc", /dev/random spits out lots of
bytes)


Simon.

2005-04-08 08:00:31

by Matt Mackall

[permalink] [raw]
Subject: Re: /dev/random problem on 2.6.12-rc1

On Fri, Apr 08, 2005 at 08:56:51AM +0200, Simon Derr wrote:
> On Thu, 7 Apr 2005, Matt Mackall wrote:
>
> > On Thu, Apr 07, 2005 at 05:36:59PM +0200, Simon Derr wrote:
> > >
> > >
> > > On Thu, 7 Apr 2005, Yura Pakhuchiy wrote:
> > >
> > > > On Thu, 2005-04-07 at 14:40 +0200, Patrice Martinez wrote:
> > > > > When using a machine with a 2612-rc 1kernel, I encounter problems
> > > > > reading /dev/random:
> > > > > it simply nevers returns anything, and the process is blocked in the
> > > > > read...
> > > > > The easiest way to see it is to type:
> > > > > od < /dev/random
> > > > >
> > > > > Any idea?
> > > >
> > > > Because, /dev/random use user input, mouse movements and other things to
> > > > generate next random number. Use /dev/urandom if you want version that
> > > > will never block your machine.
> > > >
> > > > Read "man 4 random" for details.
> > > >
> > > Something changed since previous versions of the kernel, I guess.
> > > Running `find /usr | wc' on a ssh session generates both network and disk
> > > activity, and you should not expect any other kind of input on a networked
> > > server.
> >
> Oops, the command is actually "find /usr | xargs wc", witch causes lots of
> disk activity.
>
> > FYI, network activity only generates entropy on a very small subset of
> > NICs, and probably not the one you're using. This is good, as network
> > activity is assumed passively observable/timable.
> Offtopic, but why isn't the policy the same for all NICs ?

The policy is the same, it just hasn't been implemented. SA_RANDOM
is scheduled for abolishment.

> > > Anyway, still zero bytes coming from /dev/random, for the few minutes I
> > > waited.
> >
> > Are you and Patrice both experiencing this on the same machine?
> Both IA-64, but that's the only common point.
>
> > What
> > was the last kernel that was known to work for you? Do you see the
> > contents of /proc/sys/kernel/random/entropy_avail change over time?
> > Are there any other entropy consumers on your machine?
> None that I am aware of.
>
> I run:
> # dd if=/dev/random bs=1 count=1 | od

strace the dd process, please. This works fine here.

> Another shell:
> # lsof /dev/random
> COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
> dd 1496 root 0r CHR 1,8 99952 /dev/random
>
> Now, find /usr | xargs wc running in background.
>
> About /proc/sys/kernel/random/entropy_avail:
> (5 second refresh interval)

That may not be sufficient resolution. The upper layers will pull from
it whenever it rises above 64 and bash it back down to within 7 bits
of 0. What does it do when no one is reading from it?

--
Mathematics is the supreme nostalgia of our time.

2005-04-08 12:12:26

by Simon Derr

[permalink] [raw]
Subject: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)

On Fri, 8 Apr 2005, Matt Mackall wrote:

> On Fri, Apr 08, 2005 at 08:56:51AM +0200, Simon Derr wrote:
> > On Thu, 7 Apr 2005, Matt Mackall wrote:
> >
> > > On Thu, Apr 07, 2005 at 05:36:59PM +0200, Simon Derr wrote:
> > > >
> > > >
> > I run:
> > # dd if=/dev/random bs=1 count=1 | od
>
> strace the dd process, please. This works fine here.

As expected, dd is waiting in:

read(0, 0x6000000000014000, 1)

with fd 0 being /dev/random.

> > About /proc/sys/kernel/random/entropy_avail:
> > (5 second refresh interval)
>
> That may not be sufficient resolution. The upper layers will pull from
> it whenever it rises above 64 and bash it back down to within 7 bits
> of 0. What does it do when no one is reading from it?
Oh, you're right.

with 1/100 sec (or so) resolution, things look like:
(with nobody reading /dev/random, this time)

* usually zero, lots of zero
* sometimes things such as:
0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 1 0 4 1 0 0 0 1 2 2 2 0 2 0 0 0 0 3 0 0 0

or even

0 0 0 1 0 0 3 2 0 2 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 0

Other values:

poolsize:4096
read_wakeup_threshold:64
write_wakeup_threshold:128

I enabled the debug messages in random.c and I think I found the problem
lying in the IA64 version of fls().

It turns out that the generic and IA64 versions of fls() disagree:

(output from a small test program)

x ia64_fls(x) generic_fls(x)

i=-1, t=0, ia64: -65535 et generic:0
i=0, t=1, ia64: 0 et generic:1
i=1, t=2, ia64: 1 et generic:2
i=2, t=4, ia64: 2 et generic:3
i=3, t=8, ia64: 3 et generic:4
i=4, t=16, ia64: 4 et generic:5
i=5, t=32, ia64: 5 et generic:6
i=6, t=64, ia64: 6 et generic:7
i=7, t=128, ia64: 7 et generic:8
i=8, t=256, ia64: 8 et generic:9
i=9, t=512, ia64: 9 et generic:10
i=10, t=1024, ia64: 10 et generic:11
i=11, t=2048, ia64: 11 et generic:12
i=12, t=4096, ia64: 12 et generic:13
i=13, t=8192, ia64: 13 et generic:14
i=14, t=16384, ia64: 14 et generic:15
i=15, t=32768, ia64: 15 et generic:16
i=16, t=65536, ia64: 16 et generic:17
i=17, t=131072, ia64: 17 et generic:18
i=18, t=262144, ia64: 18 et generic:19
i=19, t=524288, ia64: 19 et generic:20
i=20, t=1048576, ia64: 20 et generic:21
i=21, t=2097152, ia64: 21 et generic:22
i=22, t=4194304, ia64: 22 et generic:23
i=23, t=8388608, ia64: 23 et generic:24
i=24, t=16777216, ia64: 24 et generic:25
i=25, t=33554432, ia64: 25 et generic:26
i=26, t=67108864, ia64: 26 et generic:27
i=27, t=134217728, ia64: 27 et generic:28
i=28, t=268435456, ia64: 28 et generic:29
i=29, t=536870912, ia64: 29 et generic:30
i=30, t=1073741824, ia64: 30 et generic:31
i=31, t=-2147483648, ia64: 31 et generic:32
i=32, t=0, ia64: -65535 et generic:0
...

I tried to fix it with an ia64 version that would give the same result as
the generic version, but the kernel did not boot, I guess some functions
rely on the ""broken"" ia64_fls() behaviour.

So I just changed fls() to use generic_fls() instead of ia64_fls().

(Patch below).

And /dev/random seems to work again.


Simon.


Signed-Off-By: Simon Derr <[email protected]>

Index: linux-2.6.11/include/asm-ia64/bitops.h
===================================================================
--- linux-2.6.11.orig/include/asm-ia64/bitops.h 2005-04-08 14:07:46.826191877 +0200
+++ linux-2.6.11/include/asm-ia64/bitops.h 2005-04-08 14:08:09.750996284 +0200
@@ -327,11 +327,7 @@ ia64_fls (unsigned long x)
return exp - 0xffff;
}

-static inline int
-fls (int x)
-{
- return ia64_fls((unsigned int) x);
-}
+#define fls(x) generic_fls(x)

/*
* ffs: find first bit set. This is defined the same way as the libc and compiler builtin

2005-04-08 16:29:04

by Matt Mackall

[permalink] [raw]
Subject: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)

On Fri, Apr 08, 2005 at 02:12:04PM +0200, Simon Derr wrote:
> I enabled the debug messages in random.c and I think I found the problem
> lying in the IA64 version of fls().

Good catch.

> It turns out that the generic and IA64 versions of fls() disagree:
>
> (output from a small test program)
>
> x ia64_fls(x) generic_fls(x)
>
> i=-1, t=0, ia64: -65535 et generic:0
> i=0, t=1, ia64: 0 et generic:1
> i=1, t=2, ia64: 1 et generic:2
> i=2, t=4, ia64: 2 et generic:3
> i=3, t=8, ia64: 3 et generic:4

Well PPC at least sez:

/*
* fls: find last (most-significant) bit set.
* Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
*/

And that agrees with the generic code (used by x86). So I think IA64
is probably wrong here indeed. It's amazing that the other users of
fls don't blow up spectacularly.

> I tried to fix it with an ia64 version that would give the same result as
> the generic version, but the kernel did not boot, I guess some functions
> rely on the ""broken"" ia64_fls() behaviour.
>
> So I just changed fls() to use generic_fls() instead of ia64_fls().

If the "fixed" version didn't boot, how did the "alternate fixed"
version boot?

--
Mathematics is the supreme nostalgia of our time.