2006-10-10 18:05:07

by Paul Wouters

[permalink] [raw]
Subject: more random device badness in 2.6.18 :(


Running 2.6.18-1.2741.fc6xen in dom0, I end up with the "intel_rng"
loaded and thus a /dev/hw_random (/dev/hwrng) device.

Since hardware random is not transparently added to /dev/random's entropy,
applications such as Openswan need to test for the availability of the
seperate device file (not a good design imho). So Openswan will use
/dev/hw_random if available. My guess is that we will need to change that
to /dev/hwrng, but we need to stay compatible for the earlier 2.6 kernels
that did not have /dev/hwrng. (let's hope the softlink stays there until
everything gets folded into a single /dev/random device again).

So I noticed Openswan was blocking indefinately on reading from /dev/hw_random.
By design, stock openswan generates a new default hostkey in a subshell,
so nothing too bad happens (bug filed against fedora openswan package to
not generate a hostkey in %post, support for fedora style hostkey added in
openswan-2.4.7dr2)

It seems my board has either no intel_rng on board, or a bad driver for it.
The intel_rng module gets loaded and the /dev/hw_random and /dev/hwrng
device nodes are create. But using these results in a hanging read:

# hexdump -C /dev/hw_random
00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*

Every call to /dev/hw_random gives that one (not very random!) line of output,
and then nothing more ever. A call to /dev/random still works:

# hexdump -C /dev/random
00000000 67 de a9 63 cf 2a 14 49 24 50 ec 1f 81 a7 4f b2 |g..c.*.I$P....O.|
00000010 b5 9d 8e 99 a3 d7 0d d5 45 ea 55 5a 70 4b 07 aa |........E.UZpK..|
00000020 4a e1 20 e3 2f 03 0a 89 43 b0 49 3c cb 01 3a 76 |J. ./...C.I<..:v|
00000030 10 4c c5 db d5 32 ff b1 8a 35 21 69 e0 1a 1a e2 |.L...2...5!i....|
[...]

We really don't want to have to verify the validity and availability of
hardware random. Bugs in the past with the padlock caused us to not even be
able to use /dev/random if the random code from the padlock module was loaded,
so this is becoming quite ugly. We can't ignore the hardware random, nor can
we assume it works if present.

So, this is a bug report against 2.6.18-1.2741.fc6xen to report broken
random. It is also a request for a better random device design, possibly
integrated with the Open Cryptographic Framework (OCF) code that handles
various crypto related hardware offloads.

I hope that the Linux kernel will soon go back to a single /dev/random
device that will use hardware random if available, and fall back to
software random if the hardware is not providing random, so that we don't
need to add all this complexity to find a working random device within the
applications.

Related to this is that random in a xen guest has also never been very good.
Perhaps it needs to be able to pull directly from the dom0's random pool.

Paul


2006-10-10 20:50:53

by Gabor Gombas

[permalink] [raw]
Subject: Re: more random device badness in 2.6.18 :(

On Tue, Oct 10, 2006 at 08:08:32PM +0200, Paul Wouters wrote:

> Since hardware random is not transparently added to /dev/random's entropy,
> applications such as Openswan need to test for the availability of the
> seperate device file (not a good design imho). So Openswan will use
> /dev/hw_random if available.

Why should Openswan touch /dev/hw_random directly?

> Every call to /dev/hw_random gives that one (not very random!) line of output,
> and then nothing more ever. A call to /dev/random still works:

$ apt-cache show rng-tools
[...]
The rngd daemon acts as a bridge between a Hardware TRNG (true random number
generator) such as the ones in some Intel/AMD/VIA chipsets, and the kernel's
PRNG (pseudo-random number generator).
.
It tests the data received from the TRNG using the FIPS 140-2 (2002-10-10)
tests to verify that it is indeed random, and feeds the random data to the
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
kernel entropy pool.
[...]

There is a good reason why /dev/hw_random is different from /dev/random...

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

2006-10-10 21:00:30

by Paul Wouters

[permalink] [raw]
Subject: Re: more random device badness in 2.6.18 :(

On Tue, 10 Oct 2006, Gabor Gombas wrote:

> Why should Openswan touch /dev/hw_random directly?

Because using /dev/random whlie /dev/hw_random is available does not always
work (eg with padlock)

> $ apt-cache show rng-tools
> [...]
> The rngd daemon acts as a bridge between a Hardware TRNG (true random number
> generator) such as the ones in some Intel/AMD/VIA chipsets, and the kernel's
> PRNG (pseudo-random number generator).
> .
> It tests the data received from the TRNG using the FIPS 140-2 (2002-10-10)
> tests to verify that it is indeed random, and feeds the random data to the
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> kernel entropy pool.
> [...]
>
> There is a good reason why /dev/hw_random is different from /dev/random...

Why is this happening in userland? Will rng-tools run on every bare Linux
system now? Including embedded systems? How about xen guests who don't have
direct access to the host's hardware (or software) random?

Why is this entropy management not part of the kernel? So for Openswan to
work correctly, it would need to depend on another daemon that may or may
not be available and/or running?

I still believe /dev/random should just give the best random possible for
the machine. Wether that is software random, or a piece of hardware, should
not matter. That's the kernel's internal state and functioning.

But thanks for the software pointer.

Paul

2006-10-10 21:13:26

by Michael Büsch

[permalink] [raw]
Subject: Re: more random device badness in 2.6.18 :(

On Tuesday 10 October 2006 23:03, Paul Wouters wrote:
> On Tue, 10 Oct 2006, Gabor Gombas wrote:
>
> > Why should Openswan touch /dev/hw_random directly?
>
> Because using /dev/random whlie /dev/hw_random is available does not always
> work (eg with padlock)

Oh, wait wait. I don't really understand your sentence.
Why can't you use /dev/random?

> > There is a good reason why /dev/hw_random is different from /dev/random...
>
> Why is this happening in userland? Will rng-tools run on every bare Linux
> system now? Including embedded systems? How about xen guests who don't have
> direct access to the host's hardware (or software) random?
>
> Why is this entropy management not part of the kernel? So for Openswan to
> work correctly, it would need to depend on another daemon that may or may
> not be available and/or running?
>
> I still believe /dev/random should just give the best random possible for
> the machine. Wether that is software random, or a piece of hardware, should
> not matter. That's the kernel's internal state and functioning.

/dev/hw_random should never be touched by anything else than rngd.
rngd takes the data from /dev/hw_random, _verifys_ it and puts it into
the normal /dev/random pools.
The verification step is really important.
So I would like to ask the other way around. Why should be put this code
into the kernel, while it works in userspace as good (or, some people may
argue it is even better in userspace, because it can more easily be exchanged,
debugged and configured. Whatever)

--
Greetings Michael.

2006-10-10 21:48:45

by Paul Wouters

[permalink] [raw]
Subject: Re: more random device badness in 2.6.18 :(

On Tue, 10 Oct 2006, Michael Buesch wrote:

> > > Why should Openswan touch /dev/hw_random directly?
> >
> > Because using /dev/random whlie /dev/hw_random is available does not always
> > work (eg with padlock)
>
> Oh, wait wait. I don't really understand your sentence.
> Why can't you use /dev/random?

We have noticed in the past that on VIA's with the padlock, that
/dev/random stopped working when hw_random got loaded, while we could
get random from /dev/hw_random. So we assumed that was the design.

> /dev/hw_random should never be touched by anything else than rngd.

Seems like a good argument to keep this state hidden in the kernel then.

> rngd takes the data from /dev/hw_random, _verifys_ it and puts it into
> the normal /dev/random pools.
> The verification step is really important.

I understand the use of a FIPS compliant hardware random.

> So I would like to ask the other way around. Why should be put this code
> into the kernel, while it works in userspace as good (or, some people may
> argue it is even better in userspace, because it can more easily be exchanged,
> debugged and configured. Whatever)

If only a single process should ever touch a device, I wonder why it is
a device visible to all of userland. For one, it confuses stupid people
like me. Second, it seems that perhaps the reason the VIA hardware random
was "broken" was becaus I and others were unaware of the requirement of
rngd with hw_random. I am obviously not the only one, since Fedora Core
6 test 3 autoloads the hardware random module, but does not come with
the rng-tools package to fix /dev/random to actually use /dev/hw_random.

At least I do feel better now about all the device renames. rngd
uses "hwrandom" per default, which no longer exists. Then there is
hw_random, which seems to be something obsolete for hwrng judging by
the softlink. And I can stop worrying that /dev/hw_random cannot be
read without root permission on default modprobe. I'm happy to hear I
no longer need to worry about all those devices, and I can go back and
remove the code that deals with /dev/hw_random, after I verify that the
VIA systems still have a functional /dev/random if someone modprobe's
hw_random without running rngd. But if not running rngd breaks /dev/random,
then we'll be forced to keep an eye out for those /dev/hw* devices.

Paul

2006-10-10 22:06:21

by Michael Büsch

[permalink] [raw]
Subject: Re: more random device badness in 2.6.18 :(

On Tuesday 10 October 2006 23:50, Paul Wouters wrote:
> On Tue, 10 Oct 2006, Michael Buesch wrote:
>
> > > > Why should Openswan touch /dev/hw_random directly?
> > >
> > > Because using /dev/random whlie /dev/hw_random is available does not always
> > > work (eg with padlock)
> >
> > Oh, wait wait. I don't really understand your sentence.
> > Why can't you use /dev/random?
>
> We have noticed in the past that on VIA's with the padlock, that
> /dev/random stopped working when hw_random got loaded, while we could
> get random from /dev/hw_random. So we assumed that was the design.

This would be a bug. But I have no idea on how this is possible to happen.

> If only a single process should ever touch a device, I wonder why it is
> a device visible to all of userland.

Oh, well. Why do we have /dev/hda, if touching it creates a damn mess. ;)
The device node is there so userspace can access it. Yes. You can read
random data from /dev/hw_random. No problem, really, if you are aware of,
that there is _NO_ guarantee that the data returned is _really_ random.
It may just return 0xFFFFFFFF for some broken piece of overheated (or
something else) hardware.
So the suggested way to use /dev/hw_random is to let rngd access it and
put the data back into the kernel entropy buffers after verifying it.

--
Greetings Michael.

2006-10-10 23:32:18

by Gabor Gombas

[permalink] [raw]
Subject: Re: more random device badness in 2.6.18 :(

On Tue, Oct 10, 2006 at 11:03:58PM +0200, Paul Wouters wrote:

> Why is this happening in userland?

Because whether the provided data is "random enough" is a policy
decision, and policy does not belong in the kernel.

> Will rng-tools run on every bare Linux
> system now? Including embedded systems?

Why not? Alternatively you can always create your own version. Open
source does not mean you get everything for free; it means you _can_ do
the work if you want to.

> How about xen guests who don't have
> direct access to the host's hardware (or software) random?

If they don't have access to the host's hardware, then they do not have a
/dev/hw_random device. What's your question? And how that's different
from machines not having a hw rng at all?

> Why is this entropy management not part of the kernel? So for Openswan to
> work correctly, it would need to depend on another daemon that may or may
> not be available and/or running?

No. It only has to depend on /dev/(u)random. How the entropy is obtained
(from /dev/hw_random, from the soundcard's white noise or from
elsewhere) is none of Openswan's business. Tha'ts up to the system
administrator or distribution maker to decide and set up.

> I still believe /dev/random should just give the best random possible for
> the machine. Wether that is software random, or a piece of hardware, should
> not matter. That's the kernel's internal state and functioning.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

2006-10-11 03:43:11

by Paul Wouters

[permalink] [raw]
Subject: Re: more random device badness in 2.6.18 :(

On Wed, 11 Oct 2006, Gabor Gombas wrote:

> > Why is this happening in userland?
>
> Because whether the provided data is "random enough" is a policy
> decision, and policy does not belong in the kernel.

So is POSIX compliance. I don't see that being ripped out :)

Is there anyone that disagrees that the quality of random should
be at minimum FIPS compliant? If everyone agrees, it seems to me
that it is more useful to have a stock kernel have proper hardware
random without additional software stirring kernel and hardware
internals.

> > How about xen guests who don't have
> > direct access to the host's hardware (or software) random?
>
> If they don't have access to the host's hardware, then they do not have a
> /dev/hw_random device. What's your question? And how that's different
> from machines not having a hw rng at all?

The xen issue is a seperate, but related, issue. My xen images have far less
entropy gathering then the host system they run on. This is causing /dev/random
to be extremely slow (empty). On hosts with hw_random, it seems I cannot get this
extra entropy from the host to the guest. Though I will try to see if running
rngd on the host helps the xenu's as well. Perhaps that will solve this problem.

> No. It only has to depend on /dev/(u)random. How the entropy is obtained
> (from /dev/hw_random, from the soundcard's white noise or from
> elsewhere) is none of Openswan's business. Tha'ts up to the system
> administrator or distribution maker to decide and set up.

Yes, again, that has always been my opinion too. We just ran into practical
issues where we couldn't. I am now doing some tests on xen and regular kernels
using VIA and Intel rngs to see if those issues are resolved, so openswan can
indeed go back to only using /dev/random. I will also test to see if running
rngd on the dom0 will benefit the xenu's, and mail a summary to the lists.

Paul