2021-05-09 00:34:38

by Mike Brooks

[permalink] [raw]
Subject: Lockless /dev/random - Performance/Security/Stability improvement

Hello Everyone,

I am a cryptographer, a hacker, and exploit developer. I am an
accomplished security engineer, and I've even hacked Majordomo2, the
maling list we are using now (CVE-2011-0049).

It is highly unusual that /dev/random is allowed to degrade the
performance of all other subsystems - and even bring the system to a
halt when it runs dry. No other kernel feature is given this
dispensation, and I wrote some code to prove that it isn't necessary.
Due to a lockless design this proposed /dev/random has much higher
bandwidth for writes to the entropy pool. Furthermore, an additional
8 octets of entropy is collected per syscall meaning that the random
source generated will be difficult to undermine.

This is an experimental subsystem that is easy to compile and to verify:
https://github.com/TheRook/KeypoolRandom

Should compile right after cloning, even on osx or windows - and
feedback is welcome. I'm making it easy for people to verify and to
discuss this design. There are other cool features here, and I have a
detailed writeup in the readme.
A perhaps heretical view is that this design doesn't require
handle_irq_event_percpu() to produce a NIST-compliant CSPRNG. Which
is a particularly useful feature for low power usage, or high
performance applications of the linux kernel. Making this source of
entropy optional is extremely interesting as it appears as though
Kernel performance has degraded substantially in recent releases, and
this feature will give more performance back to the user who should be
free to choose their own security/performance tradeoff. Disabling the
irq event handler as a source of entropy should not produce an
exploitable condition, A keypool has a much larger period over the
current entropy pool design, so the concerns around emptying the pool
are non-existent in the span of a human lifetime.

All the best,
Micahel Brooks

https://stackoverflow.com/users/183528/rook
https://stackoverflow.com/tags/cryptography/topusers #9
https://stackoverflow.com/tags/security/topusers #4


2021-06-11 04:02:00

by Sandy Harris

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

The basic ideas here look good to me; I will look at details later.
Meanwhile I wonder what others might think, so I've added some to cc
list.

One thing disturbs me, wanting to give more control to
"the user who should be free to choose their own security/performance tradeoff"

I doubt most users, or even sys admins, know enough to make such
choices. Yes, some options like the /dev/random vs /dev/urandom choice
can be given, but I'm not convinced even that is necessary. Our
objective should be to make the thing foolproof, incapable of being
messed up by user actions.

2021-06-11 06:00:38

by Stephan Müller

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

Am Freitag, 11. Juni 2021, 05:59:52 CEST schrieb Sandy Harris:

Hi Sandy,

> The basic ideas here look good to me; I will look at details later.
> Meanwhile I wonder what others might think, so I've added some to cc
> list.
>
> One thing disturbs me, wanting to give more control to
> "the user who should be free to choose their own security/performance
> tradeoff"
>
> I doubt most users, or even sys admins, know enough to make such
> choices. Yes, some options like the /dev/random vs /dev/urandom choice
> can be given, but I'm not convinced even that is necessary. Our
> objective should be to make the thing foolproof, incapable of being
> messed up by user actions.

Thank you for your considerations.

I would think you are referring to the boottime/runtime configuration of the
entropy sources.

I think you are right that normal admins should not have the capability to
influence the entropy source configuration. Normal users would not be able to
do that anyway even today.

Yet, I am involved with many different system integrators which must make
quite an effort to adjust the operation to their needs these days. This
includes adding proprietary patches. System integrators normally would compile
their own kernel, I would see no problems in changing the LRNG such that:

- the entropy source configuration is a compile time-only setting with the
current default values

- the runtime configuration is only enabled with a compile time option that is
clearly marked as a development / test option and not to be used for runtime
(like the other test interfaces). It would be disabled by default. Note, I
have developed a regression test suite to test the LRNG operation and
behavior. For this, such boottime/runtime settings come in very handy.

Regular administrators would not recompile their kernel. Thus Linux distros
would simply go with the default by not enabling the test interface and have
safe defaults. This implies that normal admins do not have the freedom to make
adjustments. Therefore, I think we would have what you propose: a foolproof
operation. Yet, people who really need the freedom (as otherwise they will
make some other problematic changes) have the ability to alter the kernel
compile time configuration to suit their needs.

Besides, the LRNG contains the logic to verify the settings and guarantee that
wrong configurations cannot be applied even at compile time. The term wrong
configuration refers to configurations which would violate mathematical
constraints. Therefore, the offered flexibility is still ensuring that such
integrators cannot mess things up to the extent that mathematically something
goes wrong.


On the other hand, when you refer to the changing of the used cryptographic
algorithms, I think all offered options are per definition safe: all offer the
same security strength. A configuration of the cryptographic algorithms is
what I would suggest to allow to administrators. This is similar to changing
the cryptographic algorithms for, say, network communication where the
administrator is in charge of configuring the allowed / used cipher suites.

Ciao
Stephan


2021-06-11 09:48:07

by Sandy Harris

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

Sandy Harris <[email protected]> wrote:

> The basic ideas here look good to me; I will look at details later.

Looking now, finding some things questionable.

Your doc has:

" /dev/random needs to be fast, and in the past it relied on using a
cryptographic primitive for expansion of PNRG to fill a given request

" urandom on the other hand uses a cryptographic primitive to compact
rather than expand,

This does not seem coherent to me & as far as I can tell, it is wrong as well.
/dev/random neither uses a PRNG nor does expansion.
/dev/urandom does both, but you seem to be saying the opposite.

" We can assume AES preserves confidentiality...

That is a reasonable assumption & it does make the design easier, but
is it necessary? If I understood some of Ted's writing correctly, one
of his design goals was not to have to trust the crypto too much. It
seems to me that is a worthy goal. One of John Denker's papers has
some quite nice stuff about using a hash function to compress input
data while preserving entropy. It needs only quite weak assumptions
about the hash.
https://www.av8n.com/turbid/

You want to use AES in OFB mode. Why? The existing driver uses ChaCha,
I think mainly because it is faster.

The classic analysis of how to use a block cipher to build a hash is
Preneel et al.
https://link.springer.com/content/pdf/10.1007%2F3-540-48329-2_31.pdf
As I recall, it examines 64 possibilities & finds only 9 are secure. I
do not know if OFB, used as you propose, is one of those. Do you?

2021-06-11 09:54:58

by Stephan Müller

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

Am Freitag, dem 11.06.2021 um 07:59 +0200 schrieb Stephan Müller:
> Am Freitag, 11. Juni 2021, 05:59:52 CEST schrieb Sandy Harris:
>
> Hi Sandy,

Hi Sandy,

Please apologize and disregard my email. I erroneously thought you are
referring to the LRNG work.

I did not want to hijack any discussion.

(but it gave me a good input :-) )

Sorry
Stephan


2021-06-27 16:37:24

by Mike Brooks

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

I apologize for my late reply - an erroneous email filter hid these
messages from me.

I'd like to take the time to thank you for all of your responses and
let me address any of the questions that were brought up on this
thread.

To reply to Sandy Harris. The code on github is a suggested
/dev/random and /dev/urandom replacement and it has some interesting
tradeoffs from the incumbent implementation. There are always rooms
for improvements, and I am exploring what it means to have a lockless
entropy store for Linux.

But you bring up a very important question - is AES required? Or why
not use a hand-crafted routine to access bits. There are two
responses to this. The first being evident by the use of dieharder.
A naive method of shifting bits to pull out data from the entropy pool
will be detected with the 'dieharder' test suite. The implementation
on github will write a sample file, and any assessors that pull data
from the entropy pool need to be susceptible to an avalanche. We need
a method that is quantifiable, and one that is widely accepted as
being ideal. So could a fast, but less than ideal hash function like
sha1 be used? Absolutely, but AES-NI is still faster on platforms
that enable it...

And this is where Stephan Müller's post comes into play. Given the
diversity of hardware platforms, and the varying levels of dependency
on /dev/random - there is a need on behalf of admins to tailor the
performance of this interface as it impacts *every* syscall because
each interrupt is an entropy source. the base default should be good
enough for anyone, I agree strongly with your view on a secure by
default system.

Consider an HFT trading bot - you don't want to penalize any
interrupt. It would be great to have a NIST-Compliant Random Number
Generator that has this interrupt-taxation as optional - but disabling
it shouldn't cause anyone to be compromised, but if you are paranoid
then additional sources can be added as further hardening.

Ultimately I'd like to make a patch file for this implementation and
run benchmarks to show the efficacy of the design. Thank you all for
your feedback, and I'm happy to consider alternate lockless-designs.

Best regards,
Michael Brooks


On Sat, Jun 26, 2021 at 1:24 PM Mike Brooks <[email protected]> wrote:
>
> I apologize for my late reply - an erroneous email filter hid these messages from me.
>
> I'd like to take the time to thank you for all of your responses and let me address any of the questions that were brought up on this thread.
>
> To reply to Sandy Harris. The code on github is a suggested /dev/random and /dev/urandom replacement and it has some interesting tradeoffs from the incumbent implementation. There are always rooms for improvements, and I am exploring what it means to have a lockless entropy store for Linux.
>
> But you bring up a very important question - is AES required? Or why not use a hand-crafted routine to access bits. There are two responses to this. The first being evident by the use of dieharder. A naive method of shifting bits to pull out data from the entropy pool will be detected with the 'dieharder' test suite. The implementation on github will write a sample file, and any assessors that pull data from the entropy pool need to be susceptible to an avalanche. We need a method that is quantifiable, and one that is widely accepted as being ideal. So could a fast, but less than ideal hash function like sha1 be used? Absolutely, but AES-NI is still faster on platforms that enable it...
>
> And this is where Stephan Müller's post comes into play. Given the diversity of hardware platforms, and the varying levels of dependency on /dev/random - there is a need on behalf of admins to tailor the performance of this interface as it impacts *every* syscall because each interrupt is an entropy source. the base default should be good enough for anyone, I agree strongly with your view on a secure by default system.
>
> Consider an HFT trading bot - you don't want to penalize any interrupt. It would be great to have a NIST-Compliant Random Number Generator that has this interrupt-taxation as optional - but disabling it shouldn't cause anyone to be compromised, but if you are paranoid then additional sources can be added as further hardening.
>
> Ultimately I'd like to make a patch file for this implementation and run benchmarks to show the efficacy of the design. Thank you all for your feedback, and I'm happy to consider alternate lockless-designs.
>
> Best regards,
> Michael Brooks
>
> On Fri, Jun 11, 2021 at 2:44 AM Sandy Harris <[email protected]> wrote:
>>
>> Sandy Harris <[email protected]> wrote:
>>
>> > The basic ideas here look good to me; I will look at details later.
>>
>> Looking now, finding some things questionable.
>>
>> Your doc has:
>>
>> " /dev/random needs to be fast, and in the past it relied on using a
>> cryptographic primitive for expansion of PNRG to fill a given request
>>
>> " urandom on the other hand uses a cryptographic primitive to compact
>> rather than expand,
>>
>> This does not seem coherent to me & as far as I can tell, it is wrong as well.
>> /dev/random neither uses a PRNG nor does expansion.
>> /dev/urandom does both, but you seem to be saying the opposite.
>>
>> " We can assume AES preserves confidentiality...
>>
>> That is a reasonable assumption & it does make the design easier, but
>> is it necessary? If I understood some of Ted's writing correctly, one
>> of his design goals was not to have to trust the crypto too much. It
>> seems to me that is a worthy goal. One of John Denker's papers has
>> some quite nice stuff about using a hash function to compress input
>> data while preserving entropy. It needs only quite weak assumptions
>> about the hash.
>> https://www.av8n.com/turbid/
>>
>> You want to use AES in OFB mode. Why? The existing driver uses ChaCha,
>> I think mainly because it is faster.
>>
>> The classic analysis of how to use a block cipher to build a hash is
>> Preneel et al.
>> https://link.springer.com/content/pdf/10.1007%2F3-540-48329-2_31.pdf
>> As I recall, it examines 64 possibilities & finds only 9 are secure. I
>> do not know if OFB, used as you propose, is one of those. Do you?

2021-08-16 11:00:50

by Sandy Harris

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

I wrote:

> > The basic ideas here look good to me; I will look at details later.
>
> Looking now, finding some things questionable.
> ...

I'm still looking & finding things that seem worth wondering about, &
asking about here.

I am by no means convinced that Mike's idea of a lockless driver is a
good one. Without some locks, if two or more parts of the code write
to the same data structure, then there's a danger one will overwrite
the other's contribution & we'll lose entropy.

However, I cannot see why any data structure should be locked when it
is only being read. There's no reason to care if others read it as
well. If someone writes to it, then the result of reading becomes
indeterminate. In most applications, that would be a very Bad Thing.
In this contact, though, it is at worst harmless & possibly a Good
Thing because it would make some attacks harder.

For example, looking at the 5.8.9 kernel Ubuntu gives me, I find this
in xtract_buf()

/* Generate a hash across the pool, 16 words (512 bits) at a time */
spin_lock_irqsave(&r->lock, flags);
for (i = 0; i < r->poolinfo->poolwords; i += 16)
sha1_transform(hash.w, (__u8 *)(r->pool + i), workspace);

/*
* We mix the hash back into the pool ...
*/
__mix_pool_bytes(r, hash.w, sizeof(hash.w));
spin_unlock_irqrestore(&r->lock, flags);

The lock is held throughout the fairly expensive hash operation & I
see no reason why it should be. I'd have:

/* Generate a hash across the pool, 16 words (512 bits) at a time */
for (i = 0; i < r->poolinfo->poolwords; i += 16)
sha1_transform(hash.w, (__u8 *)(r->pool + i), workspace);

/*
* We mix the hash back into the pool ...
*/
spin_lock_irqsave(&r->lock, flags);
__mix_pool_bytes(r, hash.w, sizeof(hash.w));
spin_unlock_irqrestore(&r->lock, flags);

2021-08-16 14:26:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

On Mon, Aug 16, 2021 at 06:59:50PM +0800, Sandy Harris wrote:
> I am by no means convinced that Mike's idea of a lockless driver is a
> good one. Without some locks, if two or more parts of the code write
> to the same data structure, then there's a danger one will overwrite
> the other's contribution & we'll lose entropy.
>
> However, I cannot see why any data structure should be locked when it
> is only being read. There's no reason to care if others read it as
> well. If someone writes to it, then the result of reading becomes
> indeterminate. In most applications, that would be a very Bad Thing.
> In this contact, though, it is at worst harmless & possibly a Good
> Thing because it would make some attacks harder.
>
> For example, looking at the 5.8.9 kernel Ubuntu gives me, I find this
> in xtract_buf()
>
> /* Generate a hash across the pool, 16 words (512 bits) at a time */
> spin_lock_irqsave(&r->lock, flags);
> for (i = 0; i < r->poolinfo->poolwords; i += 16)
> sha1_transform(hash.w, (__u8 *)(r->pool + i), workspace);
>
> /*
> * We mix the hash back into the pool ...
> */
> __mix_pool_bytes(r, hash.w, sizeof(hash.w));
> spin_unlock_irqrestore(&r->lock, flags);
>
> The lock is held throughout the fairly expensive hash operation & I
> see no reason why it should be.

The reason why this is there is because otherwise, there can be two
processes both trying to extract entry from the pool, and getting the
same result, and returning the identical "randomness" to two different
userspace processes. Which would be sad.... (unless you are a
nation-state attacker, I suppose :-)

Cheers,

- Ted

2022-01-27 07:28:41

by Sandy Harris

[permalink] [raw]
Subject: Re: Lockless /dev/random - Performance/Security/Stability improvement

On Mon, Aug 16, 2021 at 10:25 PM Theodore Ts'o <[email protected]> wrote:

> On Mon, Aug 16, 2021 at 06:59:50PM +0800, Sandy Harris wrote:

> > I am by no means convinced that Mike's idea of a lockless driver is a
> > good one. Without some locks, if two or more parts of the code write
> > to the same data structure, then there's a danger one will overwrite
> > the other's contribution & we'll lose entropy.
> >
> > However, I cannot see why any data structure should be locked when it
> > is only being read. There's no reason to care if others read it as
> > well. If someone writes to it, then the result of reading becomes
> > indeterminate. In most applications, that would be a very Bad Thing.
> > In this contact, though, it is at worst harmless & possibly a Good
> > Thing because it would make some attacks harder.
> >
> > For example, looking at the 5.8.9 kernel Ubuntu gives me, I find this
> > in xtract_buf()
> >
> > /* Generate a hash across the pool, 16 words (512 bits) at a time */
> > spin_lock_irqsave(&r->lock, flags);
> > for (i = 0; i < r->poolinfo->poolwords; i += 16)
> > sha1_transform(hash.w, (__u8 *)(r->pool + i), workspace);
> >
> > /*
> > * We mix the hash back into the pool ...
> > */
> > __mix_pool_bytes(r, hash.w, sizeof(hash.w));
> > spin_unlock_irqrestore(&r->lock, flags);
> >
> > The lock is held throughout the fairly expensive hash operation & I
> > see no reason why it should be.
>
> The reason why this is there is because otherwise, there can be two
> processes both trying to extract entry from the pool, and getting the
> same result, and returning the identical "randomness" to two different
> userspace processes. Which would be sad.... (unless you are a
> nation-state attacker, I suppose :-)

So lock the chacha context, the data structure it writes, instead.
Since the write back to the pool is inside that lock, two threads
extracting entropy will get different results. Call mix_pool_bytes()
instead of __mix_pool_bytes() and it will do the necessary
locking when the input pool is written.

I think this deals with Mike's main objection:

" It is highly unusual that /dev/random is allowed to degrade the
" performance of all other subsystems - and even bring the
" system to a halt when it runs dry.

At any rate, it reduces the chance of other code that
writes to the input pool being blocked by random(4).