SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
allocation pattern inside a slab:
#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Pre-initialize the random sequence cache */
static int init_cache_random_seq(struct kmem_cache *s)
{
...
Then I printed actual random sequences for each kmem cache.
Turned out they were all the same for most of the caches and
they didn't vary across guest reboots.
int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count, gfp_t gfp)
{
...
/* Get best entropy at this stage of boot */
prandom_seed_state(&state, get_random_long());
Then I searched internet and turned out KVM can pass randomness via
virtio-rng or something. So I linked /dev/urandom.
And it didn't help!
The only way to get randomness for SLAB is to enable RDRAND inside guest.
Is it KVM bug?
For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now.
On Sat, Apr 14, 2018 at 12:59 PM, Alexey Dobriyan <[email protected]> wrote:
> SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> allocation pattern inside a slab:
>
>
> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> /* Pre-initialize the random sequence cache */
> static int init_cache_random_seq(struct kmem_cache *s)
> {
> ...
>
> Then I printed actual random sequences for each kmem cache.
> Turned out they were all the same for most of the caches and
> they didn't vary across guest reboots.
>
> int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count, gfp_t gfp)
> {
> ...
> /* Get best entropy at this stage of boot */
> prandom_seed_state(&state, get_random_long());
>
> Then I searched internet and turned out KVM can pass randomness via
> virtio-rng or something. So I linked /dev/urandom.
>
> And it didn't help!
>
> The only way to get randomness for SLAB is to enable RDRAND inside guest.
>
> Is it KVM bug?
>
> For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now.
virtio-rng doesn't really do that. I have an ancient patch set to do
exactly what you want, and I should dust it off.
[email protected]
[email protected], [email protected] moved to bcc
On Sat, Apr 14, 2018 at 10:59:21PM +0300, Alexey Dobriyan wrote:
> SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> allocation pattern inside a slab:
>
> int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count, gfp_t gfp)
> {
> ...
> /* Get best entropy at this stage of boot */
> prandom_seed_state(&state, get_random_long());
>
> Then I printed actual random sequences for each kmem cache.
> Turned out they were all the same for most of the caches and
> they didn't vary across guest reboots.
The problem is at the super-early state of the boot path, kernel code
can't allocate memory. This is something most device drivers kinda
assume they can do. :-)
So it means we haven't yet initialized the virtio-rng driver, and it's
before interrupts have been enabled, so we can't harvest any entropy
from interrupt timing. So that's why trying to use virtio-rng didn't
help.
> The only way to get randomness for SLAB is to enable RDRAND inside guest.
>
> Is it KVM bug?
No, it's not a KVM bug. The fundamental issue is in how the
CONFIG_SLAB_FREELIST_RANDOM is currently implemented.
What needs to happen is freelist should get randomized much later in
the boot sequence. Doing it later will require locking; I don't know
enough about the slab/slub code to know whether the slab_mutex would
be sufficient, or some other lock might need to be added.
The other thing I would note that is that using prandom_u32_state() doesn't
really provide much security. In fact, if the the goal is to protect
against a malicious attacker trying to guess what addresses will be
returned by the slab allocator, I suspect it's much like the security
patdowns done at airports. It might protect against a really stupid
attacker, but it's mostly security theater.
The freelist randomization is only being done once; so it's not like
performance is really an issue. It would be much better to just use
get_random_u32() and be done with it. I'd drop using prandom_*
functions in slab.c and slubct and slab_common.c, and just use a
really random number generator, if the goal is real security as
opposed to security for show....
(Not that there's necessarily any thing wrong with security theater;
the US spends over 3 billion dollars a year on security theater. As
politicians know, symbolism can be important. :-)
Cheers,
- Ted
On Sat, Apr 14, 2018 at 03:41:42PM -0700, Andy Lutomirski wrote:
> On Sat, Apr 14, 2018 at 12:59 PM, Alexey Dobriyan <[email protected]> wrote:
> > SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> > allocation pattern inside a slab:
> >
> >
> > #ifdef CONFIG_SLAB_FREELIST_RANDOM
> > /* Pre-initialize the random sequence cache */
> > static int init_cache_random_seq(struct kmem_cache *s)
> > {
> > ...
> >
> > Then I printed actual random sequences for each kmem cache.
> > Turned out they were all the same for most of the caches and
> > they didn't vary across guest reboots.
> >
> > int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count, gfp_t gfp)
> > {
> > ...
> > /* Get best entropy at this stage of boot */
> > prandom_seed_state(&state, get_random_long());
> >
> > Then I searched internet and turned out KVM can pass randomness via
> > virtio-rng or something. So I linked /dev/urandom.
> >
> > And it didn't help!
> >
> > The only way to get randomness for SLAB is to enable RDRAND inside guest.
> >
> > Is it KVM bug?
> >
> > For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now.
>
> virtio-rng doesn't really do that. I have an ancient patch set to do
> exactly what you want, and I should dust it off.
Please, do. Here is a list of caches which aren't exactly randomly
randomized with my setup. Many important ones are there :-(
XXX name 'dma-kmalloc-96', r b1e6718e2e7147d4
XXX name 'dma-kmalloc-192', r a7664a0d69968019
XXX name 'dma-kmalloc-8', r 662c2e986443235c
XXX name 'dma-kmalloc-16', r 770a9b620ae4cd62
XXX name 'dma-kmalloc-32', r 2e200073d5fa9f46
XXX name 'dma-kmalloc-64', r d8538fda83c74168
XXX name 'dma-kmalloc-128', r 9e4b956d09dd7d44
XXX name 'dma-kmalloc-256', r 8b14bcb58f9e18f5
XXX name 'dma-kmalloc-512', r 2bbace4b7120624a
XXX name 'dma-kmalloc-1024', r 7cdf44406db52f5b
XXX name 'dma-kmalloc-2048', r 18fe0ebf6bcfdf43
XXX name 'dma-kmalloc-4096', r 9f1a5eee118facf7
XXX name 'dma-kmalloc-8192', r f514d72a1cc441a2
XXX name 'kmalloc-8192', r 14843df817b556cc
XXX name 'kmalloc-4096', r 52ed85fa9c691bbe
XXX name 'kmalloc-2048', r fa81aa9222ff65a7
XXX name 'kmalloc-1024', r ae355c02d31f21d3
XXX name 'kmalloc-512', r 5fe0d22aaf2ef8d9
XXX name 'kmalloc-256', r 336d07a06917b95
XXX name 'kmalloc-192', r 6b6cd5399dd06d95
XXX name 'kmalloc-128', r 893b9e85369964ab
XXX name 'kmalloc-96', r 179e185395d2612
XXX name 'kmalloc-64', r 29cf688b37eccea7
XXX name 'kmalloc-32', r fb7b4e7dca6de00a
XXX name 'kmalloc-16', r a2a441fdc499d0c7
XXX name 'kmalloc-8', r e5454c7095ddd2be
XXX name 'kmem_cache_node', r 500dc6126a47b229
XXX name 'kmem_cache', r 816c8c7bcde08372
XXX name 'task_group', r c09c4d1c1436ce97
XXX name 'radix_tree_node', r 4dd9540b830a4ea8
XXX name 'pool_workqueue', r 88b1e9d9a1f0b570
XXX name 'Acpi-Namespace', r 3e34d55f8f1cb140
XXX name 'Acpi-State', r b94e04635e77b48a
XXX name 'Acpi-Parse', r d5374863b90f2a4c
XXX name 'Acpi-ParseExt', r eefb2fff892f64a9
XXX name 'Acpi-Operand', r ce51949bcc80af13
XXX name 'pid', r cd6d8ee9e5209156
XXX name 'anon_vma', r c3a9273a68127ac7
XXX name 'anon_vma_chain', r a7cec15033c31a9b
XXX name 'cred_jar', r fe4cc38c6d99cf63
XXX name 'task_struct', r eecb8895c6b7dbdb
XXX name 'sighand_cache', r e5243c5eb2ce3a63
XXX name 'signal_cache', r 88b2e108d8ef81c7
XXX name 'files_cache', r ee29814e58dc909c
XXX name 'fs_cache', r bc700a5f8fc28ff8
XXX name 'mm_struct', r f5230f99c7447359
XXX name 'vm_area_struct', r e30f3f8e648a9f88
XXX name 'nsproxy', r ae7c08b524a0f4d4
XXX name 'uts_namespace', r 6b1266178968ed99
XXX name 'buffer_head', r b24c10679dc55a11
XXX name 'names_cache', r 2e023b54e3ca5b8f
XXX name 'dentry', r 83cc18634fbd74e8
XXX name 'inode_cache', r ff9a0ff3b4665cf5
XXX name 'filp', r 4fdad214b7ca7fc1
XXX name 'mnt_cache', r 8e726d32470b23e0
XXX name 'kernfs_node_cache', r 929c5f56778d365d
XXX name 'bdev_cache', r 8a5520036bd0a464
XXX name 'sigqueue', r 2cf75c4d16191efb
XXX name 'seq_file', r ec3ba1fe514524d5
XXX name 'proc_inode_cache', r b0c76cbbda5bb41f
XXX name 'pde_opener', r 5f82f8e7100a517c
XXX name 'proc_dir_entry', r ebabc4e93b52d7b8
XXX name 'shmem_inode_cache', r 2b25a3eb9aa32973
XXX name 'net_namespace', r 95793a7eae08a33f
On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o wrote:
> What needs to happen is freelist should get randomized much later in
> the boot sequence. Doing it later will require locking; I don't know
> enough about the slab/slub code to know whether the slab_mutex would
> be sufficient, or some other lock might need to be added.
Could we have the bootloader pass in some initial randomness?
On Sat, Apr 14, 2018 at 3:44 PM, Theodore Y. Ts'o <[email protected]> wrote:
> [email protected]
> [email protected], [email protected] moved to bcc
>
> On Sat, Apr 14, 2018 at 10:59:21PM +0300, Alexey Dobriyan wrote:
>> SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
>> allocation pattern inside a slab:
>>
>> int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count, gfp_t gfp)
>> {
>> ...
>> /* Get best entropy at this stage of boot */
>> prandom_seed_state(&state, get_random_long());
>>
>> Then I printed actual random sequences for each kmem cache.
>> Turned out they were all the same for most of the caches and
>> they didn't vary across guest reboots.
>
> The problem is at the super-early state of the boot path, kernel code
> can't allocate memory. This is something most device drivers kinda
> assume they can do. :-)
>
> So it means we haven't yet initialized the virtio-rng driver, and it's
> before interrupts have been enabled, so we can't harvest any entropy
> from interrupt timing. So that's why trying to use virtio-rng didn't
> help.
>
>> The only way to get randomness for SLAB is to enable RDRAND inside guest.
>>
>> Is it KVM bug?
>
> No, it's not a KVM bug. The fundamental issue is in how the
> CONFIG_SLAB_FREELIST_RANDOM is currently implemented.
>
> What needs to happen is freelist should get randomized much later in
> the boot sequence. Doing it later will require locking; I don't know
> enough about the slab/slub code to know whether the slab_mutex would
> be sufficient, or some other lock might need to be added.
>
> The other thing I would note that is that using prandom_u32_state() doesn't
> really provide much security. In fact, if the the goal is to protect
> against a malicious attacker trying to guess what addresses will be
> returned by the slab allocator, I suspect it's much like the security
> patdowns done at airports. It might protect against a really stupid
> attacker, but it's mostly security theater.
>
> The freelist randomization is only being done once; so it's not like
> performance is really an issue. It would be much better to just use
> get_random_u32() and be done with it. I'd drop using prandom_*
> functions in slab.c and slubct and slab_common.c, and just use a
> really random number generator, if the goal is real security as
> opposed to security for show....
>
> (Not that there's necessarily any thing wrong with security theater;
> the US spends over 3 billion dollars a year on security theater. As
> politicians know, symbolism can be important. :-)
I've added Thomas Garnier to CC (since he wrote this originally). He
can speak to its position in the boot ordering and the effective
entropy.
-Kees
--
Kees Cook
Pixel Security
On Mon, Apr 16, 2018 at 8:54 AM Kees Cook <[email protected]> wrote:
> On Sat, Apr 14, 2018 at 3:44 PM, Theodore Y. Ts'o <[email protected]> wrote:
> > [email protected]
> > [email protected], [email protected] moved to bcc
> >
> > On Sat, Apr 14, 2018 at 10:59:21PM +0300, Alexey Dobriyan wrote:
> >> SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> >> allocation pattern inside a slab:
> >>
> >> int cache_random_seq_create(struct kmem_cache *cachep, unsigned
int count, gfp_t gfp)
> >> {
> >> ...
> >> /* Get best entropy at this stage of boot */
> >> prandom_seed_state(&state, get_random_long());
> >>
> >> Then I printed actual random sequences for each kmem cache.
> >> Turned out they were all the same for most of the caches and
> >> they didn't vary across guest reboots.
> >
> > The problem is at the super-early state of the boot path, kernel code
> > can't allocate memory. This is something most device drivers kinda
> > assume they can do. :-)
> >
> > So it means we haven't yet initialized the virtio-rng driver, and it's
> > before interrupts have been enabled, so we can't harvest any entropy
> > from interrupt timing. So that's why trying to use virtio-rng didn't
> > help.
> >
> >> The only way to get randomness for SLAB is to enable RDRAND inside
guest.
> >>
> >> Is it KVM bug?
> >
> > No, it's not a KVM bug. The fundamental issue is in how the
> > CONFIG_SLAB_FREELIST_RANDOM is currently implemented.
Entropy at early boot in VM has always been a problem for this feature or
others. Did you look at the impact on other boot security features fetching
random values? Does your VM had RDRAND support (we use get_random_long()
which will fetch from RDRAND to provide as much entropy as possible at this
point)?
> >
> > What needs to happen is freelist should get randomized much later in
> > the boot sequence. Doing it later will require locking; I don't know
> > enough about the slab/slub code to know whether the slab_mutex would
> > be sufficient, or some other lock might need to be added.
You can't re-randomize pre-allocated pages that's why the cache is
randomized that early. If you don't have RDRAND, we could re-randomize
later at boot with more entropy that could be useful in this specific case.
> >
> > The other thing I would note that is that using prandom_u32_state()
doesn't
> > really provide much security. In fact, if the the goal is to protect
> > against a malicious attacker trying to guess what addresses will be
> > returned by the slab allocator, I suspect it's much like the security
> > patdowns done at airports. It might protect against a really stupid
> > attacker, but it's mostly security theater.
> >
> > The freelist randomization is only being done once; so it's not like
> > performance is really an issue. It would be much better to just use
> > get_random_u32() and be done with it. I'd drop using prandom_*
> > functions in slab.c and slubct and slab_common.c, and just use a
> > really random number generator, if the goal is real security as
> > opposed to security for show....
The state is seeded with get_random_long() which will use RDRAND and any
available entropy at this point. I am not sure the value of calling
get_random_long() on each iteration especially if you don't have RDRAND.
> >
> > (Not that there's necessarily any thing wrong with security theater;
> > the US spends over 3 billion dollars a year on security theater. As
> > politicians know, symbolism can be important. :-)
> I've added Thomas Garnier to CC (since he wrote this originally). He
> can speak to its position in the boot ordering and the effective
> entropy.
Thanks for including me.
> -Kees
> --
> Kees Cook
> Pixel Security
--
Thomas
On Mon, Apr 16, 2018 at 04:15:44PM +0000, Thomas Garnier wrote:
> On Mon, Apr 16, 2018 at 8:54 AM Kees Cook <[email protected]> wrote:
>
> > On Sat, Apr 14, 2018 at 3:44 PM, Theodore Y. Ts'o <[email protected]> wrote:
> > > [email protected]
> > > [email protected], [email protected] moved to bcc
> > >
> > > On Sat, Apr 14, 2018 at 10:59:21PM +0300, Alexey Dobriyan wrote:
> > >> SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> > >> allocation pattern inside a slab:
> > >>
> > >> int cache_random_seq_create(struct kmem_cache *cachep, unsigned
> int count, gfp_t gfp)
> > >> {
> > >> ...
> > >> /* Get best entropy at this stage of boot */
> > >> prandom_seed_state(&state, get_random_long());
> > >>
> > >> Then I printed actual random sequences for each kmem cache.
> > >> Turned out they were all the same for most of the caches and
> > >> they didn't vary across guest reboots.
> > >
> > > The problem is at the super-early state of the boot path, kernel code
> > > can't allocate memory. This is something most device drivers kinda
> > > assume they can do. :-)
> > >
> > > So it means we haven't yet initialized the virtio-rng driver, and it's
> > > before interrupts have been enabled, so we can't harvest any entropy
> > > from interrupt timing. So that's why trying to use virtio-rng didn't
> > > help.
> > >
> > >> The only way to get randomness for SLAB is to enable RDRAND inside
> guest.
> > >>
> > >> Is it KVM bug?
> > >
> > > No, it's not a KVM bug. The fundamental issue is in how the
> > > CONFIG_SLAB_FREELIST_RANDOM is currently implemented.
>
> Entropy at early boot in VM has always been a problem for this feature or
> others. Did you look at the impact on other boot security features fetching
> random values? Does your VM had RDRAND support (we use get_random_long()
> which will fetch from RDRAND to provide as much entropy as possible at this
> point)?
The problem is that "qemu-system-x86_64" by default doesn't use RDRAND nor
does it use entropy from the host to bootstrap. You need "-cpu host" or
equivalent.
Given that DMI strings are acting as a seed and fixed creation order of
core kernel caches those SLAB randomization sequences may be globally
the same (I didn't check) or draw from a small set.
And of course there will be users which don't use RDRAND because it is
NSA backdoor.
On Sat, 2018-04-14 at 17:41 -0700, Matthew Wilcox wrote:
> On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o wrote:
> > What needs to happen is freelist should get randomized much later
> > in the boot sequence. Doing it later will require locking; I don't
> > know enough about the slab/slub code to know whether the slab_mutex
> > would be sufficient, or some other lock might need to be added.
>
> Could we have the bootloader pass in some initial randomness?
Where would the bootloader get it from (securely) that the kernel
can't? For example, if you compile in a TPM driver, the kernel will
pick up 32 random entropy bytes from the TPM to seed the pool, but I
think it happens too late to help with this problem currently. IMA
also needs the TPM very early in the boot sequence, so I was wondering
about using the initial EFI driver, which is present on boot, and then
transitioning to the proper kernel TPM driver later, which would mean
we could seed the pool earlier.
As long as you mix it properly and limit the amount, it shouldn't
necessarily be a source of actual compromise, but having an external
input to our cryptographically secure entropy pool is an additional
potential attack vector.
James
On Tue, Apr 17, 2018 at 10:13:34AM +0100, James Bottomley wrote:
> On Sat, 2018-04-14 at 17:41 -0700, Matthew Wilcox wrote:
> > On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o wrote:
> > > What needs to happen is freelist should get randomized much later
> > > in the boot sequence.??Doing it later will require locking; I don't
> > > know enough about the slab/slub code to know whether the slab_mutex
> > > would be sufficient, or some other lock might need to be added.
> >
> > Could we have the bootloader pass in some initial randomness?
>
> Where would the bootloader get it from (securely) that the kernel
> can't?
In this particular case, qemu is booting the kernel, so it can apply to
/dev/random for some entropy.
> For example, if you compile in a TPM driver, the kernel will
> pick up 32 random entropy bytes from the TPM to seed the pool, but I
> think it happens too late to help with this problem currently. IMA
> also needs the TPM very early in the boot sequence, so I was wondering
> about using the initial EFI driver, which is present on boot, and then
> transitioning to the proper kernel TPM driver later, which would mean
> we could seed the pool earlier.
>
> As long as you mix it properly and limit the amount, it shouldn't
> necessarily be a source of actual compromise, but having an external
> input to our cryptographically secure entropy pool is an additional
> potential attack vector.
I thought our model was that if somebody had compromised the bootloader,
all bets were off. And also that we were free to mix in as many
untrustworthy bytes of alleged entropy into the random pool as we liked.
On Tue, 2018-04-17 at 04:47 -0700, Matthew Wilcox wrote:
> On Tue, Apr 17, 2018 at 10:13:34AM +0100, James Bottomley wrote:
> > On Sat, 2018-04-14 at 17:41 -0700, Matthew Wilcox wrote:
> > > On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o wrote:
> > > > What needs to happen is freelist should get randomized much
> > > > later in the boot sequence. Doing it later will require
> > > > locking; I don't know enough about the slab/slub code to know
> > > > whether the slab_mutex would be sufficient, or some other lock
> > > > might need to be added.
> > >
> > > Could we have the bootloader pass in some initial randomness?
> >
> > Where would the bootloader get it from (securely) that the kernel
> > can't?
>
> In this particular case, qemu is booting the kernel, so it can apply
> to /dev/random for some entropy.
Well, yes, but wouldn't qemu virtualize /dev/random anyway so the guest
kernel can get it from the HWRNG provided by qemu?
> > For example, if you compile in a TPM driver, the kernel will
> > pick up 32 random entropy bytes from the TPM to seed the pool, but
> > I think it happens too late to help with this problem
> > currently. IMA also needs the TPM very early in the boot sequence,
> > so I was wondering about using the initial EFI driver, which is
> > present on boot, and then transitioning to the proper kernel TPM
> > driver later, which would mean we could seed the pool earlier.
> >
> > As long as you mix it properly and limit the amount, it shouldn't
> > necessarily be a source of actual compromise, but having an
> > external input to our cryptographically secure entropy pool is an
> > additional potential attack vector.
>
> I thought our model was that if somebody had compromised the
> bootloader, all bets were off.
You don't have to compromise the bootloader to influence this, you
merely have to trick it into providing the random number you wanted.
The bigger you make the attack surface (the more inputs) the more
likelihood of finding a trick that works.
> And also that we were free to mix in as many untrustworthy bytes of
> alleged entropy into the random pool as we liked.
No, entropy mixing ensures that all you do with bad entropy is degrade
the quality, but if the quality degrades to zero (as it might at boot
when you've no other entropy sources so you feed in 100% bad entropy),
then the random sequences become predictable.
James
On Tue, Apr 17, 2018 at 12:57:12PM +0100, James Bottomley wrote:
> On Tue, 2018-04-17 at 04:47 -0700, Matthew Wilcox wrote:
> > On Tue, Apr 17, 2018 at 10:13:34AM +0100, James Bottomley wrote:
> > > On Sat, 2018-04-14 at 17:41 -0700, Matthew Wilcox wrote:
> > > > On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o wrote:
> > > > > What needs to happen is freelist should get randomized much
> > > > > later in the boot sequence.??Doing it later will require
> > > > > locking; I don't know enough about the slab/slub code to know
> > > > > whether the slab_mutex would be sufficient, or some other lock
> > > > > might need to be added.
> > > >
> > > > Could we have the bootloader pass in some initial randomness?
> > >
> > > Where would the bootloader get it from (securely) that the kernel
> > > can't?
> >
> > In this particular case, qemu is booting the kernel, so it can apply
> > to /dev/random for some entropy.
>
> Well, yes, but wouldn't qemu virtualize /dev/random anyway so the guest
> kernel can get it from the HWRNG provided by qemu?
The part of Ted's mail that I snipped explained that virtio-rng relies on
being able to kmalloc memory, so by definition it can't provide entropy
before kmalloc is initialised.
> > I thought our model was that if somebody had compromised the
> > bootloader, all bets were off.
>
> You don't have to compromise the bootloader to influence this, you
> merely have to trick it into providing the random number you wanted.
> The bigger you make the attack surface (the more inputs) the more
> likelihood of finding a trick that works.
>
> > ??And also that we were free to mix in as many untrustworthy bytes of
> > alleged entropy into the random pool as we liked.
>
> No, entropy mixing ensures that all you do with bad entropy is degrade
> the quality, but if the quality degrades to zero (as it might at boot
> when you've no other entropy sources so you feed in 100% bad entropy),
> then the random sequences become predictable.
I don't understand that. If I estimate that I have 'k' bytes of entropy
in my pool, and then I mix in 'n' entirely predictable bytes, I should
still have k bytes of entropy in the pool. If I withdraw k bytes from
the pool, then yes the future output from the pool may be entirely
predictable, but I have to know what those k bytes were.
On Tue, 2018-04-17 at 07:07 -0700, Matthew Wilcox wrote:
> On Tue, Apr 17, 2018 at 12:57:12PM +0100, James Bottomley wrote:
> > On Tue, 2018-04-17 at 04:47 -0700, Matthew Wilcox wrote:
> > > On Tue, Apr 17, 2018 at 10:13:34AM +0100, James Bottomley wrote:
> > > > On Sat, 2018-04-14 at 17:41 -0700, Matthew Wilcox wrote:
> > > > > On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o
> > > > > wrote:
> > > > > > What needs to happen is freelist should get randomized much
> > > > > > later in the boot sequence. Doing it later will require
> > > > > > locking; I don't know enough about the slab/slub code to
> > > > > > know whether the slab_mutex would be sufficient, or some
> > > > > > other lock might need to be added.
> > > > >
> > > > > Could we have the bootloader pass in some initial randomness?
> > > >
> > > > Where would the bootloader get it from (securely) that the
> > > > kernel can't?
> > >
> > > In this particular case, qemu is booting the kernel, so it can
> > > apply to /dev/random for some entropy.
> >
> > Well, yes, but wouldn't qemu virtualize /dev/random anyway so the
> > guest kernel can get it from the HWRNG provided by qemu?
>
> The part of Ted's mail that I snipped explained that virtio-rng
> relies on being able to kmalloc memory, so by definition it can't
> provide entropy before kmalloc is initialised.
That sounds fixable ...
> > > I thought our model was that if somebody had compromised the
> > > bootloader, all bets were off.
> >
> > You don't have to compromise the bootloader to influence this, you
> > merely have to trick it into providing the random number you
> > wanted. The bigger you make the attack surface (the more inputs)
> > the more likelihood of finding a trick that works.
> >
> > > And also that we were free to mix in as many untrustworthy
> > > bytes of alleged entropy into the random pool as we liked.
> >
> > No, entropy mixing ensures that all you do with bad entropy is
> > degrade the quality, but if the quality degrades to zero (as it
> > might at boot when you've no other entropy sources so you feed in
> > 100% bad entropy), then the random sequences become predictable.
>
> I don't understand that. If I estimate that I have 'k' bytes of
> entropy in my pool, and then I mix in 'n' entirely predictable bytes,
> I should still have k bytes of entropy in the pool. If I withdraw k
> bytes from the pool, then yes the future output from the pool may be
> entirely predictable, but I have to know what those k bytes were.
If that were true, why are we debating this? I thought the problem was
the alleged random sequences for slub placement were repeating on
subsequent VM boots meaning there's effectively no entropy in the pool
and we need to add some.
James
On Tue, 2018-04-17 at 11:16 -0400, Theodore Y. Ts'o wrote:
> On Tue, Apr 17, 2018 at 12:57:12PM +0100, James Bottomley wrote:
> >
> > You don't have to compromise the bootloader to influence this, you
> > merely have to trick it into providing the random number you
> > wanted. The bigger you make the attack surface (the more inputs)
> > the more likelihood of finding a trick that works.
>
> There is a large class of devices where the bootloader can be
> considered trusted. For example, all modern Chrome and Android
> devices have signed bootloaders by default. And if you are using an
> Amazon or Chrome VM, you are generally started it with a known,
> trusted boot image.
Depends how the parameter is passed. If it can be influenced from the
command line then a large class of "trusted boot" systems actually
don't verify the command line, so you can boot a trusted system and
still inject bogus command line parameters. This is definitely true of
PC class secure boot. Not saying it will always be so, just
illustrating why you don't necessarily want to expand the attack
surface.
> The reason why it's useful to have the bootloader get the entropy is
> because it may device-specific access and be able to leverage
> whatever infrastructure was used to load the kernel and/or
> intialramfs to also load the equivalent of /var/lib/systemd/random-
> seed (or /var/lib/urandom, et. al) --- and do this early enough that
> we can have truely secure randomness for those kernel faciliteis that
> need access to real randomness to initialize the stack canary, or
> initializing the slab cache.
OK, in the UEFI ideal world where every component is a perfectly
written OS, perhaps you're right. In the more real world, do you trust
the people who wrote the bootloader to understand and correctly
implement the cryptographically secure process of obtaining a random
input?
> There are other ways that this could be done, of course. If the UEFI
> boot services are still available, you might be able to ask the UEFI
> services to give you randomness. And yes, the hardware might be
> backdoored to the fare-the-well by the MSS (for devices manufactured
> in China) or by an NSA Tailored Access Operations intercepting a
> computer shipment in transit. But my vision was that this wouldn't
> necessarily bump the entropy accounting or mark the CRNG as fully
> intialized. (If you work for the NSA and you're sure you won't do an
> own-goal, you could enable a kernel boot option which marks the CRNG
> initialized from entropy coming from UEFI or RDRAND or a TPM. But I
> don't think it should be the default.)
>
> The only goal was to get enough uncertainty so we can secure early
> kernel users of entropy for security features such as kernel ASLR,
> the kernel stack canary, SLAB freelist randomization, etc.
>
> And by the way --- if you think it is easy / possible to get secure
> random numbers easily from either a TPMv1 or TPMv2 w/o any early boot
> services (e.g., no interrupts, no DMA, no page tables, no memory
> allocation) that would be really good to know.
Well, as I said, I was planning to use the EFI driver (actually for
IMA, but it works here too) which should be present to the kernel on
boot. We also don't have quite the severe restrictions you say. The
bootmem interface is usable for allocations (even ones that persist
beyond init discard) and, although most TPMs are actually polled
devices, it is possible to use interrupt drivers that do DMA via UEFI
in early boot provided you know what you're doing.
James
On Tue, Apr 17, 2018 at 12:57:12PM +0100, James Bottomley wrote:
>
> You don't have to compromise the bootloader to influence this, you
> merely have to trick it into providing the random number you wanted.
> The bigger you make the attack surface (the more inputs) the more
> likelihood of finding a trick that works.
There is a large class of devices where the bootloader can be
considered trusted. For example, all modern Chrome and Android
devices have signed bootloaders by default. And if you are using an
Amazon or Chrome VM, you are generally started it with a known,
trusted boot image.
The reason why it's useful to have the bootloader get the entropy is
because it may device-specific access and be able to leverage whatever
infrastructure was used to load the kernel and/or intialramfs to also
load the equivalent of /var/lib/systemd/random-seed (or
/var/lib/urandom, et. al) --- and do this early enough that we can
have truely secure randomness for those kernel faciliteis that need
access to real randomness to initialize the stack canary, or
initializing the slab cache.
There are other ways that this could be done, of course. If the UEFI
boot services are still available, you might be able to ask the UEFI
services to give you randomness. And yes, the hardware might be
backdoored to the fare-the-well by the MSS (for devices manufactured
in China) or by an NSA Tailored Access Operations intercepting a
computer shipment in transit. But my vision was that this wouldn't
necessarily bump the entropy accounting or mark the CRNG as fully
intialized. (If you work for the NSA and you're sure you won't do an
own-goal, you could enable a kernel boot option which marks the CRNG
initialized from entropy coming from UEFI or RDRAND or a TPM. But I
don't think it should be the default.)
The only goal was to get enough uncertainty so we can secure early
kernel users of entropy for security features such as kernel ASLR, the
kernel stack canary, SLAB freelist randomization, etc.
And by the way --- if you think it is easy / possible to get secure
random numbers easily from either a TPMv1 or TPMv2 w/o any early boot
services (e.g., no interrupts, no DMA, no page tables, no memory
allocation) that would be really good to know.
Cheers,
> No, entropy mixing ensures that all you do with bad entropy is degrade
> the quality, but if the quality degrades to zero (as it might at boot
> when you've no other entropy sources so you feed in 100% bad entropy),
> then the random sequences become predictable.
Actually, if you have good entropy mixing, you can mix super-bad
entropy --- e.g., completely known by the attacker, and it won't make
the entropy pool any worse. It can only help.
It does require that the entropy mixing algorithm should be
reversible, so that mixing in even a fully known sequence will not
cause uncertainty to be lost. The input_pool in the random driver is
designed in such a way, which is why /dev/[u]random is world-writable.
Anyone can contribute potential uncertainty into the pool. Regardless
of whether they have zero, partial, or full knowledge of the internal
random state, they won't have any more certainty of the pool after
they mix in their contribution. And an attacker which does not know
the contribution, and who might have partial knowledge of the pool,
will less knowledge about the internal state afterwards.
Cheers,
- Ted
On Tue, Apr 17, 2018 at 04:42:39PM +0100, James Bottomley wrote:
> Depends how the parameter is passed. If it can be influenced from the
> command line then a large class of "trusted boot" systems actually
> don't verify the command line, so you can boot a trusted system and
> still inject bogus command line parameters. This is definitely true of
> PC class secure boot. Not saying it will always be so, just
> illustrating why you don't necessarily want to expand the attack
> surface.
Sure, this is why I don't really like the scheme of relying on the
command line. For one thing, the command-line is public, so if the
attacker can read /proc/cmdline, they'll have access to the entropy.
What I would prefer is an extension to the boot protocol so that some
number of bytes would be passed to the kernel as a separate bag of
bytes alongside the kernel command line and the initrd.
The kernel would mix that into the random driver (which is written so
the basic input pool and primary_crng can accept input in super-early
boot). This woud be done *before* we relocate the kernel, so that
kernel ASLR code can relocate the kernel test to a properly
unpredictable number --- so this really is quite super-early boot.
> OK, in the UEFI ideal world where every component is a perfectly
> written OS, perhaps you're right. In the more real world, do you trust
> the people who wrote the bootloader to understand and correctly
> implement the cryptographically secure process of obtaining a random
> input?
In the default setup, I would expect the bootloader (such as grub)
would read the random initialization data from disk. So it would work
much like systemd reading from /var/lib/systemd/random-seed. And I
would trust the bootloader implementors to be able to do this about as
well as I would trust the systemd implementors. :-) It's not that
hard, after all....
- Ted