2017-07-07 08:34:16

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

According to KMSAN (see the report below) it's possible that
unfreeze_partials() accesses &n->list_lock before it's being
initialized. The initialization normally happens in
init_kmem_cache_node() when it's called from init_kmem_cache_nodes(),
but only after the struct kmem_cache_node is published.
To avoid what appears to be a data race, we need to publish the struct
after it has been initialized.

KMSAN (https://github.com/google/kmsan) report is as follows:

==================================================================
BUG: KMSAN: use of uninitialized memory in
queued_spin_lock_slowpath+0xa55/0xaf0 kernel/locking/qspinlock.c:478
CPU: 1 PID: 5021 Comm: modprobe Not tainted 4.11.0-rc5+ #2876
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
queued_spin_lock_slowpath+0xa55/0xaf0 kernel/locking/qspinlock.c:478
queued_spin_lock include/asm-generic/qspinlock.h:103 [inline]
do_raw_spin_lock include/linux/spinlock.h:148 [inline]
__raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
_raw_spin_lock+0x73/0x80 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
unfreeze_partials+0x6d/0x210 mm/slub.c:2181
put_cpu_partial mm/slub.c:2255 [inline]
__slab_free mm/slub.c:2894 [inline]
do_slab_free mm/slub.c:2990 [inline]
slab_free mm/slub.c:3005 [inline]
kmem_cache_free+0x39d/0x5b0 mm/slub.c:3020
free_task_struct kernel/fork.c:158 [inline]
free_task kernel/fork.c:370 [inline]
__put_task_struct+0x7eb/0x880 kernel/fork.c:407
put_task_struct include/linux/sched/task.h:94 [inline]
delayed_put_task_struct+0x271/0x2b0 kernel/exit.c:181
__rcu_reclaim kernel/rcu/rcu.h:118 [inline]
rcu_do_batch kernel/rcu/tree.c:2879 [inline]
invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline]
__rcu_process_callbacks kernel/rcu/tree.c:3109 [inline]
rcu_process_callbacks+0x1f10/0x2b50 kernel/rcu/tree.c:3126
__do_softirq+0x485/0x942 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1fa/0x230 kernel/softirq.c:405
exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:657
smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
RIP: 0010:native_restore_fl arch/x86/include/asm/irqflags.h:36 [inline]
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:77
[inline]
RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
RSP: 0018:ffff880023d6f7b8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000246 RBX: ffff88001ebf8000 RCX: 0000000000000002
RDX: 0000000000000001 RSI: ffff880000000000 RDI: ffffea0000b332f0
RBP: ffff880023d6f838 R08: 0000000000000884 R09: 0000000000000004
R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85ab5df0
R13: ffff880023d6f885 R14: 0000000000000001 R15: ffffffff81ae424d
</IRQ>
page_remove_rmap+0x9d/0xd80 mm/rmap.c:1256
zap_pte_range mm/memory.c:1237 [inline]
zap_pmd_range mm/memory.c:1318 [inline]
zap_pud_range mm/memory.c:1347 [inline]
zap_p4d_range mm/memory.c:1368 [inline]
unmap_page_range+0x28fd/0x34a0 mm/memory.c:1389
unmap_single_vma+0x354/0x4b0 mm/memory.c:1434
unmap_vmas+0x192/0x2a0 mm/memory.c:1464
unmap_region+0x375/0x4c0 mm/mmap.c:2464
do_munmap+0x1a39/0x2360 mm/mmap.c:2669
vm_munmap mm/mmap.c:2688 [inline]
SYSC_munmap+0x13d/0x1a0 mm/mmap.c:2698
SyS_munmap+0x47/0x70 mm/mmap.c:2695
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x7fb3c34f0d37
RSP: 002b:00007ffd44afc268 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 0000563670927260 RCX: 00007fb3c34f0d37
RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fb3c3bd5000
RBP: 0000000000000000 R08: 00007fb3c3bd1700 R09: 00007ffd44afc3c8
R10: 0000000000000000 R11: 0000000000000206 R12: 00000000ffffffff
R13: 0000563670927110 R14: 0000563670927210 R15: 00007ffd44afc4f0
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 [inline]
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
kmsan_post_alloc_hook+0x10/0x20 mm/kmsan/kmsan.c:350
slab_post_alloc_hook mm/slab.h:459 [inline]
slab_alloc_node mm/slub.c:2747 [inline]
kmem_cache_alloc_node+0x150/0x210 mm/slub.c:2788
init_kmem_cache_nodes mm/slub.c:3430 [inline]
kmem_cache_open mm/slub.c:3649 [inline]
__kmem_cache_create+0x13e/0x5f0 mm/slub.c:4290
create_cache mm/slab_common.c:382 [inline]
kmem_cache_create+0x186/0x220 mm/slab_common.c:467
fork_init+0x7e/0x430 kernel/fork.c:451
start_kernel+0x499/0x580 init/main.c:654
x86_64_start_reservations arch/x86/kernel/head64.c:196 [inline]
x86_64_start_kernel+0x6cc/0x700 arch/x86/kernel/head64.c:177
verify_cpu+0x0/0xfc
==================================================================

Signed-off-by: Alexander Potapenko <[email protected]>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1d3f9835f4ea..481e523bb30d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
return 0;
}

- s->node[node] = n;
init_kmem_cache_node(n);
+ s->node[node] = n;
}
return 1;
}
--
2.13.2.725.g09c95d1e9-goog


2017-07-07 08:43:31

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

Hi all,

On Fri, Jul 7, 2017 at 10:34 AM, Alexander Potapenko <[email protected]> wrote:
> According to KMSAN (see the report below) it's possible that
> unfreeze_partials() accesses &n->list_lock before it's being
> initialized. The initialization normally happens in
> init_kmem_cache_node() when it's called from init_kmem_cache_nodes(),
> but only after the struct kmem_cache_node is published.
> To avoid what appears to be a data race, we need to publish the struct
> after it has been initialized.
I was unsure whether we should use acquire/release primitives instead
of direct accesses to s->node[node].
If a data race here is really possible, then looks like we do.
> KMSAN (https://github.com/google/kmsan) report is as follows:
>
> ==================================================================
> BUG: KMSAN: use of uninitialized memory in
> queued_spin_lock_slowpath+0xa55/0xaf0 kernel/locking/qspinlock.c:478
> CPU: 1 PID: 5021 Comm: modprobe Not tainted 4.11.0-rc5+ #2876
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x172/0x1c0 lib/dump_stack.c:52
> kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
> __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
> queued_spin_lock_slowpath+0xa55/0xaf0 kernel/locking/qspinlock.c:478
> queued_spin_lock include/asm-generic/qspinlock.h:103 [inline]
> do_raw_spin_lock include/linux/spinlock.h:148 [inline]
> __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
> _raw_spin_lock+0x73/0x80 kernel/locking/spinlock.c:151
> spin_lock include/linux/spinlock.h:299 [inline]
> unfreeze_partials+0x6d/0x210 mm/slub.c:2181
> put_cpu_partial mm/slub.c:2255 [inline]
> __slab_free mm/slub.c:2894 [inline]
> do_slab_free mm/slub.c:2990 [inline]
> slab_free mm/slub.c:3005 [inline]
> kmem_cache_free+0x39d/0x5b0 mm/slub.c:3020
> free_task_struct kernel/fork.c:158 [inline]
> free_task kernel/fork.c:370 [inline]
> __put_task_struct+0x7eb/0x880 kernel/fork.c:407
> put_task_struct include/linux/sched/task.h:94 [inline]
> delayed_put_task_struct+0x271/0x2b0 kernel/exit.c:181
> __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
> rcu_do_batch kernel/rcu/tree.c:2879 [inline]
> invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline]
> __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline]
> rcu_process_callbacks+0x1f10/0x2b50 kernel/rcu/tree.c:3126
> __do_softirq+0x485/0x942 kernel/softirq.c:284
> invoke_softirq kernel/softirq.c:364 [inline]
> irq_exit+0x1fa/0x230 kernel/softirq.c:405
> exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:657
> smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
> apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
> RIP: 0010:native_restore_fl arch/x86/include/asm/irqflags.h:36 [inline]
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:77
> [inline]
> RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
> RSP: 0018:ffff880023d6f7b8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> RAX: 0000000000000246 RBX: ffff88001ebf8000 RCX: 0000000000000002
> RDX: 0000000000000001 RSI: ffff880000000000 RDI: ffffea0000b332f0
> RBP: ffff880023d6f838 R08: 0000000000000884 R09: 0000000000000004
> R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85ab5df0
> R13: ffff880023d6f885 R14: 0000000000000001 R15: ffffffff81ae424d
> </IRQ>
> page_remove_rmap+0x9d/0xd80 mm/rmap.c:1256
> zap_pte_range mm/memory.c:1237 [inline]
> zap_pmd_range mm/memory.c:1318 [inline]
> zap_pud_range mm/memory.c:1347 [inline]
> zap_p4d_range mm/memory.c:1368 [inline]
> unmap_page_range+0x28fd/0x34a0 mm/memory.c:1389
> unmap_single_vma+0x354/0x4b0 mm/memory.c:1434
> unmap_vmas+0x192/0x2a0 mm/memory.c:1464
> unmap_region+0x375/0x4c0 mm/mmap.c:2464
> do_munmap+0x1a39/0x2360 mm/mmap.c:2669
> vm_munmap mm/mmap.c:2688 [inline]
> SYSC_munmap+0x13d/0x1a0 mm/mmap.c:2698
> SyS_munmap+0x47/0x70 mm/mmap.c:2695
> entry_SYSCALL_64_fastpath+0x13/0x94
> RIP: 0033:0x7fb3c34f0d37
> RSP: 002b:00007ffd44afc268 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000563670927260 RCX: 00007fb3c34f0d37
> RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fb3c3bd5000
> RBP: 0000000000000000 R08: 00007fb3c3bd1700 R09: 00007ffd44afc3c8
> R10: 0000000000000000 R11: 0000000000000206 R12: 00000000ffffffff
> R13: 0000563670927110 R14: 0000563670927210 R15: 00007ffd44afc4f0
> origin:
> save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
> kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 [inline]
> kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
> kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
> kmsan_post_alloc_hook+0x10/0x20 mm/kmsan/kmsan.c:350
> slab_post_alloc_hook mm/slab.h:459 [inline]
> slab_alloc_node mm/slub.c:2747 [inline]
> kmem_cache_alloc_node+0x150/0x210 mm/slub.c:2788
> init_kmem_cache_nodes mm/slub.c:3430 [inline]
> kmem_cache_open mm/slub.c:3649 [inline]
> __kmem_cache_create+0x13e/0x5f0 mm/slub.c:4290
> create_cache mm/slab_common.c:382 [inline]
> kmem_cache_create+0x186/0x220 mm/slab_common.c:467
> fork_init+0x7e/0x430 kernel/fork.c:451
> start_kernel+0x499/0x580 init/main.c:654
> x86_64_start_reservations arch/x86/kernel/head64.c:196 [inline]
> x86_64_start_kernel+0x6cc/0x700 arch/x86/kernel/head64.c:177
> verify_cpu+0x0/0xfc
> ==================================================================
>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 1d3f9835f4ea..481e523bb30d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
> return 0;
> }
>
> - s->node[node] = n;
> init_kmem_cache_node(n);
> + s->node[node] = n;
> }
> return 1;
> }
> --
> 2.13.2.725.g09c95d1e9-goog
>



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2017-07-07 20:23:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

On Fri, 7 Jul 2017 10:34:08 +0200 Alexander Potapenko <[email protected]> wrote:

> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
> return 0;
> }
>
> - s->node[node] = n;
> init_kmem_cache_node(n);
> + s->node[node] = n;
> }
> return 1;
> }

If this matters then I have bad feelings about free_kmem_cache_nodes():

static void free_kmem_cache_nodes(struct kmem_cache *s)
{
int node;
struct kmem_cache_node *n;

for_each_kmem_cache_node(s, node, n) {
kmem_cache_free(kmem_cache_node, n);
s->node[node] = NULL;
}
}

Inviting a use-after-free? I guess not, as there should be no way
to look up these items at this stage.

Could the slab maintainers please take a look at these and also have a
think about Alexander's READ_ONCE/WRITE_ONCE question?

Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

On Fri, 7 Jul 2017, Andrew Morton wrote:

> On Fri, 7 Jul 2017 10:34:08 +0200 Alexander Potapenko <[email protected]> wrote:
>
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
> > return 0;
> > }
> >
> > - s->node[node] = n;
> > init_kmem_cache_node(n);
> > + s->node[node] = n;
> > }
> > return 1;
> > }
>
> If this matters then I have bad feelings about free_kmem_cache_nodes():

At creation time the kmem_cache structure is private and no one can run a
free operation.

> Inviting a use-after-free? I guess not, as there should be no way
> to look up these items at this stage.

Right.

> Could the slab maintainers please take a look at these and also have a
> think about Alexander's READ_ONCE/WRITE_ONCE question?

Was I cced on these?

2017-07-10 09:19:58

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

On Sat, Jul 8, 2017 at 1:18 AM, Christoph Lameter <[email protected]> wrote:
> On Fri, 7 Jul 2017, Andrew Morton wrote:
>
>> On Fri, 7 Jul 2017 10:34:08 +0200 Alexander Potapenko <[email protected]> wrote:
>>
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
>> > return 0;
>> > }
>> >
>> > - s->node[node] = n;
>> > init_kmem_cache_node(n);
>> > + s->node[node] = n;
>> > }
>> > return 1;
>> > }
>>
>> If this matters then I have bad feelings about free_kmem_cache_nodes():
>
> At creation time the kmem_cache structure is private and no one can run a
> free operation.
>
>> Inviting a use-after-free? I guess not, as there should be no way
>> to look up these items at this stage.
>
> Right.
>
>> Could the slab maintainers please take a look at these and also have a
>> think about Alexander's READ_ONCE/WRITE_ONCE question?
>
> Was I cced on these?
I've asked Andrew about READ_ONCE privately.
My concern is as follows.
Since unfreeze_partials() sees uninitialized value of n->list_lock, I
was suspecting there's a data race between unfreeze_partials() and
init_kmem_cache_nodes().
If so, reads and writes to s->node[node] must be acquire/release
atomics (not actually READ_ONCE/WRITE_ONCE, but
smp_load_acquire/smp_store_release).




--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

On Mon, 10 Jul 2017, Alexander Potapenko wrote:

> >> Could the slab maintainers please take a look at these and also have a
> >> think about Alexander's READ_ONCE/WRITE_ONCE question?
> >
> > Was I cced on these?
> I've asked Andrew about READ_ONCE privately.

Please post to a mailing list and cc the maintainers?

> Since unfreeze_partials() sees uninitialized value of n->list_lock, I
> was suspecting there's a data race between unfreeze_partials() and
> init_kmem_cache_nodes().

I have not seen the details but I would suspect that this is related to
early boot issues? The list lock is initialized upon slab creation and at
that time no one can get to the kmem_cache structure.

There are a couple of boot time slabs that will temporarily be available.
and released upon boot completion.

> If so, reads and writes to s->node[node] must be acquire/release
> atomics (not actually READ_ONCE/WRITE_ONCE, but
> smp_load_acquire/smp_store_release).

Can we figure the reason for these out before proposing fixes?

2017-07-10 20:32:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

On Fri, 7 Jul 2017 18:18:31 -0500 (CDT) Christoph Lameter <[email protected]> wrote:

> On Fri, 7 Jul 2017, Andrew Morton wrote:
>
> > On Fri, 7 Jul 2017 10:34:08 +0200 Alexander Potapenko <[email protected]> wrote:
> >
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
> > > return 0;
> > > }
> > >
> > > - s->node[node] = n;
> > > init_kmem_cache_node(n);
> > > + s->node[node] = n;
> > > }
> > > return 1;
> > > }
> >
> > If this matters then I have bad feelings about free_kmem_cache_nodes():
>
> At creation time the kmem_cache structure is private and no one can run a
> free operation.
>
> > Inviting a use-after-free? I guess not, as there should be no way
> > to look up these items at this stage.
>
> Right.

Still. It looks bad, and other sites do these things in the other order.

> > Could the slab maintainers please take a look at these and also have a
> > think about Alexander's READ_ONCE/WRITE_ONCE question?
>
> Was I cced on these?

It's all on linux-mm.

2017-07-12 14:11:31

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

Hi everyone,

On Mon, Jul 10, 2017 at 10:32 PM, Andrew Morton
<[email protected]> wrote:
> On Fri, 7 Jul 2017 18:18:31 -0500 (CDT) Christoph Lameter <[email protected]> wrote:
>
>> On Fri, 7 Jul 2017, Andrew Morton wrote:
>>
>> > On Fri, 7 Jul 2017 10:34:08 +0200 Alexander Potapenko <[email protected]> wrote:
>> >
>> > > --- a/mm/slub.c
>> > > +++ b/mm/slub.c
>> > > @@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
>> > > return 0;
>> > > }
>> > >
>> > > - s->node[node] = n;
>> > > init_kmem_cache_node(n);
>> > > + s->node[node] = n;
>> > > }
>> > > return 1;
>> > > }
>> >
>> > If this matters then I have bad feelings about free_kmem_cache_nodes():
>>
>> At creation time the kmem_cache structure is private and no one can run a
>> free operation.
I've double-checked the code path and this turned out to be a false
positive caused by KMSAN not instrumenting the contents of mm/slub.c
(i.e. the initialization of the spinlock remained unnoticed).
Christoph is indeed right that kmem_cache_structure is private, so a
race is not possible here.
I am sorry for the false alarm.
>> > Inviting a use-after-free? I guess not, as there should be no way
>> > to look up these items at this stage.
>>
>> Right.
>
> Still. It looks bad, and other sites do these things in the other order.
If the maintainers agree the initialization order needs to be fixed,
we'll need to remove the (irrelevant) KMSAN report from the patch
description.
>> > Could the slab maintainers please take a look at these and also have a
>> > think about Alexander's READ_ONCE/WRITE_ONCE question?
>>
>> Was I cced on these?
>
> It's all on linux-mm.



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2017-07-12 19:21:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

On Wed, 12 Jul 2017 16:11:28 +0200 Alexander Potapenko <[email protected]> wrote:

> >> At creation time the kmem_cache structure is private and no one can run a
> >> free operation.
> I've double-checked the code path and this turned out to be a false
> positive caused by KMSAN not instrumenting the contents of mm/slub.c
> (i.e. the initialization of the spinlock remained unnoticed).
> Christoph is indeed right that kmem_cache_structure is private, so a
> race is not possible here.
> I am sorry for the false alarm.
> >> > Inviting a use-after-free? I guess not, as there should be no way
> >> > to look up these items at this stage.
> >>
> >> Right.
> >
> > Still. It looks bad, and other sites do these things in the other order.
> If the maintainers agree the initialization order needs to be fixed,
> we'll need to remove the (irrelevant) KMSAN report from the patch
> description.

Yup. I did this:

From: Alexander Potapenko <[email protected]>
Subject: slub: tidy up initialization ordering

- free_kmem_cache_nodes() frees the cache node before nulling out a
reference to it

- init_kmem_cache_nodes() publishes the cache node before initializing it

Neither of these matter at runtime because the cache nodes cannot be
looked up by any other thread. But it's neater and more consistent to
reorder these.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

mm/slub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN mm/slub.c~slub-make-sure-struct-kmem_cache_node-is-initialized-before-publication mm/slub.c
--- a/mm/slub.c~slub-make-sure-struct-kmem_cache_node-is-initialized-before-publication
+++ a/mm/slub.c
@@ -3358,8 +3358,8 @@ static void free_kmem_cache_nodes(struct
struct kmem_cache_node *n;

for_each_kmem_cache_node(s, node, n) {
- kmem_cache_free(kmem_cache_node, n);
s->node[node] = NULL;
+ kmem_cache_free(kmem_cache_node, n);
}
}

@@ -3389,8 +3389,8 @@ static int init_kmem_cache_nodes(struct
return 0;
}

- s->node[node] = n;
init_kmem_cache_node(n);
+ s->node[node] = n;
}
return 1;
}
_

Subject: Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

On Wed, 12 Jul 2017, Andrew Morton wrote:

> - free_kmem_cache_nodes() frees the cache node before nulling out a
> reference to it
>
> - init_kmem_cache_nodes() publishes the cache node before initializing it
>
> Neither of these matter at runtime because the cache nodes cannot be
> looked up by any other thread. But it's neater and more consistent to
> reorder these.

The compiler or processor may reorder them at will anyways. But if its
tidier....

Acked-by: Christoph Lameter <[email protected]>