2019-11-04 17:04:12

by Thibaut Sautereau

[permalink] [raw]
Subject: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

I ran into the following BUG on a 5.3.8 kernel:

=============================================================================
BUG skbuff_head_cache (Tainted: G T): Object already free
-----------------------------------------------------------------------------

INFO: Slab 0x000000000d2d2f8f objects=16 used=3 fp=0x0000000064309071 flags=0x3fff00000000201
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B T 5.3.8 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:print_trailer+0x70/0x1d5
Code: 28 4d 8b 4d 00 4d 8b 45 20 81 e2 ff 7f 00 00 e8 86 ce ef ff 8b 4b 20 48 89 ea 48 89 ee 4c 29 e2 48 c7 c7 90 6f d4 89 48 01 e9 <48> 33 09 48 33 8b 70 01 00 00 e8 61 ce ef ff f6 43 09 04 74 35 8b
RSP: 0018:ffffbf7680003d58 EFLAGS: 00010046
RAX: 000000000000005d RBX: ffffa3d2bb08e540 RCX: 0000000000000000
RDX: 00005c2d8fdc2000 RSI: 0000000000000000 RDI: ffffffff89d46f90
RBP: 0000000000000000 R08: 0000000000000242 R09: 000000000000006c
R10: 0000000000000000 R11: 0000000000000030 R12: ffffa3d27023e000
R13: fffff11080c08f80 R14: ffffa3d2bb047a80 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffffa3d2be400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a6c4000 CR4: 00000000000006f0
Call Trace:
<IRQ>
free_debug_processing.cold.37+0xc9/0x149
? __kfree_skb_flush+0x30/0x40
? __kfree_skb_flush+0x30/0x40
__slab_free+0x22a/0x3d0
? tcp_wfree+0x2a/0x140
? __sock_wfree+0x1b/0x30
kmem_cache_free_bulk+0x415/0x420
? __kfree_skb_flush+0x30/0x40
__kfree_skb_flush+0x30/0x40
net_rx_action+0x2dd/0x480
__do_softirq+0xf0/0x246
irq_exit+0x93/0xb0
do_IRQ+0xa0/0x110
common_interrupt+0xf/0xf
</IRQ>
RIP: 0010:default_idle+0x13/0x20
Code: 24 28 eb 9e 4c 89 ff e8 eb 9f c9 ff eb 9f e8 84 03 8c ff 90 90 90 90 e8 2b 96 bc ff e9 07 00 00 00 0f 00 2d 01 71 41 00 fb f4 <e9> 18 96 bc ff 0f 1f 84 00 00 00 00 00 53 65 48 8b 1c 25 00 5d 01
RSP: 0018:ffffffff89e03eb8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd6
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff89d8fe58
RBP: ffffffff89e85d78 R08: 00000000000d2730 R09: 0000000000000000
R10: 0000000000000222 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff8a2ea920 R15: 000000007efbbd98
? default_idle+0x5/0x20
do_idle+0x1ad/0x220
cpu_startup_entry+0x14/0x20
start_kernel+0x62f/0x66e
secondary_startup_64+0xa4/0xb0
Modules linked in: [...]
CR2: 0000000000000000
---[ end trace 0d08341f1cb4ff43 ]---
RIP: 0010:print_trailer+0x70/0x1d5
Code: 28 4d 8b 4d 00 4d 8b 45 20 81 e2 ff 7f 00 00 e8 86 ce ef ff 8b 4b 20 48 89 ea 48 89 ee 4c 29 e2 48 c7 c7 90 6f d4 89 48 01 e9 <48> 33 09 48 33 8b 70 01 00 00 e8 61 ce ef ff f6 43 09 04 74 35 8b
RSP: 0018:ffffbf7680003d58 EFLAGS: 00010046
RAX: 000000000000005d RBX: ffffa3d2bb08e540 RCX: 0000000000000000
RDX: 00005c2d8fdc2000 RSI: 0000000000000000 RDI: ffffffff89d46f90
RBP: 0000000000000000 R08: 0000000000000242 R09: 000000000000006c
R10: 0000000000000000 R11: 0000000000000030 R12: ffffa3d27023e000
R13: fffff11080c08f80 R14: ffffa3d2bb047a80 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffffa3d2be400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a6c4000 CR4: 00000000000006f0
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0x8000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

We first encountered this issue under huge network traffic (system image
download), and I was able to reproduce by simply sending a big packet
with `ping -s 65507 <ip>`, which crashes the kernel every single time.

The BUG only happens when using `slub_debug=F` on the command-line (to
enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
reported and the system keeps running.

The code path is:
net_rx_action
__kfree_skb_flush
kmem_cache_free_bulk() # skbuff_head_cache
slab_free()
do_slab_free()
__slab_free()
free_debug_processing()
free_consistency_check()
object_err() # "Object already free"
print_trailer()
print_tracking() # !(s->flags & SLAB_STORE_USER) => return;
print_page_info() # "INFO: Slab ..."
pr_err("INFO: Object ...", ..., get_freepointer(s, p))
get_freepointer()
freelist_dereference() # NULL pointer dereference

Enabling KASAN shows less info because the NULL pointer dereference then
apparently happens before reaching free_debug_processing().

Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
slab walking for init_on_free"), and indeed the BUG is not triggered
when init_on_free is disabled.

--
Thibaut Sautereau
CLIP OS developer


2019-11-04 17:34:33

by Eric Dumazet

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free



On 11/4/19 9:03 AM, Thibaut Sautereau wrote:
>
> We first encountered this issue under huge network traffic (system image
> download), and I was able to reproduce by simply sending a big packet
> with `ping -s 65507 <ip>`, which crashes the kernel every single time.
>

Since you have a repro, could you start a bisection ?

Thanks !

2019-11-05 08:07:02

by Thibaut Sautereau

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On Mon, Nov 04, 2019 at 09:33:18AM -0800, Eric Dumazet wrote:
>
>
> On 11/4/19 9:03 AM, Thibaut Sautereau wrote:
> >
> > We first encountered this issue under huge network traffic (system image
> > download), and I was able to reproduce by simply sending a big packet
> > with `ping -s 65507 <ip>`, which crashes the kernel every single time.
> >
>
> Since you have a repro, could you start a bisection ?

From my previous email:

"Bisection points to the following commit: 1b7e816fc80e ("mm: slub:
Fix slab walking for init_on_free"), and indeed the BUG is not
triggered when init_on_free is disabled."

Or are you meaning something else?

--
Thibaut Sautereau
CLIP OS developer

2019-11-05 09:01:54

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On 11/4/19 6:03 PM, Thibaut Sautereau wrote:
> The BUG only happens when using `slub_debug=F` on the command-line (to
> enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
> reported and the system keeps running.

You could change slub_debug parameter to:
slub_debug=FU,skbuff_head_cache

That will also print out who previously allocated and freed the double
freed object. And limit all the tracking just to the affected cache.

> The code path is:
> net_rx_action
> __kfree_skb_flush
> kmem_cache_free_bulk() # skbuff_head_cache
> slab_free()
> do_slab_free()
> __slab_free()
> free_debug_processing()
> free_consistency_check()
> object_err() # "Object already free"
> print_trailer()
> print_tracking() # !(s->flags & SLAB_STORE_USER) => return;
> print_page_info() # "INFO: Slab ..."
> pr_err("INFO: Object ...", ..., get_freepointer(s, p))
> get_freepointer()
> freelist_dereference() # NULL pointer dereference
>
> Enabling KASAN shows less info because the NULL pointer dereference then
> apparently happens before reaching free_debug_processing().
>
> Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
> slab walking for init_on_free"), and indeed the BUG is not triggered
> when init_on_free is disabled.

That could be either buggy SLUB code, or the commit somehow exposed a
real bug in skbuff users.

2019-11-05 10:21:12

by Alexander Potapenko

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On Tue, Nov 5, 2019 at 9:06 AM Thibaut Sautereau
<[email protected]> wrote:
>
> On Mon, Nov 04, 2019 at 09:33:18AM -0800, Eric Dumazet wrote:
> >
> >
> > On 11/4/19 9:03 AM, Thibaut Sautereau wrote:
> > >
> > > We first encountered this issue under huge network traffic (system image
> > > download), and I was able to reproduce by simply sending a big packet
> > > with `ping -s 65507 <ip>`, which crashes the kernel every single time.
> > >
> >
> > Since you have a repro, could you start a bisection ?
>
> From my previous email:
>
> "Bisection points to the following commit: 1b7e816fc80e ("mm: slub:
> Fix slab walking for init_on_free"), and indeed the BUG is not
> triggered when init_on_free is disabled."
>
> Or are you meaning something else?
Could you please give more specific reproduction steps?
I've checked out v5.3.8 from
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git, ran
`make defconfig` and added CONFIG_SLUB_DEBUG_ON=y.
Then I've built the kernel, ran it on QEMU with slub_debug=F and
init_on_free=1, SSHed into the machine and executed `ping -s 65507
127.0.0.1`
This however didn't trigger any crashes.
Am I missing something?
> --
> Thibaut Sautereau
> CLIP OS developer



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2019-11-05 14:34:38

by Thibaut Sautereau

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On Tue, Nov 05, 2019 at 10:00:39AM +0100, Vlastimil Babka wrote:
> On 11/4/19 6:03 PM, Thibaut Sautereau wrote:
> > The BUG only happens when using `slub_debug=F` on the command-line (to
> > enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
> > reported and the system keeps running.
>
> You could change slub_debug parameter to:
> slub_debug=FU,skbuff_head_cache
>
> That will also print out who previously allocated and freed the double
> freed object. And limit all the tracking just to the affected cache.

Thanks, I did not know about that.

However, as kind of expected, I get a BUG due to a NULL pointer
dereference in print_track():

=============================================================================
BUG skbuff_head_cache (Tainted: G T): Object already free
-----------------------------------------------------------------------------

BUG: kernel NULL pointer dereference, address: 00000000000000e0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B T 5.3.8 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:print_track+0xd/0x67
Code: 24 20 89 ee 48 c7 c7 70 33 d5 8c 49 8b 4c 24 28 49 89 c0 e8 77 f2 ef ff 83 c5 01 eb b9 41 54 49 89 fc 55 48 89 d5 53 48 89 f3 <48> 8b 13 48 85 d2 74 4d 48 89 e9 44 8b 8b 8c 00 00 00 4c 89 e6 48
RSP: 0018:ffff99d1c0003d18 EFLAGS: 00010002
RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000000
RDX: 00000000fffc0d26 RSI: 00000000000000e0 RDI: ffffffff8cd538a0
RBP: 00000000fffc0d26 R08: 0000000000000240 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8cd538a0
R13: ffffd1cd00c88700 R14: ffff93613e077e00 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff93613e600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000e0 CR3: 00000000010c0000 CR4: 00000000000006f0
Call Trace:
<IRQ>
print_tracking+0x38/0x6b
print_trailer+0x33/0x1d4
free_debug_processing.cold.38+0xc9/0x149
? __kfree_skb_flush+0x30/0x40
? __kfree_skb_flush+0x30/0x40
? __kfree_skb_flush+0x30/0x40
__slab_free+0x22a/0x3d0
? sock_wfree+0x5d/0x70
? __kfree_skb_flush+0x30/0x40
kmem_cache_free_bulk+0x354/0x420
__kfree_skb_flush+0x30/0x40
net_rx_action+0x2be/0x460
__do_softirq+0xf3/0x249
irq_exit+0x93/0xb0
do_IRQ+0xa0/0x110
common_interrupt+0xf/0xf
</IRQ>
RIP: 0010:default_idle+0x13/0x20
Code: ff eb 9f e8 8f eb 8c ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 0b 4b bd ff e9 07 00 00 00 0f 00 2d 21 5a 42 00 fb f4 <e9> f8 4a bd ff 0f 1f 84 00 00 00 00 00 53 65 48 8b 1c 25 00 5d 01
RSP: 0018:ffffffff8d003eb8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd6
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 7ffffff6501e23c1
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8cda0130
RBP: ffffffff8d088438 R08: 0000000000000000 R09: 00000009b00fa2fe
R10: 00000000000001fd R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff8d502920 R15: 000000007efbbd98
? default_idle+0x5/0x20
do_idle+0x1a8/0x220
cpu_startup_entry+0x14/0x20
start_kernel+0x615/0x654
secondary_startup_64+0xa4/0xb0
Modules linked in: virtio_scsi virtio_net net_failover failover virtio_mmio virtio_input virtio_gpu virtio_crypto crypto_engine virtio_console virtio_balloon uhci_hcd uas usb_storage snd_hda_codec_generic snd_hda_codec snd_hda_core ledtrig_audio rtc_cmos qxl qemu_fw_cfg psmouse mousedev ehci_pci ehci_hcd button bochs_drm drm_vram_helper ttm virtio_rng
CR2: 00000000000000e0
---[ end trace 386b05c18ef99c32 ]---
RIP: 0010:print_track+0xd/0x67
Code: 24 20 89 ee 48 c7 c7 70 33 d5 8c 49 8b 4c 24 28 49 89 c0 e8 77 f2 ef ff 83 c5 01 eb b9 41 54 49 89 fc 55 48 89 d5 53 48 89 f3 <48> 8b 13 48 85 d2 74 4d 48 89 e9 44 8b 8b 8c 00 00 00 4c 89 e6 48
RSP: 0018:ffff99d1c0003d18 EFLAGS: 00010002
RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000000
RDX: 00000000fffc0d26 RSI: 00000000000000e0 RDI: ffffffff8cd538a0
RBP: 00000000fffc0d26 R08: 0000000000000240 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8cd538a0
R13: ffffd1cd00c88700 R14: ffff93613e077e00 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff93613e600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000e0 CR3: 00000000010c0000 CR4: 00000000000006f0
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0xb000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

I tried reverting 1b7e816fc80e ("mm: slub: Fix slab walking for
init_on_free") or disabling init_on_free, but then again I cannot
reproduce the bug and nothing is logged.

>
> > The code path is:
> > net_rx_action
> > __kfree_skb_flush
> > kmem_cache_free_bulk() # skbuff_head_cache
> > slab_free()
> > do_slab_free()
> > __slab_free()
> > free_debug_processing()
> > free_consistency_check()
> > object_err() # "Object already free"
> > print_trailer()
> > print_tracking() # !(s->flags & SLAB_STORE_USER) => return;
> > print_page_info() # "INFO: Slab ..."
> > pr_err("INFO: Object ...", ..., get_freepointer(s, p))
> > get_freepointer()
> > freelist_dereference() # NULL pointer dereference
> >
> > Enabling KASAN shows less info because the NULL pointer dereference then
> > apparently happens before reaching free_debug_processing().
> >
> > Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
> > slab walking for init_on_free"), and indeed the BUG is not triggered
> > when init_on_free is disabled.
>
> That could be either buggy SLUB code, or the commit somehow exposed a
> real bug in skbuff users.

Right. At first I thought about some incompatibility between
init_on_free and SLAB_CONSISTENCY_CHECKS, but in that case why would it
only happen with skbuff_head_cache? On the other hand, if it's a bug in
skbuff users, why is the on_freelist() check in free_consistency_check()
not detecting anything when init_on_free is disabled?

--
Thibaut Sautereau
CLIP OS developer

2019-11-05 15:06:41

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On 11/5/19 3:32 PM, Thibaut Sautereau wrote:
> On Tue, Nov 05, 2019 at 10:00:39AM +0100, Vlastimil Babka wrote:
>> On 11/4/19 6:03 PM, Thibaut Sautereau wrote:
>>> The BUG only happens when using `slub_debug=F` on the command-line (to
>>> enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
>>> reported and the system keeps running.
>>
>> You could change slub_debug parameter to:
>> slub_debug=FU,skbuff_head_cache
>>
>> That will also print out who previously allocated and freed the double
>> freed object. And limit all the tracking just to the affected cache.
>
> Thanks, I did not know about that.
>
> However, as kind of expected, I get a BUG due to a NULL pointer
> dereference in print_track():

Ah, I didn't read properly your initial mail, that there's a null
pointer deference during the consistency check.

...

>>>
>>> Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
>>> slab walking for init_on_free"), and indeed the BUG is not triggered
>>> when init_on_free is disabled.
>>
>> That could be either buggy SLUB code, or the commit somehow exposed a
>> real bug in skbuff users.
>
> Right. At first I thought about some incompatibility between
> init_on_free and SLAB_CONSISTENCY_CHECKS, but in that case why would it
> only happen with skbuff_head_cache?

That's curious, yeah.

> On the other hand, if it's a bug in
> skbuff users, why is the on_freelist() check in free_consistency_check()
> not detecting anything when init_on_free is disabled?

I vaguely suspect the code in the commit 1b7e816fc80e you bisected,
where in slab_free_freelist_hook() in the first iteration, we have void
*p = NULL; and set_freepointer(s, object, p); will thus write NULL into
the freelist. Is is the NULL we are crashing on? The code seems to
assume that the freelist is rewritten later in the function, but that
part is only active with some CONFIG_ option(s), none of which might be
enabled in your case?
But I don't really understand what exactly this function is supposed to
do. Laura, does my theory make sense?

Thanks,
Vlastimil

2019-11-05 15:07:55

by Thibaut Sautereau

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On Tue, Nov 05, 2019 at 11:19:08AM +0100, Alexander Potapenko wrote:
> On Tue, Nov 5, 2019 at 9:06 AM Thibaut Sautereau
> <[email protected]> wrote:
> >
> > On Mon, Nov 04, 2019 at 09:33:18AM -0800, Eric Dumazet wrote:
> > >
> > >
> > > On 11/4/19 9:03 AM, Thibaut Sautereau wrote:
> > > >
> > > > We first encountered this issue under huge network traffic (system image
> > > > download), and I was able to reproduce by simply sending a big packet
> > > > with `ping -s 65507 <ip>`, which crashes the kernel every single time.
> > > >
> > >
> > > Since you have a repro, could you start a bisection ?
> >
> > From my previous email:
> >
> > "Bisection points to the following commit: 1b7e816fc80e ("mm: slub:
> > Fix slab walking for init_on_free"), and indeed the BUG is not
> > triggered when init_on_free is disabled."
> >
> > Or are you meaning something else?
> Could you please give more specific reproduction steps?
> I've checked out v5.3.8 from
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git, ran
> `make defconfig` and added CONFIG_SLUB_DEBUG_ON=y.
> Then I've built the kernel, ran it on QEMU with slub_debug=F and
> init_on_free=1, SSHed into the machine and executed `ping -s 65507
> 127.0.0.1`
> This however didn't trigger any crashes.
> Am I missing something?

I cannot reproduce either when pinging the loopback interface. Please
try pinging another machine.

I also attached the kernel configuration I use in a QEMU VM to reproduce
the bug, as well as the VM's settings.

Loaded modules listed in the kernel panic trace are:
virtio_scsi virtio_net net_failover failover virtio_mmio
virtio_input virtio_gpu virtio_crypto crypto_engine virtio_console
virtio_balloon uhci_hcd uas usb_storage rtc_cmos qxl qemu_fw_cfg
psmouse mousedev ehci_pci ehci_hcd button bochs_drm drm_vram_helper
ttm virtio_rng

Hope this helps,

--
Thibaut Sautereau
CLIP OS developer


Attachments:
(No filename) (1.94 kB)
config (110.18 kB)
qemu.conf (2.93 kB)
Download all attachments

2019-11-05 17:02:49

by Laura Abbott

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On 11/5/19 10:02 AM, Vlastimil Babka wrote:
> On 11/5/19 3:32 PM, Thibaut Sautereau wrote:
>> On Tue, Nov 05, 2019 at 10:00:39AM +0100, Vlastimil Babka wrote:
>>> On 11/4/19 6:03 PM, Thibaut Sautereau wrote:
>>>> The BUG only happens when using `slub_debug=F` on the command-line (to
>>>> enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
>>>> reported and the system keeps running.
>>>
>>> You could change slub_debug parameter to:
>>> slub_debug=FU,skbuff_head_cache
>>>
>>> That will also print out who previously allocated and freed the double
>>> freed object. And limit all the tracking just to the affected cache.
>>
>> Thanks, I did not know about that.
>>
>> However, as kind of expected, I get a BUG due to a NULL pointer
>> dereference in print_track():
>
> Ah, I didn't read properly your initial mail, that there's a null
> pointer deference during the consistency check.
>
> ...
>
>>>>
>>>> Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
>>>> slab walking for init_on_free"), and indeed the BUG is not triggered
>>>> when init_on_free is disabled.
>>>
>>> That could be either buggy SLUB code, or the commit somehow exposed a
>>> real bug in skbuff users.
>>
>> Right. At first I thought about some incompatibility between
>> init_on_free and SLAB_CONSISTENCY_CHECKS, but in that case why would it
>> only happen with skbuff_head_cache?
>
> That's curious, yeah.
>
>> On the other hand, if it's a bug in
>> skbuff users, why is the on_freelist() check in free_consistency_check()
>> not detecting anything when init_on_free is disabled?
>
> I vaguely suspect the code in the commit 1b7e816fc80e you bisected,
> where in slab_free_freelist_hook() in the first iteration, we have void
> *p = NULL; and set_freepointer(s, object, p); will thus write NULL into
> the freelist. Is is the NULL we are crashing on? The code seems to
> assume that the freelist is rewritten later in the function, but that
> part is only active with some CONFIG_ option(s), none of which might be
> enabled in your case?
> But I don't really understand what exactly this function is supposed to
> do. Laura, does my theory make sense?
>
> Thanks,
> Vlastimil
>

The note about getting re-written is referring to the fact that the trick
with the bulk free is that we build the detached freelist and then when
we do the cmpxchg it's getting (correctly) updated there.

But looking at this again, I realize this function still has a more
fundamental problem: walking the freelist like this means we actually
end up reversing the list so head and tail are no longer pointing
to the correct blocks. I was able to reproduce the issue by writing a
simple kmem_cache_bulk_alloc/kmem_cache_bulk_free function. I'm
guessing that the test of ping with an unusual size was enough to
regularly trigger a non-trivial bulk alloc/free.

The fix I gave before fixed part of the problem but not all of it.
At this point we're basically duplicating the work of the loop
below so I think we can just combine it. Was there a reason this
wasn't just combined in the first place?

diff --git a/mm/slub.c b/mm/slub.c
index dac41cf0b94a..1510b86b2e7e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1431,13 +1431,17 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
void *next = *head;
void *old_tail = *tail ? *tail : *head;
int rsize;
+ next = *head;

- if (slab_want_init_on_free(s)) {
- void *p = NULL;
+ /* Head and tail of the reconstructed freelist */
+ *head = NULL;
+ *tail = NULL;

- do {
- object = next;
- next = get_freepointer(s, object);
+ do {
+ object = next;
+ next = get_freepointer(s, object);
+
+ if (slab_want_init_on_free(s)) {
/*
* Clear the object and the metadata, but don't touch
* the redzone.
@@ -1447,29 +1451,8 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
: 0;
memset((char *)object + s->inuse, 0,
s->size - s->inuse - rsize);
- set_freepointer(s, object, p);
- p = object;
- } while (object != old_tail);
- }
-
-/*
- * Compiler cannot detect this function can be removed if slab_free_hook()
- * evaluates to nothing. Thus, catch all relevant config debug options here.
- */
-#if defined(CONFIG_LOCKDEP) || \
- defined(CONFIG_DEBUG_KMEMLEAK) || \
- defined(CONFIG_DEBUG_OBJECTS_FREE) || \
- defined(CONFIG_KASAN)

- next = *head;
-
- /* Head and tail of the reconstructed freelist */
- *head = NULL;
- *tail = NULL;
-
- do {
- object = next;
- next = get_freepointer(s, object);
+ }
/* If object's reuse doesn't have to be delayed */
if (!slab_free_hook(s, object)) {
/* Move object to the new freelist */
@@ -1484,9 +1467,6 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
*tail = NULL;

return *head != NULL;
-#else
- return true;
-#endif
}

static void *setup_object(struct kmem_cache *s, struct page *page,

2019-11-06 09:30:32

by Thibaut Sautereau

[permalink] [raw]
Subject: Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

On Tue, Nov 05, 2019 at 12:01:15PM -0500, Laura Abbott wrote:
> On 11/5/19 10:02 AM, Vlastimil Babka wrote:
> > On 11/5/19 3:32 PM, Thibaut Sautereau wrote:
> > > On Tue, Nov 05, 2019 at 10:00:39AM +0100, Vlastimil Babka wrote:
> > > > On 11/4/19 6:03 PM, Thibaut Sautereau wrote:
> > > > > The BUG only happens when using `slub_debug=F` on the command-line (to
> > > > > enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
> > > > > reported and the system keeps running.
> > > >
> > > > You could change slub_debug parameter to:
> > > > slub_debug=FU,skbuff_head_cache
> > > >
> > > > That will also print out who previously allocated and freed the double
> > > > freed object. And limit all the tracking just to the affected cache.
> > >
> > > Thanks, I did not know about that.
> > >
> > > However, as kind of expected, I get a BUG due to a NULL pointer
> > > dereference in print_track():
> >
> > Ah, I didn't read properly your initial mail, that there's a null
> > pointer deference during the consistency check.
> >
> > ...
> >
> > > > >
> > > > > Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
> > > > > slab walking for init_on_free"), and indeed the BUG is not triggered
> > > > > when init_on_free is disabled.
> > > >
> > > > That could be either buggy SLUB code, or the commit somehow exposed a
> > > > real bug in skbuff users.
> > >
> > > Right. At first I thought about some incompatibility between
> > > init_on_free and SLAB_CONSISTENCY_CHECKS, but in that case why would it
> > > only happen with skbuff_head_cache?
> >
> > That's curious, yeah.
> >
> > > On the other hand, if it's a bug in
> > > skbuff users, why is the on_freelist() check in free_consistency_check()
> > > not detecting anything when init_on_free is disabled?
> >
> > I vaguely suspect the code in the commit 1b7e816fc80e you bisected,
> > where in slab_free_freelist_hook() in the first iteration, we have void
> > *p = NULL; and set_freepointer(s, object, p); will thus write NULL into
> > the freelist. Is is the NULL we are crashing on? The code seems to
> > assume that the freelist is rewritten later in the function, but that
> > part is only active with some CONFIG_ option(s), none of which might be
> > enabled in your case?
> > But I don't really understand what exactly this function is supposed to
> > do. Laura, does my theory make sense?
> >
> > Thanks,
> > Vlastimil
> >
>
> The note about getting re-written is referring to the fact that the trick
> with the bulk free is that we build the detached freelist and then when
> we do the cmpxchg it's getting (correctly) updated there.

Thank you Laura for this clarification.

> But looking at this again, I realize this function still has a more
> fundamental problem: walking the freelist like this means we actually
> end up reversing the list so head and tail are no longer pointing
> to the correct blocks. I was able to reproduce the issue by writing a
> simple kmem_cache_bulk_alloc/kmem_cache_bulk_free function. I'm
> guessing that the test of ping with an unusual size was enough to
> regularly trigger a non-trivial bulk alloc/free.
>
> The fix I gave before fixed part of the problem but not all of it.
> At this point we're basically duplicating the work of the loop
> below so I think we can just combine it. Was there a reason this
> wasn't just combined in the first place?

Good catch about the freelist inversion, I was actually starting to
wonder whether it was really intended. Anyway, I tested your patch (see
one tiny inline comment below) and it seems to run fine for now, without
me being able to reproduce the bug anymore.

> diff --git a/mm/slub.c b/mm/slub.c
> index dac41cf0b94a..1510b86b2e7e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1431,13 +1431,17 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> void *next = *head;
> void *old_tail = *tail ? *tail : *head;
> int rsize;
> + next = *head;

`next` is already set a few lines above.

> - if (slab_want_init_on_free(s)) {
> - void *p = NULL;
> + /* Head and tail of the reconstructed freelist */
> + *head = NULL;
> + *tail = NULL;
> - do {
> - object = next;
> - next = get_freepointer(s, object);
> + do {
> + object = next;
> + next = get_freepointer(s, object);
> +
> + if (slab_want_init_on_free(s)) {
> /*
> * Clear the object and the metadata, but don't touch
> * the redzone.
> @@ -1447,29 +1451,8 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> : 0;
> memset((char *)object + s->inuse, 0,
> s->size - s->inuse - rsize);
> - set_freepointer(s, object, p);
> - p = object;
> - } while (object != old_tail);
> - }
> -
> -/*
> - * Compiler cannot detect this function can be removed if slab_free_hook()
> - * evaluates to nothing. Thus, catch all relevant config debug options here.
> - */
> -#if defined(CONFIG_LOCKDEP) || \
> - defined(CONFIG_DEBUG_KMEMLEAK) || \
> - defined(CONFIG_DEBUG_OBJECTS_FREE) || \
> - defined(CONFIG_KASAN)
> - next = *head;
> -
> - /* Head and tail of the reconstructed freelist */
> - *head = NULL;
> - *tail = NULL;
> -
> - do {
> - object = next;
> - next = get_freepointer(s, object);
> + }
> /* If object's reuse doesn't have to be delayed */
> if (!slab_free_hook(s, object)) {
> /* Move object to the new freelist */
> @@ -1484,9 +1467,6 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> *tail = NULL;
> return *head != NULL;
> -#else
> - return true;
> -#endif
> }
> static void *setup_object(struct kmem_cache *s, struct page *page,
>

--
Thibaut Sautereau
CLIP OS developer