2017-12-16 00:37:48

by Andrei Vagin

[permalink] [raw]
Subject: BUG: unable to handle kernel NULL pointer dereference in fdb_find_rcu

Hi,

We run criu tests for linux-next and today we get this bug:

The kernel version is 4.15.0-rc3-next-20171215

[ 235.397328] BUG: unable to handle kernel NULL pointer dereference
at 000000000000000c
[ 235.398624] IP: fdb_find_rcu+0x3c/0x130
[ 235.399365] PGD 51970067 P4D 51970067 PUD 51971067 PMD 0
[ 235.400400] Oops: 0000 [#1] SMP
[ 235.400959] Modules linked in:
[ 235.401455] CPU: 1 PID: 32057 Comm: criu Not tainted
4.15.0-rc3-next-20171215-00002-g2e56147d7dc8 #10
[ 235.402935] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1.fc26 04/01/2014
[ 235.404356] RIP: 0010:fdb_find_rcu+0x3c/0x130
[ 235.405069] RSP: 0018:ffffbc64c05f76e8 EFLAGS: 00010246
[ 235.405908] RAX: 000000000000e871 RBX: 0000000000000000 RCX: 0000000000000000
[ 235.407062] RDX: 0000000000000000 RSI: ffff9d9b69a04798 RDI: ffff9d9b64c14948
[ 235.408370] RBP: ffff9d9b64c14948 R08: ffffbc64c05f7c18 R09: ffff9d9b64c140a0
[ 235.409518] R10: 0000000064c140a0 R11: ffff9d9b75743968 R12: ffff9d9b757ce260
[ 235.410674] R13: 0000000000000000 R14: ffffbc64c05f7c18 R15: ffffffffa5384560
[ 235.411837] FS: 00007ffac2660100(0000) GS:ffff9d9b7fd00000(0000)
knlGS:0000000000000000
[ 235.412920] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.413764] CR2: 000000000000000c CR3: 000000006a7ff004 CR4: 00000000003606e0
[ 235.414829] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 235.416096] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 235.417365] Call Trace:
[ 235.417774] br_fdb_change_mac_address+0x31/0x80
[ 235.418538] br_stp_change_bridge_id+0x25/0x110
[ 235.419280] br_dev_newlink+0x43/0xa0
[ 235.419796] rtnl_newlink+0x70f/0x940
[ 235.420391] ? nla_parse+0x83/0xf0
[ 235.420948] ? nla_strlcpy+0x48/0x50
[ 235.421540] ? rtnl_link_ops_get+0x34/0x50
[ 235.422211] ? rtnl_newlink+0x193/0x940
[ 235.422843] ? deactivate_slab.isra.78+0x11b/0x3b0
[ 235.423599] rtnetlink_rcv_msg+0x25d/0x2d0
[ 235.424189] ? __alloc_skb+0x82/0x1e0
[ 235.424718] ? __slab_alloc+0x1c/0x30
[ 235.425420] ? rtnl_calcit.isra.27+0x110/0x110
[ 235.426354] netlink_rcv_skb+0x8d/0x130
[ 235.426995] netlink_unicast+0x19d/0x250
[ 235.427754] netlink_sendmsg+0x2a5/0x3a0
[ 235.428387] sock_sendmsg+0x30/0x40
[ 235.428807] ___sys_sendmsg+0x269/0x2c0
[ 235.429277] ? generic_perform_write+0x122/0x1b0
[ 235.429828] ? __generic_file_write_iter+0x192/0x1c0
[ 235.430422] ? ext4_file_write_iter+0x20c/0x3e0
[ 235.430975] ? SyS_setns+0xc2/0xd0
[ 235.431425] ? __vfs_write+0xf9/0x170
[ 235.431886] ? __sys_sendmsg+0x51/0x90
[ 235.432501] __sys_sendmsg+0x51/0x90
[ 235.433098] entry_SYSCALL_64_fastpath+0x1a/0x7d
[ 235.433822] RIP: 0033:0x7ffac1e170f7
[ 235.434409] RSP: 002b:00007ffc5610c688 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 235.435635] RAX: ffffffffffffffda RBX: 00000000022eb010 RCX: 00007ffac1e170f7
[ 235.436796] RDX: 0000000000000000 RSI: 00007ffc5610c6c0 RDI: 0000000000000002
[ 235.437956] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 235.439125] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc5610cbac
[ 235.440556] R13: 0005005800000011 R14: 00007ffac2666158 R15: 0000000000000001
[ 235.442175] Code: fd 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89
44 24 08 31 c0 8b 06 48 8b 1f 66 89 54 24 06 89 04 24 0f b7 46 04 66
89 44 24 04 <8b> 43 0c 8b 14 24 8d b0 f7 be ad de 8b 44 24 04 01 f2 01
f0 89
[ 235.444717] RIP: fdb_find_rcu+0x3c/0x130 RSP: ffffbc64c05f76e8
[ 235.445636] CR2: 000000000000000c
[ 235.446278] ---[ end trace b77358b1e42a9dd7 ]---
[ 235.447170] Kernel panic - not syncing: Fatal exception in interrupt
[ 235.448379] Kernel Offset: 0x23000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 235.449750] ---[ end Kernel panic - not syncing: Fatal exception in interrupt


2017-12-16 09:17:26

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference in fdb_find_rcu

On 16/12/17 02:37, Andrei Vagin wrote:
> Hi,
>
> We run criu tests for linux-next and today we get this bug:
>
> The kernel version is 4.15.0-rc3-next-20171215
>
> [ 235.397328] BUG: unable to handle kernel NULL pointer dereference
> at 000000000000000c
> [ 235.398624] IP: fdb_find_rcu+0x3c/0x130
[snip]

Hi,
Thanks for the report, I've missed the changelink before dev creation case when I did
the rhashtable conversion, some of the options do fdb lookups as part of their routine
but we don't have the table initialized yet at that point.
I'll send a fix after some testing.

Thanks,
Nik


2017-12-16 09:29:16

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference in fdb_find_rcu

On 16/12/17 11:17, Nikolay Aleksandrov wrote:
> On 16/12/17 02:37, Andrei Vagin wrote:
>> Hi,
>>
>> We run criu tests for linux-next and today we get this bug:
>>
>> The kernel version is 4.15.0-rc3-next-20171215
>>
>> [ 235.397328] BUG: unable to handle kernel NULL pointer dereference
>> at 000000000000000c
>> [ 235.398624] IP: fdb_find_rcu+0x3c/0x130
> [snip]
>
> Hi,
> Thanks for the report, I've missed the changelink before dev creation case when I did

err, s/changelink/br_stp_change_bridge_id/
the other options are set after register_netdevice, this is the only one changed before

> the rhashtable conversion, some of the options do fdb lookups as part of their routine
> but we don't have the table initialized yet at that point.
> I'll send a fix after some testing.
>
> Thanks,
> Nik
>
>

2017-12-16 10:41:01

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference in fdb_find_rcu

On 16/12/17 11:29, Nikolay Aleksandrov wrote:
> On 16/12/17 11:17, Nikolay Aleksandrov wrote:
>> On 16/12/17 02:37, Andrei Vagin wrote:
>>> Hi,
>>>
>>> We run criu tests for linux-next and today we get this bug:
>>>
>>> The kernel version is 4.15.0-rc3-next-20171215
>>>
>>> [ 235.397328] BUG: unable to handle kernel NULL pointer dereference
>>> at 000000000000000c
>>> [ 235.398624] IP: fdb_find_rcu+0x3c/0x130
>> [snip]
>>
>> Hi,
>> Thanks for the report, I've missed the changelink before dev creation case when I did
>
> err, s/changelink/br_stp_change_bridge_id/
> the other options are set after register_netdevice, this is the only one changed before
>
>> the rhashtable conversion, some of the options do fdb lookups as part of their routine
>> but we don't have the table initialized yet at that point.
>> I'll send a fix after some testing.
>>
>> Thanks,
>> Nik
>>
>>
>

We need to fix this in -net, it has a memory leak that has existed since the
introduction of br_stp_change_bridge_id() before register_netdevice because
it adds an fdb entry which never gets deleted if an error happens, also the
notifications for that fdb entry come with ifindex = 0 because the bridge netdev
doesn't exist yet. All of that looks wrong, I'll send a fix for -net to move
the bridge id change after the netdev register and cleanup any bridge fdbs
on error.

The commit with that change is:
30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
Before the changelink in while doing newlink in bridge was possible, this would happen
only on netdev register fail, but now it is much easier to trigger (as below) since
changelink can fail if called with wrong arguments.

Here's the trace of rmmod bridge after a failed bridge newlink with mac address set
(this kernel is before my rhashtable change):
$ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
RTNETLINK answers: Invalid argument
$ rmmod bridge
[ 1822.142525] =============================================================================
[ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown()
[ 1822.144821] -----------------------------------------------------------------------------

[ 1822.145990] Disabling lock debugging due to kernel taint
[ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100
[ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87
[ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1822.150008] Call Trace:
[ 1822.150510] dump_stack+0x78/0xa9
[ 1822.151156] slab_err+0xb1/0xd3
[ 1822.151834] ? __kmalloc+0x1bb/0x1ce
[ 1822.152546] __kmem_cache_shutdown+0x151/0x28b
[ 1822.153395] shutdown_cache+0x13/0x144
[ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb
[ 1822.154669] SyS_delete_module+0x194/0x244
[ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a
[ 1822.156343] RIP: 0033:0x7f929bd38b17
[ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17
[ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0
[ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11
[ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80
[ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090
[ 1822.161278] INFO: Object 0x000000007645de29 @offset=0
[ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128


2017-12-16 11:22:24

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference in fdb_find_rcu

On 16/12/17 12:40, Nikolay Aleksandrov wrote:
> On 16/12/17 11:29, Nikolay Aleksandrov wrote:
>> On 16/12/17 11:17, Nikolay Aleksandrov wrote:
>>> On 16/12/17 02:37, Andrei Vagin wrote:
>>>> Hi,
>>>>
>>>> We run criu tests for linux-next and today we get this bug:
>>>>
>>>> The kernel version is 4.15.0-rc3-next-20171215
>>>>
>>>> [ 235.397328] BUG: unable to handle kernel NULL pointer dereference
>>>> at 000000000000000c
>>>> [ 235.398624] IP: fdb_find_rcu+0x3c/0x130
>>> [snip]
>>>
>>> Hi,
>>> Thanks for the report, I've missed the changelink before dev creation case when I did
>>
>> err, s/changelink/br_stp_change_bridge_id/
>> the other options are set after register_netdevice, this is the only one changed before
>>
>>> the rhashtable conversion, some of the options do fdb lookups as part of their routine
>>> but we don't have the table initialized yet at that point.
>>> I'll send a fix after some testing.
>>>
>>> Thanks,
>>> Nik
>>>
>>>
>>
>
> We need to fix this in -net, it has a memory leak that has existed since the
> introduction of br_stp_change_bridge_id() before register_netdevice because
> it adds an fdb entry which never gets deleted if an error happens, also the
> notifications for that fdb entry come with ifindex = 0 because the bridge netdev
> doesn't exist yet. All of that looks wrong, I'll send a fix for -net to move
> the bridge id change after the netdev register and cleanup any bridge fdbs
> on error.
>
> The commit with that change is:
> 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
> Before the changelink in while doing newlink in bridge was possible, this would happen
> only on netdev register fail, but now it is much easier to trigger (as below) since
> changelink can fail if called with wrong arguments.
>

One more thing before sending the patch, the actual commit that introduced the fdb
insert in br_stp_change_bridge_id() is:
commit a4b816d8ba1c
Author: Toshiaki Makita <[email protected]>
Date: Fri Feb 7 16:48:21 2014 +0900

bridge: Change local fdb entries whenever mac address of bridge device changes

So that is what we need to fix.