Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754229AbdLPKlB (ORCPT ); Sat, 16 Dec 2017 05:41:01 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:36336 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751036AbdLPKk6 (ORCPT ); Sat, 16 Dec 2017 05:40:58 -0500 X-Google-Smtp-Source: ACJfBot9PyBgfp9kybjjEmMUb5e115b4Uz3dYkj+j/EM3OjSuhArMicTs3JxGlLh5TvYTBO5yDouHg== Subject: Re: BUG: unable to handle kernel NULL pointer dereference in fdb_find_rcu From: Nikolay Aleksandrov To: Andrei Vagin , Linux Kernel Network Developers , LKML References: <415b4093-4c58-c671-0df1-a6f650414416@cumulusnetworks.com> <8791c6f9-e69d-2269-f840-d8e05b0b44da@cumulusnetworks.com> Cc: "David S. Miller" , Toshiaki Makita , Stephen Hemminger , roopa Message-ID: <70905249-4854-726f-a3fb-258d25d2c1de@cumulusnetworks.com> Date: Sat, 16 Dec 2017 12:40:54 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <8791c6f9-e69d-2269-f840-d8e05b0b44da@cumulusnetworks.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3718 Lines: 77 On 16/12/17 11:29, Nikolay Aleksandrov wrote: > On 16/12/17 11:17, Nikolay Aleksandrov wrote: >> On 16/12/17 02:37, Andrei Vagin wrote: >>> Hi, >>> >>> We run criu tests for linux-next and today we get this bug: >>> >>> The kernel version is 4.15.0-rc3-next-20171215 >>> >>> [ 235.397328] BUG: unable to handle kernel NULL pointer dereference >>> at 000000000000000c >>> [ 235.398624] IP: fdb_find_rcu+0x3c/0x130 >> [snip] >> >> Hi, >> Thanks for the report, I've missed the changelink before dev creation case when I did > > err, s/changelink/br_stp_change_bridge_id/ > the other options are set after register_netdevice, this is the only one changed before > >> the rhashtable conversion, some of the options do fdb lookups as part of their routine >> but we don't have the table initialized yet at that point. >> I'll send a fix after some testing. >> >> Thanks, >> Nik >> >> > We need to fix this in -net, it has a memory leak that has existed since the introduction of br_stp_change_bridge_id() before register_netdevice because it adds an fdb entry which never gets deleted if an error happens, also the notifications for that fdb entry come with ifindex = 0 because the bridge netdev doesn't exist yet. All of that looks wrong, I'll send a fix for -net to move the bridge id change after the netdev register and cleanup any bridge fdbs on error. The commit with that change is: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device") Before the changelink in while doing newlink in bridge was possible, this would happen only on netdev register fail, but now it is much easier to trigger (as below) since changelink can fail if called with wrong arguments. Here's the trace of rmmod bridge after a failed bridge newlink with mac address set (this kernel is before my rhashtable change): $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1 RTNETLINK answers: Invalid argument $ rmmod bridge [ 1822.142525] ============================================================================= [ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown() [ 1822.144821] ----------------------------------------------------------------------------- [ 1822.145990] Disabling lock debugging due to kernel taint [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100 [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87 [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1822.150008] Call Trace: [ 1822.150510] dump_stack+0x78/0xa9 [ 1822.151156] slab_err+0xb1/0xd3 [ 1822.151834] ? __kmalloc+0x1bb/0x1ce [ 1822.152546] __kmem_cache_shutdown+0x151/0x28b [ 1822.153395] shutdown_cache+0x13/0x144 [ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb [ 1822.154669] SyS_delete_module+0x194/0x244 [ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a [ 1822.156343] RIP: 0033:0x7f929bd38b17 [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17 [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0 [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11 [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80 [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090 [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0 [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128