Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756506AbdLPLWY (ORCPT ); Sat, 16 Dec 2017 06:22:24 -0500 Received: from mail-wm0-f53.google.com ([74.125.82.53]:40613 "EHLO mail-wm0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752724AbdLPLWW (ORCPT ); Sat, 16 Dec 2017 06:22:22 -0500 X-Google-Smtp-Source: ACJfBotnuWkW7x3iW1PIXRZo8bsgwA0yMMWdmwH7xIkOWyNHqLAMVcem8yNS6xwI5TsOYycE96m21g== Subject: Re: BUG: unable to handle kernel NULL pointer dereference in fdb_find_rcu From: Nikolay Aleksandrov To: Andrei Vagin , Linux Kernel Network Developers , LKML Cc: "David S. Miller" , Toshiaki Makita , Stephen Hemminger , roopa References: <415b4093-4c58-c671-0df1-a6f650414416@cumulusnetworks.com> <8791c6f9-e69d-2269-f840-d8e05b0b44da@cumulusnetworks.com> <70905249-4854-726f-a3fb-258d25d2c1de@cumulusnetworks.com> Message-ID: <73a472e3-e521-1930-0643-a6f2714db6ce@cumulusnetworks.com> Date: Sat, 16 Dec 2017 13:22:17 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <70905249-4854-726f-a3fb-258d25d2c1de@cumulusnetworks.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2161 Lines: 55 On 16/12/17 12:40, Nikolay Aleksandrov wrote: > On 16/12/17 11:29, Nikolay Aleksandrov wrote: >> On 16/12/17 11:17, Nikolay Aleksandrov wrote: >>> On 16/12/17 02:37, Andrei Vagin wrote: >>>> Hi, >>>> >>>> We run criu tests for linux-next and today we get this bug: >>>> >>>> The kernel version is 4.15.0-rc3-next-20171215 >>>> >>>> [ 235.397328] BUG: unable to handle kernel NULL pointer dereference >>>> at 000000000000000c >>>> [ 235.398624] IP: fdb_find_rcu+0x3c/0x130 >>> [snip] >>> >>> Hi, >>> Thanks for the report, I've missed the changelink before dev creation case when I did >> >> err, s/changelink/br_stp_change_bridge_id/ >> the other options are set after register_netdevice, this is the only one changed before >> >>> the rhashtable conversion, some of the options do fdb lookups as part of their routine >>> but we don't have the table initialized yet at that point. >>> I'll send a fix after some testing. >>> >>> Thanks, >>> Nik >>> >>> >> > > We need to fix this in -net, it has a memory leak that has existed since the > introduction of br_stp_change_bridge_id() before register_netdevice because > it adds an fdb entry which never gets deleted if an error happens, also the > notifications for that fdb entry come with ifindex = 0 because the bridge netdev > doesn't exist yet. All of that looks wrong, I'll send a fix for -net to move > the bridge id change after the netdev register and cleanup any bridge fdbs > on error. > > The commit with that change is: > 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device") > Before the changelink in while doing newlink in bridge was possible, this would happen > only on netdev register fail, but now it is much easier to trigger (as below) since > changelink can fail if called with wrong arguments. > One more thing before sending the patch, the actual commit that introduced the fdb insert in br_stp_change_bridge_id() is: commit a4b816d8ba1c Author: Toshiaki Makita Date: Fri Feb 7 16:48:21 2014 +0900 bridge: Change local fdb entries whenever mac address of bridge device changes So that is what we need to fix.