Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp3214913rwl; Sun, 9 Apr 2023 10:17:27 -0700 (PDT) X-Google-Smtp-Source: AKy350YltnrB1usPI7PIGXIJpYeuSbNNC3zdlxuZQwDIsaQteeo2XiJn9vyV1UcIyb6Zr5avbITT X-Received: by 2002:a17:902:c74a:b0:1a1:ee8c:eed7 with SMTP id q10-20020a170902c74a00b001a1ee8ceed7mr4986621plq.67.1681060647173; Sun, 09 Apr 2023 10:17:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681060647; cv=none; d=google.com; s=arc-20160816; b=ttwrJcdqyBcmYnWTSZaIE48cvevbx75v9qC7ffE1XOxgR/LN+2dpCt7Hzb9rNqZvl+ tY5CMVrgiKYOcqBRWeX4Cgjt3D3oh5kmRakUhWXeRh59lRRG4vOIq8J12zmW7JHB/Ex7 gSWh1MB8eqePv8qWLl07xbqzBOrjZAz5tM9pEO/87GyiLauFbUPs+1kFd04D5IDDDi6C 1Rglm8zxkl3rtbtcHQrIs/9ZYfG1iOVFS9r4UIgCgt9XTXBeImlAb24KUfJpjdHo++Nw TS6OWOrZ+zw9WEtw/HuvPH5ZtNwtjFd1P9XgqsiDDUQqlEqtZetbB44U9KrEfcYsgnOz Krmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:feedback-id :dkim-signature; bh=EVicW71akCt/y1tM1F+st9STMxkjr4NI+o/dAkIr1Hg=; b=F06s/8qVSUk/+hAqfE4d7BgyyRA6rHtAgvxUDcHIErktkgNWFtumsl5Rd92i/cs4W7 NXOfhCXM8zIUgWjdt27JyZ5cQ1jD9hf2Qp4q91Y2CTVzJebtfHR5ItB2y6ZkBlUKHgAh pBkGazd4S4Mahs4FCSrtaGa3s8uPRHqETUbbgO1qlxgHjHZN+BeHYUd5YD3j46bXVMJo YTLmAFhnpunloW/Lm0vyrCHIAJ65gL1iW9To+DegfCz4VeO+iMQb9+Uy27RnyNslk/ss SVF11uYvuxBa6tfyLaejVPSiWeypmApC68+yLKAUABlQF9MgTmw2T/tb/oerqKicWol9 bn9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=BqsKBC2S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v8-20020a1709028d8800b0019cd5c8593bsi8749185plo.328.2023.04.09.10.17.15; Sun, 09 Apr 2023 10:17:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=BqsKBC2S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229612AbjDIRO3 (ORCPT + 99 others); Sun, 9 Apr 2023 13:14:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229463AbjDIRO2 (ORCPT ); Sun, 9 Apr 2023 13:14:28 -0400 Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B787030F8; Sun, 9 Apr 2023 10:14:23 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 4E54132001C6; Sun, 9 Apr 2023 13:14:18 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Sun, 09 Apr 2023 13:14:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1681060457; x=1681146857; bh=EVicW71akCt/y 1tM1F+st9STMxkjr4NI+o/dAkIr1Hg=; b=BqsKBC2SNkqVqZL4/ojSOoypccc/2 /s+8VG7Km1OEtSeynHaJ6tykzbwk2N2aodNOGHPSnCrQytK81XUr3WeSetwo7IQv 2p9ac0Zpci54Foc/I39kg33J0Bg9NbDWghxbDmVhvj8lT+FCRywypjgn+QavaK5E 8g2Ioty3h+Fndpy5YEvFaF6uQ/1dsa4WnH2VYGg2KaznAGTGNQ3qjOogWVWfTu/3 +YzWRFjZ3IQCM315l1OzSU8gYyMEqD275TNNzh2U72dcmyxA3CIH8K00vRTX8eZg 9yVTX7sHhsJKSX0r/nVyxKOqrwnwW0POnnqOeYYn3qDzmGx+WMZ+U4ysQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvdektddgudduvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpeffhffvvefukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpefkugho ucfutghhihhmmhgvlhcuoehiughoshgthhesihguohhstghhrdhorhhgqeenucggtffrrg htthgvrhhnpedtueeggeelgffgveehfeeftefhjeejveeltdfgjeekhefgueehvdeiffeg heffffenucffohhmrghinhepsggvlhhofidrtggrthenucevlhhushhtvghrufhiiigvpe dtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehiughoshgthhesihguohhstghhrdhorhhg X-ME-Proxy: Feedback-ID: i494840e7:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 9 Apr 2023 13:14:16 -0400 (EDT) Date: Sun, 9 Apr 2023 20:14:13 +0300 From: Ido Schimmel To: Mirsad Goran Todorovac Cc: netdev@vger.kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Nikolay Aleksandrov , Florent Fourcot , Hangbin Liu , Petr Machata , Jiri Pirko , Xin Long , linux-kernel@vger.kernel.org, bpf@vger.kernel.org Subject: Re: [BUG] kmemleak in rtnetlink_rcv() triggered by selftests/drivers/net/team in build cdc9718d5e59 Message-ID: References: <78a8a03b-6070-3e6b-5042-f848dab16fb8@alu.unizg.hr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <78a8a03b-6070-3e6b-5042-f848dab16fb8@alu.unizg.hr> X-Spam-Status: No, score=-0.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Apr 09, 2023 at 01:49:30PM +0200, Mirsad Goran Todorovac wrote: > Hi all, > > There appears to be a memleak triggered by the selftest drivers/net/team. Thanks for the report. Not sure it's related to team, see below. > > # cat /sys/kernel/debug/kmemleak > unreferenced object 0xffff8c18def8ee00 (size 256): > comm "ip", pid 5727, jiffies 4294961159 (age 954.244s) > hex dump (first 32 bytes): > 00 20 09 de 18 8c ff ff 00 00 00 00 00 00 00 00 . .............. > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [] slab_post_alloc_hook+0x8c/0x3e0 > [] __kmem_cache_alloc_node+0x1d9/0x2a0 > [] kmalloc_trace+0x2e/0xc0 > [] vlan_vid_add+0x11b/0x290 > [] vlan_device_event+0x19c/0x880 > [] raw_notifier_call_chain+0x47/0x70 > [] call_netdevice_notifiers_info+0x50/0xa0 > [] dev_open+0x94/0xa0 > [] 0xffffffffc176515e Don't know what this is. Might be another issue. > [] do_set_master+0x90/0xb0 > [] do_setlink+0x514/0x11f0 > [] __rtnl_newlink+0x4e7/0xa10 > [] rtnl_newlink+0x4c/0x70 > [] rtnetlink_rcv_msg+0x184/0x5d0 > [] netlink_rcv_skb+0x5e/0x110 > [] rtnetlink_rcv+0x19/0x20 > unreferenced object 0xffff8c18250d3700 (size 32): > comm "ip", pid 5727, jiffies 4294961159 (age 954.244s) > hex dump (first 32 bytes): > a0 ee f8 de 18 8c ff ff a0 ee f8 de 18 8c ff ff ................ > 81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ > backtrace: > [] slab_post_alloc_hook+0x8c/0x3e0 > [] __kmem_cache_alloc_node+0x1d9/0x2a0 > [] kmalloc_trace+0x2e/0xc0 > [] vlan_vid_add+0x174/0x290 > [] vlan_device_event+0x19c/0x880 > [] raw_notifier_call_chain+0x47/0x70 > [] call_netdevice_notifiers_info+0x50/0xa0 > [] dev_open+0x94/0xa0 > [] 0xffffffffc176515e > [] do_set_master+0x90/0xb0 > [] do_setlink+0x514/0x11f0 > [] __rtnl_newlink+0x4e7/0xa10 > [] rtnl_newlink+0x4c/0x70 > [] rtnetlink_rcv_msg+0x184/0x5d0 > [] netlink_rcv_skb+0x5e/0x110 > [] rtnetlink_rcv+0x19/0x20 > unreferenced object 0xffff8c1846e16800 (size 256): > comm "ip", pid 7837, jiffies 4295135225 (age 258.160s) > hex dump (first 32 bytes): > 00 20 f7 de 18 8c ff ff 00 00 00 00 00 00 00 00 . .............. > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [] slab_post_alloc_hook+0x8c/0x3e0 > [] __kmem_cache_alloc_node+0x1d9/0x2a0 > [] kmalloc_trace+0x2e/0xc0 > [] vlan_vid_add+0x11b/0x290 > [] vlan_device_event+0x19c/0x880 > [] raw_notifier_call_chain+0x47/0x70 > [] call_netdevice_notifiers_info+0x50/0xa0 > [] dev_open+0x94/0xa0 > [] bond_enslave+0x34e/0x1840 [bonding] This shows that the issue is related to the bond driver, not team. > [] do_set_master+0x90/0xb0 > [] do_setlink+0x514/0x11f0 > [] __rtnl_newlink+0x4e7/0xa10 > [] rtnl_newlink+0x4c/0x70 > [] rtnetlink_rcv_msg+0x184/0x5d0 > [] netlink_rcv_skb+0x5e/0x110 > [] rtnetlink_rcv+0x19/0x20 > unreferenced object 0xffff8c184c5ff2a0 (size 32): This is 'struct vlan_vid_info' > comm "ip", pid 7837, jiffies 4295135225 (age 258.160s) > hex dump (first 32 bytes): > a0 68 e1 46 18 8c ff ff a0 68 e1 46 18 8c ff ff .h.F.....h.F.... > 81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ ^ VLAN ID 0 > backtrace: > [] slab_post_alloc_hook+0x8c/0x3e0 > [] __kmem_cache_alloc_node+0x1d9/0x2a0 > [] kmalloc_trace+0x2e/0xc0 > [] vlan_vid_add+0x174/0x290 > [] vlan_device_event+0x19c/0x880 > [] raw_notifier_call_chain+0x47/0x70 > [] call_netdevice_notifiers_info+0x50/0xa0 > [] dev_open+0x94/0xa0 > [] bond_enslave+0x34e/0x1840 [bonding] > [] do_set_master+0x90/0xb0 > [] do_setlink+0x514/0x11f0 > [] __rtnl_newlink+0x4e7/0xa10 > [] rtnl_newlink+0x4c/0x70 > [] rtnetlink_rcv_msg+0x184/0x5d0 > [] netlink_rcv_skb+0x5e/0x110 > [] rtnetlink_rcv+0x19/0x20 VLAN ID 0 is automatically added by the 8021q driver when a net device is opened. In this case it's a device being enslaved to a bond. I believe the issue was exposed by the new bond test that was added in commit 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") as part of v6.3-rc3. The VLAN is supposed to be removed by the 8021q driver when a net device is closed and the bond driver indeed calls dev_close() when a slave is removed. However, this function is a NOP when 'IFF_UP' is not set. Unfortunately, when a bond changes its type to Ethernet this flag is incorrectly cleared in bond_ether_setup(), causing this VLAN to linger. As far as I can tell, it's not a new issue. Temporary fix is [1]. Please test it although we might end up with a different fix (needs more thinking and it's already late here). Reproduced using [2]. You can see in the before/after output how the flag is cleared/retained [3]. [1] diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 236e5219c811..50dc068dc259 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1777,14 +1777,15 @@ void bond_lower_state_changed(struct slave *slave) /* The bonding driver uses ether_setup() to convert a master bond device * to ARPHRD_ETHER, that resets the target netdevice's flags so we always - * have to restore the IFF_MASTER flag, and only restore IFF_SLAVE if it was set + * have to restore the IFF_MASTER flag, and only restore IFF_SLAVE and IFF_UP + * if they were set */ static void bond_ether_setup(struct net_device *bond_dev) { - unsigned int slave_flag = bond_dev->flags & IFF_SLAVE; + unsigned int flags = bond_dev->flags & (IFF_SLAVE | IFF_UP); ether_setup(bond_dev); - bond_dev->flags |= IFF_MASTER | slave_flag; + bond_dev->flags |= IFF_MASTER | flags; bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; } [2] #!/bin/bash ip link add name t-nlmon type nlmon ip link add name t-dummy type dummy ip link add name t-bond type bond mode active-backup ip link set dev t-bond up ip link set dev t-nlmon master t-bond ip link set dev t-nlmon nomaster ip link show dev t-bond ip link set dev t-dummy master t-bond ip link show dev t-bond ip link del dev t-bond ip link del dev t-dummy ip link del dev t-nlmon [3] Before: 12: t-bond: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/netlink 12: t-bond: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether ce:b2:31:0a:53:83 brd ff:ff:ff:ff:ff:ff After: 12: t-bond: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/netlink 12: t-bond: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 5a:18:e7:85:11:73 brd ff:ff:ff:ff:ff:ff