Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp2110892rwr; Fri, 21 Apr 2023 04:39:16 -0700 (PDT) X-Google-Smtp-Source: AKy350bFhFhjoWKqm5/H5ZDL+uwN+Z+uWp4+KZwoPTjHpipYpp6Wg1uiCAtchGPXh6PWhWPTl7ZE X-Received: by 2002:a05:6a20:440d:b0:dd:dfe4:f06a with SMTP id ce13-20020a056a20440d00b000dddfe4f06amr5466506pzb.3.1682077156304; Fri, 21 Apr 2023 04:39:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682077156; cv=none; d=google.com; s=arc-20160816; b=EJGtZvp5O7zT5Fmf6qd3wUB8C+QQfE5omlR1bjaIhrnrlm7ganIBTunY0i/eQEm+wK 0N4e6bjD1XOVTXGM0qyVY6iJ10hC4+ABVqQEaqBgpXMxuMd+88nYFHu0yNtZpuVoUUTT m8xw7E7Pjv9hyfaF0wwS7igl/X7/t7BtTD7hvDB3kpiREYoSAfqi407snjnDsVvSPbye k6S/lY8wZ9wmdaCFmm/nc+8uyqSkTbi2KOoVtVESTRa+DLAuexgfTuS6k0jajdvhbhe8 dAq38T7oL7nE/XnEuLW1NF4KBoKktkKrS5ottSFqbfmchdyOSKsJeFXt0gY4439ITJrT W+QA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature:dkim-signature; bh=6ql2gHO5bMSQBQmuGDoOOFMZIkHLkiYo8E4sbc80BhQ=; b=yfyfNaWH1+cwnr9uttMWlc3xvSP5+c0n5Zw4xIbXhrpYuDIQKAB+Nx8zGZ4uSTjBPU tXTyXSwEhmEzbtHLq6QoH5tiWG9SJuZopf/FQ6ZfmdjIS3DDN0jRWy3qgZNqCNTADKFd yF2P4uJi4Wm6fg6mlagbocALv6qCV8UQSmk892q64BwQESOGPNx0NQnXHmNMBbp6Wfoq qephs1OCh0NQjpBGHqivIIXtWScPN9H6UfC0lFyQW96LsSvrNYlkjAvR+JMvKMoVq7Y3 I+zr1ZKGUdeghFqxISFdQyhc6sBR60LDSZzGTMjf2GezMZJElEWi2UZRnToYBqbTetgf xGfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@alu.unizg.hr header.s=mail header.b=UMUmw7H1; dkim=fail header.i=@alu.unizg.hr header.s=mail header.b=xRxG1emW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alu.unizg.hr Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 13-20020a17090a08cd00b002467062c312si6740855pjn.128.2023.04.21.04.39.04; Fri, 21 Apr 2023 04:39:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@alu.unizg.hr header.s=mail header.b=UMUmw7H1; dkim=fail header.i=@alu.unizg.hr header.s=mail header.b=xRxG1emW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alu.unizg.hr Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232030AbjDULg4 (ORCPT + 99 others); Fri, 21 Apr 2023 07:36:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231675AbjDULgZ (ORCPT ); Fri, 21 Apr 2023 07:36:25 -0400 Received: from domac.alu.hr (domac.alu.unizg.hr [161.53.235.3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C636CCC0E; Fri, 21 Apr 2023 04:35:54 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by domac.alu.hr (Postfix) with ESMTP id F1E15600F2; Fri, 21 Apr 2023 13:35:51 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=alu.unizg.hr; s=mail; t=1682076952; bh=IGXmsfuT92LLRa2whQYPoOabsM4yoa1huIum4sr6paY=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=UMUmw7H1tbiK8/TCoS9hO9JwdWIZGWB/ELhzbpo0lTr/Fup5TWgNOaa2z8ESkMo4U kZfW0bZIU+ujtyaEjLS8YhAEbECelrsHHcU1c8LVW4N+oTPeB0i72GbgSbcY65/CxJ yjq+f5/BGlRbUQg+qJOFVmzq/xKpPRdfDNkuyAm6Lz6hmmZBQXBLnUPClsvMDnVMor i0//xHJ2KFNQXpzVQTJfEHnVA0mBnB9WiYVWkZosXBs+p0/4Ye5RNaPrRmUwDm9Qnv JZHsbByfpB8YEovdesZk5gq5z2sNuY5CVYVu6CJIx8b2DLCn9oaPD/jbYtA2mgLoGg W2q/Ic5T7dAFg== X-Virus-Scanned: Debian amavisd-new at domac.alu.hr Received: from domac.alu.hr ([127.0.0.1]) by localhost (domac.alu.hr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fIc5lsbTSreK; Fri, 21 Apr 2023 13:35:49 +0200 (CEST) Received: from [10.0.1.117] (grf-nat.grf.hr [161.53.83.23]) by domac.alu.hr (Postfix) with ESMTPSA id 07681600E9; Fri, 21 Apr 2023 13:35:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=alu.unizg.hr; s=mail; t=1682076949; bh=IGXmsfuT92LLRa2whQYPoOabsM4yoa1huIum4sr6paY=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=xRxG1emW3QdvM8PKii4T+o7xqJ+LY1OYWGsJndx8RMTTwfNsc26dDz7AL/n+bpaRC hTDcraBZWaCwAlQ7x+G9JHgj11uEiE6Rj1Eu5heiTEZ9GZ35KjDKo4VksHr1wj4cc+ 9ijqxGiGLPIbPQ84z+dEL7GFRtFzvZ87pnf+1qLLpLXqWvB9EfjQdG9EPo+KZpK9QD AAd5WdL2yxc/NAyvLDBmzsQ9GI0kqmhpQ0V5swTvNlUQ8HgzrXRL+lwhjT6V0KHaGN jbba9/wpfB4ePfOmyDkBxqH0D8dfwPxjb//PT9ryyPhdYEVZo0dWeBmzwef21ffQXE AyRD5HcHYxN7g== Message-ID: <060ff7a3-126f-3da5-4d93-0139e8fc4a9b@alu.unizg.hr> Date: Fri, 21 Apr 2023 13:35:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [BUG] [FIXED: TESTED] kmemleak in rtnetlink_rcv() triggered by selftests/drivers/net/team in build cdc9718d5e59 Content-Language: en-US, hr To: Ido Schimmel Cc: netdev@vger.kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Nikolay Aleksandrov , Florent Fourcot , Hangbin Liu , Petr Machata , Jiri Pirko , Xin Long , linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Jay Vosburgh , Andy Gospodarek References: <78a8a03b-6070-3e6b-5042-f848dab16fb8@alu.unizg.hr> <67b3fa90-ad29-29f1-e6f3-fb674d255a1e@alu.unizg.hr> <7650b2eb-0aee-a2b0-2e64-c9bc63210f67@alu.unizg.hr> From: Mirsad Goran Todorovac In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13.4.2023. 20:19, Ido Schimmel wrote: > On Mon, Apr 10, 2023 at 07:34:09PM +0200, Mirsad Goran Todorovac wrote: >> I've ran "make kselftest" with vanilla torvalds tree 6.3-rc5 + your patch. >> >> It failed two lines after "enslaved device client - ns-A IP" which passed OK. >> >> Is this hang for 5 hours in selftests: net: fcnal-test.sh test, at the line >> (please see to the end): > > It's not clear to me if the test failed for you or just got stuck. The > output below is all "[ OK ]". > > I ran the test with my patch and got: > > Tests passed: 875 > Tests failed: 5 > > I don't believe the failures are related to my patch given the test > doesn't use bonding. > > See more below. > >> >> # ########################################################################### >> # IPv4 address binds >> # ########################################################################### >> # >> # >> # ################################################################# >> # No VRF >> # >> # SYSCTL: net.ipv4.ping_group_range=0 2147483647 >> # >> # TEST: Raw socket bind to local address - ns-A IP [ OK ] >> # TEST: Raw socket bind to local address after device bind - ns-A IP [ OK ] >> # TEST: Raw socket bind to local address - ns-A loopback IP [ OK ] >> # TEST: Raw socket bind to local address after device bind - ns-A loopback IP [ OK ] >> # TEST: Raw socket bind to nonlocal address - nonlocal IP [ OK ] >> # TEST: TCP socket bind to nonlocal address - nonlocal IP [ OK ] >> # TEST: ICMP socket bind to nonlocal address - nonlocal IP [ OK ] >> # TEST: ICMP socket bind to broadcast address - broadcast [ OK ] >> # TEST: ICMP socket bind to multicast address - multicast [ OK ] >> # TEST: TCP socket bind to local address - ns-A IP [ OK ] >> # TEST: TCP socket bind to local address after device bind - ns-A IP [ OK ] >> # >> # ################################################################# >> # With VRF >> # >> # SYSCTL: net.ipv4.ping_group_range=0 2147483647 >> # >> # TEST: Raw socket bind to local address - ns-A IP [ OK ] >> # TEST: Raw socket bind to local address after device bind - ns-A IP [ OK ] >> # TEST: Raw socket bind to local address after VRF bind - ns-A IP [ OK ] >> # TEST: Raw socket bind to local address - VRF IP [ OK ] >> # TEST: Raw socket bind to local address after device bind - VRF IP [ OK ] >> # TEST: Raw socket bind to local address after VRF bind - VRF IP [ OK ] >> # TEST: Raw socket bind to out of scope address after VRF bind - ns-A loopback IP [ OK ] >> # TEST: Raw socket bind to nonlocal address after VRF bind - nonlocal IP [ OK ] >> # TEST: TCP socket bind to nonlocal address after VRF bind - nonlocal IP [ OK ] >> # TEST: ICMP socket bind to nonlocal address after VRF bind - nonlocal IP [ OK ] >> # TEST: ICMP socket bind to broadcast address after VRF bind - broadcast [ OK ] >> # TEST: ICMP socket bind to multicast address after VRF bind - multicast [ OK ] >> # TEST: TCP socket bind to local address - ns-A IP [ OK ] >> # TEST: TCP socket bind to local address after device bind - ns-A IP [ OK ] >> # TEST: TCP socket bind to local address - VRF IP [ OK ] >> # TEST: TCP socket bind to local address after device bind - VRF IP [ OK ] >> # TEST: TCP socket bind to invalid local address for VRF - ns-A loopback IP [ OK ] >> # TEST: TCP socket bind to invalid local address for device bind - ns-A loopback IP [ OK ] >> # >> # ########################################################################### >> # Run time tests - ipv4 >> # ########################################################################### >> # >> # TEST: Device delete with active traffic - ping in - ns-A IP [ OK ] >> # TEST: Device delete with active traffic - ping in - VRF IP [ OK ] >> # TEST: Device delete with active traffic - ping out - ns-B IP [ OK ] >> # TEST: TCP active socket, global server - ns-A IP [ OK ] >> # TEST: TCP active socket, global server - VRF IP [ OK ] >> # TEST: TCP active socket, VRF server - ns-A IP [ OK ] >> # TEST: TCP active socket, VRF server - VRF IP [ OK ] >> # TEST: TCP active socket, enslaved device server - ns-A IP [ OK ] >> # TEST: TCP active socket, VRF client - ns-A IP [ OK ] >> # TEST: TCP active socket, enslaved device client - ns-A IP [ OK ] >> # TEST: TCP active socket, global server, VRF client, local - ns-A IP [ OK ] >> # TEST: TCP active socket, global server, VRF client, local - VRF IP [ OK ] >> # TEST: TCP active socket, VRF server and client, local - ns-A IP [ OK ] >> # TEST: TCP active socket, VRF server and client, local - VRF IP [ OK ] >> # TEST: TCP active socket, global server, enslaved device client, local - ns-A IP [ OK ] >> # TEST: TCP active socket, VRF server, enslaved device client, local - ns-A IP [ OK ] >> # TEST: TCP active socket, enslaved device server and client, local - ns-A IP [ OK ] >> # TEST: TCP passive socket, global server - ns-A IP [ OK ] >> # TEST: TCP passive socket, global server - VRF IP [ OK ] >> # TEST: TCP passive socket, VRF server - ns-A IP [ OK ] >> # TEST: TCP passive socket, VRF server - VRF IP [ OK ] >> # TEST: TCP passive socket, enslaved device server - ns-A IP [ OK ] >> # TEST: TCP passive socket, VRF client - ns-A IP [ OK ] >> # TEST: TCP passive socket, enslaved device client - ns-A IP [ OK ] >> # TEST: TCP passive socket, global server, VRF client, local - ns-A IP [ OK ] >> >> Hope this helps. >> >> I also have a iwlwifi DEADLOCK and I don't know if these should be reported independently. >> (I don't think it is related to the patch.) > > If the test got stuck, then it might be related to the deadlock in > iwlwifi. Try running the test without iwlwifi and see if it helps. If > not, I suggest starting a different thread about this issue. > > Will submit the bonding patch over the weekend. Tested it again, with only the net selftest subtree: tools/testing/selftests/Makefile: TARGETS += drivers/net/bonding TARGETS += drivers/net/team TARGETS += net TARGETS += net/af_unix TARGETS += net/forwarding TARGETS += net/hsr # TARGETS += net/mptcp TARGETS += net/openvswitch TARGETS += netfilter and it failed to reproduce the hang. (NOTE: In fact, it was only a script stall forever, not a "kill -9 " non-killable process.) With or without iwlwifi module, now it appears to work as a standalone test. The problem might indeed be a spurious lockup in iwlwifi. I've noticed an attempt to lock a locked lock from within the interrupt in the journalctl logs, but I am really not that familiar with the iwlwifi driver's code ... It is apparently not a deterministic error bound to repeat with every test. I reckon the tests prior to the net subtree have done something to my kernel but thus far I could not isolate the culprit test. # tools/testing/selftests/net/fcnal-test.sh alone passes OK. > Thanks for testing Not at all. I apologise for the false alarm. Thanks for patching at such short notice. The patch closes the memory leak, and the latest change was obviously the most suspected one, but now it doesn't seem so. It would require more work to isolate the particular test that caused the hang, but I don't know if I have enough resources, mainly the time. And the guiding idea that I am going in the right direction. :-/ Best regards, Mirsad -- Mirsad Todorovac System engineer Faculty of Graphic Arts | Academy of Fine Arts University of Zagreb Republic of Croatia, the European Union Sistem inženjer Grafički fakultet | Akademija likovnih umjetnosti Sveučilište u Zagrebu