Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp656109iog; Thu, 30 Jun 2022 07:48:20 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tPBe732H9oB6c5SHFVJ539EH477u1zrrmsrcFXsrkOPiVN0szvjxnM91thWckdm8whKAky X-Received: by 2002:a05:6402:2999:b0:434:edcc:f12c with SMTP id eq25-20020a056402299900b00434edccf12cmr11967248edb.96.1656600500666; Thu, 30 Jun 2022 07:48:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656600500; cv=none; d=google.com; s=arc-20160816; b=dX+aBnZKSpbX+D4E1555UC6AFnschg71jE17+7Z34PXRQrRB2Oy+n9VfVYTOlmBIp7 p+/T/hprUN202orOgBHson6ZcwoZswLr8F5m1TSYynwlqjpmS58dwQwbx+cjTb0YqsUz HbIaPnK5duAW8Hh1Q4oREoOBdKhvkphuzcmDdDcbZMAzl0aYKLcOByZXnVzdWk+jE4q/ aTMUBGSPWfcJt2J4CjBi0SnRLcBZqcsbl9CMxl1xY8XH9Uln6NJ6QXYn/778CpjJ2E70 OOqUmfbmOB12uT8RDysdR9tyty+DZf01LmKxhIaEhEa1VaNfmFllkEB6uv0gylDCh+2b 9h1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=Uo8pIj61xDFUW3TW+CBsCWgdYrUC5EC9gRKrYHRfulc=; b=b5BQLuY6fFoF189QRT7TQNY233MhakKERd7QUHg3nPOIuWohYpIP8kr0CEKKz06TXE Z2yz1cUlLDKaWm2wweRUXGsUePnunfDQd7995sl1fsq7eMvOeCZBKKAhqIY0wqBlmtbk SfJMpuZcRAdlVNemqqJSKQ0dNJnaD/jKWRleGwWQk/XzMy2ZunCubCuZvoRKUQtkzg52 YIsarF8urHpZv7kQvj4qV5anzf2u7odMZkusJRrUfztayhVluRK90MrYyDX6Sho8KFTf p+LpafhSObccEkwNlh/kHiTBmHj/VgKBwDKUPEbgMp/0hMsMOp/b/WKjyKdXxUWQo/u2 q4LA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ht12-20020a170907608c00b006ff21848071si106457ejc.637.2022.06.30.07.47.55; Thu, 30 Jun 2022 07:48:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237149AbiF3OfF (ORCPT + 99 others); Thu, 30 Jun 2022 10:35:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232475AbiF3Oeo (ORCPT ); Thu, 30 Jun 2022 10:34:44 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46D3D68A1A; Thu, 30 Jun 2022 07:20:45 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=mqaio@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VHtYxCo_1656598839; Received: from 30.13.190.220(mailfrom:mqaio@linux.alibaba.com fp:SMTPD_---0VHtYxCo_1656598839) by smtp.aliyun-inc.com; Thu, 30 Jun 2022 22:20:41 +0800 Message-ID: <446b9cf0-8f98-b176-0f35-829004746c77@linux.alibaba.com> Date: Thu, 30 Jun 2022 22:20:39 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH] net: hinic: avoid kernel hung in hinic_get_stats64() To: Paolo Abeni , davem@davemloft.net, edumazet@google.com, kuba@kernel.org, gustavoars@kernel.org, cai.huoqing@linux.dev, aviad.krawczyk@huawei.com, zhaochen6@huawei.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org References: <07736c2b7019b6883076a06129e06e8f7c5f7154.1656487154.git.mqaio@linux.alibaba.com> <64e59afe33fff04861c800853a549f7979270f79.camel@redhat.com> From: maqiao In-Reply-To: <64e59afe33fff04861c800853a549f7979270f79.camel@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/6/30 下午5:56, Paolo Abeni 写道: > On Wed, 2022-06-29 at 15:28 +0800, Qiao Ma wrote: >> When using hinic device as a bond slave device, and reading device stats of >> master bond device, the kernel may hung. >> >> The kernel panic calltrace as follows: >> Kernel panic - not syncing: softlockup: hung tasks >> Call trace: >> native_queued_spin_lock_slowpath+0x1ec/0x31c >> dev_get_stats+0x60/0xcc >> dev_seq_printf_stats+0x40/0x120 >> dev_seq_show+0x1c/0x40 >> seq_read_iter+0x3c8/0x4dc >> seq_read+0xe0/0x130 >> proc_reg_read+0xa8/0xe0 >> vfs_read+0xb0/0x1d4 >> ksys_read+0x70/0xfc >> __arm64_sys_read+0x20/0x30 >> el0_svc_common+0x88/0x234 >> do_el0_svc+0x2c/0x90 >> el0_svc+0x1c/0x30 >> el0_sync_handler+0xa8/0xb0 >> el0_sync+0x148/0x180 >> >> And the calltrace of task that actually caused kernel hungs as follows: >> __switch_to+124 >> __schedule+548 >> schedule+72 >> schedule_timeout+348 >> __down_common+188 >> __down+24 >> down+104 >> hinic_get_stats64+44 [hinic] >> dev_get_stats+92 >> bond_get_stats+172 [bonding] >> dev_get_stats+92 >> dev_seq_printf_stats+60 >> dev_seq_show+24 >> seq_read_iter+964 >> seq_read+220 >> proc_reg_read+164 >> vfs_read+172 >> ksys_read+108 >> __arm64_sys_read+28 >> el0_svc_common+132 >> do_el0_svc+40 >> el0_svc+24 >> el0_sync_handler+164 >> el0_sync+324 >> >> When getting device stats from bond, kernel will call bond_get_stats(). >> It first holds the spinlock bond->stats_lock, and then call >> hinic_get_stats64() to collect hinic device's stats. >> However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to >> protect its critical section, which may schedule current task out. >> And if system is under high pressure, the task cannot be woken up >> immediately, which eventually triggers kernel hung panic. >> >> Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats") >> Signed-off-by: Qiao Ma > > Side note: it looks like that after this patch every section protected > by the mgmt_lock is already under rtnl lock protection, so you could > probably remove the hinic specific lock (in a separate, net-next, > patch). > > Please double check the above as I skimmed upon that quickly. Thank you, I need to carefully check each section will only be called through netlink dev_get_stats(). And I forgot to add prefix "net-next" in patch's title, forgive me... > > Thanks, > > Paolo