Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp666094iog; Thu, 30 Jun 2022 08:00:40 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uMqUz/Y7MUEQb2JJu/OdQQV2hxVGFhbOold0jX5DZG7Ln06K1c/PVZ06ASneDOdw0m+EUX X-Received: by 2002:a63:1b49:0:b0:411:c101:eda5 with SMTP id b9-20020a631b49000000b00411c101eda5mr819859pgm.600.1656601239794; Thu, 30 Jun 2022 08:00:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656601239; cv=none; d=google.com; s=arc-20160816; b=GPWMu350LR4FU1/ydq8T2ZwOasLFii4aYSw48zlLXy8JswbgVKJ8XPdFzTQtsHJC+e JHvJ6s4vcxDFPh2YBhFe4wJ/9nLgsieMCfAJQ5MSFhwGlmq7UrPKAurPWnztqapHwZMf loJypeGVNdC1n0WMFmi41fhNmbYVoYWOIVR0rXixJOZxoXyIxvNOyRWUwO+EES/wGO1c hvFaY+dnI0VlkdLRX2zEU9rdq2mE8NHpFYl8ocm1EnoxzFFm9AW2zJKpKdC8oic9Ljy8 fLWXx4SdkD1efVmqTPehmIqA+sF686MJRh9tGkglLtS0cFK5IgjfRaV5AjBg7s5a6WTW b1PA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=6o741WP3Q3yZXIZArGKbOmfrjsY4rQN1lNfuOqRhEL8=; b=mEJ61p1P266sUpjjM0zLp+HRPwwf54hXV6oiH7iihf/AVIJjJrqaok+scM8eMViKpq EyvsciUsvj5gzMl2qkhBFDhLh1ydeUiSYN0vyNKkVIKuPE+KE53V/GF3j/F/fnEuPA6V hixURVEwwX7BVGwPWBccwU0dZGEA/zhPwrbkxUKsMZw7qdyDOX9ciNIyY1zCArHYbAii Gj9mF4sxU7tmOiiCg4OMqQ6EXpsoNKnDYHrBj0escfL/xGKj+ACRQPkT2Ym/4bwv7GyD k57nB6CpVeqeZS0Qyp60504Cn0nUSWgAQuVbWl9DXLOKxYKPEl8pniInxE6C/4hB6ekU Aiyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q17-20020a63bc11000000b003ffafbc0463si25816549pge.369.2022.06.30.08.00.26; Thu, 30 Jun 2022 08:00:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236890AbiF3ONn (ORCPT + 99 others); Thu, 30 Jun 2022 10:13:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237442AbiF3ONP (ORCPT ); Thu, 30 Jun 2022 10:13:15 -0400 Received: from out199-7.us.a.mail.aliyun.com (out199-7.us.a.mail.aliyun.com [47.90.199.7]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A4E33B55F; Thu, 30 Jun 2022 06:57:52 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=mqaio@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VHtbqM8_1656597453; Received: from 30.39.134.59(mailfrom:mqaio@linux.alibaba.com fp:SMTPD_---0VHtbqM8_1656597453) by smtp.aliyun-inc.com; Thu, 30 Jun 2022 21:57:34 +0800 Message-ID: <8b012bbd-a175-5699-1f26-108dd52fc5b7@linux.alibaba.com> Date: Thu, 30 Jun 2022 21:57:33 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH] net: hinic: avoid kernel hung in hinic_get_stats64() To: Eric Dumazet Cc: David Miller , Jakub Kicinski , Paolo Abeni , gustavoars@kernel.org, cai.huoqing@linux.dev, Aviad Krawczyk , zhaochen6@huawei.com, netdev , LKML References: <07736c2b7019b6883076a06129e06e8f7c5f7154.1656487154.git.mqaio@linux.alibaba.com> From: maqiao In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/6/30 下午6:23, Eric Dumazet 写道: > On Wed, Jun 29, 2022 at 9:28 AM Qiao Ma wrote: >> >> When using hinic device as a bond slave device, and reading device stats of >> master bond device, the kernel may hung. >> >> The kernel panic calltrace as follows: >> Kernel panic - not syncing: softlockup: hung tasks >> Call trace: >> native_queued_spin_lock_slowpath+0x1ec/0x31c >> dev_get_stats+0x60/0xcc >> dev_seq_printf_stats+0x40/0x120 >> dev_seq_show+0x1c/0x40 >> seq_read_iter+0x3c8/0x4dc >> seq_read+0xe0/0x130 >> proc_reg_read+0xa8/0xe0 >> vfs_read+0xb0/0x1d4 >> ksys_read+0x70/0xfc >> __arm64_sys_read+0x20/0x30 >> el0_svc_common+0x88/0x234 >> do_el0_svc+0x2c/0x90 >> el0_svc+0x1c/0x30 >> el0_sync_handler+0xa8/0xb0 >> el0_sync+0x148/0x180 >> >> And the calltrace of task that actually caused kernel hungs as follows: >> __switch_to+124 >> __schedule+548 >> schedule+72 >> schedule_timeout+348 >> __down_common+188 >> __down+24 >> down+104 >> hinic_get_stats64+44 [hinic] >> dev_get_stats+92 >> bond_get_stats+172 [bonding] >> dev_get_stats+92 >> dev_seq_printf_stats+60 >> dev_seq_show+24 >> seq_read_iter+964 >> seq_read+220 >> proc_reg_read+164 >> vfs_read+172 >> ksys_read+108 >> __arm64_sys_read+28 >> el0_svc_common+132 >> do_el0_svc+40 >> el0_svc+24 >> el0_sync_handler+164 >> el0_sync+324 >> >> When getting device stats from bond, kernel will call bond_get_stats(). >> It first holds the spinlock bond->stats_lock, and then call >> hinic_get_stats64() to collect hinic device's stats. >> However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to >> protect its critical section, which may schedule current task out. >> And if system is under high pressure, the task cannot be woken up >> immediately, which eventually triggers kernel hung panic. >> >> Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats") >> Signed-off-by: Qiao Ma >> --- >> drivers/net/ethernet/huawei/hinic/hinic_dev.h | 1 + >> drivers/net/ethernet/huawei/hinic/hinic_main.c | 7 +++---- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h b/drivers/net/ethernet/huawei/hinic/hinic_dev.h >> index fb3e89141a0d..1fb343d03fd5 100644 >> --- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h >> +++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h >> @@ -97,6 +97,7 @@ struct hinic_dev { >> >> struct hinic_txq_stats tx_stats; >> struct hinic_rxq_stats rx_stats; >> + spinlock_t stats_lock; >> >> u8 rss_tmpl_idx; >> u8 rss_hash_engine; >> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c >> index 56a89793f47d..32a3d700ad26 100644 >> --- a/drivers/net/ethernet/huawei/hinic/hinic_main.c >> +++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c >> @@ -125,11 +125,13 @@ static void update_nic_stats(struct hinic_dev *nic_dev) >> { >> int i, num_qps = hinic_hwdev_num_qps(nic_dev->hwdev); >> >> + spin_lock(&nic_dev->stats_lock); >> for (i = 0; i < num_qps; i++) >> update_rx_stats(nic_dev, &nic_dev->rxqs[i]); >> >> for (i = 0; i < num_qps; i++) >> update_tx_stats(nic_dev, &nic_dev->txqs[i]); >> + spin_unlock(&nic_dev->stats_lock); >> } >> >> /** >> @@ -859,13 +861,9 @@ static void hinic_get_stats64(struct net_device *netdev, >> nic_rx_stats = &nic_dev->rx_stats; >> nic_tx_stats = &nic_dev->tx_stats; >> >> - down(&nic_dev->mgmt_lock); >> - >> if (nic_dev->flags & HINIC_INTF_UP)) >> update_nic_stats(nic_dev); >> >> - up(&nic_dev->mgmt_lock); >> - > > Note: The following is racy, because multiple threads can call > hinic_get_stats64() at the same time. > It needs a loop, see include/linux/u64_stats_sync.h for detail.Thanks for reminding, and I noticed that nic_tx_stats/nic_rx_stats has been protected by u64_stats_sync in update_t/rx_stats(), it seems that it's unnecessary to use spinlock in update_nic_stats(). I will send v2 as soon as possible, thanks. > >> stats->rx_bytes = nic_rx_stats->bytes; >> stats->rx_packets = nic_rx_stats->pkts; >> stats->rx_errors = nic_rx_stats->errors; > > Same remark for nic_tx_stats->{bytes|pkts|errors} > >> @@ -1239,6 +1237,7 @@ static int nic_dev_init(struct pci_dev *pdev) >> >> u64_stats_init(&tx_stats->syncp); >> u64_stats_init(&rx_stats->syncp); >> + spin_lock_init(&nic_dev->stats_lock); >> >> nic_dev->vlan_bitmap = devm_bitmap_zalloc(&pdev->dev, VLAN_N_VID, >> GFP_KERNEL); >> -- >> 1.8.3.1 >>