Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp3748609pxv; Mon, 19 Jul 2021 07:53:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw3dpMrYw+j2S7KivC4Wdvpi4CUbjIuUFZ/feeofAfNV5R38IBk5TZyJUYtkNZsv9LTvLMc X-Received: by 2002:a6b:ec0d:: with SMTP id c13mr4300458ioh.108.1626706384769; Mon, 19 Jul 2021 07:53:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626706384; cv=none; d=google.com; s=arc-20160816; b=VlNjdBw2qQKFfTooXI4h9v5Kk6X0HSZFTNFjotUBtJ8W9Ms/kAm/aG4O4RgPh8QX3f z0XcEKQts/vRJhRvwrNP/2zNDqjx6wkndOych3d86rYQYcDi7CsMrZ/tT18Ov5xgQGPj nYTRzjOrfqUgHqEuAg5ezBmrMZyXLXfuHOTX3vfYhmdmyd2Nhk2PnjDbklqMvirg3AqR 092OpxHe99aNL6KZPkT0CEjxCr6yHI/Yh5FT4kXqd/rAsFfypLYpv3txBXKZd158ifDY uI9KvRbM7T9tE/2hvmI629/QzbgXvPWa2fStKFJXC41dTFNgR0qwcqmYW/bukfm42INi ZMiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=2TMqggDy1d4GjwkeBWFENMCiF9IVcT7UERyMIqhwYKo=; b=QrGBuKUvXbxESQmvPjiU62rZiJuhgyyXr9CY6FN6Ti5upQFpYnH+FToB4iCvIVfW/w AJOdrZL2rwVwXxuVBB5k3CINq+qVcxLCIUx2jT7StIKu68rjPjB3209X4NEVpdHKZAOg iL3C8cEVVZZUcM/jOw8G7hZZcPIKu2HGI1sA8Lma04hYq7jN0PUGW06+tYBq8Dopt+cj 0zWBgquIooGQnRiFtesQoV+ELu1jvomggj4ubuKwDnAP/yOc/kGB4QUar5TtSpnVGJVe m2UYnOqh/3V8bE/l2LDah675MZJCiHO5kPqJBkNnuVvnJ+JZNcqrOoVrERjUEoF39WKG OFdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cePslZ91; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n24si18917626ioz.5.2021.07.19.07.52.52; Mon, 19 Jul 2021 07:53:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cePslZ91; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241312AbhGSOLj (ORCPT + 99 others); Mon, 19 Jul 2021 10:11:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241406AbhGSOLA (ORCPT ); Mon, 19 Jul 2021 10:11:00 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFF96C061574; Mon, 19 Jul 2021 07:19:45 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id f17so22403476wrt.6; Mon, 19 Jul 2021 07:51:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2TMqggDy1d4GjwkeBWFENMCiF9IVcT7UERyMIqhwYKo=; b=cePslZ91T589YXi+3HkXUOwh7qnmfP0xhw4Qm8qke5RPoUaOTCXMBbnt5FwRU+446L g1LcU7pZVTxoYFwIBxr1awlaMkj7dGrYj2KAEQaKVPaGuEpo8n+cE4glttLHzbyfBpEr bKmSTOc8mNjgqtXOj3NHPxRf/tdPK3EaZernnrC0PNKrqkmrG0R9aCzuvbtChw1Ws0/c BmICbm/pkEemp7ThE+EXCfvvMN+fzqAZ1AAznQF6ZMLGYzbQIU+Tx+vMJG+Ior1sGkGb fKRQK3OyUDgijP6LmQVPc0nxJ63hYLdRshhkLZcpY415xApz2M+tXXe28pDTjMYcF3vy cbKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2TMqggDy1d4GjwkeBWFENMCiF9IVcT7UERyMIqhwYKo=; b=dn7ln2jmwM4DINGKp6fUCrGDPYiwe9JDPmmjpLhX381B4mMlvb8MhBViGDjQPYy4NA cq8qY1AJQu9Vl6Ua2qOaRsxcU4zdTvoK7rfm8Uk9FkLGzGqn7y4fze3Kmw86xpZxA0kj 3hfHQno9duXNCKVhNWue9E27FKtCn/CPFLGkQQ15VehXXFLlRSP+F5do2uagT+g5mPjG vnEhTjDqZPhJtuiCpKCmnzdGKEzOso8MVm93iYLuZSrqKNCkmunL0QSvt4aESw3fyDSc hhI2rRKOxNrgQqQe2uGK/VIlczFqsluurrLVsHZ029qF8/8KC7bjxAb39g7FPdq1DwSt r3NQ== X-Gm-Message-State: AOAM531HXFBXn0tMblnU1zFm3SVSMM7Ah7JyUp5G1KQnvu6GPdXmuuky GfjbSzOwLjZLlEncXdvGxYLDsXeDD5lTP0nkymg= X-Received: by 2002:adf:ed46:: with SMTP id u6mr30387204wro.252.1626706291240; Mon, 19 Jul 2021 07:51:31 -0700 (PDT) MIME-Version: 1.0 References: <20210715080822.14575-1-justin.he@arm.com> In-Reply-To: From: Prabhakar Kushwaha Date: Mon, 19 Jul 2021 20:20:54 +0530 Message-ID: Subject: Re: [PATCH] qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union() To: Justin He Cc: Ariel Elior , "GR-everest-linux-l2@marvell.com" , "David S. Miller" , Jakub Kicinski , "netdev@vger.kernel.org" , Linux Kernel Mailing List , nd , Shai Malin , Shai Malin , Prabhakar Kushwaha Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Justin, On Mon, Jul 19, 2021 at 6:47 PM Justin He wrote: > > Hi Prabhakar > > > -----Original Message----- > > From: Prabhakar Kushwaha > > Sent: Monday, July 19, 2021 6:36 PM > > To: Justin He > > Cc: Ariel Elior ; GR-everest-linux-l2@marvell.com; > > David S. Miller ; Jakub Kicinski ; > > netdev@vger.kernel.org; Linux Kernel Mailing List > kernel@vger.kernel.org>; nd ; Shai Malin ; > > Shai Malin ; Prabhakar Kushwaha > > Subject: Re: [PATCH] qed: fix possible unpaired spin_{un}lock_bh in > > _qed_mcp_cmd_and_union() > > > > Hi Jia, > > > > On Thu, Jul 15, 2021 at 2:28 PM Jia He wrote: > > > > > > Liajian reported a bug_on hit on a ThunderX2 arm64 server with FastLinQ > > > QL41000 ethernet controller: > > > BUG: scheduling while atomic: kworker/0:4/531/0x00000200 > > > [qed_probe:488()]hw prepare failed > > > kernel BUG at mm/vmalloc.c:2355! > > > Internal error: Oops - BUG: 0 [#1] SMP > > > CPU: 0 PID: 531 Comm: kworker/0:4 Tainted: G W 5.4.0-77-generic #86- > > Ubuntu > > > pstate: 00400009 (nzcv daif +PAN -UAO) > > > Call trace: > > > vunmap+0x4c/0x50 > > > iounmap+0x48/0x58 > > > qed_free_pci+0x60/0x80 [qed] > > > qed_probe+0x35c/0x688 [qed] > > > __qede_probe+0x88/0x5c8 [qede] > > > qede_probe+0x60/0xe0 [qede] > > > local_pci_probe+0x48/0xa0 > > > work_for_cpu_fn+0x24/0x38 > > > process_one_work+0x1d0/0x468 > > > worker_thread+0x238/0x4e0 > > > kthread+0xf0/0x118 > > > ret_from_fork+0x10/0x18 > > > > > > In this case, qed_hw_prepare() returns error due to hw/fw error, but in > > > theory work queue should be in process context instead of interrupt. > > > > > > The root cause might be the unpaired spin_{un}lock_bh() in > > > _qed_mcp_cmd_and_union(), which causes botton half is disabled > > incorrectly. > > > > > > Reported-by: Lijian Zhang > > > Signed-off-by: Jia He > > > --- > > > > This patch is adding additional spin_{un}lock_bh(). > > Can you please enlighten about the exact flow causing this unpaired > > spin_{un}lock_bh. > > > For instance: > _qed_mcp_cmd_and_union() > In while loop > spin_lock_bh() > qed_mcp_has_pending_cmd() (assume false), will break the loop I agree till here. > if (cnt >= max_retries) { > ... > return -EAGAIN; <-- here returns -EAGAIN without invoking bh unlock > } > Because of break, cnt has not been increased. - cnt is still less than max_retries. - if (cnt >= max_retries) will not be *true*, leading to spin_unlock_bh(). Hence pairing completed. I am not seeing any issue here. > > Also, > > as per description, looks like you are not sure actual the root-cause. > > does this patch really solved the problem? > > I don't have that ThunderX2 to verify the patch. > But I searched all the spin_lock/unlock_bh and spin_lock_irqsave/irqrestore > under driver/.../qlogic, this is the only problematic point I could figure > out. And this might be possible code path of qed_probe(). > Without testing and proper root-cause, it is tough to accept the suggested fix. --pk