Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp3432214pxp; Mon, 14 Mar 2022 20:08:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzh9Aw7gxPZ7rJfGI7ar3VEU8Wy9orSDPUwdh9bWZ5x6vrvMxhOWzK7nlqWkCOaF0yRItPK X-Received: by 2002:a05:6402:13c6:b0:415:fb66:fb60 with SMTP id a6-20020a05640213c600b00415fb66fb60mr23567818edx.337.1647313700294; Mon, 14 Mar 2022 20:08:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647313700; cv=none; d=google.com; s=arc-20160816; b=TxLz/t0WzRLdtaWOaZrIPG83AZq6k4MZGHrJkGLrAaTzMb8bv+ZNssZ599vRvXc8+6 hiuhK41Fd7riaEdyiuLFCxBEPBN5PAmLkZSu/Cub6tnqwrfwDI9II41+iwchkk58KvJC PuqhxIYnaPAGzpAQ+ux+Xub37/T2cyeQQTSr1Bs3StmXM5O+oj3cv1M2hxYw5IGPlTYq Mmvv0v8HFdFJ8pTM+0iQw9BOk7od8nB6IDlxvkshM4ItHXkLsEPaGoyNYjkzRq+MBL4P KKRuCaXs5VDJQy4uMdDpazSNg+o0MMUOUBX+MmkwDKX8RbjtcLSqyDEiR78/mre7M9MK PmcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:mime-version:date :message-id; bh=KiF/7uhtojIAFVVSgn7OaiM2i2Zlp+aGywN10bx6Nuk=; b=aheKgAsnvAWZb0dR3QqtVTfMegVJf3eXEKXuUgKgAA9ywhMb5Rkwk85SMqPWZD+sWW qnXGSYuspeQ1VuSCBxs6LOWSWv7TJ1+RIf37okB65wyqAHRxC9/MKzuJThwVsGCPNuLv kb+0OoA/zVK5LQuoEcgvUp0yDi8X6C52b+sMCijVr0Qxb7Hp+NAebR2PVHE7tT9BpRu7 ZB0gW0Yxs/1Yzgh7ZLL77qQVQEm57GANboJ+ABfJrYHWequ8gk97lCA9OaPS4DB3mF/q 1pAGx5URaqmLEO6ty7BuZTRS2SL1KHZFcuadAHpVsvrGf0FFC3V/GxJ+ECxZ+BGHG2d0 tQog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a5-20020a1709063a4500b006db77c1188asi8466505ejf.820.2022.03.14.20.07.55; Mon, 14 Mar 2022 20:08:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243049AbiCNRI7 (ORCPT + 99 others); Mon, 14 Mar 2022 13:08:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242841AbiCNRI4 (ORCPT ); Mon, 14 Mar 2022 13:08:56 -0400 X-Greylist: delayed 603 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Mon, 14 Mar 2022 10:07:45 PDT Received: from mout-u-204.mailbox.org (mout-u-204.mailbox.org [91.198.250.253]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81C69B4F; Mon, 14 Mar 2022 10:07:45 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:105:465:1:4:0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-u-204.mailbox.org (Postfix) with ESMTPS id 4KHMvH5fyLz9sW2; Mon, 14 Mar 2022 17:49:15 +0100 (CET) Message-ID: <43418c23-5efd-4d14-706f-f536c504b75a@denx.de> Date: Mon, 14 Mar 2022 17:49:09 +0100 MIME-Version: 1.0 Subject: Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Content-Language: en-US To: Jeremi Piotrowski , linux-kernel@vger.kernel.org Cc: Thomas Gleixner , linux-pci@vger.kernel.org, Bjorn Helgaas , Michal Simek , Marek Vasut , stable@vger.kernel.org, x86@kernel.org, maz@kernel.org, Dusty Mabe References: <20211210161025.3287927-1-sr@denx.de> <163948488617.23020.3934435568065766936.tip-bot2@tip-bot2> From: Stefan Roese In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NEUTRAL,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jeremi, (added Dusty to Cc) On 3/14/22 17:36, Jeremi Piotrowski wrote: > Hi Thomas, Hi Stefan, > > On Tue, Dec 14, 2021 at 12:28:06PM -0000, tip-bot2 for Stefan Roese wrote: >> The following commit has been merged into the irq/urgent branch of tip: >> >> Commit-ID: 83dbf898a2d45289be875deb580e93050ba67529 >> Gitweb: https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529 >> Author: Stefan Roese >> AuthorDate: Tue, 14 Dec 2021 12:49:32 +01:00 >> Committer: Thomas Gleixner >> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00 >> >> PCI/MSI: Mask MSI-X vectors only on success >> >> Masking all unused MSI-X entries is done to ensure that a crash kernel >> starts from a clean slate, which correponds to the reset state of the >> device as defined in the PCI-E specificion 3.0 and later: >> >> Vector Control for MSI-X Table Entries >> -------------------------------------- >> >> "00: Mask bit: When this bit is set, the function is prohibited from >> sending a message using this MSI-X Table entry. >> ... >> This bit’s state after reset is 1 (entry is masked)." >> >> A Marvell NVME device fails to deliver MSI interrupts after trying to >> enable MSI-X interrupts due to that masking. It seems to take the MSI-X >> mask bits into account even when MSI-X is disabled. >> >> While not specification compliant, this can be cured by moving the masking >> into the success path, so that the MSI-X table entries stay in device reset >> state when the MSI-X setup fails. >> >> [ tglx: Move it into the success path, add comment and amend changelog ] >> >> Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries") >> Signed-off-by: Stefan Roese >> Signed-off-by: Thomas Gleixner >> Cc: linux-pci@vger.kernel.org >> Cc: Bjorn Helgaas >> Cc: Michal Simek >> Cc: Marek Vasut >> Cc: stable@vger.kernel.org >> Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de >> --- >> drivers/pci/msi.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >> index 48e3f4e..6748cf9 100644 >> --- a/drivers/pci/msi.c >> +++ b/drivers/pci/msi.c >> @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, >> goto out_disable; >> } >> >> - /* Ensure that all table entries are masked. */ >> - msix_mask_all(base, tsize); >> - >> ret = msix_setup_entries(dev, base, entries, nvec, affd); >> if (ret) >> goto out_disable; >> @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, >> /* Set MSI-X enabled bits and unmask the function */ >> pci_intx_for_msi(dev, 0); >> dev->msix_enabled = 1; >> + >> + /* >> + * Ensure that all table entries are masked to prevent >> + * stale entries from firing in a crash kernel. >> + * >> + * Done late to deal with a broken Marvell NVME device >> + * which takes the MSI-X mask bits into account even >> + * when MSI-X is disabled, which prevents MSI delivery. >> + */ >> + msix_mask_all(base, tsize); >> pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); >> >> pcibios_free_irq(dev); > > We've had reports of issues with AWS m4 instances, which use Intel 82559 VFs > for networking (ixgbevf) with MSI-X interrupts, which I've bisected down to > this commit. Since this commit these VMs no longer have any network connectivity > and so fail to boot. This occurs with both 5.15 and 5.10 kernels, reverting the > backport of this commit restores networking. > > Do you have any suggestions of how this can be resolved other than a revert? > > Here's the full bisect log: > > $ git bisect log > git bisect start > # good: [4e8c680af6d51ba9315e31bd4f7599e080561a2d] Linux 5.15.7 > git bisect good 4e8c680af6d51ba9315e31bd4f7599e080561a2d > # bad: [efe3167e52a5833ec20ee6214be9b99b378564a8] Linux 5.15.27 > git bisect bad efe3167e52a5833ec20ee6214be9b99b378564a8 > # bad: [63dcc388662c3562de94d69bfa771ae4cd29b79f] Linux 5.15.16 > git bisect bad 63dcc388662c3562de94d69bfa771ae4cd29b79f > # good: [57dcae4a8b93271c4e370920ea0dbb94a0215d30] Linux 5.15.10 > git bisect good 57dcae4a8b93271c4e370920ea0dbb94a0215d30 > # bad: [25960cafa06e6fcd830e6c792e6a7de68c1e25ed] Linux 5.15.12 > git bisect bad 25960cafa06e6fcd830e6c792e6a7de68c1e25ed > # bad: [fb6ad5cb3b6745e7bffc5fe19b130f3594375634] Linux 5.15.11 > git bisect bad fb6ad5cb3b6745e7bffc5fe19b130f3594375634 > # good: [257b3bb16634fd936129fe2f57a91594a75b8751] drm/amd/pm: fix a potential gpu_metrics_table memory leak > git bisect good 257b3bb16634fd936129fe2f57a91594a75b8751 > # bad: [bbdaa7a48f465a2ee76d65839caeda08af1ef3b2] btrfs: fix double free of anon_dev after failure to create subvolume > git bisect bad bbdaa7a48f465a2ee76d65839caeda08af1ef3b2 > # good: [c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6] bpf, selftests: Fix racing issue in btf_skc_cls_ingress test > git bisect good c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6 > # bad: [5cb5c3e1b184da9f49e46119a0e506519fc58185] usb: xhci: Extend support for runtime power management for AMD's Yellow carp. > git bisect bad 5cb5c3e1b184da9f49e46119a0e506519fc58185 > # good: [e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous > git bisect good e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4 > # good: [4df1af29930b03d61fb774bfaa5100dbdb964628] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error > git bisect good 4df1af29930b03d61fb774bfaa5100dbdb964628 > # bad: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success > git bisect bad d8888cdabedf353ab9b5a6af75f70bf341a3e7df > # first bad commit: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success I've added Dusty to Cc, as he (and others) already have been dealing with this issue AFAICT. Dusty, could you perhaps chime in with the latest status? AFAIU, it's related to potential issues with the Xen version used on these systems? Thanks, Stefan