The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 83dbf898a2d45289be875deb580e93050ba67529
Gitweb: https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
Author: Stefan Roese <[email protected]>
AuthorDate: Tue, 14 Dec 2021 12:49:32 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00
PCI/MSI: Mask MSI-X vectors only on success
Masking all unused MSI-X entries is done to ensure that a crash kernel
starts from a clean slate, which correponds to the reset state of the
device as defined in the PCI-E specificion 3.0 and later:
Vector Control for MSI-X Table Entries
--------------------------------------
"00: Mask bit: When this bit is set, the function is prohibited from
sending a message using this MSI-X Table entry.
...
This bit’s state after reset is 1 (entry is masked)."
A Marvell NVME device fails to deliver MSI interrupts after trying to
enable MSI-X interrupts due to that masking. It seems to take the MSI-X
mask bits into account even when MSI-X is disabled.
While not specification compliant, this can be cured by moving the masking
into the success path, so that the MSI-X table entries stay in device reset
state when the MSI-X setup fails.
[ tglx: Move it into the success path, add comment and amend changelog ]
Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Stefan Roese <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Bjorn Helgaas <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Marek Vasut <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
---
drivers/pci/msi.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 48e3f4e..6748cf9 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
goto out_disable;
}
- /* Ensure that all table entries are masked. */
- msix_mask_all(base, tsize);
-
ret = msix_setup_entries(dev, base, entries, nvec, affd);
if (ret)
goto out_disable;
@@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
/* Set MSI-X enabled bits and unmask the function */
pci_intx_for_msi(dev, 0);
dev->msix_enabled = 1;
+
+ /*
+ * Ensure that all table entries are masked to prevent
+ * stale entries from firing in a crash kernel.
+ *
+ * Done late to deal with a broken Marvell NVME device
+ * which takes the MSI-X mask bits into account even
+ * when MSI-X is disabled, which prevents MSI delivery.
+ */
+ msix_mask_all(base, tsize);
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
pcibios_free_irq(dev);
Hi Jeremi,
(added Dusty to Cc)
On 3/14/22 17:36, Jeremi Piotrowski wrote:
> Hi Thomas, Hi Stefan,
>
> On Tue, Dec 14, 2021 at 12:28:06PM -0000, tip-bot2 for Stefan Roese wrote:
>> The following commit has been merged into the irq/urgent branch of tip:
>>
>> Commit-ID: 83dbf898a2d45289be875deb580e93050ba67529
>> Gitweb: https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
>> Author: Stefan Roese <[email protected]>
>> AuthorDate: Tue, 14 Dec 2021 12:49:32 +01:00
>> Committer: Thomas Gleixner <[email protected]>
>> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00
>>
>> PCI/MSI: Mask MSI-X vectors only on success
>>
>> Masking all unused MSI-X entries is done to ensure that a crash kernel
>> starts from a clean slate, which correponds to the reset state of the
>> device as defined in the PCI-E specificion 3.0 and later:
>>
>> Vector Control for MSI-X Table Entries
>> --------------------------------------
>>
>> "00: Mask bit: When this bit is set, the function is prohibited from
>> sending a message using this MSI-X Table entry.
>> ...
>> This bit’s state after reset is 1 (entry is masked)."
>>
>> A Marvell NVME device fails to deliver MSI interrupts after trying to
>> enable MSI-X interrupts due to that masking. It seems to take the MSI-X
>> mask bits into account even when MSI-X is disabled.
>>
>> While not specification compliant, this can be cured by moving the masking
>> into the success path, so that the MSI-X table entries stay in device reset
>> state when the MSI-X setup fails.
>>
>> [ tglx: Move it into the success path, add comment and amend changelog ]
>>
>> Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")
>> Signed-off-by: Stefan Roese <[email protected]>
>> Signed-off-by: Thomas Gleixner <[email protected]>
>> Cc: [email protected]
>> Cc: Bjorn Helgaas <[email protected]>
>> Cc: Michal Simek <[email protected]>
>> Cc: Marek Vasut <[email protected]>
>> Cc: [email protected]
>> Link: https://lore.kernel.org/r/[email protected]
>> ---
>> drivers/pci/msi.c | 13 ++++++++++---
>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 48e3f4e..6748cf9 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>> goto out_disable;
>> }
>>
>> - /* Ensure that all table entries are masked. */
>> - msix_mask_all(base, tsize);
>> -
>> ret = msix_setup_entries(dev, base, entries, nvec, affd);
>> if (ret)
>> goto out_disable;
>> @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>> /* Set MSI-X enabled bits and unmask the function */
>> pci_intx_for_msi(dev, 0);
>> dev->msix_enabled = 1;
>> +
>> + /*
>> + * Ensure that all table entries are masked to prevent
>> + * stale entries from firing in a crash kernel.
>> + *
>> + * Done late to deal with a broken Marvell NVME device
>> + * which takes the MSI-X mask bits into account even
>> + * when MSI-X is disabled, which prevents MSI delivery.
>> + */
>> + msix_mask_all(base, tsize);
>> pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>>
>> pcibios_free_irq(dev);
>
> We've had reports of issues with AWS m4 instances, which use Intel 82559 VFs
> for networking (ixgbevf) with MSI-X interrupts, which I've bisected down to
> this commit. Since this commit these VMs no longer have any network connectivity
> and so fail to boot. This occurs with both 5.15 and 5.10 kernels, reverting the
> backport of this commit restores networking.
>
> Do you have any suggestions of how this can be resolved other than a revert?
>
> Here's the full bisect log:
>
> $ git bisect log
> git bisect start
> # good: [4e8c680af6d51ba9315e31bd4f7599e080561a2d] Linux 5.15.7
> git bisect good 4e8c680af6d51ba9315e31bd4f7599e080561a2d
> # bad: [efe3167e52a5833ec20ee6214be9b99b378564a8] Linux 5.15.27
> git bisect bad efe3167e52a5833ec20ee6214be9b99b378564a8
> # bad: [63dcc388662c3562de94d69bfa771ae4cd29b79f] Linux 5.15.16
> git bisect bad 63dcc388662c3562de94d69bfa771ae4cd29b79f
> # good: [57dcae4a8b93271c4e370920ea0dbb94a0215d30] Linux 5.15.10
> git bisect good 57dcae4a8b93271c4e370920ea0dbb94a0215d30
> # bad: [25960cafa06e6fcd830e6c792e6a7de68c1e25ed] Linux 5.15.12
> git bisect bad 25960cafa06e6fcd830e6c792e6a7de68c1e25ed
> # bad: [fb6ad5cb3b6745e7bffc5fe19b130f3594375634] Linux 5.15.11
> git bisect bad fb6ad5cb3b6745e7bffc5fe19b130f3594375634
> # good: [257b3bb16634fd936129fe2f57a91594a75b8751] drm/amd/pm: fix a potential gpu_metrics_table memory leak
> git bisect good 257b3bb16634fd936129fe2f57a91594a75b8751
> # bad: [bbdaa7a48f465a2ee76d65839caeda08af1ef3b2] btrfs: fix double free of anon_dev after failure to create subvolume
> git bisect bad bbdaa7a48f465a2ee76d65839caeda08af1ef3b2
> # good: [c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6] bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
> git bisect good c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6
> # bad: [5cb5c3e1b184da9f49e46119a0e506519fc58185] usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
> git bisect bad 5cb5c3e1b184da9f49e46119a0e506519fc58185
> # good: [e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
> git bisect good e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4
> # good: [4df1af29930b03d61fb774bfaa5100dbdb964628] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
> git bisect good 4df1af29930b03d61fb774bfaa5100dbdb964628
> # bad: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
> git bisect bad d8888cdabedf353ab9b5a6af75f70bf341a3e7df
> # first bad commit: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
I've added Dusty to Cc, as he (and others) already have been dealing
with this issue AFAICT.
Dusty, could you perhaps chime in with the latest status? AFAIU, it's
related to potential issues with the Xen version used on these systems?
Thanks,
Stefan
On 3/14/22 12:49, Stefan Roese wrote:
> I've added Dusty to Cc, as he (and others) already have been dealing
> with this issue AFAICT.
>
> Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> related to potential issues with the Xen version used on these systems?
Thanks Stefan,
Yes. My understanding is that the issue is because AWS is using older versions
of Xen. They are in the process of updating their fleet to a newer version of
Xen so the change introduced with Stefan's commit isn't an issue any longer.
I think the changes are scheduled to be completed in the next 10-12 weeks. For
now we are carrying a revert in the Fedora Kernel.
You can follow this Fedora CoreOS issue if you'd like to know more about when
the change lands in their backend. We work closely with one of their partner
engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
Dusty
Hi Thomas, Hi Stefan,
On Tue, Dec 14, 2021 at 12:28:06PM -0000, tip-bot2 for Stefan Roese wrote:
> The following commit has been merged into the irq/urgent branch of tip:
>
> Commit-ID: 83dbf898a2d45289be875deb580e93050ba67529
> Gitweb: https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
> Author: Stefan Roese <[email protected]>
> AuthorDate: Tue, 14 Dec 2021 12:49:32 +01:00
> Committer: Thomas Gleixner <[email protected]>
> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00
>
> PCI/MSI: Mask MSI-X vectors only on success
>
> Masking all unused MSI-X entries is done to ensure that a crash kernel
> starts from a clean slate, which correponds to the reset state of the
> device as defined in the PCI-E specificion 3.0 and later:
>
> Vector Control for MSI-X Table Entries
> --------------------------------------
>
> "00: Mask bit: When this bit is set, the function is prohibited from
> sending a message using this MSI-X Table entry.
> ...
> This bit’s state after reset is 1 (entry is masked)."
>
> A Marvell NVME device fails to deliver MSI interrupts after trying to
> enable MSI-X interrupts due to that masking. It seems to take the MSI-X
> mask bits into account even when MSI-X is disabled.
>
> While not specification compliant, this can be cured by moving the masking
> into the success path, so that the MSI-X table entries stay in device reset
> state when the MSI-X setup fails.
>
> [ tglx: Move it into the success path, add comment and amend changelog ]
>
> Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")
> Signed-off-by: Stefan Roese <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Michal Simek <[email protected]>
> Cc: Marek Vasut <[email protected]>
> Cc: [email protected]
> Link: https://lore.kernel.org/r/[email protected]
> ---
> drivers/pci/msi.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 48e3f4e..6748cf9 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> goto out_disable;
> }
>
> - /* Ensure that all table entries are masked. */
> - msix_mask_all(base, tsize);
> -
> ret = msix_setup_entries(dev, base, entries, nvec, affd);
> if (ret)
> goto out_disable;
> @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> /* Set MSI-X enabled bits and unmask the function */
> pci_intx_for_msi(dev, 0);
> dev->msix_enabled = 1;
> +
> + /*
> + * Ensure that all table entries are masked to prevent
> + * stale entries from firing in a crash kernel.
> + *
> + * Done late to deal with a broken Marvell NVME device
> + * which takes the MSI-X mask bits into account even
> + * when MSI-X is disabled, which prevents MSI delivery.
> + */
> + msix_mask_all(base, tsize);
> pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>
> pcibios_free_irq(dev);
We've had reports of issues with AWS m4 instances, which use Intel 82559 VFs
for networking (ixgbevf) with MSI-X interrupts, which I've bisected down to
this commit. Since this commit these VMs no longer have any network connectivity
and so fail to boot. This occurs with both 5.15 and 5.10 kernels, reverting the
backport of this commit restores networking.
Do you have any suggestions of how this can be resolved other than a revert?
Here's the full bisect log:
$ git bisect log
git bisect start
# good: [4e8c680af6d51ba9315e31bd4f7599e080561a2d] Linux 5.15.7
git bisect good 4e8c680af6d51ba9315e31bd4f7599e080561a2d
# bad: [efe3167e52a5833ec20ee6214be9b99b378564a8] Linux 5.15.27
git bisect bad efe3167e52a5833ec20ee6214be9b99b378564a8
# bad: [63dcc388662c3562de94d69bfa771ae4cd29b79f] Linux 5.15.16
git bisect bad 63dcc388662c3562de94d69bfa771ae4cd29b79f
# good: [57dcae4a8b93271c4e370920ea0dbb94a0215d30] Linux 5.15.10
git bisect good 57dcae4a8b93271c4e370920ea0dbb94a0215d30
# bad: [25960cafa06e6fcd830e6c792e6a7de68c1e25ed] Linux 5.15.12
git bisect bad 25960cafa06e6fcd830e6c792e6a7de68c1e25ed
# bad: [fb6ad5cb3b6745e7bffc5fe19b130f3594375634] Linux 5.15.11
git bisect bad fb6ad5cb3b6745e7bffc5fe19b130f3594375634
# good: [257b3bb16634fd936129fe2f57a91594a75b8751] drm/amd/pm: fix a potential gpu_metrics_table memory leak
git bisect good 257b3bb16634fd936129fe2f57a91594a75b8751
# bad: [bbdaa7a48f465a2ee76d65839caeda08af1ef3b2] btrfs: fix double free of anon_dev after failure to create subvolume
git bisect bad bbdaa7a48f465a2ee76d65839caeda08af1ef3b2
# good: [c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6] bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
git bisect good c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6
# bad: [5cb5c3e1b184da9f49e46119a0e506519fc58185] usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
git bisect bad 5cb5c3e1b184da9f49e46119a0e506519fc58185
# good: [e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
git bisect good e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4
# good: [4df1af29930b03d61fb774bfaa5100dbdb964628] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
git bisect good 4df1af29930b03d61fb774bfaa5100dbdb964628
# bad: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
git bisect bad d8888cdabedf353ab9b5a6af75f70bf341a3e7df
# first bad commit: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
Bests,
Jeremi
On Mon, Mar 14, 2022 at 01:04:55PM -0400, Dusty Mabe wrote:
>
>
> On 3/14/22 12:49, Stefan Roese wrote:
>
> > I've added Dusty to Cc, as he (and others) already have been dealing
> > with this issue AFAICT.
> >
> > Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> > related to potential issues with the Xen version used on these systems?
>
> Thanks Stefan,
>
> Yes. My understanding is that the issue is because AWS is using older versions
> of Xen. They are in the process of updating their fleet to a newer version of
> Xen so the change introduced with Stefan's commit isn't an issue any longer.
>
> I think the changes are scheduled to be completed in the next 10-12 weeks. For
> now we are carrying a revert in the Fedora Kernel.
>
> You can follow this Fedora CoreOS issue if you'd like to know more about when
> the change lands in their backend. We work closely with one of their partner
> engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
>
> Dusty
Thanks for the link and explanation. What a fun coincidence that we hit this in
Flatcar Container Linux as well. We've reverted the commit in our kernels for
the time being, and will track that issue.
Thanks,
Jeremi
Hi,
On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
> On Mon, Mar 14, 2022 at 01:04:55PM -0400, Dusty Mabe wrote:
> >
> >
> > On 3/14/22 12:49, Stefan Roese wrote:
> >
> > > I've added Dusty to Cc, as he (and others) already have been dealing
> > > with this issue AFAICT.
> > >
> > > Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> > > related to potential issues with the Xen version used on these systems?
> >
> > Thanks Stefan,
> >
> > Yes. My understanding is that the issue is because AWS is using older versions
> > of Xen. They are in the process of updating their fleet to a newer version of
> > Xen so the change introduced with Stefan's commit isn't an issue any longer.
> >
> > I think the changes are scheduled to be completed in the next 10-12 weeks. For
> > now we are carrying a revert in the Fedora Kernel.
> >
> > You can follow this Fedora CoreOS issue if you'd like to know more about when
> > the change lands in their backend. We work closely with one of their partner
> > engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
> >
> > Dusty
>
> Thanks for the link and explanation. What a fun coincidence that we hit this in
> Flatcar Container Linux as well. We've reverted the commit in our kernels for
> the time being, and will track that issue.
Does someone knows here on current state of the AWS instances using
the older Xen version which causes the issues?
AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
MSI-X vectors only on success") as it fixed a bug. Now several
downstream distros do carry a revert of this commit, which I believe
is an unfortunate situation and wonder if this can be addressed
upstream to deal with the AWS m4.large instance issues.
Regards,
Salvatore
On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
>
> Does someone knows here on current state of the AWS instances using
> the older Xen version which causes the issues?
>
> AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
> MSI-X vectors only on success") as it fixed a bug. Now several
> downstream distros do carry a revert of this commit, which I believe
> is an unfortunate situation and wonder if this can be addressed
> upstream to deal with the AWS m4.large instance issues.
The problem is that except for a bisect result we've not seen much
information which might help to debug and analyze the problem.
The guest uses MSI-X on that NIC:
Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
So looking at the commit in question:
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 48e3f4e47b29..6748cf9d7d90 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
goto out_disable;
}
- /* Ensure that all table entries are masked. */
- msix_mask_all(base, tsize);
-
ret = msix_setup_entries(dev, base, entries, nvec, affd);
if (ret)
goto out_disable;
@@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
/* Set MSI-X enabled bits and unmask the function */
pci_intx_for_msi(dev, 0);
dev->msix_enabled = 1;
+
+ /*
+ * Ensure that all table entries are masked to prevent
+ * stale entries from firing in a crash kernel.
+ *
+ * Done late to deal with a broken Marvell NVME device
+ * which takes the MSI-X mask bits into account even
+ * when MSI-X is disabled, which prevents MSI delivery.
+ */
+ msix_mask_all(base, tsize);
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
IOW, it moves the invocation of msix_mask_all() into the success
path.
As the device uses MSI-X this change does not make any difference from a
hardware perspective simply because _all_ MSI-X interrupts are masked
via the CTRL register already. And it does not matter whether the kernel
masks them individually _before_ or _after_ the allocation. At least not
on real hardware and on a sane emulation.
Now this is XEN and neither real hardware nor sane emulation.
It must be a XEN_HVM guest because XEN_PV guests disable PCI/MSI[-X]
completely which makes the invocation of msix_mask_all() a NOP.
If it's not a XEN_HVM guest, then you can stop reading further as I'm
unable to decode why moving a NOP makes a difference. That belongs in to
the realm of voodoo, but then XEN is voodoo at least for me. :)
XEN guests do not use the common PCI mask/unmask machinery which would
unmask the interrupt on request_irq().
So I assume that the following happens:
Guest Hypervisor
msix_capabilities_init()
....
alloc_irq()
xen_magic() -> alloc_msix_interrupt()
request_irq()
msix_mask_all() -> trap
do_magic()
request_irq()
unmask()
xen_magic()
unmask_evtchn() -> do_more_magic()
So I assume further that msix_mask_all() actually is able to mask the
interrupts in the hardware (ctrl word of the vector table) despite the
hypervisor having allocated and requested the interrupt already.
Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
XEN_PV does. I can only assume the answer is voodoo...
Maybe the XEN people have some more enlightened answers to that.
Thanks,
tglx
On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> XEN guests do not use the common PCI mask/unmask machinery which would
> unmask the interrupt on request_irq().
>
> So I assume that the following happens:
>
> Guest Hypervisor
>
> msix_capabilities_init()
> ....
> alloc_irq()
> xen_magic() -> alloc_msix_interrupt()
> request_irq()
>
> msix_mask_all() -> trap
> do_magic()
> request_irq()
> unmask()
> xen_magic()
> unmask_evtchn() -> do_more_magic()
>
> So I assume further that msix_mask_all() actually is able to mask the
> interrupts in the hardware (ctrl word of the vector table) despite the
> hypervisor having allocated and requested the interrupt already.
>
> Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> XEN_PV does. I can only assume the answer is voodoo...
>
> Maybe the XEN people have some more enlightened answers to that.
So I was talking to Juergen about this and he agrees, that for the case
where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
masking should be disabled like it is done for XEN PV.
Why the hypervisor grants the mask write is still mysterious, but I
leave that to the folks who can master the XEN voodoo.
I'll send out a patch in minute.
Thanks,
tglx
Hi Thomas,
On Thu, Apr 28, 2022 at 03:48:03PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> > On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> > XEN guests do not use the common PCI mask/unmask machinery which would
> > unmask the interrupt on request_irq().
> >
> > So I assume that the following happens:
> >
> > Guest Hypervisor
> >
> > msix_capabilities_init()
> > ....
> > alloc_irq()
> > xen_magic() -> alloc_msix_interrupt()
> > request_irq()
> >
> > msix_mask_all() -> trap
> > do_magic()
> > request_irq()
> > unmask()
> > xen_magic()
> > unmask_evtchn() -> do_more_magic()
> >
> > So I assume further that msix_mask_all() actually is able to mask the
> > interrupts in the hardware (ctrl word of the vector table) despite the
> > hypervisor having allocated and requested the interrupt already.
> >
> > Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> > really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> > XEN_PV does. I can only assume the answer is voodoo...
> >
> > Maybe the XEN people have some more enlightened answers to that.
>
> So I was talking to Juergen about this and he agrees, that for the case
> where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
> masking should be disabled like it is done for XEN PV.
>
> Why the hypervisor grants the mask write is still mysterious, but I
> leave that to the folks who can master the XEN voodoo.
>
> I'll send out a patch in minute.
Thank you. We are having Noah Meyerhans now testing the patch and he
will report back if it works (Cc'ed here now).
Regards,
Salvatore
Hi Thomas,
On Thu, Apr 28, 2022 at 08:43:10PM +0200, Salvatore Bonaccorso wrote:
> Hi Thomas,
>
> On Thu, Apr 28, 2022 at 03:48:03PM +0200, Thomas Gleixner wrote:
> > On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> > > On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> > > XEN guests do not use the common PCI mask/unmask machinery which would
> > > unmask the interrupt on request_irq().
> > >
> > > So I assume that the following happens:
> > >
> > > Guest Hypervisor
> > >
> > > msix_capabilities_init()
> > > ....
> > > alloc_irq()
> > > xen_magic() -> alloc_msix_interrupt()
> > > request_irq()
> > >
> > > msix_mask_all() -> trap
> > > do_magic()
> > > request_irq()
> > > unmask()
> > > xen_magic()
> > > unmask_evtchn() -> do_more_magic()
> > >
> > > So I assume further that msix_mask_all() actually is able to mask the
> > > interrupts in the hardware (ctrl word of the vector table) despite the
> > > hypervisor having allocated and requested the interrupt already.
> > >
> > > Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> > > really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> > > XEN_PV does. I can only assume the answer is voodoo...
> > >
> > > Maybe the XEN people have some more enlightened answers to that.
> >
> > So I was talking to Juergen about this and he agrees, that for the case
> > where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
> > masking should be disabled like it is done for XEN PV.
> >
> > Why the hypervisor grants the mask write is still mysterious, but I
> > leave that to the folks who can master the XEN voodoo.
> >
> > I'll send out a patch in minute.
>
> Thank you. We are having Noah Meyerhans now testing the patch and he
> will report back if it works (Cc'ed here now).
To confirm: Noah tested the patch on various different Xen (and other
VM configurations) and works well.
Regards,
Salvatore