2019-10-08 03:49:28

by Jian-Hong Pan

[permalink] [raw]
Subject: [PATCH] PCI/MSI: Fix restoring of MSI-X vector control's mask bit

MSI-X vector control's bit 0 is the mask bit, which masks the
corresponding interrupt request, or not. Other reserved bits might be
used for other purpose by device vendors. For example, the values of
Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other
values [1].

The original restoring logic in system resuming uses the whole MSI-X
vector control value as the flag to set/clear the mask bit. However,
this logic conflicts the idea mentioned above. It may mislead system to
disable the MSI-X vector entries. That makes system get no interrupt
from Kingston NVMe SSD after resume and usually get NVMe I/O timeout
error.

[ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled

This patch takes only the mask bit of original MSI-X vector control
value as the flag to fix this issue.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887
Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code")
Signed-off-by: Jian-Hong Pan <[email protected]>
---
drivers/pci/msi.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 0884bedcfc7a..deae3d5acaf6 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
static void __pci_restore_msix_state(struct pci_dev *dev)
{
struct msi_desc *entry;
+ u32 flag;

if (!dev->msix_enabled)
return;
@@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);

arch_restore_msi_irqs(dev);
- for_each_pci_msi_entry(entry, dev)
- msix_mask_irq(entry, entry->masked);
+ for_each_pci_msi_entry(entry, dev) {
+ flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT;
+ msix_mask_irq(entry, flag);
+ }

pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
}
--
2.23.0


2019-10-24 10:10:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI/MSI: Fix restoring of MSI-X vector control's mask bit

On Tue, Oct 08, 2019 at 11:42:39AM +0800, Jian-Hong Pan wrote:
> MSI-X vector control's bit 0 is the mask bit, which masks the
> corresponding interrupt request, or not. Other reserved bits might be
> used for other purpose by device vendors. For example, the values of
> Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other
> values [1].
>
> The original restoring logic in system resuming uses the whole MSI-X
> vector control value as the flag to set/clear the mask bit. However,
> this logic conflicts the idea mentioned above. It may mislead system to
> disable the MSI-X vector entries. That makes system get no interrupt
> from Kingston NVMe SSD after resume and usually get NVMe I/O timeout
> error.
>
> [ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled
>
> This patch takes only the mask bit of original MSI-X vector control
> value as the flag to fix this issue.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8
>
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887
> Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code")
> Signed-off-by: Jian-Hong Pan <[email protected]>
> ---
> drivers/pci/msi.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 0884bedcfc7a..deae3d5acaf6 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
> static void __pci_restore_msix_state(struct pci_dev *dev)
> {
> struct msi_desc *entry;
> + u32 flag;
>
> if (!dev->msix_enabled)
> return;
> @@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
> PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>
> arch_restore_msi_irqs(dev);
> - for_each_pci_msi_entry(entry, dev)
> - msix_mask_irq(entry, entry->masked);
> + for_each_pci_msi_entry(entry, dev) {
> + flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT;
> + msix_mask_irq(entry, flag);

This makes good sense: before your patch, when we restore, we set the
mask bit if *any* bits in the Vector Control register are set.

There are other paths leading to __pci_msix_desc_mask_irq(), so I
think it would be better to do the masking there, e.g.,

if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT)
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;

I think the other paths all pass either 0 or 1, so they're all safe
today. But doing the masking in __pci_msix_desc_mask_irq() removes
that assumption from the callers.

I applied the patch below to pci/msi, let me know if it doesn't work
for you.

> + }
>
> pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> }

commit 1a828a554650
Author: Jian-Hong Pan <[email protected]>
Date: Tue Oct 8 11:42:39 2019 +0800

PCI/MSI: Fix incorrect MSI-X masking on resume

When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
Control register for each vector and saves it in desc->masked. Each
register is 32 bits and bit 0 is the actual Mask bit.

When we restored these registers during resume, we previously set the Mask
bit if *any* bit in desc->masked was set instead of when the Mask bit
itself was set:

pci_restore_state
pci_restore_msi_state
__pci_restore_msix_state
for_each_pci_msi_entry
msix_mask_irq(entry, entry->masked) <-- entire u32 word
__pci_msix_desc_mask_irq(desc, flag)
mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
if (flag) <-- testing entire u32, not just bit 0
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT
writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)

This means that after resume, MSI-X vectors were masked when they shouldn't
be, which leads to timeouts like this:

nvme nvme0: I/O 978 QID 3 timeout, completion polled

On resume, set the Mask bit only when the saved Mask bit from suspend was
set.

This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston
NVME SSD running FW E8FK11.T").

[bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()]
Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jian-Hong Pan <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 0884bedcfc7a..771041784e64 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -213,12 +213,13 @@ u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)

if (pci_msi_ignore_mask)
return 0;
+
desc_addr = pci_msix_desc_addr(desc);
if (!desc_addr)
return 0;

mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
- if (flag)
+ if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT)
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;

writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);

2019-10-24 10:17:07

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI/MSI: Fix restoring of MSI-X vector control's mask bit

On Wed, Oct 23, 2019 at 04:12:35PM -0500, Bjorn Helgaas wrote:
> On Tue, Oct 08, 2019 at 11:42:39AM +0800, Jian-Hong Pan wrote:
> > MSI-X vector control's bit 0 is the mask bit, which masks the
> > corresponding interrupt request, or not. Other reserved bits might be
> > used for other purpose by device vendors. For example, the values of
> > Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other
> > values [1].
> >
> > The original restoring logic in system resuming uses the whole MSI-X
> > vector control value as the flag to set/clear the mask bit. However,
> > this logic conflicts the idea mentioned above. It may mislead system to
> > disable the MSI-X vector entries. That makes system get no interrupt
> > from Kingston NVMe SSD after resume and usually get NVMe I/O timeout
> > error.
> >
> > [ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled
> >
> > This patch takes only the mask bit of original MSI-X vector control
> > value as the flag to fix this issue.
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8
> >
> > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887
> > Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code")
> > Signed-off-by: Jian-Hong Pan <[email protected]>
> > ---
> > drivers/pci/msi.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 0884bedcfc7a..deae3d5acaf6 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
> > static void __pci_restore_msix_state(struct pci_dev *dev)
> > {
> > struct msi_desc *entry;
> > + u32 flag;
> >
> > if (!dev->msix_enabled)
> > return;
> > @@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
> > PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
> >
> > arch_restore_msi_irqs(dev);
> > - for_each_pci_msi_entry(entry, dev)
> > - msix_mask_irq(entry, entry->masked);
> > + for_each_pci_msi_entry(entry, dev) {
> > + flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > + msix_mask_irq(entry, flag);
>
> This makes good sense: before your patch, when we restore, we set the
> mask bit if *any* bits in the Vector Control register are set.
>
> There are other paths leading to __pci_msix_desc_mask_irq(), so I
> think it would be better to do the masking there, e.g.,
>
> if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT)
> mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
>
> I think the other paths all pass either 0 or 1, so they're all safe
> today. But doing the masking in __pci_msix_desc_mask_irq() removes
> that assumption from the callers.
>
> I applied the patch below to pci/msi, let me know if it doesn't work
> for you.
>
> > + }
> >
> > pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > }
>
> commit 1a828a554650
> Author: Jian-Hong Pan <[email protected]>
> Date: Tue Oct 8 11:42:39 2019 +0800
>
> PCI/MSI: Fix incorrect MSI-X masking on resume
>
> When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
> Control register for each vector and saves it in desc->masked. Each
> register is 32 bits and bit 0 is the actual Mask bit.
>
> When we restored these registers during resume, we previously set the Mask
> bit if *any* bit in desc->masked was set instead of when the Mask bit
> itself was set:
>
> pci_restore_state
> pci_restore_msi_state
> __pci_restore_msix_state
> for_each_pci_msi_entry
> msix_mask_irq(entry, entry->masked) <-- entire u32 word
> __pci_msix_desc_mask_irq(desc, flag)
> mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
> if (flag) <-- testing entire u32, not just bit 0
> mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT
> writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)
>
> This means that after resume, MSI-X vectors were masked when they shouldn't
> be, which leads to timeouts like this:
>
> nvme nvme0: I/O 978 QID 3 timeout, completion polled
>
> On resume, set the Mask bit only when the saved Mask bit from suspend was
> set.
>
> This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston
> NVME SSD running FW E8FK11.T").
>
> [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()]
> Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Jian-Hong Pan <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>

I forgot; this probably should be marked for stable, too.

> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 0884bedcfc7a..771041784e64 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -213,12 +213,13 @@ u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
>
> if (pci_msi_ignore_mask)
> return 0;
> +
> desc_addr = pci_msix_desc_addr(desc);
> if (!desc_addr)
> return 0;
>
> mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> - if (flag)
> + if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT)
> mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
>
> writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>
> _______________________________________________
> Linux-nvme mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-nvme

2019-10-24 19:06:11

by Jian-Hong Pan

[permalink] [raw]
Subject: Re: [PATCH] PCI/MSI: Fix restoring of MSI-X vector control's mask bit

Bjorn Helgaas <[email protected]> 於 2019年10月24日 週四 上午5:43寫道:
>
> On Wed, Oct 23, 2019 at 04:12:35PM -0500, Bjorn Helgaas wrote:
> > On Tue, Oct 08, 2019 at 11:42:39AM +0800, Jian-Hong Pan wrote:
> > > MSI-X vector control's bit 0 is the mask bit, which masks the
> > > corresponding interrupt request, or not. Other reserved bits might be
> > > used for other purpose by device vendors. For example, the values of
> > > Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other
> > > values [1].
> > >
> > > The original restoring logic in system resuming uses the whole MSI-X
> > > vector control value as the flag to set/clear the mask bit. However,
> > > this logic conflicts the idea mentioned above. It may mislead system to
> > > disable the MSI-X vector entries. That makes system get no interrupt
> > > from Kingston NVMe SSD after resume and usually get NVMe I/O timeout
> > > error.
> > >
> > > [ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled
> > >
> > > This patch takes only the mask bit of original MSI-X vector control
> > > value as the flag to fix this issue.
> > >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8
> > >
> > > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887
> > > Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code")
> > > Signed-off-by: Jian-Hong Pan <[email protected]>
> > > ---
> > > drivers/pci/msi.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > index 0884bedcfc7a..deae3d5acaf6 100644
> > > --- a/drivers/pci/msi.c
> > > +++ b/drivers/pci/msi.c
> > > @@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
> > > static void __pci_restore_msix_state(struct pci_dev *dev)
> > > {
> > > struct msi_desc *entry;
> > > + u32 flag;
> > >
> > > if (!dev->msix_enabled)
> > > return;
> > > @@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
> > > PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
> > >
> > > arch_restore_msi_irqs(dev);
> > > - for_each_pci_msi_entry(entry, dev)
> > > - msix_mask_irq(entry, entry->masked);
> > > + for_each_pci_msi_entry(entry, dev) {
> > > + flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > > + msix_mask_irq(entry, flag);
> >
> > This makes good sense: before your patch, when we restore, we set the
> > mask bit if *any* bits in the Vector Control register are set.
> >
> > There are other paths leading to __pci_msix_desc_mask_irq(), so I
> > think it would be better to do the masking there, e.g.,
> >
> > if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT)
> > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> >
> > I think the other paths all pass either 0 or 1, so they're all safe
> > today. But doing the masking in __pci_msix_desc_mask_irq() removes
> > that assumption from the callers.
> >
> > I applied the patch below to pci/msi, let me know if it doesn't work
> > for you.
> >
> > > + }
> > >
> > > pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > > }
> >
> > commit 1a828a554650
> > Author: Jian-Hong Pan <[email protected]>
> > Date: Tue Oct 8 11:42:39 2019 +0800
> >
> > PCI/MSI: Fix incorrect MSI-X masking on resume
> >
> > When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
> > Control register for each vector and saves it in desc->masked. Each
> > register is 32 bits and bit 0 is the actual Mask bit.
> >
> > When we restored these registers during resume, we previously set the Mask
> > bit if *any* bit in desc->masked was set instead of when the Mask bit
> > itself was set:
> >
> > pci_restore_state
> > pci_restore_msi_state
> > __pci_restore_msix_state
> > for_each_pci_msi_entry
> > msix_mask_irq(entry, entry->masked) <-- entire u32 word
> > __pci_msix_desc_mask_irq(desc, flag)
> > mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
> > if (flag) <-- testing entire u32, not just bit 0
> > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT
> > writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)
> >
> > This means that after resume, MSI-X vectors were masked when they shouldn't
> > be, which leads to timeouts like this:
> >
> > nvme nvme0: I/O 978 QID 3 timeout, completion polled
> >
> > On resume, set the Mask bit only when the saved Mask bit from suspend was
> > set.
> >
> > This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston
> > NVME SSD running FW E8FK11.T").
> >
> > [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()]
> > Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887
> > Link: https://lore.kernel.org/r/[email protected]
> > Signed-off-by: Jian-Hong Pan <[email protected]>
> > Signed-off-by: Bjorn Helgaas <[email protected]>

The modified patch also fixes the issue.

> I forgot; this probably should be marked for stable, too.

Oh! Yes, please!

Thank you,
Jian-Hong Pan

> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 0884bedcfc7a..771041784e64 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -213,12 +213,13 @@ u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
> >
> > if (pci_msi_ignore_mask)
> > return 0;
> > +
> > desc_addr = pci_msix_desc_addr(desc);
> > if (!desc_addr)
> > return 0;
> >
> > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > - if (flag)
> > + if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT)
> > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> >
> > writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
> >
> > _______________________________________________
> > Linux-nvme mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-nvme