2022-10-26 06:32:51

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH v2] pci: fix device presence detection for VFs

virtio uses the same driver for VFs and PFs. Accordingly,
pci_device_is_present is used to detect device presence. This function
isn't currently working properly for VFs since it attempts reading
device and vendor ID. As VFs are present if and only if PF is present,
just return the value for that device.

Reported-by: Wei Gong <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
---

Wei Gong, thanks for your testing of the RFC!
As I made a small change, would appreciate re-testing.

Thanks!

changes from RFC:
use pci_physfn() wrapper to make the code build without PCI_IOV


drivers/pci/pci.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2127aba3550b..899b3f52e84e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)

bool pci_device_is_present(struct pci_dev *pdev)
{
+ struct pci_dev *physfn = pci_physfn(pdev);
u32 v;

+ /* Not a PF? Switch to the PF. */
+ if (physfn != pdev)
+ return pci_device_is_present(physfn);
+
if (pci_dev_is_disconnected(pdev))
return false;
return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
--
MST



2022-10-26 13:50:39

by Wei Gong

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs. Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID. As VFs are present if and only if PF is present,
> just return the value for that device.
>
> Reported-by: Wei Gong <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>

Tested-by: Wei Gong <[email protected]>

> ---
>
> Wei Gong, thanks for your testing of the RFC!
> As I made a small change, would appreciate re-testing.

I updated your change and retested in my environment.
It worked well and fixed my problem.

>
> Thanks!
>
> changes from RFC:
> use pci_physfn() wrapper to make the code build without PCI_IOV
>
>
> drivers/pci/pci.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 2127aba3550b..899b3f52e84e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
>
> bool pci_device_is_present(struct pci_dev *pdev)
> {
> + struct pci_dev *physfn = pci_physfn(pdev);
> u32 v;
>
> + /* Not a PF? Switch to the PF. */
> + if (physfn != pdev)
> + return pci_device_is_present(physfn);
> +
> if (pci_dev_is_disconnected(pdev))
> return false;
> return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> --
> MST
>

2022-11-08 04:55:09

by Wei Gong

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs. Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID. As VFs are present if and only if PF is present,
> just return the value for that device.
>
> Reported-by: Wei Gong <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
>
> Wei Gong, thanks for your testing of the RFC!
> As I made a small change, would appreciate re-testing.
>
> Thanks!
>
> changes from RFC:
> use pci_physfn() wrapper to make the code build without PCI_IOV
>
>
> drivers/pci/pci.c | 5 +++++
> 1 file changed, 5 insertions(+)

Tested-by: Wei Gong <[email protected]>

retest done and well.

I would rephrase the bug.

according to sriov's protocol specification vendor_id and
device_id field in all VFs return FFFFh when read
so when vf devs is in the pci_device_is_present,it will be
misjudged as surprise removeal

when io is issued on the vf, normally disable virtio_blk vf
devs,at this time the disable opration will hang. and virtio
blk dev io hang.

task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x2ee/0x900
schedule+0x4f/0xc0
blk_mq_freeze_queue_wait+0x69/0xa0
? wait_woken+0x80/0x80
blk_mq_freeze_queue+0x1b/0x20
blk_cleanup_queue+0x3d/0xd0
virtblk_remove+0x3c/0xb0 [virtio_blk]
virtio_dev_remove+0x4b/0x80
device_release_driver_internal+0x103/0x1d0
device_release_driver+0x12/0x20
bus_remove_device+0xe1/0x150
device_del+0x192/0x3f0
device_unregister+0x1b/0x60
unregister_virtio_device+0x18/0x30
virtio_pci_remove+0x41/0x80
pci_device_remove+0x3e/0xb0
device_release_driver_internal+0x103/0x1d0
device_release_driver+0x12/0x20
pci_stop_bus_device+0x79/0xa0
pci_stop_and_remove_bus_device+0x13/0x20
pci_iov_remove_virtfn+0xc5/0x130
? pci_get_device+0x4a/0x60
sriov_disable+0x33/0xf0
pci_disable_sriov+0x26/0x30
virtio_pci_sriov_configure+0x6f/0xa0
sriov_numvfs_store+0x104/0x140
dev_attr_store+0x17/0x30
sysfs_kf_write+0x3e/0x50
kernfs_fop_write_iter+0x138/0x1c0
new_sync_write+0x117/0x1b0
vfs_write+0x185/0x250
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x61/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f21bd1f3ba4
RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0

when virtio_blk is performing io, as long as there two stages of:
1. dispatch io. queue_usage_counter++;
2. io is completed after receiving the interrupt. queue_usage_counter--;

disable virtio_blk vfs:
if(!pci_device_is_present(pci_dev))
virtio_break_device(&vp_dev->vdev);
virtqueue for vf devs will be marked broken.
the interrupt notification io is end. Since it is judged that the
virtqueue has been marked as broken, the completed io will not be
performed.
So queue_usage_counter will not be cleared.
when the disk is removed at the same time, the queue will be frozen,
and you must wait for the queue_usage_counter to be cleared.
Therefore, it leads to the removeal of hang.



Thanks,
Wei Gong

2022-11-08 05:46:12

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Mon, Nov 07, 2022 at 11:58:21PM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
> > O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > virtio uses the same driver for VFs and PFs. Accordingly,
> > > pci_device_is_present is used to detect device presence. This function
> > > isn't currently working properly for VFs since it attempts reading
> > > device and vendor ID. As VFs are present if and only if PF is present,
> > > just return the value for that device.
> > >
> > > Reported-by: Wei Gong <[email protected]>
> > > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > > ---
> > >
> > > Wei Gong, thanks for your testing of the RFC!
> > > As I made a small change, would appreciate re-testing.
> > >
> > > Thanks!
> > >
> > > changes from RFC:
> > > use pci_physfn() wrapper to make the code build without PCI_IOV
> > >
> > >
> > > drivers/pci/pci.c | 5 +++++
> > > 1 file changed, 5 insertions(+)
> >
> > Tested-by: Wei Gong <[email protected]>
>
>
>
> Thanks!
> Bjorn, could you queue this fix in your tree or should I queue this
> in the virtio tree? I also think this would be nice to have for stable too -
> do you agree?
> Thanks!

It's on my list to look at, sorry for the delay.

Bjorn

2022-11-08 05:49:53

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
> O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > virtio uses the same driver for VFs and PFs. Accordingly,
> > pci_device_is_present is used to detect device presence. This function
> > isn't currently working properly for VFs since it attempts reading
> > device and vendor ID. As VFs are present if and only if PF is present,
> > just return the value for that device.
> >
> > Reported-by: Wei Gong <[email protected]>
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > ---
> >
> > Wei Gong, thanks for your testing of the RFC!
> > As I made a small change, would appreciate re-testing.
> >
> > Thanks!
> >
> > changes from RFC:
> > use pci_physfn() wrapper to make the code build without PCI_IOV
> >
> >
> > drivers/pci/pci.c | 5 +++++
> > 1 file changed, 5 insertions(+)
>
> Tested-by: Wei Gong <[email protected]>



Thanks!
Bjorn, could you queue this fix in your tree or should I queue this
in the virtio tree? I also think this would be nice to have for stable too -
do you agree?
Thanks!


> retest done and well.
>
> I would rephrase the bug.
>
> according to sriov's protocol specification vendor_id and
> device_id field in all VFs return FFFFh when read
> so when vf devs is in the pci_device_is_present,it will be
> misjudged as surprise removeal
>
> when io is issued on the vf, normally disable virtio_blk vf
> devs,at this time the disable opration will hang. and virtio
> blk dev io hang.
>
> task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x2ee/0x900
> schedule+0x4f/0xc0
> blk_mq_freeze_queue_wait+0x69/0xa0
> ? wait_woken+0x80/0x80
> blk_mq_freeze_queue+0x1b/0x20
> blk_cleanup_queue+0x3d/0xd0
> virtblk_remove+0x3c/0xb0 [virtio_blk]
> virtio_dev_remove+0x4b/0x80
> device_release_driver_internal+0x103/0x1d0
> device_release_driver+0x12/0x20
> bus_remove_device+0xe1/0x150
> device_del+0x192/0x3f0
> device_unregister+0x1b/0x60
> unregister_virtio_device+0x18/0x30
> virtio_pci_remove+0x41/0x80
> pci_device_remove+0x3e/0xb0
> device_release_driver_internal+0x103/0x1d0
> device_release_driver+0x12/0x20
> pci_stop_bus_device+0x79/0xa0
> pci_stop_and_remove_bus_device+0x13/0x20
> pci_iov_remove_virtfn+0xc5/0x130
> ? pci_get_device+0x4a/0x60
> sriov_disable+0x33/0xf0
> pci_disable_sriov+0x26/0x30
> virtio_pci_sriov_configure+0x6f/0xa0
> sriov_numvfs_store+0x104/0x140
> dev_attr_store+0x17/0x30
> sysfs_kf_write+0x3e/0x50
> kernfs_fop_write_iter+0x138/0x1c0
> new_sync_write+0x117/0x1b0
> vfs_write+0x185/0x250
> ksys_write+0x67/0xe0
> __x64_sys_write+0x1a/0x20
> do_syscall_64+0x61/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f21bd1f3ba4
> RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
>
> when virtio_blk is performing io, as long as there two stages of:
> 1. dispatch io. queue_usage_counter++;
> 2. io is completed after receiving the interrupt. queue_usage_counter--;
>
> disable virtio_blk vfs:
> if(!pci_device_is_present(pci_dev))
> virtio_break_device(&vp_dev->vdev);
> virtqueue for vf devs will be marked broken.
> the interrupt notification io is end. Since it is judged that the
> virtqueue has been marked as broken, the completed io will not be
> performed.
> So queue_usage_counter will not be cleared.
> when the disk is removed at the same time, the queue will be frozen,
> and you must wait for the queue_usage_counter to be cleared.
> Therefore, it leads to the removeal of hang.
>
>
>
> Thanks,
> Wei Gong
>


2022-11-08 15:11:50

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs. Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID.

> As VFs are present if and only if PF is present,
> just return the value for that device.

VFs are only present when the PF is present *and* the PF has VF Enable
set. Do you care about the possibility that VF Enable has been
cleared?

> Reported-by: Wei Gong <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
>
> Wei Gong, thanks for your testing of the RFC!
> As I made a small change, would appreciate re-testing.
>
> Thanks!
>
> changes from RFC:
> use pci_physfn() wrapper to make the code build without PCI_IOV
>
>
> drivers/pci/pci.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 2127aba3550b..899b3f52e84e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
>
> bool pci_device_is_present(struct pci_dev *pdev)
> {
> + struct pci_dev *physfn = pci_physfn(pdev);
> u32 v;
>
> + /* Not a PF? Switch to the PF. */
> + if (physfn != pdev)
> + return pci_device_is_present(physfn);
> +
> if (pci_dev_is_disconnected(pdev))
> return false;
> return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> --
> MST
>

2022-11-08 15:26:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > virtio uses the same driver for VFs and PFs. Accordingly,
> > pci_device_is_present is used to detect device presence. This function
> > isn't currently working properly for VFs since it attempts reading
> > device and vendor ID.
>
> > As VFs are present if and only if PF is present,
> > just return the value for that device.
>
> VFs are only present when the PF is present *and* the PF has VF Enable
> set. Do you care about the possibility that VF Enable has been
> cleared?

Can you also include a hint about how the problem manifests, and a URL
to the report if available?

It's beyond the scope of this patch, but I've never liked the
semantics of pci_device_is_present() because it's racy by design. All
it tells us is that some time in the *past*, the device was present.
It's telling that almost all calls test for !pci_device_is_present(),
which does make a little more sense.

> > Reported-by: Wei Gong <[email protected]>
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > ---
> >
> > Wei Gong, thanks for your testing of the RFC!
> > As I made a small change, would appreciate re-testing.
> >
> > Thanks!
> >
> > changes from RFC:
> > use pci_physfn() wrapper to make the code build without PCI_IOV
> >
> >
> > drivers/pci/pci.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 2127aba3550b..899b3f52e84e 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
> >
> > bool pci_device_is_present(struct pci_dev *pdev)
> > {
> > + struct pci_dev *physfn = pci_physfn(pdev);
> > u32 v;
> >
> > + /* Not a PF? Switch to the PF. */
> > + if (physfn != pdev)
> > + return pci_device_is_present(physfn);
> > +
> > if (pci_dev_is_disconnected(pdev))
> > return false;
> > return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> > --
> > MST
> >

2022-11-08 16:02:11

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > virtio uses the same driver for VFs and PFs. Accordingly,
> > > pci_device_is_present is used to detect device presence. This function
> > > isn't currently working properly for VFs since it attempts reading
> > > device and vendor ID.
> >
> > > As VFs are present if and only if PF is present,
> > > just return the value for that device.
> >
> > VFs are only present when the PF is present *and* the PF has VF Enable
> > set. Do you care about the possibility that VF Enable has been
> > cleared?
>
> Can you also include a hint about how the problem manifests, and a URL
> to the report if available?

Here you go:
lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz

is it enough to include this link or do you want me
to repost copying the text from there?

> It's beyond the scope of this patch, but I've never liked the
> semantics of pci_device_is_present() because it's racy by design. All
> it tells us is that some time in the *past*, the device was present.
> It's telling that almost all calls test for !pci_device_is_present(),
> which does make a little more sense.

I agree. The problem is in the API really.
What people want is pci_device_was_removed()

With surprise removal at least at the pci express level
we know that there was a surprise removal event.
PCI subsystem seems to chose to discard that information.
There's nothing driver could do to reliably detect
that - if someone pulled the card out then stuck it back in
quickly driver will assume it's the old card and
attempt graceful removal, which is likely to fail.

However some of the problem is at the hardware level too.
If you are poking at the device's config and it's
pulled out and another is put back in quickly, your
config access might land at the new card.
Does not feel robust. I don't have a good solution for this
except "try to avoid config cycles as much as you can".




> > > Reported-by: Wei Gong <[email protected]>
> > > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > > ---
> > >
> > > Wei Gong, thanks for your testing of the RFC!
> > > As I made a small change, would appreciate re-testing.
> > >
> > > Thanks!
> > >
> > > changes from RFC:
> > > use pci_physfn() wrapper to make the code build without PCI_IOV
> > >
> > >
> > > drivers/pci/pci.c | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 2127aba3550b..899b3f52e84e 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
> > >
> > > bool pci_device_is_present(struct pci_dev *pdev)
> > > {
> > > + struct pci_dev *physfn = pci_physfn(pdev);
> > > u32 v;
> > >
> > > + /* Not a PF? Switch to the PF. */
> > > + if (physfn != pdev)
> > > + return pci_device_is_present(physfn);
> > > +
> > > if (pci_dev_is_disconnected(pdev))
> > > return false;
> > > return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> > > --
> > > MST
> > >


2022-11-08 18:13:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > virtio uses the same driver for VFs and PFs. Accordingly,
> > > > > pci_device_is_present is used to detect device presence. This function
> > > > > isn't currently working properly for VFs since it attempts reading
> > > > > device and vendor ID.
> > > >
> > > > > As VFs are present if and only if PF is present,
> > > > > just return the value for that device.
> > > >
> > > > VFs are only present when the PF is present *and* the PF has VF Enable
> > > > set. Do you care about the possibility that VF Enable has been
> > > > cleared?
>
> I think you missed this question.

I was hoping Wei will answer that, I don't have the hardware.

> > > Can you also include a hint about how the problem manifests, and a URL
> > > to the report if available?
> >
> > Here you go:
> > lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz
> >
> > is it enough to include this link or do you want me
> > to repost copying the text from there?
>
> Uh, well, OK, I guess I could dig through that and figure what what's
> relevant. I'd like the commit log to contain a hint of what the
> problem looks like and some justification for why it should be
> backported to stable.
>
> I still look at Documentation/process/stable-kernel-rules.rst
> occasionally to decide things like this, but I get the feeling that
> it's a little out-of-date and more restrictive than current practice.
>
> But I do think the "PF exists but VF disabled" situation needs to be
> clarified somehow, too.
>
> Bjorn


2022-11-08 18:32:48

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > virtio uses the same driver for VFs and PFs. Accordingly,
> > > > pci_device_is_present is used to detect device presence. This function
> > > > isn't currently working properly for VFs since it attempts reading
> > > > device and vendor ID.
> > >
> > > > As VFs are present if and only if PF is present,
> > > > just return the value for that device.
> > >
> > > VFs are only present when the PF is present *and* the PF has VF Enable
> > > set. Do you care about the possibility that VF Enable has been
> > > cleared?

I think you missed this question.

> > Can you also include a hint about how the problem manifests, and a URL
> > to the report if available?
>
> Here you go:
> lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz
>
> is it enough to include this link or do you want me
> to repost copying the text from there?

Uh, well, OK, I guess I could dig through that and figure what what's
relevant. I'd like the commit log to contain a hint of what the
problem looks like and some justification for why it should be
backported to stable.

I still look at Documentation/process/stable-kernel-rules.rst
occasionally to decide things like this, but I get the feeling that
it's a little out-of-date and more restrictive than current practice.

But I do think the "PF exists but VF disabled" situation needs to be
clarified somehow, too.

Bjorn

2022-11-09 05:16:34

by Wei Gong

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > virtio uses the same driver for VFs and PFs. Accordingly,
> > > > > > pci_device_is_present is used to detect device presence. This function
> > > > > > isn't currently working properly for VFs since it attempts reading
> > > > > > device and vendor ID.
> > > > >
> > > > > > As VFs are present if and only if PF is present,
> > > > > > just return the value for that device.
> > > > >
> > > > > VFs are only present when the PF is present *and* the PF has VF Enable
> > > > > set. Do you care about the possibility that VF Enable has been
> > > > > cleared?
> >
> > I think you missed this question.
>
> I was hoping Wei will answer that, I don't have the hardware.

In my case I don't care that VF Enable has been cleared.

>
> > > > Can you also include a hint about how the problem manifests, and a URL
> > > > to the report if available?
> > >
> > > Here you go:
> > > lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz
> > >
> > > is it enough to include this link or do you want me
> > > to repost copying the text from there?
> >
> > Uh, well, OK, I guess I could dig through that and figure what what's
> > relevant. I'd like the commit log to contain a hint of what the
> > problem looks like and some justification for why it should be
> > backported to stable.
> >
> > I still look at Documentation/process/stable-kernel-rules.rst
> > occasionally to decide things like this, but I get the feeling that
> > it's a little out-of-date and more restrictive than current practice.
> >
> > But I do think the "PF exists but VF disabled" situation needs to be
> > clarified somehow, too.
> >
> > Bjorn
>

2022-11-09 05:45:33

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > device presence. This function isn't currently working
> > > > > > > properly for VFs since it attempts reading device and
> > > > > > > vendor ID.
> > > > > >
> > > > > > > As VFs are present if and only if PF is present,
> > > > > > > just return the value for that device.
> > > > > >
> > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > has VF Enable set. Do you care about the possibility that
> > > > > > VF Enable has been cleared?
> > >
> > > I think you missed this question.
> >
> > I was hoping Wei will answer that, I don't have the hardware.
>
> In my case I don't care that VF Enable has been cleared.

OK, let me rephrase that :)

I think pci_device_is_present(VF) should return "false" if the PF is
present but VFs are disabled.

If you think it should return "true" when the PF is present and VFs
are disabled, we should explain why.

We would also need to fix the commit log, because "VFs are present if
and only if PF is present" is not actually true. "VFs are present
only if PF is present" is true, but "VFs are present if PF is present"
is not.

Bjorn

2022-11-09 07:31:12

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > device presence. This function isn't currently working
> > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > vendor ID.
> > > > > > >
> > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > just return the value for that device.
> > > > > > >
> > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > has VF Enable set. Do you care about the possibility that
> > > > > > > VF Enable has been cleared?
> > > >
> > > > I think you missed this question.
> > >
> > > I was hoping Wei will answer that, I don't have the hardware.
> >
> > In my case I don't care that VF Enable has been cleared.
>
> OK, let me rephrase that :)
>
> I think pci_device_is_present(VF) should return "false" if the PF is
> present but VFs are disabled.
>
> If you think it should return "true" when the PF is present and VFs
> are disabled, we should explain why.
>
> We would also need to fix the commit log, because "VFs are present if
> and only if PF is present" is not actually true. "VFs are present
> only if PF is present" is true, but "VFs are present if PF is present"
> is not.
>
> Bjorn

Bjorn, I don't really understand the question.

How does one get a vf pointer without enabling sriov?
They are only created by sriov_add_vfs after calling
pcibios_sriov_enable.



--
MST


2022-11-09 07:42:57

by Wei Gong

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > device presence. This function isn't currently working
> > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > vendor ID.
> > > > > > >
> > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > just return the value for that device.
> > > > > > >
> > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > has VF Enable set. Do you care about the possibility that
> > > > > > > VF Enable has been cleared?
> > > >
> > > > I think you missed this question.
> > >
> > > I was hoping Wei will answer that, I don't have the hardware.
> >
> > In my case I don't care that VF Enable has been cleared.
>
> OK, let me rephrase that :)
>
> I think pci_device_is_present(VF) should return "false" if the PF is
> present but VFs are disabled.

I agree.

>
> If you think it should return "true" when the PF is present and VFs
> are disabled, we should explain why.

I don't think it should return "true" when the PF is present and VFS
are disabled.

I think pci_device_is_present(VF) should return "true" if the PF is
present and VFs are enabled.
In the current implementation, it cannot correctly judge whether the
VF is present.
When the PF is present and VFs are enabled, I think it should return
"true", but in fact it returns "false"

Through your comments, I realize that this patch is inaccurate in
judging whether VF present in the case of "the PF is present and
VFs are disabled"

Thinks,
Wei

>
> We would also need to fix the commit log, because "VFs are present if
> and only if PF is present" is not actually true. "VFs are present
> only if PF is present" is true, but "VFs are present if PF is present"
> is not.
>
> Bjorn

2022-11-09 17:54:00

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Wed, Nov 09, 2022 at 11:30:29AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 02:10:30AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> > > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > > > device presence. This function isn't currently working
> > > > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > > > vendor ID.
> > > > > > > > >
> > > > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > > > just return the value for that device.
> > > > > > > > >
> > > > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > > > has VF Enable set. Do you care about the possibility that
> > > > > > > > > VF Enable has been cleared?
> > > > > >
> > > > > > I think you missed this question.
> > > > >
> > > > > I was hoping Wei will answer that, I don't have the hardware.
> > > >
> > > > In my case I don't care that VF Enable has been cleared.
> > >
> > > OK, let me rephrase that :)
> > >
> > > I think pci_device_is_present(VF) should return "false" if the PF is
> > > present but VFs are disabled.
> > >
> > > If you think it should return "true" when the PF is present and VFs
> > > are disabled, we should explain why.
> > >
> > > We would also need to fix the commit log, because "VFs are present if
> > > and only if PF is present" is not actually true. "VFs are present
> > > only if PF is present" is true, but "VFs are present if PF is present"
> > > is not.
> >
> > Bjorn, I don't really understand the question.
> >
> > How does one get a vf pointer without enabling sriov?
> > They are only created by sriov_add_vfs after calling
> > pcibios_sriov_enable.
>
> Oh, I think I see where you're coming from. The fact that we have a
> VF pointer means VFs were enabled in the past, and as long as the PF
> is still present, the VFs should still be enabled.
>
> Since the continued existence of the VF device depends on VF Enable, I
> guess my question is whether we need to worry about VF Enable being
> cleared, e.g., via sysfs reset or a buggy PF driver.
>
> Taking a step back, I don't understand the
> "if (!pci_device_is_present()) virtio_break_device()" strategy because
> checking for device presence is always unreliable.

The point is to break out of loops.


> I assume the
> consumer of vq->broken, e.g., virtnet_send_command(), would see a
> failed PCI read that probably returns ~0 data. Could it not check for
> that and then figure out whether that's valid data or an error
> indication?

No, it's not doing any reads - it is waiting for a DMA.

> It looks like today, virtnet_send_command() might sit in that "while"
> loop calling virtqueue_get_buf() repeatedly until virtio_pci_remove()
> notices the device is gone and marks it broken. Something must be
> failing in virtqueue_get_buf() in that interval between the device
> disappearing and virtio_pci_remove() noticing it.
>
> Bjorn

Nope - it is just doing posted writes, these disappear into thin ether
if there's no target.

--
MST


2022-11-09 18:05:18

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Wed, Nov 09, 2022 at 02:10:30AM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > > device presence. This function isn't currently working
> > > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > > vendor ID.
> > > > > > > >
> > > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > > just return the value for that device.
> > > > > > > >
> > > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > > has VF Enable set. Do you care about the possibility that
> > > > > > > > VF Enable has been cleared?
> > > > >
> > > > > I think you missed this question.
> > > >
> > > > I was hoping Wei will answer that, I don't have the hardware.
> > >
> > > In my case I don't care that VF Enable has been cleared.
> >
> > OK, let me rephrase that :)
> >
> > I think pci_device_is_present(VF) should return "false" if the PF is
> > present but VFs are disabled.
> >
> > If you think it should return "true" when the PF is present and VFs
> > are disabled, we should explain why.
> >
> > We would also need to fix the commit log, because "VFs are present if
> > and only if PF is present" is not actually true. "VFs are present
> > only if PF is present" is true, but "VFs are present if PF is present"
> > is not.
>
> Bjorn, I don't really understand the question.
>
> How does one get a vf pointer without enabling sriov?
> They are only created by sriov_add_vfs after calling
> pcibios_sriov_enable.

Oh, I think I see where you're coming from. The fact that we have a
VF pointer means VFs were enabled in the past, and as long as the PF
is still present, the VFs should still be enabled.

Since the continued existence of the VF device depends on VF Enable, I
guess my question is whether we need to worry about VF Enable being
cleared, e.g., via sysfs reset or a buggy PF driver.

Taking a step back, I don't understand the
"if (!pci_device_is_present()) virtio_break_device()" strategy because
checking for device presence is always unreliable. I assume the
consumer of vq->broken, e.g., virtnet_send_command(), would see a
failed PCI read that probably returns ~0 data. Could it not check for
that and then figure out whether that's valid data or an error
indication?

It looks like today, virtnet_send_command() might sit in that "while"
loop calling virtqueue_get_buf() repeatedly until virtio_pci_remove()
notices the device is gone and marks it broken. Something must be
failing in virtqueue_get_buf() in that interval between the device
disappearing and virtio_pci_remove() noticing it.

Bjorn

2022-11-10 19:47:01

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

Hi Wei,

I can't quite parse this. Is the problem that you had some virtio I/O
in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio
I/O operation hangs?

Is there any indication to the user, e.g., softlockup oops?

More questions below.

On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:

> according to sriov's protocol specification vendor_id and
> device_id field in all VFs return FFFFh when read
> so when vf devs is in the pci_device_is_present,it will be
> misjudged as surprise removeal
>
> when io is issued on the vf, normally disable virtio_blk vf
> devs,at this time the disable opration will hang. and virtio
> blk dev io hang.
>
> task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x2ee/0x900
> schedule+0x4f/0xc0
> blk_mq_freeze_queue_wait+0x69/0xa0
> ? wait_woken+0x80/0x80
> blk_mq_freeze_queue+0x1b/0x20
> blk_cleanup_queue+0x3d/0xd0
> virtblk_remove+0x3c/0xb0 [virtio_blk]
> virtio_dev_remove+0x4b/0x80
> device_release_driver_internal+0x103/0x1d0
> device_release_driver+0x12/0x20
> bus_remove_device+0xe1/0x150
> device_del+0x192/0x3f0
> device_unregister+0x1b/0x60
> unregister_virtio_device+0x18/0x30
> virtio_pci_remove+0x41/0x80
> pci_device_remove+0x3e/0xb0
> device_release_driver_internal+0x103/0x1d0
> device_release_driver+0x12/0x20
> pci_stop_bus_device+0x79/0xa0
> pci_stop_and_remove_bus_device+0x13/0x20
> pci_iov_remove_virtfn+0xc5/0x130
> ? pci_get_device+0x4a/0x60
> sriov_disable+0x33/0xf0
> pci_disable_sriov+0x26/0x30
> virtio_pci_sriov_configure+0x6f/0xa0
> sriov_numvfs_store+0x104/0x140
> dev_attr_store+0x17/0x30
> sysfs_kf_write+0x3e/0x50
> kernfs_fop_write_iter+0x138/0x1c0
> new_sync_write+0x117/0x1b0
> vfs_write+0x185/0x250
> ksys_write+0x67/0xe0
> __x64_sys_write+0x1a/0x20
> do_syscall_64+0x61/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f21bd1f3ba4
> RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
>
> when virtio_blk is performing io, as long as there two stages of:
> 1. dispatch io. queue_usage_counter++;
> 2. io is completed after receiving the interrupt. queue_usage_counter--;
>
> disable virtio_blk vfs:
> if(!pci_device_is_present(pci_dev))
> virtio_break_device(&vp_dev->vdev);
> virtqueue for vf devs will be marked broken.
> the interrupt notification io is end. Since it is judged that the
> virtqueue has been marked as broken, the completed io will not be
> performed.
> So queue_usage_counter will not be cleared.
> when the disk is removed at the same time, the queue will be frozen,
> and you must wait for the queue_usage_counter to be cleared.
> Therefore, it leads to the removeal of hang.

I want to follow along in the code, but I need some hints.

"queue_usage_counter" looks like it's supposed to be a symbol, but I
can't find it.

Where (which function) is the I/O dispatched and queue_usage_counter
incremented? Where is queue_usage_counter decremented?

Prior to this change pci_device_is_present(VF) returned "false"
(because the VF Vendor ID is 0xffff); after the change it will return
"true" (because it will look at the PF Vendor ID instead).

Previously virtio_pci_remove() called virtio_break_device(). I guess
that meant the virtio I/O operation will never be completed?

But if we don't call virtio_break_device(), the virtio I/O operation
*will* be completed?

Bjorn

2022-11-10 20:37:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> Hi Wei,
>
> I can't quite parse this. Is the problem that you had some virtio I/O
> in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio
> I/O operation hangs?

I think so. I also think that just attempting to
remove the module or to unbind the driver from it
will have the same effect.

> Is there any indication to the user, e.g., softlockup oops?
>
> More questions below.
>
> On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
>
> > according to sriov's protocol specification vendor_id and
> > device_id field in all VFs return FFFFh when read
> > so when vf devs is in the pci_device_is_present,it will be
> > misjudged as surprise removeal
> >
> > when io is issued on the vf, normally disable virtio_blk vf
> > devs,at this time the disable opration will hang. and virtio
> > blk dev io hang.
> >
> > task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002
> > Call Trace:
> > <TASK>
> > __schedule+0x2ee/0x900
> > schedule+0x4f/0xc0
> > blk_mq_freeze_queue_wait+0x69/0xa0
> > ? wait_woken+0x80/0x80
> > blk_mq_freeze_queue+0x1b/0x20
> > blk_cleanup_queue+0x3d/0xd0
> > virtblk_remove+0x3c/0xb0 [virtio_blk]
> > virtio_dev_remove+0x4b/0x80
> > device_release_driver_internal+0x103/0x1d0
> > device_release_driver+0x12/0x20
> > bus_remove_device+0xe1/0x150
> > device_del+0x192/0x3f0
> > device_unregister+0x1b/0x60
> > unregister_virtio_device+0x18/0x30
> > virtio_pci_remove+0x41/0x80
> > pci_device_remove+0x3e/0xb0
> > device_release_driver_internal+0x103/0x1d0
> > device_release_driver+0x12/0x20
> > pci_stop_bus_device+0x79/0xa0
> > pci_stop_and_remove_bus_device+0x13/0x20
> > pci_iov_remove_virtfn+0xc5/0x130
> > ? pci_get_device+0x4a/0x60
> > sriov_disable+0x33/0xf0
> > pci_disable_sriov+0x26/0x30
> > virtio_pci_sriov_configure+0x6f/0xa0
> > sriov_numvfs_store+0x104/0x140
> > dev_attr_store+0x17/0x30
> > sysfs_kf_write+0x3e/0x50
> > kernfs_fop_write_iter+0x138/0x1c0
> > new_sync_write+0x117/0x1b0
> > vfs_write+0x185/0x250
> > ksys_write+0x67/0xe0
> > __x64_sys_write+0x1a/0x20
> > do_syscall_64+0x61/0xb0
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f21bd1f3ba4
> > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
> >
> > when virtio_blk is performing io, as long as there two stages of:
> > 1. dispatch io. queue_usage_counter++;
> > 2. io is completed after receiving the interrupt. queue_usage_counter--;
> >
> > disable virtio_blk vfs:
> > if(!pci_device_is_present(pci_dev))
> > virtio_break_device(&vp_dev->vdev);
> > virtqueue for vf devs will be marked broken.
> > the interrupt notification io is end. Since it is judged that the
> > virtqueue has been marked as broken, the completed io will not be
> > performed.
> > So queue_usage_counter will not be cleared.
> > when the disk is removed at the same time, the queue will be frozen,
> > and you must wait for the queue_usage_counter to be cleared.
> > Therefore, it leads to the removeal of hang.
>
> I want to follow along in the code, but I need some hints.
>
> "queue_usage_counter" looks like it's supposed to be a symbol, but I
> can't find it.

I think it refers to q->q_usage_counter in blk core.

> Where (which function) is the I/O dispatched and queue_usage_counter
> incremented? Where is queue_usage_counter decremented?
>
> Prior to this change pci_device_is_present(VF) returned "false"
> (because the VF Vendor ID is 0xffff); after the change it will return
> "true" (because it will look at the PF Vendor ID instead).
>
> Previously virtio_pci_remove() called virtio_break_device(). I guess
> that meant the virtio I/O operation will never be completed?
>
> But if we don't call virtio_break_device(), the virtio I/O operation
> *will* be completed?
>
> Bjorn

It's completed anyway - nothing special happened at the device
level - but driver does not detect it.

Calling virtio_break_device will mark all queues as broken, as
a result attempts to check whether operation completed
will return false.

This probably means we need to work on handling surprise removal
better in virtio blk - since it looks like actual suprise
removal will hang too. But that I think is a separate issue.

--
MST


2022-11-11 04:22:18

by Wei Gong

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> Hi Wei,
>
> I can't quite parse this. Is the problem that you had some virtio I/O
> in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio
> I/O operation hangs?

It is, and has the same effect when unbind driver.

>
> Is there any indication to the user, e.g., softlockup oops?

There is no more user output than the Trace stack listed below.

>
> More questions below.
>
> On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
>
> > according to sriov's protocol specification vendor_id and
> > device_id field in all VFs return FFFFh when read
> > so when vf devs is in the pci_device_is_present,it will be
> > misjudged as surprise removeal
> >
> > when io is issued on the vf, normally disable virtio_blk vf
> > devs,at this time the disable opration will hang. and virtio
> > blk dev io hang.
> >
> > task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002
> > Call Trace:
> > <TASK>
> > __schedule+0x2ee/0x900
> > schedule+0x4f/0xc0
> > blk_mq_freeze_queue_wait+0x69/0xa0
> > ? wait_woken+0x80/0x80
> > blk_mq_freeze_queue+0x1b/0x20
> > blk_cleanup_queue+0x3d/0xd0
> > virtblk_remove+0x3c/0xb0 [virtio_blk]
> > virtio_dev_remove+0x4b/0x80
> > device_release_driver_internal+0x103/0x1d0
> > device_release_driver+0x12/0x20
> > bus_remove_device+0xe1/0x150
> > device_del+0x192/0x3f0
> > device_unregister+0x1b/0x60
> > unregister_virtio_device+0x18/0x30
> > virtio_pci_remove+0x41/0x80
> > pci_device_remove+0x3e/0xb0
> > device_release_driver_internal+0x103/0x1d0
> > device_release_driver+0x12/0x20
> > pci_stop_bus_device+0x79/0xa0
> > pci_stop_and_remove_bus_device+0x13/0x20
> > pci_iov_remove_virtfn+0xc5/0x130
> > ? pci_get_device+0x4a/0x60
> > sriov_disable+0x33/0xf0
> > pci_disable_sriov+0x26/0x30
> > virtio_pci_sriov_configure+0x6f/0xa0
> > sriov_numvfs_store+0x104/0x140
> > dev_attr_store+0x17/0x30
> > sysfs_kf_write+0x3e/0x50
> > kernfs_fop_write_iter+0x138/0x1c0
> > new_sync_write+0x117/0x1b0
> > vfs_write+0x185/0x250
> > ksys_write+0x67/0xe0
> > __x64_sys_write+0x1a/0x20
> > do_syscall_64+0x61/0xb0
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f21bd1f3ba4
> > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
> >
> > when virtio_blk is performing io, as long as there two stages of:
> > 1. dispatch io. queue_usage_counter++;
> > 2. io is completed after receiving the interrupt. queue_usage_counter--;
> >
> > disable virtio_blk vfs:
> > if(!pci_device_is_present(pci_dev))
> > virtio_break_device(&vp_dev->vdev);
> > virtqueue for vf devs will be marked broken.
> > the interrupt notification io is end. Since it is judged that the
> > virtqueue has been marked as broken, the completed io will not be
> > performed.
> > So queue_usage_counter will not be cleared.
> > when the disk is removed at the same time, the queue will be frozen,
> > and you must wait for the queue_usage_counter to be cleared.
> > Therefore, it leads to the removeal of hang.
>
> I want to follow along in the code, but I need some hints.
>
> "queue_usage_counter" looks like it's supposed to be a symbol, but I
> can't find it.
>
> Where (which function) is the I/O dispatched and queue_usage_counter
> incremented? Where is queue_usage_counter decremented?

queue_usage_counter is blk core q->q_usage_counter.

In blk_mq:
submit_bio
-> submit_bio_noacct
-> __submit_bio_noacct_mq
-> bio_queue_enter
bio_queue_enter: q_usage_counter++


Complete io:
vp_vring_interrupt
-> vring_interrupt ( if (vq->broken) { return; } )
-> virtblk_done
-> blk_mq_complete_request
-> virtblk_request_done
-> blk_mq_end_request
-> blk_mq_free_request
-> blk_queue_exit
blk_queue_exit: q_usage_counter--

>
> Prior to this change pci_device_is_present(VF) returned "false"
> (because the VF Vendor ID is 0xffff); after the change it will return
> "true" (because it will look at the PF Vendor ID instead).
>
> Previously virtio_pci_remove() called virtio_break_device(). I guess
> that meant the virtio I/O operation will never be completed?
>
> But if we don't call virtio_break_device(), the virtio I/O operation
> *will* be completed?
>
> Bjorn

I think so.
In virtblk I think I don't want virtio_break_device() to be called
because it need to complete, but it would be well if it could be
called at the right time.

The problem new is that it's being called in a situation where it
shouldn't, and that's because pci_device_is_present() is giving
incorrect return.

Wei

2022-11-11 23:55:55

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs. Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID. As VFs are present if and only if PF is present,
> just return the value for that device.
>
> Reported-by: Wei Gong <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>

Applied as below to pci/enumeration for v6.2, thanks!

commit 98b04dd0b457 ("PCI: Fix pci_device_is_present() for VFs by checking PF")
Author: Michael S. Tsirkin <[email protected]>
Date: Wed Oct 26 02:11:21 2022 -0400

PCI: Fix pci_device_is_present() for VFs by checking PF

pci_device_is_present() previously didn't work for VFs because it reads the
Vendor and Device ID, which are 0xffff for VFs, which looks like they
aren't present. Check the PF instead.

Wei Gong reported that if virtio I/O is in progress when the driver is
unbound or "0" is written to /sys/.../sriov_numvfs, the virtio I/O
operation hangs, which may result in output like this:

task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002
Call Trace:
schedule+0x4f/0xc0
blk_mq_freeze_queue_wait+0x69/0xa0
blk_mq_freeze_queue+0x1b/0x20
blk_cleanup_queue+0x3d/0xd0
virtblk_remove+0x3c/0xb0 [virtio_blk]
virtio_dev_remove+0x4b/0x80
...
device_unregister+0x1b/0x60
unregister_virtio_device+0x18/0x30
virtio_pci_remove+0x41/0x80
pci_device_remove+0x3e/0xb0

This happened because pci_device_is_present(VF) returned "false" in
virtio_pci_remove(), so it called virtio_break_device(). The broken vq
meant that vring_interrupt() skipped the vq.callback() that would have
completed the virtio I/O operation via virtblk_done().

[bhelgaas: commit log, simplify to always use pci_physfn(), add stable tag]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Wei Gong <[email protected]>
Tested-by: Wei Gong <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: [email protected]

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9f3cc829dfee..fba95486caaf 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6447,6 +6447,8 @@ bool pci_device_is_present(struct pci_dev *pdev)
{
u32 v;

+ /* Check PF if pdev is a VF, since VF Vendor/Device IDs are 0xffff */
+ pdev = pci_physfn(pdev);
if (pci_dev_is_disconnected(pdev))
return false;
return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);

2022-11-12 00:47:46

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> ...

> > Prior to this change pci_device_is_present(VF) returned "false"
> > (because the VF Vendor ID is 0xffff); after the change it will return
> > "true" (because it will look at the PF Vendor ID instead).
> >
> > Previously virtio_pci_remove() called virtio_break_device(). I guess
> > that meant the virtio I/O operation will never be completed?
> >
> > But if we don't call virtio_break_device(), the virtio I/O operation
> > *will* be completed?
>
> It's completed anyway - nothing special happened at the device
> level - but driver does not detect it.
>
> Calling virtio_break_device will mark all queues as broken, as
> a result attempts to check whether operation completed
> will return false.
>
> This probably means we need to work on handling surprise removal
> better in virtio blk - since it looks like actual suprise
> removal will hang too. But that I think is a separate issue.

Yeah, this situation doesn't seem like it's inherently special for
virtio or VFs, so it's a little surprising to see
pci_device_is_present() used there.

Bjorn

2022-11-13 09:24:18

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > ...
>
> > > Prior to this change pci_device_is_present(VF) returned "false"
> > > (because the VF Vendor ID is 0xffff); after the change it will return
> > > "true" (because it will look at the PF Vendor ID instead).
> > >
> > > Previously virtio_pci_remove() called virtio_break_device(). I guess
> > > that meant the virtio I/O operation will never be completed?
> > >
> > > But if we don't call virtio_break_device(), the virtio I/O operation
> > > *will* be completed?
> >
> > It's completed anyway - nothing special happened at the device
> > level - but driver does not detect it.
> >
> > Calling virtio_break_device will mark all queues as broken, as
> > a result attempts to check whether operation completed
> > will return false.
> >
> > This probably means we need to work on handling surprise removal
> > better in virtio blk - since it looks like actual suprise
> > removal will hang too. But that I think is a separate issue.
>
> Yeah, this situation doesn't seem like it's inherently special for
> virtio or VFs, so it's a little surprising to see
> pci_device_is_present() used there.
>
> Bjorn


Just making sure - pci_device_is_present *is* the suggested way
to distinguish between graceful and surprise removal, isn't it?

--
MST


2022-11-15 17:19:40

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

[+cc Lukas; you can probably give a better answer here :)]

On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > ...
> >
> > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > (because the VF Vendor ID is 0xffff); after the change it will return
> > > > "true" (because it will look at the PF Vendor ID instead).
> > > >
> > > > Previously virtio_pci_remove() called virtio_break_device(). I guess
> > > > that meant the virtio I/O operation will never be completed?
> > > >
> > > > But if we don't call virtio_break_device(), the virtio I/O operation
> > > > *will* be completed?
> > >
> > > It's completed anyway - nothing special happened at the device
> > > level - but driver does not detect it.
> > >
> > > Calling virtio_break_device will mark all queues as broken, as
> > > a result attempts to check whether operation completed
> > > will return false.
> > >
> > > This probably means we need to work on handling surprise removal
> > > better in virtio blk - since it looks like actual suprise
> > > removal will hang too. But that I think is a separate issue.
> >
> > Yeah, this situation doesn't seem like it's inherently special for
> > virtio or VFs, so it's a little surprising to see
> > pci_device_is_present() used there.
>
> Just making sure - pci_device_is_present *is* the suggested way
> to distinguish between graceful and surprise removal, isn't it?

I'm not quite sure what you're asking here. A driver would learn
about a graceful removal when its .remove() method is called before
the device is physically removed. The device is still accessible and
everything should just work.

A driver would learn about a surprise removal either by a read result
that is PCI_POSSIBLE_ERROR() or possibly when its .error_detected()
callback is called. The .remove() method will eventually be called
when we destroy the pci_dev.

I guess .remove() might use pci_device_is_present() and assume that if
it returns "true", this is a graceful removal. But that's not
reliable since the device could be physically removed between the
pci_device_is_present() call and the driver's next access to it.

I think the best thing would be for .remove() to just do whatever it
needs to do and look for errors, e.g., using PCI_POSSIBLE_ERROR(),
without relying on pci_device_is_present().

If .remove() wants to avoid doing something that might cause an error,
maybe we should expose pci_dev_is_disconnected(). That's set by the
hotplug remove paths before .remove() is called and feels a little
less racy.

Bjorn

2022-11-16 11:47:54

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

[cc += Parav Pandit, author of 43bb40c5b926]

On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > (because the VF Vendor ID is 0xffff); after the change it will return
> > > > "true" (because it will look at the PF Vendor ID instead).
> > > >
> > > > Previously virtio_pci_remove() called virtio_break_device(). I guess
> > > > that meant the virtio I/O operation will never be completed?
> > > >
> > > > But if we don't call virtio_break_device(), the virtio I/O operation
> > > > *will* be completed?
>
> Just making sure - pci_device_is_present *is* the suggested way
> to distinguish between graceful and surprise removal, isn't it?

No, it's not. Instead of !pci_device_is_present() you really want to
call pci_dev_is_disconnected() instead.

While the fix Bjorn applied for v6.2 may solve the issue and may make
sense on it's own, it's not the solution you're looking for. You want
to swap the call to !pci_device_is_present() with pci_dev_is_disconnected(),
move pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h
and add a Fixes tag referencing 43bb40c5b926.

If you don't want to move pci_dev_is_disconnected(), you can alternatively
check for "pdev->error_state == pci_channel_io_perm_failure" or call
pci_channel_offline(). The latter will also return true though on
transient inaccessibility of the device (e.g. if it's being reset).

The theory of operation is as follows: The PCI layer does indeed know
whether the device was surprise removed or gracefully removed and that
information is passed in the "presence" flag to pciehp_unconfigure_device()
(in drivers/pci/hotplug/pciehp_pci.c). That function does the following:

if (!presence)
pci_walk_bus(parent, pci_dev_set_disconnected, NULL);

In other words, pdev->error_state is set to pci_channel_io_perm_failure
on the entire hierarchy below the hotplug port. And pci_dev_is_disconnected()
simply checks whether that's the device's error_state.

pci_dev_is_disconnected() makes sense if you definitely know the device
is gone and want to skip certain steps or delays on device teardown.
However be aware that the device may be hot-removed after graceful
removal was initiated. In such a situation, pci_dev_is_disconnected()
may return false and you'll try to access the device as normal, even
though it was yanked from the slot after the pci_dev_is_disconnected()
call was performed. Ideally you should be able to cope with such
scenarios as well.

For some more background info, refer to this LWN article (scroll down
to the "Surprise removal" section):
https://lwn.net/Articles/767885/

Thanks,

Lukas

2022-11-17 06:01:11

by Parav Pandit

[permalink] [raw]
Subject: RE: [PATCH v2] pci: fix device presence detection for VFs


> From: Lukas Wunner <[email protected]>
> Sent: Wednesday, November 16, 2022 6:16 AM
>
> [cc += Parav Pandit, author of 43bb40c5b926]
>
> On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > > (because the VF Vendor ID is 0xffff); after the change it will
> > > > > return "true" (because it will look at the PF Vendor ID instead).
> > > > >
> > > > > Previously virtio_pci_remove() called virtio_break_device(). I
> > > > > guess that meant the virtio I/O operation will never be completed?
> > > > >
> > > > > But if we don't call virtio_break_device(), the virtio I/O
> > > > > operation
> > > > > *will* be completed?
> >
> > Just making sure - pci_device_is_present *is* the suggested way to
> > distinguish between graceful and surprise removal, isn't it?
>
> No, it's not. Instead of !pci_device_is_present() you really want to call
> pci_dev_is_disconnected() instead.
>
> While the fix Bjorn applied for v6.2 may solve the issue and may make sense
> on it's own, it's not the solution you're looking for. You want to swap the
> call to !pci_device_is_present() with pci_dev_is_disconnected(), move
> pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and
> add a Fixes tag referencing 43bb40c5b926.
>
> If you don't want to move pci_dev_is_disconnected(), you can alternatively
> check for "pdev->error_state == pci_channel_io_perm_failure" or call
> pci_channel_offline(). The latter will also return true though on transient
> inaccessibility of the device (e.g. if it's being reset).
>
pci_device_is_present() is calling pci_dev_is_disconnected().
pci_dev_is_disconnected() avoids reading the vendor id.
So pci_dev_is_disconnected() looks less strong check.
I see that it can return a valid value on recoverable error case.

In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error?

And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1].

[1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228

Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API.

> The theory of operation is as follows: The PCI layer does indeed know
> whether the device was surprise removed or gracefully removed and that
> information is passed in the "presence" flag to pciehp_unconfigure_device()
> (in drivers/pci/hotplug/pciehp_pci.c). That function does the following:
>
> if (!presence)
> pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>
> In other words, pdev->error_state is set to pci_channel_io_perm_failure on
> the entire hierarchy below the hotplug port. And pci_dev_is_disconnected()
> simply checks whether that's the device's error_state.
>
> pci_dev_is_disconnected() makes sense if you definitely know the device is
> gone and want to skip certain steps or delays on device teardown.
> However be aware that the device may be hot-removed after graceful
> removal was initiated. In such a situation, pci_dev_is_disconnected() may
> return false and you'll try to access the device as normal, even though it was
> yanked from the slot after the pci_dev_is_disconnected() call was
> performed. Ideally you should be able to cope with such scenarios as well.
>
> For some more background info, refer to this LWN article (scroll down to the
> "Surprise removal" section):
> https://lwn.net/Articles/767885/
>
> Thanks,
>
> Lukas

2022-12-19 06:16:47

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Thu, Nov 17, 2022 at 05:36:48AM +0000, Parav Pandit wrote:
>
> > From: Lukas Wunner <[email protected]>
> > Sent: Wednesday, November 16, 2022 6:16 AM
> >
> > [cc += Parav Pandit, author of 43bb40c5b926]
> >
> > On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > > > (because the VF Vendor ID is 0xffff); after the change it will
> > > > > > return "true" (because it will look at the PF Vendor ID instead).
> > > > > >
> > > > > > Previously virtio_pci_remove() called virtio_break_device(). I
> > > > > > guess that meant the virtio I/O operation will never be completed?
> > > > > >
> > > > > > But if we don't call virtio_break_device(), the virtio I/O
> > > > > > operation
> > > > > > *will* be completed?
> > >
> > > Just making sure - pci_device_is_present *is* the suggested way to
> > > distinguish between graceful and surprise removal, isn't it?
> >
> > No, it's not. Instead of !pci_device_is_present() you really want to call
> > pci_dev_is_disconnected() instead.
> >
> > While the fix Bjorn applied for v6.2 may solve the issue and may make sense
> > on it's own, it's not the solution you're looking for. You want to swap the
> > call to !pci_device_is_present() with pci_dev_is_disconnected(), move
> > pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and
> > add a Fixes tag referencing 43bb40c5b926.
> >
> > If you don't want to move pci_dev_is_disconnected(), you can alternatively
> > check for "pdev->error_state == pci_channel_io_perm_failure" or call
> > pci_channel_offline(). The latter will also return true though on transient
> > inaccessibility of the device (e.g. if it's being reset).
> >
> pci_device_is_present() is calling pci_dev_is_disconnected().
> pci_dev_is_disconnected() avoids reading the vendor id.
> So pci_dev_is_disconnected() looks less strong check.
> I see that it can return a valid value on recoverable error case.
>
> In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error?
>
> And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1].
>
> [1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228
>
> Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API.

Bjorn, Lukas, what's your take on this idea?
Thanks!


> > The theory of operation is as follows: The PCI layer does indeed know
> > whether the device was surprise removed or gracefully removed and that
> > information is passed in the "presence" flag to pciehp_unconfigure_device()
> > (in drivers/pci/hotplug/pciehp_pci.c). That function does the following:
> >
> > if (!presence)
> > pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
> >
> > In other words, pdev->error_state is set to pci_channel_io_perm_failure on
> > the entire hierarchy below the hotplug port. And pci_dev_is_disconnected()
> > simply checks whether that's the device's error_state.
> >
> > pci_dev_is_disconnected() makes sense if you definitely know the device is
> > gone and want to skip certain steps or delays on device teardown.
> > However be aware that the device may be hot-removed after graceful
> > removal was initiated. In such a situation, pci_dev_is_disconnected() may
> > return false and you'll try to access the device as normal, even though it was
> > yanked from the slot after the pci_dev_is_disconnected() call was
> > performed. Ideally you should be able to cope with such scenarios as well.
> >
> > For some more background info, refer to this LWN article (scroll down to the
> > "Surprise removal" section):
> > https://lwn.net/Articles/767885/
> >
> > Thanks,
> >
> > Lukas

2022-12-19 08:52:00

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2] pci: fix device presence detection for VFs

On Mon, Dec 19, 2022 at 12:56:15AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 17, 2022 at 05:36:48AM +0000, Parav Pandit wrote:
> > > From: Lukas Wunner <[email protected]>
> > > Sent: Wednesday, November 16, 2022 6:16 AM
> > >
> > > [cc += Parav Pandit, author of 43bb40c5b926]
> > >
> > > On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> > > > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > > > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > > > > (because the VF Vendor ID is 0xffff); after the change it will
> > > > > > > return "true" (because it will look at the PF Vendor ID instead).
> > > > > > >
> > > > > > > Previously virtio_pci_remove() called virtio_break_device(). I
> > > > > > > guess that meant the virtio I/O operation will never be completed?
> > > > > > >
> > > > > > > But if we don't call virtio_break_device(), the virtio I/O
> > > > > > > operation
> > > > > > > *will* be completed?
> > > >
> > > > Just making sure - pci_device_is_present *is* the suggested way to
> > > > distinguish between graceful and surprise removal, isn't it?
> > >
> > > No, it's not. Instead of !pci_device_is_present() you really want to call
> > > pci_dev_is_disconnected() instead.
> > >
> > > While the fix Bjorn applied for v6.2 may solve the issue and may make sense
> > > on it's own, it's not the solution you're looking for. You want to swap the
> > > call to !pci_device_is_present() with pci_dev_is_disconnected(), move
> > > pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and
> > > add a Fixes tag referencing 43bb40c5b926.
> > >
> > > If you don't want to move pci_dev_is_disconnected(), you can alternatively
> > > check for "pdev->error_state == pci_channel_io_perm_failure" or call
> > > pci_channel_offline(). The latter will also return true though on transient
> > > inaccessibility of the device (e.g. if it's being reset).
> > >
> > pci_device_is_present() is calling pci_dev_is_disconnected().
> > pci_dev_is_disconnected() avoids reading the vendor id.
> > So pci_dev_is_disconnected() looks less strong check.
> > I see that it can return a valid value on recoverable error case.
> >
> > In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error?
> >
> > And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1].
> >
> > [1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228
> >
> > Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API.
>
> Bjorn, Lukas, what's your take on this idea?

I don't really know what to add to my e-mail of Nov 16
(quoted here in full).

Yes, pci_channel_offline() returns true on transient and permanent
failure. Whether that's what you want, depends on your use case.
If you want to check for a surprise-removed device, then you only
want to check for permanent failure, so pci_channel_offline() is
not correct and you should rather check for
"pdev->error_state == pci_channel_io_perm_failure" or move
pci_dev_is_disconnected() to include/linux/pci.h. But again,
I've already explained this in my e-mail ov Nov 16, so I don't
know what's unclear.

Thanks,

Lukas

> > > The theory of operation is as follows: The PCI layer does indeed know
> > > whether the device was surprise removed or gracefully removed and that
> > > information is passed in the "presence" flag to pciehp_unconfigure_device()
> > > (in drivers/pci/hotplug/pciehp_pci.c). That function does the following:
> > >
> > > if (!presence)
> > > pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
> > >
> > > In other words, pdev->error_state is set to pci_channel_io_perm_failure on
> > > the entire hierarchy below the hotplug port. And pci_dev_is_disconnected()
> > > simply checks whether that's the device's error_state.
> > >
> > > pci_dev_is_disconnected() makes sense if you definitely know the device is
> > > gone and want to skip certain steps or delays on device teardown.
> > > However be aware that the device may be hot-removed after graceful
> > > removal was initiated. In such a situation, pci_dev_is_disconnected() may
> > > return false and you'll try to access the device as normal, even though it was
> > > yanked from the slot after the pci_dev_is_disconnected() call was
> > > performed. Ideally you should be able to cope with such scenarios as well.
> > >
> > > For some more background info, refer to this LWN article (scroll down to the
> > > "Surprise removal" section):
> > > https://lwn.net/Articles/767885/
> > >
> > > Thanks,
> > >
> > > Lukas