2020-04-30 12:05:47

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH 0/1] net/mlx5: Call pci_disable_sriov() on remove

Hello,

I'm currently working on improvements in PF-VF handling on s390. One thing that
may be a bit special for us is that the s390 hotplug code supports shutting
down a single PF of a multi-function device such as a ConnectX-5.
During testing I found that the mlx5_core driver does not call
pci_disable_sriov() in its remove handler which is called on the PF via
pci_stop_and_remove_bus_device() on an orderly hot unplug.

Following is a patch to fix this, I want to point out however that I'm not
entirely sure about the effect of clear_vfs that distinguishes
mlx5_sriov_detach() from mlx5_sriov_disable() only that the former is called
from the remove handler and it doesn't call pci_disable_sriov().
That however is required according to Documentation/PCI/pci-iov-howto.rst

I've only tested this on top of my currently still internal PF-VF rework so
I might also be totally missing something here in that case excuse my
ignorance.

Best regards,
Niklas Schnelle

Niklas Schnelle (1):
net/mlx5: Call pci_disable_sriov() on remove

drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--
2.17.1


2020-04-30 12:05:59

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH 1/1] net/mlx5: Call pci_disable_sriov() on remove

as described in Documentation/PCI/pci-iov-howto.rst a driver with SR-IOV
support should call pci_disable_sriov() in the remove handler.
Otherwise removing a PF (e.g. via pci_stop_and_remove_bus_device()) with
attached VFs does not properly shut the VFs down before shutting down
the PF. This leads to the VF drivers handling defunct devices and
accompanying error messages.

In the current code pci_disable_sriov() is already called in
mlx5_sriov_disable() but not in mlx5_sriov_detach() which is called from
the remove handler. Fix this by moving the pci_disable_sriov() call into
mlx5_device_disable_sriov() which is called by both.

Signed-off-by: Niklas Schnelle <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index 3094d20297a9..2401961c9f5b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -114,6 +114,8 @@ mlx5_device_disable_sriov(struct mlx5_core_dev *dev, int num_vfs, bool clear_vf)
int err;
int vf;

+ pci_disable_sriov(dev->pdev);
+
for (vf = num_vfs - 1; vf >= 0; vf--) {
if (!sriov->vfs_ctx[vf].enabled)
continue;
@@ -156,7 +158,6 @@ static void mlx5_sriov_disable(struct pci_dev *pdev)
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
int num_vfs = pci_num_vf(dev->pdev);

- pci_disable_sriov(pdev);
mlx5_device_disable_sriov(dev, num_vfs, true);
}

--
2.17.1

2020-04-30 16:00:06

by Saeed Mahameed

[permalink] [raw]
Subject: Re: [PATCH 1/1] net/mlx5: Call pci_disable_sriov() on remove

On Thu, 2020-04-30 at 14:03 +0200, Niklas Schnelle wrote:
> as described in Documentation/PCI/pci-iov-howto.rst a driver with SR-
> IOV
> support should call pci_disable_sriov() in the remove handler.

Hi Niklas,

looking at the documentation, it doesn't say "should" it just gives the
code as example.

> Otherwise removing a PF (e.g. via pci_stop_and_remove_bus_device())
> with
> attached VFs does not properly shut the VFs down before shutting down
> the PF. This leads to the VF drivers handling defunct devices and
> accompanying error messages.
>

Which should be the admin responsibility .. if the admin want to do
this, then let it be.. why block him ?

our mlx5 driver in the vf handles this gracefully and once pf
driver/device is back online the vf driver quickly recovers.

> In the current code pci_disable_sriov() is already called in
> mlx5_sriov_disable() but not in mlx5_sriov_detach() which is called
> from
> the remove handler. Fix this by moving the pci_disable_sriov() call
> into
> mlx5_device_disable_sriov() which is called by both.
>
> Signed-off-by: Niklas Schnelle <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
> b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
> index 3094d20297a9..2401961c9f5b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
> @@ -114,6 +114,8 @@ mlx5_device_disable_sriov(struct mlx5_core_dev
> *dev, int num_vfs, bool clear_vf)
> int err;
> int vf;
>
> + pci_disable_sriov(dev->pdev);
> +
> for (vf = num_vfs - 1; vf >= 0; vf--) {
> if (!sriov->vfs_ctx[vf].enabled)
> continue;
> @@ -156,7 +158,6 @@ static void mlx5_sriov_disable(struct pci_dev
> *pdev)
> struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
> int num_vfs = pci_num_vf(dev->pdev);
>
> - pci_disable_sriov(pdev);

this patch is no good as it breaks code symmetry.. and could lead to
many new issues.


> mlx5_device_disable_sriov(dev, num_vfs, true);
> }
>

2020-04-30 16:15:27

by Saeed Mahameed

[permalink] [raw]
Subject: Re: [PATCH 0/1] net/mlx5: Call pci_disable_sriov() on remove

On Thu, 2020-04-30 at 14:03 +0200, Niklas Schnelle wrote:
> Hello,
>
> I'm currently working on improvements in PF-VF handling on s390. One
> thing that
> may be a bit special for us is that the s390 hotplug code supports
> shutting
> down a single PF of a multi-function device such as a ConnectX-5.
> During testing I found that the mlx5_core driver does not call
> pci_disable_sriov() in its remove handler which is called on the PF
> via
> pci_stop_and_remove_bus_device() on an orderly hot unplug.

Actually what happens if you call pci_disable_sriov() when there are
VFs attached to VMs ?

>
> Following is a patch to fix this, I want to point out however that
> I'm not
> entirely sure about the effect of clear_vfs that distinguishes
> mlx5_sriov_detach() from mlx5_sriov_disable() only that the former is
> called
> from the remove handler and it doesn't call pci_disable_sriov().
> That however is required according to Documentation/PCI/pci-iov-
> howto.rst
>

The Doc doesn't say "required", so the question is, is it really
required ?

because the way i see it, it is the admin responsibility to clear out
vfs before shutting down the PF driver.

as explained in the patch review, mlx5 vf driver is resilient of such
behavior and once the device is back online the vf driver quickly
recovers, so it is actually a feature and not a bug :)

There are many reasons why the admin would want to do this:

1) Fast Firmware upgrade
2) Fast Hyper-visor upgrade/refresh
3) PF debugging

So you can quickly reset the PF driver without the need to wait for all
VFs or manually detach-attach them.


> I've only tested this on top of my currently still internal PF-VF
> rework so
> I might also be totally missing something here in that case excuse my
> ignorance.
>
> Best regards,
> Niklas Schnelle
>
> Niklas Schnelle (1):
> net/mlx5: Call pci_disable_sriov() on remove
>
> drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>

2020-04-30 19:50:28

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH 1/1] net/mlx5: Call pci_disable_sriov() on remove



On 4/30/20 5:58 PM, Saeed Mahameed wrote:
> On Thu, 2020-04-30 at 14:03 +0200, Niklas Schnelle wrote:
>> as described in Documentation/PCI/pci-iov-howto.rst a driver with SR-
>> IOV
>> support should call pci_disable_sriov() in the remove handler.
>
> Hi Niklas,
>
> looking at the documentation, it doesn't say "should" it just gives the
> code as example.
>
>> Otherwise removing a PF (e.g. via pci_stop_and_remove_bus_device())
>> with
>> attached VFs does not properly shut the VFs down before shutting down
>> the PF. This leads to the VF drivers handling defunct devices and
>> accompanying error messages.
>>
>
> Which should be the admin responsibility .. if the admin want to do
> this, then let it be.. why block him ?
>
> our mlx5 driver in the vf handles this gracefully and once pf
> driver/device is back online the vf driver quickly recovers.
See my answer to your other answer ;-)
>
>> In the current code pci_disable_sriov() is already called in
>> mlx5_sriov_disable() but not in mlx5_sriov_detach() which is called
>> from
>> the remove handler. Fix this by moving the pci_disable_sriov() call
>> into
>> mlx5_device_disable_sriov() which is called by both.
>>
>> Signed-off-by: Niklas Schnelle <[email protected]>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>> index 3094d20297a9..2401961c9f5b 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>> @@ -114,6 +114,8 @@ mlx5_device_disable_sriov(struct mlx5_core_dev
>> *dev, int num_vfs, bool clear_vf)
>> int err;
>> int vf;
>>
>> + pci_disable_sriov(dev->pdev);
>> +
>> for (vf = num_vfs - 1; vf >= 0; vf--) {
>> if (!sriov->vfs_ctx[vf].enabled)
>> continue;
>> @@ -156,7 +158,6 @@ static void mlx5_sriov_disable(struct pci_dev
>> *pdev)
>> struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
>> int num_vfs = pci_num_vf(dev->pdev);
>>
>> - pci_disable_sriov(pdev);
>
> this patch is no good as it breaks code symmetry.. and could lead to
> many new issues.
Ah you're right I totally missed that there is a matching pci_enable_sriov() in
mlx5_enable_sriov() haven't used these myself before and since it wasn't in the
documentation example I somehow expected it to happen in non-driver code,
so for symmetry that would also have to move to mlx5_device_enable_sriov(),
sorry for the oversight.
>
>
>> mlx5_device_disable_sriov(dev, num_vfs, true);
>> }
>>
>

2020-04-30 20:27:35

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH 0/1] net/mlx5: Call pci_disable_sriov() on remove



On 4/30/20 6:13 PM, Saeed Mahameed wrote:
> On Thu, 2020-04-30 at 14:03 +0200, Niklas Schnelle wrote:
>> Hello,
>>
>> I'm currently working on improvements in PF-VF handling on s390. One
>> thing that
>> may be a bit special for us is that the s390 hotplug code supports
>> shutting
>> down a single PF of a multi-function device such as a ConnectX-5.
>> During testing I found that the mlx5_core driver does not call
>> pci_disable_sriov() in its remove handler which is called on the PF
>> via
>> pci_stop_and_remove_bus_device() on an orderly hot unplug.
>
> Actually what happens if you call pci_disable_sriov() when there are
> VFs attached to VMs ?
So the scenario I looked at was that the admin does a

echo 0 > /sys/bus/pci/slots/<pfslot>/power

for the relevant PFs (Note that on our systems
PFs of a single card could be passed through to different LPARs).
And then physically removes the adapter or moves the PF(s) to a different LPAR.
In this scenario the orderly shutdown would hopefully help QEMU/libvirt to
trigger whatever orderly shutdown it can do but that testing is still on my todo list.
I assumed that because the pci_disable_sriov() call is basically the only
thing the example shows for the remove case it would trigger the intended actions
in the common code including for vfio-pci, udev events and so on.
>
>>
>> Following is a patch to fix this, I want to point out however that
>> I'm not
>> entirely sure about the effect of clear_vfs that distinguishes
>> mlx5_sriov_detach() from mlx5_sriov_disable() only that the former is
>> called
>> from the remove handler and it doesn't call pci_disable_sriov().
>> That however is required according to Documentation/PCI/pci-iov-
>> howto.rst
>>
>
> The Doc doesn't say "required", so the question is, is it really
> required ?
Yes but as I wrote above it is about the only thing the example shows
for the removal procedure so there is definitely a clash between
what the mlx5 driver does and what the documentation suggests.
So I think even though I agree my patch definitely wasn't ready, this
is surely worth thinking about for a second or two.
>
> because the way i see it, it is the admin responsibility to clear out
> vfs before shutting down the PF driver.
>
> as explained in the patch review, mlx5 vf driver is resilient of such
> behavior and once the device is back online the vf driver quickly
> recovers, so it is actually a feature and not a bug :)
>
> There are many reasons why the admin would want to do this:
>
> 1) Fast Firmware upgrade
> 2) Fast Hyper-visor upgrade/refresh
> 3) PF debugging
>
> So you can quickly reset the PF driver without the need to wait for all
> VFs or manually detach-attach them.
So the behavior I was seeing that triggered looking into this is that the VF drivers
weren't just spewing error messages. After hitting some relatively small timeout
the VFs would then start to disappear (maybe 20 seconds or so) and in my testing
I didn't manage to reactivate them.
That might have been due to some error on my part though and I didn't try very
hard because I did not assume that this should work. So thank you for
pointing out that something like that is actually a scenario you have in mind.
Now that I know that I'll definitely look into this a bit more.

So thank you for your input I'm still not sure what the right approach for the
shutdown really is but this is certainly interesting.
>
>
>> I've only tested this on top of my currently still internal PF-VF
>> rework so
>> I might also be totally missing something here in that case excuse my
>> ignorance.
>>
>> Best regards,
>> Niklas Schnelle
>>
>> Niklas Schnelle (1):
>> net/mlx5: Call pci_disable_sriov() on remove
>>
>> drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>

2020-04-30 22:34:01

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH 1/1] net/mlx5: Call pci_disable_sriov() on remove



On 4/30/20 9:47 PM, Niklas Schnelle wrote:
>
>
> On 4/30/20 5:58 PM, Saeed Mahameed wrote:
>> On Thu, 2020-04-30 at 14:03 +0200, Niklas Schnelle wrote:
>>> as described in Documentation/PCI/pci-iov-howto.rst a driver with SR-
>>> IOV
>>> support should call pci_disable_sriov() in the remove handler.
>>
>> Hi Niklas,
>>
>> looking at the documentation, it doesn't say "should" it just gives the
>> code as example.
>>
>>> Otherwise removing a PF (e.g. via pci_stop_and_remove_bus_device())
>>> with
>>> attached VFs does not properly shut the VFs down before shutting down
>>> the PF. This leads to the VF drivers handling defunct devices and
>>> accompanying error messages.
>>>
>>
>> Which should be the admin responsibility .. if the admin want to do
>> this, then let it be.. why block him ?
>>
>> our mlx5 driver in the vf handles this gracefully and once pf
>> driver/device is back online the vf driver quickly recovers.
> See my answer to your other answer ;-)
>>
>>> In the current code pci_disable_sriov() is already called in
>>> mlx5_sriov_disable() but not in mlx5_sriov_detach() which is called
>>> from
>>> the remove handler. Fix this by moving the pci_disable_sriov() call
>>> into
>>> mlx5_device_disable_sriov() which is called by both.
>>>
>>> Signed-off-by: Niklas Schnelle <[email protected]>
>>> ---
>>> drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>>> index 3094d20297a9..2401961c9f5b 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>>> @@ -114,6 +114,8 @@ mlx5_device_disable_sriov(struct mlx5_core_dev
>>> *dev, int num_vfs, bool clear_vf)
>>> int err;
>>> int vf;
>>>
>>> + pci_disable_sriov(dev->pdev);
>>> +
>>> for (vf = num_vfs - 1; vf >= 0; vf--) {
>>> if (!sriov->vfs_ctx[vf].enabled)
>>> continue;
>>> @@ -156,7 +158,6 @@ static void mlx5_sriov_disable(struct pci_dev
>>> *pdev)
>>> struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
>>> int num_vfs = pci_num_vf(dev->pdev);
>>>
>>> - pci_disable_sriov(pdev);
>>
>> this patch is no good as it breaks code symmetry.. and could lead to
>> many new issues.
> Ah you're right I totally missed that there is a matching pci_enable_sriov() in
> mlx5_enable_sriov() haven't used these myself before and since it wasn't in the
> documentation example I somehow expected it to happen in non-driver code,
aaand it actually is in the documentation example and I definitely sent this
when it wasn't ready, sorry again…
> so for symmetry that would also have to move to mlx5_device_enable_sriov(),
> sorry for the oversight.
>>
>>
>>> mlx5_device_disable_sriov(dev, num_vfs, true);
>>> }
>>>
>>