2020-01-10 07:43:29

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH 1/2] igb: Use device_lock() insead of rtnl_lock()

Commit 9474933caf21 ("igb: close/suspend race in netif_device_detach")
fixed race condition between close and power management ops by using
rtnl_lock().

However we can achieve the same by using device_lock() since all power
management ops are protected by device_lock().

This fix is a preparation for next patch, to prevent a dead lock under
rtnl_lock() when calling runtime resume routine.

Signed-off-by: Kai-Heng Feng <[email protected]>
---
drivers/net/ethernet/intel/igb/igb_main.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b46bff8fe056..3750e2b926b1 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4026,8 +4026,13 @@ static int __igb_close(struct net_device *netdev, bool suspending)

int igb_close(struct net_device *netdev)
{
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ struct device *dev = &adapter->pdev->dev;
+
+ device_lock(dev);
if (netif_device_present(netdev) || netdev->dismantle)
return __igb_close(netdev, false);
+ device_unlock(dev);
return 0;
}

@@ -8760,7 +8765,6 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake,
u32 wufc = runtime ? E1000_WUFC_LNKC : adapter->wol;
bool wake;

- rtnl_lock();
netif_device_detach(netdev);

if (netif_running(netdev))
@@ -8769,7 +8773,6 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake,
igb_ptp_suspend(adapter);

igb_clear_interrupt_scheme(adapter);
- rtnl_unlock();

status = rd32(E1000_STATUS);
if (status & E1000_STATUS_LU)
@@ -8897,13 +8900,11 @@ static int __maybe_unused igb_resume(struct device *dev)

wr32(E1000_WUS, ~0);

- rtnl_lock();
if (!err && netif_running(netdev))
err = __igb_open(netdev, true);

if (!err)
netif_device_attach(netdev);
- rtnl_unlock();

return err;
}
--
2.17.1


2020-01-10 07:43:42

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH 2/2] ethtool: Call begin() and complete() in __ethtool_get_link_ksettings()

Device like igb gets runtime suspended when there's no link partner. We
can't get correct speed under that state:
$ cat /sys/class/net/enp3s0/speed
1000

In addition to that, an error can also be spotted in dmesg:
[ 385.991957] igb 0000:03:00.0 enp3s0: PCIe link lost

It's because the igb device doesn't get runtime resumed before calling
get_link_ksettings().

So let's call begin() and complete() like what dev_ethtool() does, to
runtime resume/suspend or power up/down the device properly.

Once this fix is in place, igb can show the speed correctly without link
partner:
$ cat /sys/class/net/enp3s0/speed
-1

Signed-off-by: Kai-Heng Feng <[email protected]>
---
net/ethtool/ioctl.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 182bffbffa78..c768dbf45fc4 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -423,13 +423,26 @@ struct ethtool_link_usettings {
int __ethtool_get_link_ksettings(struct net_device *dev,
struct ethtool_link_ksettings *link_ksettings)
{
+ int rc;
+
ASSERT_RTNL();

if (!dev->ethtool_ops->get_link_ksettings)
return -EOPNOTSUPP;

+ if (dev->ethtool_ops->begin) {
+ rc = dev->ethtool_ops->begin(dev);
+ if (rc < 0)
+ return rc;
+ }
+
memset(link_ksettings, 0, sizeof(*link_ksettings));
- return dev->ethtool_ops->get_link_ksettings(dev, link_ksettings);
+ rc = dev->ethtool_ops->get_link_ksettings(dev, link_ksettings);
+
+ if (dev->ethtool_ops->complete)
+ dev->ethtool_ops->complete(dev);
+
+ return rc;
}
EXPORT_SYMBOL(__ethtool_get_link_ksettings);

--
2.17.1

2020-01-13 12:52:38

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH 2/2] ethtool: Call begin() and complete() in __ethtool_get_link_ksettings()

On Fri, Jan 10, 2020 at 03:41:59PM +0800, Kai-Heng Feng wrote:
> Device like igb gets runtime suspended when there's no link partner. We
> can't get correct speed under that state:
> $ cat /sys/class/net/enp3s0/speed
> 1000
>
> In addition to that, an error can also be spotted in dmesg:
> [ 385.991957] igb 0000:03:00.0 enp3s0: PCIe link lost
>
> It's because the igb device doesn't get runtime resumed before calling
> get_link_ksettings().
>
> So let's call begin() and complete() like what dev_ethtool() does, to
> runtime resume/suspend or power up/down the device properly.
>
> Once this fix is in place, igb can show the speed correctly without link
> partner:
> $ cat /sys/class/net/enp3s0/speed
> -1

I agree that we definitely should make sure ->begin() and ->complete()
are always called around ethtool_ops calls. But even if nesting should
be safe now (for in-tree drivers, that is), I'm not really happy about
calling them even in places where we positively know we are always
inside a begin / complete block as e.g. vxlan or net_failover do. (And
also linkinfo.c and linkmodes.c but it may be easier to call
->get_link_ksettings() directly there.)

How about having two helpers: one simple for "ethtool context" where we
know we already are between ->begin() and ->complete() and one with the
begin/complete calls for the rest? Another interesting question is if
any of the callers which do not have their own begin()/complete() wrap
does actually need anything more than speed and duplex (I didn't do
a full check).

Michal

> Signed-off-by: Kai-Heng Feng <[email protected]>
> ---
> net/ethtool/ioctl.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
> index 182bffbffa78..c768dbf45fc4 100644
> --- a/net/ethtool/ioctl.c
> +++ b/net/ethtool/ioctl.c
> @@ -423,13 +423,26 @@ struct ethtool_link_usettings {
> int __ethtool_get_link_ksettings(struct net_device *dev,
> struct ethtool_link_ksettings *link_ksettings)
> {
> + int rc;
> +
> ASSERT_RTNL();
>
> if (!dev->ethtool_ops->get_link_ksettings)
> return -EOPNOTSUPP;
>
> + if (dev->ethtool_ops->begin) {
> + rc = dev->ethtool_ops->begin(dev);
> + if (rc < 0)
> + return rc;
> + }
> +
> memset(link_ksettings, 0, sizeof(*link_ksettings));
> - return dev->ethtool_ops->get_link_ksettings(dev, link_ksettings);
> + rc = dev->ethtool_ops->get_link_ksettings(dev, link_ksettings);
> +
> + if (dev->ethtool_ops->complete)
> + dev->ethtool_ops->complete(dev);
> +
> + return rc;
> }
> EXPORT_SYMBOL(__ethtool_get_link_ksettings);
>
> --
> 2.17.1
>

2020-01-14 10:20:42

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH 2/2] ethtool: Call begin() and complete() in __ethtool_get_link_ksettings()

Hi,

> On Jan 13, 2020, at 20:51, Michal Kubecek <[email protected]> wrote:
>
> On Fri, Jan 10, 2020 at 03:41:59PM +0800, Kai-Heng Feng wrote:
>> Device like igb gets runtime suspended when there's no link partner. We
>> can't get correct speed under that state:
>> $ cat /sys/class/net/enp3s0/speed
>> 1000
>>
>> In addition to that, an error can also be spotted in dmesg:
>> [ 385.991957] igb 0000:03:00.0 enp3s0: PCIe link lost
>>
>> It's because the igb device doesn't get runtime resumed before calling
>> get_link_ksettings().
>>
>> So let's call begin() and complete() like what dev_ethtool() does, to
>> runtime resume/suspend or power up/down the device properly.
>>
>> Once this fix is in place, igb can show the speed correctly without link
>> partner:
>> $ cat /sys/class/net/enp3s0/speed
>> -1
>
> I agree that we definitely should make sure ->begin() and ->complete()
> are always called around ethtool_ops calls. But even if nesting should
> be safe now (for in-tree drivers, that is), I'm not really happy about
> calling them even in places where we positively know we are always
> inside a begin / complete block as e.g. vxlan or net_failover do. (And
> also linkinfo.c and linkmodes.c but it may be easier to call
> ->get_link_ksettings() directly there.)

Ok. Maybe use set_bit() to know it's inside of begin() and complete()?

>
> How about having two helpers: one simple for "ethtool context" where we
> know we already are between ->begin() and ->complete() and one with the
> begin/complete calls for the rest?

Or I can rebase this patch on top of the refactoring work.

Kai-Heng

> Another interesting question is if
> any of the callers which do not have their own begin()/complete() wrap
> does actually need anything more than speed and duplex (I didn't do
> a full check).

For sysfs yes, I am not sure about other cases though.

Kai-Heng

>
> Michal
>
>> Signed-off-by: Kai-Heng Feng <[email protected]>
>> ---
>> net/ethtool/ioctl.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
>> index 182bffbffa78..c768dbf45fc4 100644
>> --- a/net/ethtool/ioctl.c
>> +++ b/net/ethtool/ioctl.c
>> @@ -423,13 +423,26 @@ struct ethtool_link_usettings {
>> int __ethtool_get_link_ksettings(struct net_device *dev,
>> struct ethtool_link_ksettings *link_ksettings)
>> {
>> + int rc;
>> +
>> ASSERT_RTNL();
>>
>> if (!dev->ethtool_ops->get_link_ksettings)
>> return -EOPNOTSUPP;
>>
>> + if (dev->ethtool_ops->begin) {
>> + rc = dev->ethtool_ops->begin(dev);
>> + if (rc < 0)
>> + return rc;
>> + }
>> +
>> memset(link_ksettings, 0, sizeof(*link_ksettings));
>> - return dev->ethtool_ops->get_link_ksettings(dev, link_ksettings);
>> + rc = dev->ethtool_ops->get_link_ksettings(dev, link_ksettings);
>> +
>> + if (dev->ethtool_ops->complete)
>> + dev->ethtool_ops->complete(dev);
>> +
>> + return rc;
>> }
>> EXPORT_SYMBOL(__ethtool_get_link_ksettings);
>>
>> --
>> 2.17.1
>>