When an interface is brought up which was previously suspended (via
runtime PM), it would hang. This happens because napi_disable is called
before napi_enable.
Solve this by avoiding napi_enable in the resume during open function
(netif_running is true when open is called, IFF_UP is set after a
successful open; netif_running is false when close is called, but IFF_UP
is then still set).
While at it, remove WORK_ENABLE check from rtl8152_open (introduced with
the original change) because it cannot happen:
- After this patch, runtime resume will not set it during rtl8152_open.
- When link is up, rtl8152_open is not called.
- When link is down during system/auto suspend/resume, it is not set.
Fixes: 41cec84cf285 ("r8152: don't enable napi before rx ready")
Link: https://lkml.kernel.org/r/20151205105912.GA1766@al
Signed-off-by: Peter Wu <[email protected]>
---
v2: moved rtl_runtime_suspend_enable from close to rtl8152_suspend
Tested with the scenario from
https://lkml.kernel.org/r/20151208111008.GA18728@al
(added debug prints to verify the suspend/resume moments)
---
drivers/net/usb/r8152.c | 21 +++------------------
1 file changed, 3 insertions(+), 18 deletions(-)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index d9427ca..2e32c41 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3067,17 +3067,6 @@ static int rtl8152_open(struct net_device *netdev)
mutex_lock(&tp->control);
- /* The WORK_ENABLE may be set when autoresume occurs */
- if (test_bit(WORK_ENABLE, &tp->flags)) {
- clear_bit(WORK_ENABLE, &tp->flags);
- usb_kill_urb(tp->intr_urb);
- cancel_delayed_work_sync(&tp->schedule);
-
- /* disable the tx/rx, if the workqueue has enabled them. */
- if (netif_carrier_ok(netdev))
- tp->rtl_ops.disable(tp);
- }
-
tp->rtl_ops.up(tp);
rtl8152_set_speed(tp, AUTONEG_ENABLE,
@@ -3124,12 +3113,6 @@ static int rtl8152_close(struct net_device *netdev)
} else {
mutex_lock(&tp->control);
- /* The autosuspend may have been enabled and wouldn't
- * be disable when autoresume occurs, because the
- * netif_running() would be false.
- */
- rtl_runtime_suspend_enable(tp, false);
-
tp->rtl_ops.down(tp);
mutex_unlock(&tp->control);
@@ -3512,7 +3495,7 @@ static int rtl8152_resume(struct usb_interface *intf)
netif_device_attach(tp->netdev);
}
- if (netif_running(tp->netdev)) {
+ if (netif_running(tp->netdev) && tp->netdev->flags & IFF_UP) {
if (test_bit(SELECTIVE_SUSPEND, &tp->flags)) {
rtl_runtime_suspend_enable(tp, false);
clear_bit(SELECTIVE_SUSPEND, &tp->flags);
@@ -3532,6 +3515,8 @@ static int rtl8152_resume(struct usb_interface *intf)
}
usb_submit_urb(tp->intr_urb, GFP_KERNEL);
} else if (test_bit(SELECTIVE_SUSPEND, &tp->flags)) {
+ if (tp->netdev->flags & IFF_UP)
+ rtl_runtime_suspend_enable(tp, false);
clear_bit(SELECTIVE_SUSPEND, &tp->flags);
}
--
2.6.3
Peter Wu [mailto:[email protected]]
> Sent: Tuesday, December 08, 2015 7:18 PM
>
> When an interface is brought up which was previously suspended (via
> runtime PM), it would hang. This happens because napi_disable is called
> before napi_enable.
>
> Solve this by avoiding napi_enable in the resume during open function
> (netif_running is true when open is called, IFF_UP is set after a
> successful open; netif_running is false when close is called, but IFF_UP
> is then still set).
>
> While at it, remove WORK_ENABLE check from rtl8152_open (introduced with
> the original change) because it cannot happen:
>
> - After this patch, runtime resume will not set it during rtl8152_open.
> - When link is up, rtl8152_open is not called.
> - When link is down during system/auto suspend/resume, it is not set.
>
> Fixes: 41cec84cf285 ("r8152: don't enable napi before rx ready")
> Link: https://lkml.kernel.org/r/20151205105912.GA1766@al
> Signed-off-by: Peter Wu <[email protected]>
> ---
> v2: moved rtl_runtime_suspend_enable from close to rtl8152_suspend
Acked-by: Hayes Wang <[email protected]>
Best Regards,
Hayes
On Tue, Dec 08, 2015 at 12:39:12PM +0000, Hayes Wang wrote:
> Peter Wu [mailto:[email protected]]
> > Sent: Tuesday, December 08, 2015 7:18 PM
> >
> > When an interface is brought up which was previously suspended (via
> > runtime PM), it would hang. This happens because napi_disable is called
> > before napi_enable.
> >
> > Solve this by avoiding napi_enable in the resume during open function
> > (netif_running is true when open is called, IFF_UP is set after a
> > successful open; netif_running is false when close is called, but IFF_UP
> > is then still set).
> >
> > While at it, remove WORK_ENABLE check from rtl8152_open (introduced with
> > the original change) because it cannot happen:
> >
> > - After this patch, runtime resume will not set it during rtl8152_open.
> > - When link is up, rtl8152_open is not called.
> > - When link is down during system/auto suspend/resume, it is not set.
> >
> > Fixes: 41cec84cf285 ("r8152: don't enable napi before rx ready")
> > Link: https://lkml.kernel.org/r/20151205105912.GA1766@al
> > Signed-off-by: Peter Wu <[email protected]>
> > ---
> > v2: moved rtl_runtime_suspend_enable from close to rtl8152_suspend
>
> Acked-by: Hayes Wang <[email protected]>
>
> Best Regards,
> Hayes
I found another problem with runtime PM. When a device is suspended via
autosuspend and a system suspend takes place, there is no network I/O
after resume. Triggering a renegotiation (ethtool -r eth1) brings back
network activity.
Can you have a look? I guess that reset_resume needs different
treatment, but don't know how to do it properly as suspend is not called
before system reset (because the device is apparently already in
suspended state).
I think that this new issue can be addressed in a different patch, this
patch solves a more severe lockup. Opinions?
--
Kind regards,
Peter Wu
https://lekensteyn.nl
From: Peter Wu <[email protected]>
Date: Tue, 8 Dec 2015 12:17:42 +0100
> When an interface is brought up which was previously suspended (via
> runtime PM), it would hang. This happens because napi_disable is called
> before napi_enable.
>
> Solve this by avoiding napi_enable in the resume during open function
> (netif_running is true when open is called, IFF_UP is set after a
> successful open; netif_running is false when close is called, but IFF_UP
> is then still set).
>
> While at it, remove WORK_ENABLE check from rtl8152_open (introduced with
> the original change) because it cannot happen:
>
> - After this patch, runtime resume will not set it during rtl8152_open.
> - When link is up, rtl8152_open is not called.
> - When link is down during system/auto suspend/resume, it is not set.
>
> Fixes: 41cec84cf285 ("r8152: don't enable napi before rx ready")
> Link: https://lkml.kernel.org/r/20151205105912.GA1766@al
> Signed-off-by: Peter Wu <[email protected]>
> ---
> v2: moved rtl_runtime_suspend_enable from close to rtl8152_suspend
Applied, and queued up for -stable, thanks.
On Tue, 2015-12-08 at 15:33 +0100, Peter Wu wrote:
> Can you have a look? I guess that reset_resume needs different
> treatment, but don't know how to do it properly as suspend is not
> called
> before system reset (because the device is apparently already in
> suspended state).
>
> I think that this new issue can be addressed in a different patch,
> this
> patch solves a more severe lockup. Opinions?
We do not know in advance whether resume() or reset_resume()
will be called after S3. A driver must be ready for both.
Regards
Oliver
Peter Wu [mailto:[email protected]]
> Sent: Tuesday, December 08, 2015 10:33 PM
[...]
> I found another problem with runtime PM. When a device is suspended via
> autosuspend and a system suspend takes place, there is no network I/O
> after resume. Triggering a renegotiation (ethtool -r eth1) brings back
> network activity.
I think it is relative to the firmware. Could you try the driver from Realtek website?
Best Regards,
Hayes
On Tue, 2015-12-22 at 09:48 +0000, Hayes Wang wrote:
> Peter Wu [mailto:[email protected]]
> > Sent: Tuesday, December 08, 2015 10:33 PM
> [...]
> > I found another problem with runtime PM. When a device is suspended via
> > autosuspend and a system suspend takes place, there is no network I/O
> > after resume. Triggering a renegotiation (ethtool -r eth1) brings back
> > network activity.
>
> I think it is relative to the firmware. Could you try the driver from Realtek website?
Hi,
at the risk of repeating myself I must say that there is a logic flaw
in the driver. If you look at this code:
static int rtl8152_resume(struct usb_interface *intf)
{
struct r8152 *tp = usb_get_intfdata(intf);
mutex_lock(&tp->control);
if (!test_bit(SELECTIVE_SUSPEND, &tp->flags)) {
tp->rtl_ops.init(tp);
netif_device_attach(tp->netdev);
}
if (netif_running(tp->netdev) && tp->netdev->flags & IFF_UP) {
if (test_bit(SELECTIVE_SUSPEND, &tp->flags)) {
rtl_runtime_suspend_enable(tp, false);
clear_bit(SELECTIVE_SUSPEND, &tp->flags);
napi_disable(&tp->napi);
set_bit(WORK_ENABLE, &tp->flags);
if (netif_carrier_ok(tp->netdev))
rtl_start_rx(tp);
napi_enable(&tp->napi);
} else {
tp->rtl_ops.up(tp);
rtl8152_set_speed(tp, AUTONEG_ENABLE,
tp->mii.supports_gmii ?
SPEED_1000 : SPEED_100,
DUPLEX_FULL);
netif_carrier_off(tp->netdev);
set_bit(WORK_ENABLE, &tp->flags);
}
You need to understand that its use of the flag SELECTIVE_SUSPEND
is invalid. SELECTIVE_SUSPEND is used at two places in the driver.
Once in rtl8152_start_xmit(), where it is working but misnamed.
At that time you need to know whether the device is suspended.
To the device it does not matter whether the suspension is selective
or for the whole bus. It cannot tell. The driver just needs to know
whether it should resume the device if a packet to be transmitted
through it is given to the driver. So far all is well.
But at the time rtl8152_resume() is called the flag has become
meaningless. It tells you whether the device was selectively suspended
(we call that autosuspended, but the concept is the same), but you
take it to mean that it was _only_ selectively suspended. It does not
tell you that.
That matters a lot because the behavior of the host regarding powering
the bus during S3 and S4 is not defined. As I mentioned the device
can't tell whether it is selectively suspended. But it does notice
if its power supply is cut. In that case the driver must reinitialize
the device.
The current code does that conditional on
!test_bit(SELECTIVE_SUSPEND ... )
That is wrong because you need to do this if power was lost. The test
only tells you whether the device was selectively suspend before
power was lost (if power was lost). That is not the same thing at all.
The way the USB subsystem is designed is that it tells you whether power
had been cut by calling reset_resume() if power was cut or resume() if
power was kept.
If reset_resume() is called you must always execute this code:
tp->rtl_ops.init(tp);
and
tp->rtl_ops.up(tp);
rtl8152_set_speed(tp, AUTONEG_ENABLE,
tp->mii.supports_gmii ?
SPEED_1000 : SPEED_100,
DUPLEX_FULL);
The conditions used in rtl8152_resume() are wrong.
That is the reason "ethtool -r eth1" is reported to restore the device.
It triggers equivalent operations.
It is clear to me that you cannot get away with using the same operation
for resume() and reset_resume() in your driver. It is fundamentally
impossible. Firmware cannot fix it.
Sorry for the length of the explanation.
HTH
Oliver
Oliver Neukum [mailto:[email protected]]
[...]
> It is clear to me that you cannot get away with using the same operation
> for resume() and reset_resume() in your driver. It is fundamentally
> impossible. Firmware cannot fix it.
I would think how to fix it.
> Sorry for the length of the explanation.
Thanks for your response. I have some questions. What are the flows when
the system resume follows a system suspend which follows a autosuspend?
Are they as following?
1. suspend() with PMSG_IS_AUTO for autosuspned.
2. suspend() for system suspend.
3. resume() for system resume.
And, should the device exist autosuspend before (2)?
Best Regards,
Hayes
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On Wed, 2015-12-23 at 03:31 +0000, Hayes Wang wrote:
> Oliver Neukum [mailto:[email protected]]
> [...]
> > It is clear to me that you cannot get away with using the same operation
> > for resume() and reset_resume() in your driver. It is fundamentally
> > impossible. Firmware cannot fix it.
>
> I would think how to fix it.
>
> > Sorry for the length of the explanation.
>
> Thanks for your response. I have some questions. What are the flows when
> the system resume follows a system suspend which follows a autosuspend?
> Are they as following?
>
> 1. suspend() with PMSG_IS_AUTO for autosuspned.
> 2. suspend() for system suspend.
> 3. resume() for system resume.
>
> And, should the device exist autosuspend before (2)?
No, step (2) does not exist. Calls to suspend() and [reset_]resume()
always balance. Usually a driver shouldn't care about system suspend.
The way the driver is currently coded will also fail for Port-Power Off.
Regards
Oliver
Oliver Neukum [mailto:[email protected]]
> Sent: Wednesday, December 23, 2015 4:20 PM
[...]
> No, step (2) does not exist. Calls to suspend() and [reset_]resume()
> always balance. Usually a driver shouldn't care about system suspend.
> The way the driver is currently coded will also fail for Port-Power Off.
It is different with Windows. The Windows would resume the device before
system suspend, if the system suspend follows the autosuspend.
Would this be a problem? After system suspend, the device may wake up
the system when receiving any packet, not only magic packet. The wake
events are different for system suspend and autosuspend. However, I
couldn't change the wake event, because the autosuspend occurs first,
and the suspend() is only called once.
Best Regards,
Hayes
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On Wed, 2015-12-23 at 09:20 +0000, Hayes Wang wrote:
> Oliver Neukum [mailto:[email protected]]
> > Sent: Wednesday, December 23, 2015 4:20 PM
> [...]
> > No, step (2) does not exist. Calls to suspend() and [reset_]resume()
> > always balance. Usually a driver shouldn't care about system suspend.
> > The way the driver is currently coded will also fail for Port-Power Off.
>
> It is different with Windows. The Windows would resume the device before
> system suspend, if the system suspend follows the autosuspend.
>
> Would this be a problem? After system suspend, the device may wake up
> the system when receiving any packet, not only magic packet. The wake
> events are different for system suspend and autosuspend. However, I
> couldn't change the wake event, because the autosuspend occurs first,
> and the suspend() is only called once.
That is indeed a problem and I need to think a bit about finding
a good solution. If you are happy with an inelegant solution, you can
use a pm_notifier, which will tell you that the system is going
to suspend. This is documented:
https://www.kernel.org/doc/Documentation/power/notifiers.txt
HTH
Oliver
Oliver Neukum [mailto:[email protected]]
> Sent: Wednesday, December 23, 2015 6:46 PM
[...]
> That is indeed a problem and I need to think a bit about finding
> a good solution. If you are happy with an inelegant solution, you can
> use a pm_notifier, which will tell you that the system is going
> to suspend. This is documented:
>
> https://www.kernel.org/doc/Documentation/power/notifiers.txt
Thanks. I would study it after fixing the reset_resume() issue.
Best Regards,
Hayes
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On Wed, 23 Dec 2015, Hayes Wang wrote:
> Oliver Neukum [mailto:[email protected]]
> > Sent: Wednesday, December 23, 2015 4:20 PM
> [...]
> > No, step (2) does not exist. Calls to suspend() and [reset_]resume()
> > always balance. Usually a driver shouldn't care about system suspend.
> > The way the driver is currently coded will also fail for Port-Power Off.
>
> It is different with Windows. The Windows would resume the device before
> system suspend, if the system suspend follows the autosuspend.
>
> Would this be a problem? After system suspend, the device may wake up
> the system when receiving any packet, not only magic packet. The wake
> events are different for system suspend and autosuspend. However, I
> couldn't change the wake event, because the autosuspend occurs first,
> and the suspend() is only called once.
I don't understand why the wakeup conditions are different. It seems
to me that the choice of which packets will generate a wakeup ought to
depend on the user's selection, not on the kind of suspend. For
instance, if the user says that only a magic packet should cause a
wakeup then that should be true for both runtime suspend and system
suspend.
To put it another way, as far as the device is concerned a suspend is
just a suspend -- there's no different between a runtime suspend and a
system suspend.
Alan Stern
On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
> I don't understand why the wakeup conditions are different. It seems
> to me that the choice of which packets will generate a wakeup ought to
> depend on the user's selection, not on the kind of suspend. For
> instance, if the user says that only a magic packet should cause a
> wakeup then that should be true for both runtime suspend and system
> suspend.
>
> To put it another way, as far as the device is concerned a suspend is
> just a suspend -- there's no different between a runtime suspend and a
> system suspend.
This literally true, but the host and the driver care.
If we autosuspend a running network device, any packet
(maybe filtered for MAC) should cause a remote wake up,
else we'd lose packets.
But you cannot keep that setting if the system goes down
or any broadcast packet would resume the whole system.
Yet you cannot just disable remote wake up, as WoL packages
still must trigger a remote wake up.
So there are drivers which must change settings on devices
as the system goes to sleep, even if their devices have
already been autosuspended. We could use the notifier chains
for that. But can this solution be called elegant?
Merry Christmas
Oliver
On Thu, 24 Dec 2015, Oliver Neukum wrote:
> On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
>
> > I don't understand why the wakeup conditions are different. It seems
> > to me that the choice of which packets will generate a wakeup ought to
> > depend on the user's selection, not on the kind of suspend. For
> > instance, if the user says that only a magic packet should cause a
> > wakeup then that should be true for both runtime suspend and system
> > suspend.
> >
> > To put it another way, as far as the device is concerned a suspend is
> > just a suspend -- there's no different between a runtime suspend and a
> > system suspend.
>
> This literally true, but the host and the driver care.
> If we autosuspend a running network device, any packet
> (maybe filtered for MAC) should cause a remote wake up,
> else we'd lose packets.
That's also true during system suspend.
> But you cannot keep that setting if the system goes down
> or any broadcast packet would resume the whole system.
> Yet you cannot just disable remote wake up, as WoL packages
> still must trigger a remote wake up.
This means that sometimes you want to avoid losing packets and other
times you do want to lose packets. That is a policy decision, and
therefore it should be made by the user, not the kernel.
> So there are drivers which must change settings on devices
> as the system goes to sleep, even if their devices have
> already been autosuspended. We could use the notifier chains
> for that. But can this solution be called elegant?
Instead of the driver trying to do this automatically, you could rely
on userspace telling the driver which packets should cause a wakeup.
The setting could be updated immediately before and after each system
suspend.
I admit this is more awkward than having the driver make a choice based
on the type of suspend. This is a case where the resources provided by
the PM core aren't adequate for what the driver needs. The PM core
distinguishes between wakeup enabled or disabled; it doesn't
distinguish among different levels of wakekup.
Alan Stern
On Thu, 2015-12-24 at 10:14 -0500, Alan Stern wrote:
> On Thu, 24 Dec 2015, Oliver Neukum wrote:
>
> > On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
> > But you cannot keep that setting if the system goes down
> > or any broadcast packet would resume the whole system.
> > Yet you cannot just disable remote wake up, as WoL packages
> > still must trigger a remote wake up.
>
> This means that sometimes you want to avoid losing packets and other
> times you do want to lose packets. That is a policy decision, and
> therefore it should be made by the user, not the kernel.
Indeed it is and there is a tool for this with a defined
interface called "ethtool"
The problem here is not the policy decision, but implementing
it in kernel space.
> > So there are drivers which must change settings on devices
> > as the system goes to sleep, even if their devices have
> > already been autosuspended. We could use the notifier chains
> > for that. But can this solution be called elegant?
>
> Instead of the driver trying to do this automatically, you could rely
> on userspace telling the driver which packets should cause a wakeup.
It does.
> The setting could be updated immediately before and after each system
> suspend.
The API is so that user space sets the policy, which persists until
user space changes the setting and the kernel implements it. The
problem is that to do so the kernel needs to do IO to the device
as the system is about to suspend.
Thus the driver may need to resume the device and it needs to learn
that the system is about to go to sleep, even if the device it
manages is already autosuspended.
> I admit this is more awkward than having the driver make a choice based
> on the type of suspend. This is a case where the resources provided by
It also is a race condition, unless you want user space to disable
autosuspend as the system is about to go to sleep. And it makes
relatively little sense, as enabling remote wakeup is the last thing
we do before the device suspends. Setting the filters long before that
doesn't make much sense.
> the PM core aren't adequate for what the driver needs. The PM core
> distinguishes between wakeup enabled or disabled; it doesn't
> distinguish among different levels of wakekup.
True and sanely it cannot. We could only distinguish between drivers
which need their devices to be resumed before the system suspends and
the rest.
Or we tell driver coders to use the notifier chains.
Regards
Oliver
On Thu, 24 Dec 2015, Oliver Neukum wrote:
> On Thu, 2015-12-24 at 10:14 -0500, Alan Stern wrote:
> > On Thu, 24 Dec 2015, Oliver Neukum wrote:
> >
> > > On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
>
> > > But you cannot keep that setting if the system goes down
> > > or any broadcast packet would resume the whole system.
> > > Yet you cannot just disable remote wake up, as WoL packages
> > > still must trigger a remote wake up.
> >
> > This means that sometimes you want to avoid losing packets and other
> > times you do want to lose packets. That is a policy decision, and
> > therefore it should be made by the user, not the kernel.
>
> Indeed it is and there is a tool for this with a defined
> interface called "ethtool"
No; ethtool affects the wakeup setting for system suspend, but not
for runtime suspend. I was referring to something that would specify
the setting for both cases. But perhaps that doesn't make sense,
because you never want to drop relevant packets during runtime suspend.
If you did, you would run "ifconfig down" instead.
> > the PM core aren't adequate for what the driver needs. The PM core
> > distinguishes between wakeup enabled or disabled; it doesn't
> > distinguish among different levels of wakekup.
>
> True and sanely it cannot. We could only distinguish between drivers
> which need their devices to be resumed before the system suspends and
> the rest.
> Or we tell driver coders to use the notifier chains.
"Resume before system suspend" sounds like a reasonable thing to
implement, for devices that have multiple levels of wakeup settings.
Would you like to post a proposal on linux-pm for this?
Alan Stern