2020-03-23 14:39:50

by Qais Yousef

[permalink] [raw]
Subject: lockdep warning in urb.c:363 usb_submit_urb

Hi

I've hit the following lockdep warning when I trigger hibernate on arm64
platform (Juno-r2)


echo suspend > /sys/power/disk
echo disk > /sys/power/state

I only had a usb flash drive attached to it. Let me know if you need more info.



# echo suspend > disk
# echo disk > state
[ 325.775152] PM: hibernation: hibernation entry
[ 325.780708] Filesystems sync: 0.000 seconds
[ 325.784976] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 325.793322] OOM killer disabled.
[ 325.797934] PM: hibernation: Preallocating image memory
[ 326.852851] PM: hibernation: Allocated 96124 pages for snapshot
[ 326.858826] PM: hibernation: Allocated 384496 kbytes in 1.04 seconds (369.70 MB/s)
[ 326.866447] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 326.876429] printk: Suspending console(s) (use no_console_suspend to debug)
[ 326.947751] Disabling non-boot CPUs ...
[ 326.948909] CPU1: shutdown
[ 326.948918] psci: CPU1 killed (polled 0 ms)
[ 326.951082] CPU2: shutdown
[ 326.952130] psci: CPU2 killed (polled 0 ms)
[ 326.954600] CPU3: shutdown
[ 326.954613] psci: CPU3 killed (polled 0 ms)
[ 326.957217] CPU4: shutdown
[ 326.957229] psci: CPU4 killed (polled 0 ms)
[ 326.958924] CPU5: shutdown
[ 326.958936] psci: CPU5 killed (polled 0 ms)
[ 326.959878] PM: hibernation: Creating image:
[ 326.959878] PM: hibernation: Need to copy 94789 pages
[ 326.959878] PM: hibernation: Image created (94789 pages copied)
[ 326.959994] Enabling non-boot CPUs ...
[ 326.973678] Detected PIPT I-cache on CPU1
[ 326.973728] CPU1: Booted secondary processor 0x0000000000 [0x410fd080]
[ 326.975172] CPU1 is up
[ 326.993051] Detected PIPT I-cache on CPU2
[ 326.993081] CPU2: Booted secondary processor 0x0000000001 [0x410fd080]
[ 326.993839] CPU2 is up
[ 327.007492] Detected VIPT I-cache on CPU3
[ 327.007557] CPU3: Booted secondary processor 0x0000000101 [0x410fd033]
[ 327.009075] CPU3 is up
[ 327.022636] Detected VIPT I-cache on CPU4
[ 327.022682] CPU4: Booted secondary processor 0x0000000102 [0x410fd033]
[ 327.024296] CPU4 is up
[ 327.038227] Detected VIPT I-cache on CPU5
[ 327.038272] CPU5: Booted secondary processor 0x0000000103 [0x410fd033]
[ 327.039921] CPU5 is up
[ 327.154849] usb usb2: runtime PM trying to activate child device usb2 but parent (7ffb0000.ohci) is not active
[ 327.290355] PM: Cannot find swap device, try swapon -a
[ 327.295533] PM: Cannot get swap writer
[ 327.480758] OOM killer enabled.
[ 327.483939] Restarting tasks ...
[ 327.484229] ------------[ cut here ]------------
[ 327.484664] done.
[ 327.487743] URB 000000000c121c1c submitted while active
[ 327.499622] WARNING: CPU: 1 PID: 296 at drivers/usb/core/urb.c:363 usb_submit_urb+0x3f0/0x520
[ 327.508191] Modules linked in:
[ 327.511262] CPU: 1 PID: 296 Comm: kworker/1:2 Not tainted 5.6.0-rc6 #533
[ 327.517993] Hardware name: ARM Juno development board (r2) (DT)
[ 327.523944] Workqueue: usb_hub_wq hub_event
[ 327.528148] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 327.532959] pc : usb_submit_urb+0x3f0/0x520
[ 327.537160] lr : usb_submit_urb+0x3f0/0x520
[ 327.541360] sp : ffff80001891b8c0
[ 327.544687] x29: ffff80001891b8c0 x28: ffff000973f7a000
[ 327.550024] x27: ffff00097049f320 x26: 0000000000000003
[ 327.555361] x25: ffff8000101704e8 x24: ffff80001891bb48
[ 327.560697] x23: ffff80001323d000 x22: 0000000000000c00
[ 327.566033] x21: 0000000000000004 x20: 00000000fffffff0
[ 327.571369] x19: ffff0009703f0300 x18: 0000000000000000
[ 327.576705] x17: 0000000000000000 x16: 0000000000000000
[ 327.582041] x15: 0000000000000000 x14: 0000000000000000
[ 327.587376] x13: 0000000000000000 x12: 0000000000000000
[ 327.592712] x11: 0000000000000000 x10: 0000000000000000
[ 327.598048] x9 : ffff80001323da88 x8 : ffff00097ef0a798
[ 327.603384] x7 : ffff800010148c08 x6 : ffff00097eef7450
[ 327.608720] x5 : ffff00097eef7450 x4 : 0000000000000000
[ 327.614055] x3 : ffff00097ef06df0 x2 : 0000000000000001
[ 327.619391] x1 : 1798c844b4d7c300 x0 : 0000000000000000
[ 327.624727] Call trace:
[ 327.627183] usb_submit_urb+0x3f0/0x520
[ 327.631035] hub_activate+0x108/0x788
[ 327.634713] hub_resume+0x40/0x108
[ 327.638129] usb_resume_interface.isra.9+0x60/0x108
[ 327.643028] usb_resume_both+0xe4/0x140
[ 327.646881] usb_runtime_resume+0x24/0x30
[ 327.650910] __rpm_callback+0xdc/0x138
[ 327.654675] rpm_callback+0x34/0x98
[ 327.658178] rpm_resume+0x4a8/0x720
[ 327.661681] rpm_resume+0x50c/0x720
[ 327.665183] __pm_runtime_resume+0x4c/0xb8
[ 327.669297] usb_autopm_get_interface+0x28/0x60
[ 327.673848] hub_event+0x80/0x1368
[ 327.677266] process_one_work+0x2a4/0x748
[ 327.681292] worker_thread+0x48/0x498
[ 327.684970] kthread+0x13c/0x140
[ 327.688211] ret_from_fork+0x10/0x18
[ 327.691801] irq event stamp: 17114
[ 327.695219] hardirqs last enabled at (17113): [<ffff80001191b29c>] _raw_spin_unlock_irq+0x34/0x68
[ 327.704224] hardirqs last disabled at (17114): [<ffff8000119132d8>] __schedule+0xd0/0x7e8
[ 327.712442] softirqs last enabled at (16742): [<ffff8000100818a4>] __do_softirq+0x4bc/0x568
[ 327.720922] softirqs last disabled at (16731): [<ffff800010114064>] irq_exit+0x144/0x150
[ 327.729051] ---[ end trace 7d41436f96488c51 ]---
[ 327.733762] PM: hibernation: hibernation exit
sh: write error: No such device
# [ 327.739972] hub 2-0:1.0: activate --> -16



Thanks

--
Qais Yousef


2020-03-23 15:37:20

by Oliver Neukum

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Am Montag, den 23.03.2020, 14:38 +0000 schrieb Qais Yousef:
> Hi
>
> I've hit the following lockdep warning when I trigger hibernate on arm64
> platform (Juno-r2)
>
>
> echo suspend > /sys/power/disk
> echo disk > /sys/power/state
>
> I only had a usb flash drive attached to it. Let me know if you need more info.

Hi,

that is not a lockdep issue, but the hub driver is not properly killing
its URB presumably. Yet, the driver looks correct to me. Please use
the additional patch and activate dynamic debugging for usbcore.

Regards
Oliver


Attachments:
0001-usb-hub-additional-debugging.patch (683.00 B)

2020-03-23 15:56:41

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Mon, 23 Mar 2020, Oliver Neukum wrote:

> Am Montag, den 23.03.2020, 14:38 +0000 schrieb Qais Yousef:
> > Hi
> >
> > I've hit the following lockdep warning when I trigger hibernate on arm64
> > platform (Juno-r2)
> >
> >
> > echo suspend > /sys/power/disk
> > echo disk > /sys/power/state
> >
> > I only had a usb flash drive attached to it. Let me know if you need more info.
>
> Hi,
>
> that is not a lockdep issue, but the hub driver is not properly killing
> its URB presumably. Yet, the driver looks correct to me. Please use
> the additional patch and activate dynamic debugging for usbcore.

Was the USB flash drive being used as a swap device for holding the
hibernation image? That's not likely to work very well. At least, I
doubt that it has been tested very much.

This diagnostic was suggested by the runtime PM error that occurred
when the system was trying to store the hibernation image. That's
probably when the hub driver's URB got restarted.

Alan Stern

2020-03-23 15:58:39

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/23/20 16:36, Oliver Neukum wrote:
> Am Montag, den 23.03.2020, 14:38 +0000 schrieb Qais Yousef:
> > Hi
> >
> > I've hit the following lockdep warning when I trigger hibernate on arm64
> > platform (Juno-r2)
> >
> >
> > echo suspend > /sys/power/disk
> > echo disk > /sys/power/state
> >
> > I only had a usb flash drive attached to it. Let me know if you need more info.
>
> Hi,
>
> that is not a lockdep issue, but the hub driver is not properly killing
> its URB presumably. Yet, the driver looks correct to me. Please use
> the additional patch and activate dynamic debugging for usbcore.

Yes sorry my bad, it's a WARN_ONCE().

I applied your patch and will reproduce, but meanwhile not sure if you noticed
this line in my previous email

[ 327.154849] usb usb2: runtime PM trying to activate child device usb2 but parent (7ffb0000.ohci) is not active

Thanks

--
Qais Yousef

2020-03-23 16:03:22

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/23/20 11:54, Alan Stern wrote:
> On Mon, 23 Mar 2020, Oliver Neukum wrote:
>
> > Am Montag, den 23.03.2020, 14:38 +0000 schrieb Qais Yousef:
> > > Hi
> > >
> > > I've hit the following lockdep warning when I trigger hibernate on arm64
> > > platform (Juno-r2)
> > >
> > >
> > > echo suspend > /sys/power/disk
> > > echo disk > /sys/power/state
> > >
> > > I only had a usb flash drive attached to it. Let me know if you need more info.
> >
> > Hi,
> >
> > that is not a lockdep issue, but the hub driver is not properly killing
> > its URB presumably. Yet, the driver looks correct to me. Please use
> > the additional patch and activate dynamic debugging for usbcore.
>
> Was the USB flash drive being used as a swap device for holding the
> hibernation image? That's not likely to work very well. At least, I
> doubt that it has been tested very much.
>
> This diagnostic was suggested by the runtime PM error that occurred
> when the system was trying to store the hibernation image. That's
> probably when the hub driver's URB got restarted.

Yes, sadly it's the only source of storage I have on that device.

When I ran the test I had swapoff, as you can see in the snippet below. When
swap is on I do hibernate and wakeup successfully. At least that's what appears
to be happening to me.

I get a similar splat regardless of swap is on or off.


[ 327.154849] usb usb2: runtime PM trying to activate child device usb2 but parent (7ffb0000.ohci) is not active
[ 327.290355] PM: Cannot find swap device, try swapon -a
[ 327.295533] PM: Cannot get swap writer
[ 327.480758] OOM killer enabled.
[ 327.483939] Restarting tasks ...
[ 327.484229] ------------[ cut here ]------------
[ 327.484664] done.
[ 327.487743] URB 000000000c121c1c submitted while active
[ 327.499622] WARNING: CPU: 1 PID: 296 at drivers/usb/core/urb.c:363 usb_submit_urb+0x3f0/0x520

Thanks

--
Qais Yousef

2020-03-23 16:18:52

by Oliver Neukum

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Am Montag, den 23.03.2020, 11:54 -0400 schrieb Alan Stern:
> On Mon, 23 Mar 2020, Oliver Neukum wrote:
>
> > Am Montag, den 23.03.2020, 14:38 +0000 schrieb Qais Yousef:
> > > Hi
> > >
> > > I've hit the following lockdep warning when I trigger hibernate on arm64
> > > platform (Juno-r2)
> > >
> > >
> > > echo suspend > /sys/power/disk
> > > echo disk > /sys/power/state
> > >
> > > I only had a usb flash drive attached to it. Let me know if you need more info.
> >
> > Hi,
> >
> > that is not a lockdep issue, but the hub driver is not properly killing
> > its URB presumably. Yet, the driver looks correct to me. Please use
> > the additional patch and activate dynamic debugging for usbcore.
>
> Was the USB flash drive being used as a swap device for holding the
> hibernation image? That's not likely to work very well. At least, I
> doubt that it has been tested very much.

Right, but this is good. We are getting a test for something that needs
work. It does not really matetr why STD fails.

> This diagnostic was suggested by the runtime PM error that occurred
> when the system was trying to store the hibernation image. That's
> probably when the hub driver's URB got restarted.

AFAICT hub_quiesce() unconditionally calls usb_kill_urb(). So I'd like
to verify that case is really triggered.

Regards
Oliver

2020-03-23 16:22:14

by Oliver Neukum

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Am Montag, den 23.03.2020, 15:57 +0000 schrieb Qais Yousef:
> On 03/23/20 16:36, Oliver Neukum wrote:
> > Am Montag, den 23.03.2020, 14:38 +0000 schrieb Qais Yousef:
> > > Hi
> > >
> > > I've hit the following lockdep warning when I trigger hibernate on arm64
> > > platform (Juno-r2)
> > >
> > >
> > > echo suspend > /sys/power/disk
> > > echo disk > /sys/power/state
> > >
> > > I only had a usb flash drive attached to it. Let me know if you need more info.
> >
> > Hi,
> >
> > that is not a lockdep issue, but the hub driver is not properly killing
> > its URB presumably. Yet, the driver looks correct to me. Please use
> > the additional patch and activate dynamic debugging for usbcore.
>
> Yes sorry my bad, it's a WARN_ONCE().
>
> I applied your patch and will reproduce, but meanwhile not sure if you noticed
> this line in my previous email
>
> [ 327.154849] usb usb2: runtime PM trying to activate child device usb2 but parent (7ffb0000.ohci) is not active

Hi,

yes I noticed, but that did not trigger the WARN(). One thing at a time
please.

Regards
Oliver

2020-03-23 17:30:56

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Hi Oliver

On 03/23/20 16:36, Oliver Neukum wrote:
> Am Montag, den 23.03.2020, 14:38 +0000 schrieb Qais Yousef:
> > Hi
> >
> > I've hit the following lockdep warning when I trigger hibernate on arm64
> > platform (Juno-r2)
> >
> >
> > echo suspend > /sys/power/disk
> > echo disk > /sys/power/state
> >
> > I only had a usb flash drive attached to it. Let me know if you need more info.
>
> Hi,
>
> that is not a lockdep issue, but the hub driver is not properly killing
> its URB presumably. Yet, the driver looks correct to me. Please use
> the additional patch and activate dynamic debugging for usbcore.

First time I use dynamic debugging, hopefully I've done correctly.


echo "file drivers/usb/* +p" > /sys/kernel/debug/dynamic_debug/control

$REPRODUCE

cat /sys/kernel/debug/dynamic_debug/control | grep usb > usb.debug

usb.debug is attached.

Thanks

--
Qais Yousef


Attachments:
(No filename) (925.00 B)
usb.debug (144.90 kB)
Download all attachments

2020-03-24 09:10:02

by Oliver Neukum

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Am Montag, den 23.03.2020, 17:29 +0000 schrieb Qais Yousef:
> Hi Oliver

Hi,

> First time I use dynamic debugging, hopefully I've done correctly.

I am afraid not.

> echo "file drivers/usb/* +p" > /sys/kernel/debug/dynamic_debug/control

Overkill but correct. +mpf would be even better

> $REPRODUCE

Good

> cat /sys/kernel/debug/dynamic_debug/control | grep usb > usb.debug

No.

/sys/kernel/debug/dynamic_debug/control holds the collection of the
messages that may be triggered, but it does not tell you which messages
are triggered and in which order. The triggered messages end up
in syslog. So you would use 'dmesg'
I am afraid you redid the test correctly and then threw away the
result.
Could you redo it and just attach the output of dmesg?

Sorry
Oliver

2020-03-24 10:46:59

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/24/20 10:08, Oliver Neukum wrote:
> Am Montag, den 23.03.2020, 17:29 +0000 schrieb Qais Yousef:
> > Hi Oliver
>
> Hi,
>
> > First time I use dynamic debugging, hopefully I've done correctly.
>
> I am afraid not.
>
> > echo "file drivers/usb/* +p" > /sys/kernel/debug/dynamic_debug/control
>
> Overkill but correct. +mpf would be even better
>
> > $REPRODUCE
>
> Good
>
> > cat /sys/kernel/debug/dynamic_debug/control | grep usb > usb.debug
>
> No.
>
> /sys/kernel/debug/dynamic_debug/control holds the collection of the
> messages that may be triggered, but it does not tell you which messages
> are triggered and in which order. The triggered messages end up
> in syslog. So you would use 'dmesg'
> I am afraid you redid the test correctly and then threw away the
> result.
> Could you redo it and just attach the output of dmesg?
>
> Sorry

I should have stuck to what I know then. I misread the documentation. Hopefully
the attached looks better. I don't see the new debug you added emitted.

Thanks

--
Qais Yousef


Attachments:
(No filename) (1.05 kB)
usb.dmesg (55.97 kB)
Download all attachments

2020-03-24 13:32:37

by Oliver Neukum

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Am Dienstag, den 24.03.2020, 10:46 +0000 schrieb Qais Yousef:
>
> I should have stuck to what I know then. I misread the documentation. Hopefully
> the attached looks better. I don't see the new debug you added emitted.

That is odd. Please try

echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control

with the attached improved patch.

Regards
Oliver


Attachments:
0001-usb-hub-additional-debugging.patch (697.00 B)

2020-03-24 13:44:58

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/24/20 14:20, Oliver Neukum wrote:
> Am Dienstag, den 24.03.2020, 10:46 +0000 schrieb Qais Yousef:
> >
> > I should have stuck to what I know then. I misread the documentation. Hopefully
> > the attached looks better. I don't see the new debug you added emitted.
>
> That is odd. Please try
>
> echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
>
> with the attached improved patch.

Hmm still no luck


# history
0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
1 swapoff -a
2 echo suspend > /sys/power/disk
3 echo disk > /sys/power/state
4 dmesg > usb.dmesg





# grep "URB allocated" /sys/kernel/debug/dynamic_debug/control
drivers/usb/core/hub.c:1632 [usbcore]hub_configure =pmf "%p URB allocated \012"





$ git log -p
commit dfd1731f9a3e7592135d2a6b2a5c5e1640a7eea4 (HEAD)
Author: Oliver Neukum <[email protected]>
Date: Mon Mar 23 16:34:35 2020 +0100

usb: hub additional debugging

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 54cd8ef795ec..12ce2fdc4c2a 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1629,6 +1629,7 @@ static int hub_configure(struct usb_hub *hub,
ret = -ENOMEM;
goto fail;
}
+ dev_dbg(hub_dev, "%p URB allocated \n", hub->urb);

usb_fill_int_urb(hub->urb, hdev, pipe, *hub->buffer, maxp, hub_irq,
hub, endpoint->bInterval);


Thanks

--
Qais Yousef


Attachments:
(No filename) (1.48 kB)
usb.dmesg (51.41 kB)
Download all attachments

2020-03-24 13:49:23

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Tue, 24 Mar 2020, Qais Yousef wrote:

> I should have stuck to what I know then. I misread the documentation. Hopefully
> the attached looks better. I don't see the new debug you added emitted.

These lines:

[ 158.113836] ohci_hcd:ohci_resume: ohci-platform 7ffb0000.ohci: powerup ports
[ 158.139682] usbcore:hcd_bus_resume: usb usb2: usb resume
[ 158.139715] ohci_hcd:ohci_rh_resume: ohci-platform 7ffb0000.ohci: resume root hub
...
[ 158.219604] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[ 158.220482] usb usb2: runtime PM trying to activate child device usb2 but parent (7ffb0000.ohci) is not active

suggest there is a bug in the platform code. The 7ffb0000.ohci
platform device should already be resumed and active at this point.

Following that, this is suspicious:

[ 158.482995] PM: Cannot find swap device, try swapon -a
[ 158.488379] PM: Cannot get swap writer

Since there was never any device attached to the OHCI controller, this
error is not connected to the previous one. In fact, the swap device
was plugged into the EHCI controller; it was 1-1.1. But the log
doesn't contain anything about that device being suspended, resumed, or
disconnected. What happened to it?

[ 159.064094] OOM killer enabled.
[ 159.067351] Restarting tasks ...
[ 159.068831] usbcore:hub_event: hub 2-0:1.0: state 7 ports 1 chg 0000 evt 0000
[ 159.079921] usbcore:hub_event: hub 1-1:1.0: state 7 ports 4 chg 0000 evt 0000
[ 159.079959] usbcore:hub_event: hub 1-0:1.0: state 7 ports 1 chg 0000 evt 0000
[ 159.090776] done.
[ 159.097076] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[ 159.102961] ------------[ cut here ]------------
[ 159.107805] URB (____ptrval____) submitted while active

And why was usb2 resumed twice in a row with intervening suspend?

Alan Stern

2020-03-24 13:53:00

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Tue, 24 Mar 2020, Qais Yousef wrote:

> On 03/24/20 14:20, Oliver Neukum wrote:
> > Am Dienstag, den 24.03.2020, 10:46 +0000 schrieb Qais Yousef:
> > >
> > > I should have stuck to what I know then. I misread the documentation. Hopefully
> > > the attached looks better. I don't see the new debug you added emitted.
> >
> > That is odd. Please try
> >
> > echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> >
> > with the attached improved patch.
>
> Hmm still no luck
>
>
> # history
> 0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> 1 swapoff -a
> 2 echo suspend > /sys/power/disk
> 3 echo disk > /sys/power/state
> 4 dmesg > usb.dmesg

What happens if you omit step 1 (the swapoff)?

> $ git log -p
> commit dfd1731f9a3e7592135d2a6b2a5c5e1640a7eea4 (HEAD)
> Author: Oliver Neukum <[email protected]>
> Date: Mon Mar 23 16:34:35 2020 +0100
>
> usb: hub additional debugging
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 54cd8ef795ec..12ce2fdc4c2a 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -1629,6 +1629,7 @@ static int hub_configure(struct usb_hub *hub,
> ret = -ENOMEM;
> goto fail;
> }
> + dev_dbg(hub_dev, "%p URB allocated \n", hub->urb);
>
> usb_fill_int_urb(hub->urb, hdev, pipe, *hub->buffer, maxp, hub_irq,
> hub, endpoint->bInterval);

Oliver, by the way, %p isn't a good way to get pointer values for
debugging. Its output depends on how the system is configured. Use
%px instead (see Documentation/core-api/printk-formats.rst).

Alan Stern

2020-03-24 14:07:10

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/24/20 09:52, Alan Stern wrote:
> On Tue, 24 Mar 2020, Qais Yousef wrote:
>
> > On 03/24/20 14:20, Oliver Neukum wrote:
> > > Am Dienstag, den 24.03.2020, 10:46 +0000 schrieb Qais Yousef:
> > > >
> > > > I should have stuck to what I know then. I misread the documentation. Hopefully
> > > > the attached looks better. I don't see the new debug you added emitted.
> > >
> > > That is odd. Please try
> > >
> > > echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> > >
> > > with the attached improved patch.
> >
> > Hmm still no luck
> >
> >
> > # history
> > 0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> > 1 swapoff -a
> > 2 echo suspend > /sys/power/disk
> > 3 echo disk > /sys/power/state
> > 4 dmesg > usb.dmesg
>
> What happens if you omit step 1 (the swapoff)?

It seems to hibernate (suspend) successfully. If I omit that step I must setup
a wakealarm to trigger the wakeup, but that's it.

I attached the dmesg; I didn't reboot the system in between.


# history
0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
1 swapoff -a
2 echo suspend > /sys/power/disk
3 echo disk > /sys/power/state
4 dmesg > usb.dmesg
5 history
6 grep URB /sys/kernel/debug/dynamic_debug/control
7 grep "URB allocated" /sys/kernel/debug/dynamic_debug/control
8 swapon -a
9 echo +60 > /sys/class/rtc/rtc0/wakealarm
10 echo disk > /sys/power/state
11 dmesg > usb.dmesg


Thanks

--
Qais Yousef

>
> > $ git log -p
> > commit dfd1731f9a3e7592135d2a6b2a5c5e1640a7eea4 (HEAD)
> > Author: Oliver Neukum <[email protected]>
> > Date: Mon Mar 23 16:34:35 2020 +0100
> >
> > usb: hub additional debugging
> >
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index 54cd8ef795ec..12ce2fdc4c2a 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -1629,6 +1629,7 @@ static int hub_configure(struct usb_hub *hub,
> > ret = -ENOMEM;
> > goto fail;
> > }
> > + dev_dbg(hub_dev, "%p URB allocated \n", hub->urb);
> >
> > usb_fill_int_urb(hub->urb, hdev, pipe, *hub->buffer, maxp, hub_irq,
> > hub, endpoint->bInterval);
>
> Oliver, by the way, %p isn't a good way to get pointer values for
> debugging. Its output depends on how the system is configured. Use
> %px instead (see Documentation/core-api/printk-formats.rst).
>
> Alan Stern
>


Attachments:
(No filename) (2.49 kB)
usb.dmesg (80.05 kB)
Download all attachments

2020-03-24 15:57:45

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Tue, 24 Mar 2020, Qais Yousef wrote:

> On 03/24/20 09:52, Alan Stern wrote:
> > On Tue, 24 Mar 2020, Qais Yousef wrote:
> >
> > > On 03/24/20 14:20, Oliver Neukum wrote:
> > > > Am Dienstag, den 24.03.2020, 10:46 +0000 schrieb Qais Yousef:
> > > > >
> > > > > I should have stuck to what I know then. I misread the documentation. Hopefully
> > > > > the attached looks better. I don't see the new debug you added emitted.
> > > >
> > > > That is odd. Please try
> > > >
> > > > echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> > > >
> > > > with the attached improved patch.
> > >
> > > Hmm still no luck
> > >
> > >
> > > # history
> > > 0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> > > 1 swapoff -a
> > > 2 echo suspend > /sys/power/disk
> > > 3 echo disk > /sys/power/state
> > > 4 dmesg > usb.dmesg
> >
> > What happens if you omit step 1 (the swapoff)?
>
> It seems to hibernate (suspend) successfully. If I omit that step I must setup
> a wakealarm to trigger the wakeup, but that's it.

You don't have any other wakeup sources? Like a power button?

> I attached the dmesg; I didn't reboot the system in between.
>
>
> # history
> 0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> 1 swapoff -a
> 2 echo suspend > /sys/power/disk
> 3 echo disk > /sys/power/state
> 4 dmesg > usb.dmesg
> 5 history
> 6 grep URB /sys/kernel/debug/dynamic_debug/control
> 7 grep "URB allocated" /sys/kernel/debug/dynamic_debug/control
> 8 swapon -a
> 9 echo +60 > /sys/class/rtc/rtc0/wakealarm
> 10 echo disk > /sys/power/state
> 11 dmesg > usb.dmesg

This certainly reinforces the initial impression that the cause of the
warnings is a bug in the platform code. You should ask the appropriate
maintainer.

However, an equally troubling question is why the usb2 bus never got
suspended in the first place. To solve that, you may need to enable
dynamic debugging in the Power Management core (i.e., "file
drivers/base/power/* +p").

Alan Stern

2020-03-24 17:23:32

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/24/20 11:56, Alan Stern wrote:
> On Tue, 24 Mar 2020, Qais Yousef wrote:
>
> > On 03/24/20 09:52, Alan Stern wrote:
> > > On Tue, 24 Mar 2020, Qais Yousef wrote:
> > >
> > > > On 03/24/20 14:20, Oliver Neukum wrote:
> > > > > Am Dienstag, den 24.03.2020, 10:46 +0000 schrieb Qais Yousef:
> > > > > >
> > > > > > I should have stuck to what I know then. I misread the documentation. Hopefully
> > > > > > the attached looks better. I don't see the new debug you added emitted.
> > > > >
> > > > > That is odd. Please try
> > > > >
> > > > > echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> > > > >
> > > > > with the attached improved patch.
> > > >
> > > > Hmm still no luck
> > > >
> > > >
> > > > # history
> > > > 0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> > > > 1 swapoff -a
> > > > 2 echo suspend > /sys/power/disk
> > > > 3 echo disk > /sys/power/state
> > > > 4 dmesg > usb.dmesg
> > >
> > > What happens if you omit step 1 (the swapoff)?
> >
> > It seems to hibernate (suspend) successfully. If I omit that step I must setup
> > a wakealarm to trigger the wakeup, but that's it.
>
> You don't have any other wakeup sources? Like a power button?

Not sure if it's hooked correctly as a wakeup source. But as UK is now getting
lockedown, I don't think I'll be seeing the board for a while and serial
console is my only friend :-)

I can hard reboot remotely reliably though.

>
> > I attached the dmesg; I didn't reboot the system in between.
> >
> >
> > # history
> > 0 echo "module usbcore +mfp" > /sys/kernel/debug/dynamic_debug/control
> > 1 swapoff -a
> > 2 echo suspend > /sys/power/disk
> > 3 echo disk > /sys/power/state
> > 4 dmesg > usb.dmesg
> > 5 history
> > 6 grep URB /sys/kernel/debug/dynamic_debug/control
> > 7 grep "URB allocated" /sys/kernel/debug/dynamic_debug/control
> > 8 swapon -a
> > 9 echo +60 > /sys/class/rtc/rtc0/wakealarm
> > 10 echo disk > /sys/power/state
> > 11 dmesg > usb.dmesg
>
> This certainly reinforces the initial impression that the cause of the
> warnings is a bug in the platform code. You should ask the appropriate
> maintainer.

The device-tree compatible node returns "generic-ohci".
drivers/usb/host/ohci-platform.c returns you as the maintainer :-)

>
> However, an equally troubling question is why the usb2 bus never got
> suspended in the first place. To solve that, you may need to enable
> dynamic debugging in the Power Management core (i.e., "file
> drivers/base/power/* +p").

Thanks Alan. I'll run with extra debug and send back.

Cheers

--
Qais Yousef

2020-03-24 19:23:04

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Tue, 24 Mar 2020, Qais Yousef wrote:

> On 03/24/20 11:56, Alan Stern wrote:

> > This certainly reinforces the initial impression that the cause of the
> > warnings is a bug in the platform code. You should ask the appropriate
> > maintainer.
>
> The device-tree compatible node returns "generic-ohci".
> drivers/usb/host/ohci-platform.c returns you as the maintainer :-)

I'm the maintainer of the driver for the device. But the device
structure itself (the one named 7ffb0000.ohci) gets created by
device-tree -- that's what I was referring to.

Here's the first error message:

usb usb2: runtime PM trying to activate child device usb2 but parent (7ffb0000.ohci) is not active

The runtime PM status of 7ffb0000.ohci is set in ohci_platform_probe(),
which does:

pm_runtime_set_active(&dev->dev);

The runtime PM status can change, and there aren't any debugging
statements in ohci_platform_suspend() or ohci_platform_resume() (or
ohci_suspend()/ohci_resume() in ohci-hcd.c, for that matter). Maybe
you can add some so we can see if anything strange is going on.

Any maybe you can find out exactly where that error message is coming
from by calling dump_stack() immediately after the dev_err() line
(approximately line 1198 in drivers/base/power/runtime.c).

(Also, you might want to turn off rcutorture. It adds a lot of
messages to the system log that are irrelevant for our purposes.)

Alan Stern

2020-03-25 15:01:05

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/24/20 15:22, Alan Stern wrote:
> On Tue, 24 Mar 2020, Qais Yousef wrote:
>
> > On 03/24/20 11:56, Alan Stern wrote:
>
> > > This certainly reinforces the initial impression that the cause of the
> > > warnings is a bug in the platform code. You should ask the appropriate
> > > maintainer.
> >
> > The device-tree compatible node returns "generic-ohci".
> > drivers/usb/host/ohci-platform.c returns you as the maintainer :-)
>
> I'm the maintainer of the driver for the device. But the device
> structure itself (the one named 7ffb0000.ohci) gets created by
> device-tree -- that's what I was referring to.
>
> Here's the first error message:
>
> usb usb2: runtime PM trying to activate child device usb2 but parent (7ffb0000.ohci) is not active
>
> The runtime PM status of 7ffb0000.ohci is set in ohci_platform_probe(),
> which does:
>
> pm_runtime_set_active(&dev->dev);
>
> The runtime PM status can change, and there aren't any debugging
> statements in ohci_platform_suspend() or ohci_platform_resume() (or
> ohci_suspend()/ohci_resume() in ohci-hcd.c, for that matter). Maybe
> you can add some so we can see if anything strange is going on.
>
> Any maybe you can find out exactly where that error message is coming
> from by calling dump_stack() immediately after the dev_err() line
> (approximately line 1198 in drivers/base/power/runtime.c).
>
> (Also, you might want to turn off rcutorture. It adds a lot of
> messages to the system log that are irrelevant for our purposes.)

Thanks for all the hints Alan.

I think I figured it out, the below patch seems to fix it for me. Looking
at other drivers resume functions it seems we're missing the
pm_runtime_disable()->set_active()->enable() dance. Doing that fixes the
warning and the dev_err() in driver/base/power.

I don't see xhci-plat.c doing that, I wonder if it needs it too.

I'm not well versed about the details and the rules here. So my fix could be
a hack, though it does seem the right thing to do.

I wonder why the power core doesn't handle this transparently..

Requested dmesg is attached too.


diff --git a/drivers/usb/host/ohci-platform.c b/drivers/usb/host/ohci-platform.c
index 7addfc2cbadc..eb92c8092fae 100644
--- a/drivers/usb/host/ohci-platform.c
+++ b/drivers/usb/host/ohci-platform.c
@@ -299,6 +299,10 @@ static int ohci_platform_resume(struct device *dev)
}

ohci_resume(hcd, false);
+
+ pm_runtime_disable(dev);
+ pm_runtime_set_active(dev);
+ pm_runtime_enable(dev);
return 0;
}
#endif /* CONFIG_PM_SLEEP */


Thanks

--
Qais Yousef


Attachments:
(No filename) (2.62 kB)
usb.dmesg (49.12 kB)
Download all attachments

2020-03-25 20:50:20

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Wed, 25 Mar 2020, Qais Yousef wrote:

> Thanks for all the hints Alan.
>
> I think I figured it out, the below patch seems to fix it for me. Looking
> at other drivers resume functions it seems we're missing the
> pm_runtime_disable()->set_active()->enable() dance. Doing that fixes the
> warning and the dev_err() in driver/base/power.

Ah, yes. This should have been added years ago; guess I forgot. :-(

> I don't see xhci-plat.c doing that, I wonder if it needs it too.
>
> I'm not well versed about the details and the rules here. So my fix could be
> a hack, though it does seem the right thing to do.
>
> I wonder why the power core doesn't handle this transparently..

Initially, we didn't want the PM core to do this automatically because
we thought some devices might want to remain runtime-suspended
following a system resume, and only the device driver would know what
to do.

Raphael, now that we have the direct_complete mechanism, can we revisit
this? Should the PM core automatically call pm_runtime_set_active() if
dev->power.direct_complete isn't set? Perhaps in device_resume_early()
prior to the pm_runtime_enable() call?

It's possible we discussed this and decided against it at the time when
direct_complete was added, but if so I don't remember what was said.

Alan Stern

> diff --git a/drivers/usb/host/ohci-platform.c b/drivers/usb/host/ohci-platform.c
> index 7addfc2cbadc..eb92c8092fae 100644
> --- a/drivers/usb/host/ohci-platform.c
> +++ b/drivers/usb/host/ohci-platform.c
> @@ -299,6 +299,10 @@ static int ohci_platform_resume(struct device *dev)
> }
>
> ohci_resume(hcd, false);
> +
> + pm_runtime_disable(dev);
> + pm_runtime_set_active(dev);
> + pm_runtime_enable(dev);
> return 0;
> }
> #endif /* CONFIG_PM_SLEEP */
>
>
> Thanks
>
> --
> Qais Yousef


2020-03-25 21:28:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Wed, Mar 25, 2020 at 9:49 PM Alan Stern <[email protected]> wrote:
>
> On Wed, 25 Mar 2020, Qais Yousef wrote:
>
> > Thanks for all the hints Alan.
> >
> > I think I figured it out, the below patch seems to fix it for me. Looking
> > at other drivers resume functions it seems we're missing the
> > pm_runtime_disable()->set_active()->enable() dance. Doing that fixes the
> > warning and the dev_err() in driver/base/power.
>
> Ah, yes. This should have been added years ago; guess I forgot. :-(
>
> > I don't see xhci-plat.c doing that, I wonder if it needs it too.
> >
> > I'm not well versed about the details and the rules here. So my fix could be
> > a hack, though it does seem the right thing to do.
> >
> > I wonder why the power core doesn't handle this transparently..
>
> Initially, we didn't want the PM core to do this automatically because
> we thought some devices might want to remain runtime-suspended
> following a system resume, and only the device driver would know what
> to do.
>
> Raphael, now that we have the direct_complete mechanism, can we revisit
> this? Should the PM core automatically call pm_runtime_set_active() if
> dev->power.direct_complete isn't set? Perhaps in device_resume_early()
> prior to the pm_runtime_enable() call?
>
> It's possible we discussed this and decided against it at the time when
> direct_complete was added, but if so I don't remember what was said.

Me neither. :-)

That said complexity has grown since then and there are the
DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED flags that can be
used to control that behavior to some extent.

Setting DPM_FLAG_SMART_SUSPEND alone, in particular, causes
pm_runtime_set_active() to be called at the noirq stage of device
resume either by the core or by bus types (e.g. PCI) etc.

It looks like ohci-platform might use DPM_FLAG_SMART_SUSPEND, but I
need to take a closer look at that (possibly later this week).

Cheers!


>
> > diff --git a/drivers/usb/host/ohci-platform.c b/drivers/usb/host/ohci-platform.c
> > index 7addfc2cbadc..eb92c8092fae 100644
> > --- a/drivers/usb/host/ohci-platform.c
> > +++ b/drivers/usb/host/ohci-platform.c
> > @@ -299,6 +299,10 @@ static int ohci_platform_resume(struct device *dev)
> > }
> >
> > ohci_resume(hcd, false);
> > +
> > + pm_runtime_disable(dev);
> > + pm_runtime_set_active(dev);
> > + pm_runtime_enable(dev);
> > return 0;
> > }
> > #endif /* CONFIG_PM_SLEEP */
> >
> >
> > Thanks
> >
> > --
> > Qais Yousef
>
>

2020-03-26 12:28:27

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 03/25/20 22:28, Rafael J. Wysocki wrote:
> On Wed, Mar 25, 2020 at 9:49 PM Alan Stern <[email protected]> wrote:
> >
> > On Wed, 25 Mar 2020, Qais Yousef wrote:
> >
> > > Thanks for all the hints Alan.
> > >
> > > I think I figured it out, the below patch seems to fix it for me. Looking
> > > at other drivers resume functions it seems we're missing the
> > > pm_runtime_disable()->set_active()->enable() dance. Doing that fixes the
> > > warning and the dev_err() in driver/base/power.
> >
> > Ah, yes. This should have been added years ago; guess I forgot. :-(
> >
> > > I don't see xhci-plat.c doing that, I wonder if it needs it too.
> > >
> > > I'm not well versed about the details and the rules here. So my fix could be
> > > a hack, though it does seem the right thing to do.
> > >
> > > I wonder why the power core doesn't handle this transparently..
> >
> > Initially, we didn't want the PM core to do this automatically because
> > we thought some devices might want to remain runtime-suspended
> > following a system resume, and only the device driver would know what
> > to do.
> >
> > Raphael, now that we have the direct_complete mechanism, can we revisit
> > this? Should the PM core automatically call pm_runtime_set_active() if
> > dev->power.direct_complete isn't set? Perhaps in device_resume_early()
> > prior to the pm_runtime_enable() call?
> >
> > It's possible we discussed this and decided against it at the time when
> > direct_complete was added, but if so I don't remember what was said.
>
> Me neither. :-)
>
> That said complexity has grown since then and there are the
> DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED flags that can be
> used to control that behavior to some extent.
>
> Setting DPM_FLAG_SMART_SUSPEND alone, in particular, causes
> pm_runtime_set_active() to be called at the noirq stage of device
> resume either by the core or by bus types (e.g. PCI) etc.
>
> It looks like ohci-platform might use DPM_FLAG_SMART_SUSPEND, but I
> need to take a closer look at that (possibly later this week).

Okay I take it this was root caused correctly and now it's a question of which
is a better fix.

Thanks!

--
Qais Yousef

2020-03-27 20:45:44

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Thu, 26 Mar 2020, Qais Yousef wrote:

> On 03/25/20 22:28, Rafael J. Wysocki wrote:
> > On Wed, Mar 25, 2020 at 9:49 PM Alan Stern <[email protected]> wrote:

> > > Raphael, now that we have the direct_complete mechanism, can we revisit
> > > this? Should the PM core automatically call pm_runtime_set_active() if
> > > dev->power.direct_complete isn't set? Perhaps in device_resume_early()
> > > prior to the pm_runtime_enable() call?
> > >
> > > It's possible we discussed this and decided against it at the time when
> > > direct_complete was added, but if so I don't remember what was said.
> >
> > Me neither. :-)
> >
> > That said complexity has grown since then and there are the
> > DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED flags that can be
> > used to control that behavior to some extent.
> >
> > Setting DPM_FLAG_SMART_SUSPEND alone, in particular, causes
> > pm_runtime_set_active() to be called at the noirq stage of device
> > resume either by the core or by bus types (e.g. PCI) etc.
> >
> > It looks like ohci-platform might use DPM_FLAG_SMART_SUSPEND, but I
> > need to take a closer look at that (possibly later this week).
>
> Okay I take it this was root caused correctly and now it's a question of which
> is a better fix.

Indeed.

Raphael, I've been going over the PM core code, trying to figure out
what it's really doing. It's kind of a mess.

A large part of the problem is related to an inconsistency between the
documentation and the code. include/linux/pm.h says that
DPM_FLAG_SMART_SUSPEND tells bus types and PM domains about what the
driver wants. This strongly implies that the PM core will ignore
SMART_SUSPEND. But in fact the core does check that flag and takes its
own actions if the device has no subsystem-level callbacks!

Furthermore, the PM core's actions don't seem to make sense. If the
flag is set and the device is runtime-suspended when the system sleep
begins, the core will skip issuing the suspend_late and suspend_noirq
callbacks to the driver. But it doesn't skip issuing the suspend
callback! I can't figure that out. Furthermore, the decisions about
whether to skip the resume_noirq, resume_early, and resume callbacks
are based on different criteria from the decisions on the suspend side.

That's not all: The SMART_SUSPEND decisions completely ignore the value
of DPM_FLAG_NEVER_SKIP! NEVER_SKIP affects only the direct_completion
pathway.

SMART_SUSPEND seems to have two different meanings. (1) If the device
is already in runtime suspend when a system sleep starts, skip the
suspend_late and suspend_noirq callbacks. (2) Under certain (similar)
circumstances, skip the resume callbacks. The documentation only
mentions (1) but the code also handles (2).

Other things in there also seem strange. device_prepare() does a
WARN_ON if either SMART_SUSPEND or LEAVE_SUSPENDED is set and the
device is not runtime-PM-enabled. That's understandable, but it's also
racy. A system sleep can begin at any time; how can a driver know when
it is safe to disable a device's runtime PM briefly?

When device_prepare() calculates the power.direct_complete flag, it
checks to see whether the device is currently in runtime suspend in
some cases but not in others, as in the code added by your commit
c62ec4610c40 ("PM / core: Fix direct_complete handling for devices
with no callbacks"). Since the runtime-PM state is going to checked in
__device_suspend() anyway, we shouldn't need to check it here at all.

At a couple of points in the code, THAW and RESTORE events are each
treatedly specially, with no explanation.

The power.may_skip_resume flag is used in only one place, when
LEAVE_SUSPENDED is set and there are subsystem-level callbacks. In
particular, it is _not_ used by dev_pm_may_skip_resume(). That seems
highly suspicious at best.

I think it would be worthwhile to expend some serious effort
straightening all this stuff out. Perhaps we could start with a more
explicit description of what is supposed to happen at each step.
(Things to be careful about include phrases like "leave suspended",
which is not the same as "don't call the resume callbacks", even though
the two are easily conflated.)

What do you think?

Alan Stern

2020-03-28 14:15:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Friday, March 27, 2020 9:45:09 PM CET Alan Stern wrote:
> On Thu, 26 Mar 2020, Qais Yousef wrote:
>
> > On 03/25/20 22:28, Rafael J. Wysocki wrote:
> > > On Wed, Mar 25, 2020 at 9:49 PM Alan Stern <[email protected]> wrote:
>
> > > > Raphael, now that we have the direct_complete mechanism, can we revisit
> > > > this? Should the PM core automatically call pm_runtime_set_active() if
> > > > dev->power.direct_complete isn't set? Perhaps in device_resume_early()
> > > > prior to the pm_runtime_enable() call?
> > > >
> > > > It's possible we discussed this and decided against it at the time when
> > > > direct_complete was added, but if so I don't remember what was said.
> > >
> > > Me neither. :-)
> > >
> > > That said complexity has grown since then and there are the
> > > DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED flags that can be
> > > used to control that behavior to some extent.
> > >
> > > Setting DPM_FLAG_SMART_SUSPEND alone, in particular, causes
> > > pm_runtime_set_active() to be called at the noirq stage of device
> > > resume either by the core or by bus types (e.g. PCI) etc.
> > >
> > > It looks like ohci-platform might use DPM_FLAG_SMART_SUSPEND, but I
> > > need to take a closer look at that (possibly later this week).
> >
> > Okay I take it this was root caused correctly and now it's a question of which
> > is a better fix.
>
> Indeed.
>
> Raphael, I've been going over the PM core code, trying to figure out
> what it's really doing. It's kind of a mess.

Well, sorry about that.

> A large part of the problem is related to an inconsistency between the
> documentation and the code. include/linux/pm.h says that
> DPM_FLAG_SMART_SUSPEND tells bus types and PM domains about what the
> driver wants. This strongly implies that the PM core will ignore
> SMART_SUSPEND. But in fact the core does check that flag and takes its
> own actions if the device has no subsystem-level callbacks!

Right, which is because in those cases there is no "middle layer" between
the driver and the core and if you want the driver to work both with
something like genpd or the ACPI PM domain and without anything like that,
the core needs to take those actions for consistency.

> Furthermore, the PM core's actions don't seem to make sense. If the
> flag is set and the device is runtime-suspended when the system sleep
> begins, the core will skip issuing the suspend_late and suspend_noirq
> callbacks to the driver. But it doesn't skip issuing the suspend
> callback! I can't figure that out.

That's because if the core gets to executing ->suspend_late, PM-runtime has
been disabled for the device and if the device is runtime-suspended at that
point, so (at least if SMART_SUSPEND is set for the device) there is no reason
to do anything more to it.

> Furthermore, the decisions about
> whether to skip the resume_noirq, resume_early, and resume callbacks
> are based on different criteria from the decisions on the suspend side.

Right, because there are drivers that don't want devices to stay in suspend
after system resume even though they have been left in suspend by it.

Arguably, they could be left in suspend and then resumed after the completion
of system suspend, but that would add quite a bit of latency if the device
needs to be accessed right after the system suspend is complete.

> That's not all: The SMART_SUSPEND decisions completely ignore the value
> of DPM_FLAG_NEVER_SKIP! NEVER_SKIP affects only the direct_completion
> pathway.

As documented AFAICS.

> SMART_SUSPEND seems to have two different meanings. (1) If the device
> is already in runtime suspend when a system sleep starts, skip the
> suspend_late and suspend_noirq callbacks. (2) Under certain (similar)
> circumstances, skip the resume callbacks. The documentation only
> mentions (1) but the code also handles (2).

That's because (2) is the THAW case and I was distracted before I got
to documenting it properly. Sorry.

The problem is that if you leave the device in runtime suspend, calling
->freeze_late() or ->freeze_noirq() on it is not useful and if you have
skipped those, running the corresponding "thaw" callbacks is not useful
either (what would they do, specifically?).

There is a whole problem of whether or not devices should be left in
runtime suspend during hibernation and I have not had a chance to get
to the bottom of that yet.

> Other things in there also seem strange. device_prepare() does a
> WARN_ON if either SMART_SUSPEND or LEAVE_SUSPENDED is set and the
> device is not runtime-PM-enabled. That's understandable, but it's also
> racy.

I guess you mean the check in device_prepare().

> A system sleep can begin at any time; how can a driver know when
> it is safe to disable a device's runtime PM briefly?

Well, fair enough, but then I'm not sure if there is a good place for this
check at all, because drivers can briefly disable PM-runtime at any time in
theory.

> When device_prepare() calculates the power.direct_complete flag, it
> checks to see whether the device is currently in runtime suspend in
> some cases but not in others, as in the code added by your commit
> c62ec4610c40 ("PM / core: Fix direct_complete handling for devices
> with no callbacks"). Since the runtime-PM state is going to checked in
> __device_suspend() anyway, we shouldn't need to check it here at all.

I guess the point is that in theory the device can be runtime-suspended
between device_prepare() and _device_suspend(), is by checking the status
in the former, we lose the opportunity to leave it in suspend if that
happens.

OK, fair enough.

> At a couple of points in the code, THAW and RESTORE events are each
> treatedly specially, with no explanation.

Right, which is related to the kind of work in progress situation regarding
the flags and hibernation mentioned above. Again, sorry about that.

> The power.may_skip_resume flag is used in only one place, when
> LEAVE_SUSPENDED is set and there are subsystem-level callbacks. In
> particular, it is _not_ used by dev_pm_may_skip_resume(). That seems
> highly suspicious at best.

That's because it's for the middle-layer (subsystem-level) code to let the
core know that skipping the resume would be OK.

The core doesn't need that flag when it decides by itself.

> I think it would be worthwhile to expend some serious effort
> straightening all this stuff out. Perhaps we could start with a more
> explicit description of what is supposed to happen at each step.
> (Things to be careful about include phrases like "leave suspended",
> which is not the same as "don't call the resume callbacks", even though
> the two are easily conflated.)
>
> What do you think?

I am certainly not going to reject any help. :-)

Also, I'm not against clarifying anything that is not clear enough.

Cheers!



2020-03-28 19:59:28

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Sat, 28 Mar 2020, Rafael J. Wysocki wrote:

> On Friday, March 27, 2020 9:45:09 PM CET Alan Stern wrote:

> > Raphael, I've been going over the PM core code, trying to figure out
> > what it's really doing. It's kind of a mess.
>
> Well, sorry about that.
>
> > A large part of the problem is related to an inconsistency between the
> > documentation and the code. include/linux/pm.h says that
> > DPM_FLAG_SMART_SUSPEND tells bus types and PM domains about what the
> > driver wants. This strongly implies that the PM core will ignore
> > SMART_SUSPEND. But in fact the core does check that flag and takes its
> > own actions if the device has no subsystem-level callbacks!
>
> Right, which is because in those cases there is no "middle layer" between
> the driver and the core and if you want the driver to work both with
> something like genpd or the ACPI PM domain and without anything like that,
> the core needs to take those actions for consistency.

Sure, the core is acting as a proxy for the missing subsystem
callbacks. Still, it should be documented properly.

Also, couldn't we simplify the behavior? Right now the core checks
that there are no subsystem-level callbacks for any of _early, _late,
and _noirq variants before skipping a callback. Couldn't we forget
about all that checking and simply skip the device-level callbacks?
(Yes, I realize this could lead to inconsistent behavior if the
subsystem has some callbacks defined but not others -- but subsystems
should never do that, at least, not if it would lead to trouble.)

Another issue is that the documentation exists in two separate places:
include/linux/pm.h and Documentation/driver-api/devices.rst (plus a
brief mention in Documentation/power/runtime_pm.rst). This leads to
incompleteness and inconsistencies. Ideally there would be a complete
explanation in one place (probably the devices.rst file) and the others
would refer to it.

> > Furthermore, the PM core's actions don't seem to make sense. If the
> > flag is set and the device is runtime-suspended when the system sleep
> > begins, the core will skip issuing the suspend_late and suspend_noirq
> > callbacks to the driver. But it doesn't skip issuing the suspend
> > callback! I can't figure that out.
>
> That's because if the core gets to executing ->suspend_late, PM-runtime has
> been disabled for the device and if the device is runtime-suspended at that
> point, so (at least if SMART_SUSPEND is set for the device) there is no reason
> to do anything more to it.

But if SMART_SUSPEND is set and the device is runtime-suspended, why
issue the ->suspend callback? Why not just do pm_runtime_disable()
then (to prevent the device from resuming) and skip the callback?

> > Furthermore, the decisions about
> > whether to skip the resume_noirq, resume_early, and resume callbacks
> > are based on different criteria from the decisions on the suspend side.
>
> Right, because there are drivers that don't want devices to stay in suspend
> after system resume even though they have been left in suspend by it.

This suggests that sometimes we may want to issue non-matching
callbacks. For example, ->resume_noirq, ->resume_early, and ->resume
but not ->suspend, ->suspend_late, or ->suspend_noirq. Is that what
you are saying?

> Arguably, they could be left in suspend and then resumed after the completion
> of system suspend, but that would add quite a bit of latency if the device
> needs to be accessed right after the system suspend is complete.
>
> > That's not all: The SMART_SUSPEND decisions completely ignore the value
> > of DPM_FLAG_NEVER_SKIP! NEVER_SKIP affects only the direct_completion
> > pathway.
>
> As documented AFAICS.

But highly confusing. Maybe we can change the name to, say,
DPM_FLAG_NO_DIRECT_COMPLETE.

> > SMART_SUSPEND seems to have two different meanings. (1) If the device
> > is already in runtime suspend when a system sleep starts, skip the
> > suspend_late and suspend_noirq callbacks. (2) Under certain (similar)
> > circumstances, skip the resume callbacks. The documentation only
> > mentions (1) but the code also handles (2).
>
> That's because (2) is the THAW case and I was distracted before I got
> to documenting it properly. Sorry.
>
> The problem is that if you leave the device in runtime suspend, calling
> ->freeze_late() or ->freeze_noirq() on it is not useful and if you have
> skipped those, running the corresponding "thaw" callbacks is not useful
> either (what would they do, specifically?).
>
> There is a whole problem of whether or not devices should be left in
> runtime suspend during hibernation and I have not had a chance to get
> to the bottom of that yet.

Not only that. The distinction between SMART_SUSPEND and
direct_complete is rather subtle, and it doesn't seem to be carefully
explained anywhere. In fact, I'm not sure I understand it myself. :-)
For example, the direct_complete mechanism is very careful about not
leaving a device in runtime suspend if a descendant (or other dependent
device) will remain active. Does SMART_SUSPEND behave similarly? If
so, it isn't documented.

Besides, it seems like a mistake to try controlling (1) and (2)
together (i.e., with one flag). Can we do a better job of
separating the functions of SMART_SUSPEND and LEAVE_SUSPENDED?

> > Other things in there also seem strange. device_prepare() does a
> > WARN_ON if either SMART_SUSPEND or LEAVE_SUSPENDED is set and the
> > device is not runtime-PM-enabled. That's understandable, but it's also
> > racy.
>
> I guess you mean the check in device_prepare().
>
> > A system sleep can begin at any time; how can a driver know when
> > it is safe to disable a device's runtime PM briefly?
>
> Well, fair enough, but then I'm not sure if there is a good place for this
> check at all, because drivers can briefly disable PM-runtime at any time in
> theory.

There probably isn't a good place for it. We could just get rid of the
WARN. I've never heard of it triggering.

> > When device_prepare() calculates the power.direct_complete flag, it
> > checks to see whether the device is currently in runtime suspend in
> > some cases but not in others, as in the code added by your commit
> > c62ec4610c40 ("PM / core: Fix direct_complete handling for devices
> > with no callbacks"). Since the runtime-PM state is going to checked in
> > __device_suspend() anyway, we shouldn't need to check it here at all.
>
> I guess the point is that in theory the device can be runtime-suspended
> between device_prepare() and _device_suspend(), is by checking the status
> in the former, we lose the opportunity to leave it in suspend if that
> happens.
>
> OK, fair enough.
>
> > At a couple of points in the code, THAW and RESTORE events are each
> > treatedly specially, with no explanation.
>
> Right, which is related to the kind of work in progress situation regarding
> the flags and hibernation mentioned above. Again, sorry about that.

I haven't thought about those issues as much as you have. Still, it
seems obvious that the FREEZE/THAW phases should be very happy to leave
devices in runtime suspend throughout (without even worrying about
wakeup settings), and the RESTORE phase should always bring everything
back out of runtime suspend.

What to do during the POWEROFF phase isn't so clear, because it depends
on how the platform handles the poweroff transition.

> > The power.may_skip_resume flag is used in only one place, when
> > LEAVE_SUSPENDED is set and there are subsystem-level callbacks. In
> > particular, it is _not_ used by dev_pm_may_skip_resume(). That seems
> > highly suspicious at best.
>
> That's because it's for the middle-layer (subsystem-level) code to let the
> core know that skipping the resume would be OK.
>
> The core doesn't need that flag when it decides by itself.

This may be another situation where changing a name would make things
clearer. One doesn't immediately recognize that
dev_pm_may_skip_resume() applies only in cases where there is no
subsystem-level callback.

> > I think it would be worthwhile to expend some serious effort
> > straightening all this stuff out. Perhaps we could start with a more
> > explicit description of what is supposed to happen at each step.
> > (Things to be careful about include phrases like "leave suspended",
> > which is not the same as "don't call the resume callbacks", even though
> > the two are easily conflated.)
> >
> > What do you think?
>
> I am certainly not going to reject any help. :-)
>
> Also, I'm not against clarifying anything that is not clear enough.

Okay, let's start with direct_complete. The direct_complete mechanism
is applied to the SUSPEND and RESUME phases under the following
conditions:

DPM_FLAG_NEVER_SKIP (or better, DPM_FLAG_NO_DIRECT_COMPLETE)
is clear; [Incidentally, since a driver can set this flag
whenever its ->prepare routine returns 0, why do we need
DPM_FLAG_SMART_PREPARE?]

Either the device has no system-PM callbacks at all or else the
->prepare callback returns a positive value;

All of the device's descendants and dependents also want to use
direct_complete;

Neither the device nor any of its descendants/dependents is
enabled for wakeup;

The device is runtime suspended just before the ->suspend
callback would normally be issued.

When the mechanism applies, none of the suspend or resume callbacks (in
any of their normal, _early, _late, or _noirq variants) are issued,
only ->complete. Consequently the device remains in runtime suspend
throughout the system sleep.

Currently, direct_complete is never applied during any of the system
hibernation phases (FREEZE, THAW, POWEROFF, RESTORE). This may change
in the future.

Is this description correct and complete? Can you give a similarly
succinct outline for how SMART_SUSPEND and LEAVE_SUSPENDED should work?
And also describe how they differ from direct_complete and how they
interact with it? (For example, how does setting both flags differ
from returning a positive value from ->prepare?)

Alan Stern

2020-03-29 09:21:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Sat, Mar 28, 2020 at 8:58 PM Alan Stern <[email protected]> wrote:
>
> On Sat, 28 Mar 2020, Rafael J. Wysocki wrote:
>
> > On Friday, March 27, 2020 9:45:09 PM CET Alan Stern wrote:
>
> > > Raphael, I've been going over the PM core code, trying to figure out
> > > what it's really doing. It's kind of a mess.
> >
> > Well, sorry about that.
> >
> > > A large part of the problem is related to an inconsistency between the
> > > documentation and the code. include/linux/pm.h says that
> > > DPM_FLAG_SMART_SUSPEND tells bus types and PM domains about what the
> > > driver wants. This strongly implies that the PM core will ignore
> > > SMART_SUSPEND. But in fact the core does check that flag and takes its
> > > own actions if the device has no subsystem-level callbacks!
> >
> > Right, which is because in those cases there is no "middle layer" between
> > the driver and the core and if you want the driver to work both with
> > something like genpd or the ACPI PM domain and without anything like that,
> > the core needs to take those actions for consistency.
>
> Sure, the core is acting as a proxy for the missing subsystem
> callbacks. Still, it should be documented properly.
>
> Also, couldn't we simplify the behavior? Right now the core checks
> that there are no subsystem-level callbacks for any of _early, _late,
> and _noirq variants before skipping a callback. Couldn't we forget
> about all that checking and simply skip the device-level callbacks?
> (Yes, I realize this could lead to inconsistent behavior if the
> subsystem has some callbacks defined but not others -- but subsystems
> should never do that, at least, not if it would lead to trouble.)

In quite a few cases the middle layer has nothing specific to do in a
given phase of suspend/resume, but the driver may.

Subsystems haven't been required to provide callbacks for all phases
so far, so this change would require some modifications in there.

I actually prefer the core to do more, even if that means more
complexity in it, to avoid possible subtle differences in behavior
between subsystems.

> Another issue is that the documentation exists in two separate places:
> include/linux/pm.h and Documentation/driver-api/devices.rst (plus a
> brief mention in Documentation/power/runtime_pm.rst). This leads to
> incompleteness and inconsistencies. Ideally there would be a complete
> explanation in one place (probably the devices.rst file) and the others
> would refer to it.

OK

> > > Furthermore, the PM core's actions don't seem to make sense. If the
> > > flag is set and the device is runtime-suspended when the system sleep
> > > begins, the core will skip issuing the suspend_late and suspend_noirq
> > > callbacks to the driver. But it doesn't skip issuing the suspend
> > > callback! I can't figure that out.
> >
> > That's because if the core gets to executing ->suspend_late, PM-runtime has
> > been disabled for the device and if the device is runtime-suspended at that
> > point, so (at least if SMART_SUSPEND is set for the device) there is no reason
> > to do anything more to it.
>
> But if SMART_SUSPEND is set and the device is runtime-suspended, why
> issue the ->suspend callback?

The driver itself or the middle-layer may want to resume the device.

Arguably, it may do that in ->prepare() too, but see below.

> Why not just do pm_runtime_disable()
> then (to prevent the device from resuming) and skip the callback?

Because another driver may want to runtime-resume that device in order
to use it for something before ->suspend_late(). Of course, you may
argue that this means a missing device link or similar, so it is not
clear-cut.

The general rule is that "synchronous" PM-runtime can be expected to
work before ->suspend_late(), so ->suspend() callbacks should be able
to use it safely in all cases in principle.

That expectation goes against direct_complete in some cases, so
drivers need to set NEVER_SKIP (or whatever it will be called in the
future) to avoid that problem.

> > > Furthermore, the decisions about
> > > whether to skip the resume_noirq, resume_early, and resume callbacks
> > > are based on different criteria from the decisions on the suspend side.
> >
> > Right, because there are drivers that don't want devices to stay in suspend
> > after system resume even though they have been left in suspend by it.
>
> This suggests that sometimes we may want to issue non-matching
> callbacks. For example, ->resume_noirq, ->resume_early, and ->resume
> but not ->suspend, ->suspend_late, or ->suspend_noirq. Is that what
> you are saying?

Yes.

As per devices.rst:

"the driver must be prepared to
cope with the invocation of its system-wide resume callbacks back-to-back with
its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
so on) and the final state of the device must reflect the "active" runtime PM
status in that case."

> > Arguably, they could be left in suspend and then resumed after the completion
> > of system suspend, but that would add quite a bit of latency if the device
> > needs to be accessed right after the system suspend is complete.
> >
> > > That's not all: The SMART_SUSPEND decisions completely ignore the value
> > > of DPM_FLAG_NEVER_SKIP! NEVER_SKIP affects only the direct_completion
> > > pathway.
> >
> > As documented AFAICS.
>
> But highly confusing. Maybe we can change the name to, say,
> DPM_FLAG_NO_DIRECT_COMPLETE.

Sure, if that helps. :-)

> > > SMART_SUSPEND seems to have two different meanings. (1) If the device
> > > is already in runtime suspend when a system sleep starts, skip the
> > > suspend_late and suspend_noirq callbacks. (2) Under certain (similar)
> > > circumstances, skip the resume callbacks. The documentation only
> > > mentions (1) but the code also handles (2).
> >
> > That's because (2) is the THAW case and I was distracted before I got
> > to documenting it properly. Sorry.
> >
> > The problem is that if you leave the device in runtime suspend, calling
> > ->freeze_late() or ->freeze_noirq() on it is not useful and if you have
> > skipped those, running the corresponding "thaw" callbacks is not useful
> > either (what would they do, specifically?).
> >
> > There is a whole problem of whether or not devices should be left in
> > runtime suspend during hibernation and I have not had a chance to get
> > to the bottom of that yet.
>
> Not only that. The distinction between SMART_SUSPEND and
> direct_complete is rather subtle, and it doesn't seem to be carefully
> explained anywhere. In fact, I'm not sure I understand it myself. :-)
> For example, the direct_complete mechanism is very careful about not
> leaving a device in runtime suspend if a descendant (or other dependent
> device) will remain active. Does SMART_SUSPEND behave similarly? If
> so, it isn't documented.

The difference is that SMART_SUSPEND allows the ->suspend callback to
be invoked which may decide to resume the device (or reconfigure it
for system wakeup if that doesn't require resuming it). IOW, this
means "I can cope with a runtime-suspended device going forward".
[But if the device is still runtime-suspended during ->suspend_late(),
its configuration is regarded as "final".]

In turn, direct_complete means something like "if this device is
runtime-suspended, leave it as is and don't touch it during the whole
suspend-resume cycle".

> Besides, it seems like a mistake to try controlling (1) and (2)
> together (i.e., with one flag). Can we do a better job of
> separating the functions of SMART_SUSPEND and LEAVE_SUSPENDED?
>
> > > Other things in there also seem strange. device_prepare() does a
> > > WARN_ON if either SMART_SUSPEND or LEAVE_SUSPENDED is set and the
> > > device is not runtime-PM-enabled. That's understandable, but it's also
> > > racy.
> >
> > I guess you mean the check in device_prepare().
> >
> > > A system sleep can begin at any time; how can a driver know when
> > > it is safe to disable a device's runtime PM briefly?
> >
> > Well, fair enough, but then I'm not sure if there is a good place for this
> > check at all, because drivers can briefly disable PM-runtime at any time in
> > theory.
>
> There probably isn't a good place for it. We could just get rid of the
> WARN. I've never heard of it triggering.

OK

> > > When device_prepare() calculates the power.direct_complete flag, it
> > > checks to see whether the device is currently in runtime suspend in
> > > some cases but not in others, as in the code added by your commit
> > > c62ec4610c40 ("PM / core: Fix direct_complete handling for devices
> > > with no callbacks"). Since the runtime-PM state is going to checked in
> > > __device_suspend() anyway, we shouldn't need to check it here at all.
> >
> > I guess the point is that in theory the device can be runtime-suspended
> > between device_prepare() and _device_suspend(), is by checking the status
> > in the former, we lose the opportunity to leave it in suspend if that
> > happens.
> >
> > OK, fair enough.
> >
> > > At a couple of points in the code, THAW and RESTORE events are each
> > > treatedly specially, with no explanation.
> >
> > Right, which is related to the kind of work in progress situation regarding
> > the flags and hibernation mentioned above. Again, sorry about that.
>
> I haven't thought about those issues as much as you have. Still, it
> seems obvious that the FREEZE/THAW phases should be very happy to leave
> devices in runtime suspend throughout (without even worrying about
> wakeup settings), and the RESTORE phase should always bring everything
> back out of runtime suspend.

These were exactly my original thoughts, but then when I started to
consider possible interactions the restore kernel (which also carries
out the "freeze" transition before jumping into the image kernel), it
became less clear.

The concerns is basically whether or not attempting to power on
devices that are already powered on can always be guaranteed to work.

> What to do during the POWEROFF phase isn't so clear, because it depends
> on how the platform handles the poweroff transition.

POWEROFF is exactly analogous to SUSPEND AFAICS.

> > > The power.may_skip_resume flag is used in only one place, when
> > > LEAVE_SUSPENDED is set and there are subsystem-level callbacks. In
> > > particular, it is _not_ used by dev_pm_may_skip_resume(). That seems
> > > highly suspicious at best.
> >
> > That's because it's for the middle-layer (subsystem-level) code to let the
> > core know that skipping the resume would be OK.
> >
> > The core doesn't need that flag when it decides by itself.
>
> This may be another situation where changing a name would make things
> clearer. One doesn't immediately recognize that
> dev_pm_may_skip_resume() applies only in cases where there is no
> subsystem-level callback.

Fair enough.

> > > I think it would be worthwhile to expend some serious effort
> > > straightening all this stuff out. Perhaps we could start with a more
> > > explicit description of what is supposed to happen at each step.
> > > (Things to be careful about include phrases like "leave suspended",
> > > which is not the same as "don't call the resume callbacks", even though
> > > the two are easily conflated.)
> > >
> > > What do you think?
> >
> > I am certainly not going to reject any help. :-)
> >
> > Also, I'm not against clarifying anything that is not clear enough.
>
> Okay, let's start with direct_complete. The direct_complete mechanism
> is applied to the SUSPEND and RESUME phases under the following
> conditions:
>
> DPM_FLAG_NEVER_SKIP (or better, DPM_FLAG_NO_DIRECT_COMPLETE)
> is clear; [Incidentally, since a driver can set this flag
> whenever its ->prepare routine returns 0, why do we need
> DPM_FLAG_SMART_PREPARE?]

Because the former allows the driver to avoid providing a ->prepare
callback always returning 1.

> Either the device has no system-PM callbacks at all or else the
> ->prepare callback returns a positive value;

Why so?

> All of the device's descendants and dependents also want to use
> direct_complete;

Yes.

> Neither the device nor any of its descendants/dependents is
> enabled for wakeup;

Yes.

> The device is runtime suspended just before the ->suspend
> callback would normally be issued.

Yes.

> When the mechanism applies, none of the suspend or resume callbacks (in
> any of their normal, _early, _late, or _noirq variants) are issued,
> only ->complete. Consequently the device remains in runtime suspend
> throughout the system sleep.
>
> Currently, direct_complete is never applied during any of the system
> hibernation phases (FREEZE, THAW, POWEROFF, RESTORE). This may change
> in the future.
>
> Is this description correct and complete?

It is mostly. :-)

> Can you give a similarly
> succinct outline for how SMART_SUSPEND and LEAVE_SUSPENDED should work?
> And also describe how they differ from direct_complete and how they
> interact with it? (For example, how does setting both flags differ
> from returning a positive value from ->prepare?)

I will, but I need some time to do that. Stay tuned.

Cheers!

2020-03-29 14:02:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Sun, Mar 29, 2020 at 11:16 AM Rafael J. Wysocki <[email protected]> wrote:
>

[cut]

> >
> > But if SMART_SUSPEND is set and the device is runtime-suspended, why
> > issue the ->suspend callback?
>
> The driver itself or the middle-layer may want to resume the device.
>
> Arguably, it may do that in ->prepare() too,

Not really.

The problem is that that device_prepare() is executed synchronously
for all devices, so if multiple devices needed to be resumed, the
latency would accumulate if that happened in device_prepare().

Cheers!

2020-03-29 16:28:15

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Sun, 29 Mar 2020, Rafael J. Wysocki wrote:

> On Sat, Mar 28, 2020 at 8:58 PM Alan Stern <[email protected]> wrote:

> > > > A large part of the problem is related to an inconsistency between the
> > > > documentation and the code. include/linux/pm.h says that
> > > > DPM_FLAG_SMART_SUSPEND tells bus types and PM domains about what the
> > > > driver wants. This strongly implies that the PM core will ignore
> > > > SMART_SUSPEND. But in fact the core does check that flag and takes its
> > > > own actions if the device has no subsystem-level callbacks!
> > >
> > > Right, which is because in those cases there is no "middle layer" between
> > > the driver and the core and if you want the driver to work both with
> > > something like genpd or the ACPI PM domain and without anything like that,
> > > the core needs to take those actions for consistency.
> >
> > Sure, the core is acting as a proxy for the missing subsystem
> > callbacks. Still, it should be documented properly.
> >
> > Also, couldn't we simplify the behavior? Right now the core checks
> > that there are no subsystem-level callbacks for any of _early, _late,
> > and _noirq variants before skipping a callback. Couldn't we forget
> > about all that checking and simply skip the device-level callbacks?
> > (Yes, I realize this could lead to inconsistent behavior if the
> > subsystem has some callbacks defined but not others -- but subsystems
> > should never do that, at least, not if it would lead to trouble.)
>
> In quite a few cases the middle layer has nothing specific to do in a
> given phase of suspend/resume, but the driver may.
>
> Subsystems haven't been required to provide callbacks for all phases
> so far, so this change would require some modifications in there.
>
> I actually prefer the core to do more, even if that means more
> complexity in it, to avoid possible subtle differences in behavior
> between subsystems.

What I meant was that it might be reasonable to get rid of the
no_subsys_cb checks. For example, change __device_suspend_noirq() as
follows:

- no_subsys_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL);
-
- if (dev_pm_smart_suspend_and_suspended(dev) && no_subsys_cb)
+ if (dev_pm_smart_suspend_and_suspended(dev))
goto Skip;

with similar changes to the _suspend_late, _resume_noirq, and
_resume_early. In each stage, we would bypass the driver's callback
if SMART_SUSPEND was set and there was no subsystem-level callback for
_that_ stage -- rather than there being no subsystem-level callbacks
for _any_ of the stages.

> > > > Furthermore, the PM core's actions don't seem to make sense. If the
> > > > flag is set and the device is runtime-suspended when the system sleep
> > > > begins, the core will skip issuing the suspend_late and suspend_noirq
> > > > callbacks to the driver. But it doesn't skip issuing the suspend
> > > > callback! I can't figure that out.
> > >
> > > That's because if the core gets to executing ->suspend_late, PM-runtime has
> > > been disabled for the device and if the device is runtime-suspended at that
> > > point, so (at least if SMART_SUSPEND is set for the device) there is no reason
> > > to do anything more to it.
> >
> > But if SMART_SUSPEND is set and the device is runtime-suspended, why
> > issue the ->suspend callback?
>
> The driver itself or the middle-layer may want to resume the device.
>
> Arguably, it may do that in ->prepare() too, but see below.
>
> > Why not just do pm_runtime_disable()
> > then (to prevent the device from resuming) and skip the callback?
>
> Because another driver may want to runtime-resume that device in order
> to use it for something before ->suspend_late(). Of course, you may
> argue that this means a missing device link or similar, so it is not
> clear-cut.
>
> The general rule is that "synchronous" PM-runtime can be expected to
> work before ->suspend_late(), so ->suspend() callbacks should be able
> to use it safely in all cases in principle.

(With one exception: Since the PM core does pm_runtime_get_noresume()
during the prepare stage, going _into_ runtime suspend is impossible
during ->prepare and ->suspend. Of course, drivers can always call
their runtime_suspend routines directly, but that wouldn't affect the
power.runtime_status value.)

> That expectation goes against direct_complete in some cases, so
> drivers need to set NEVER_SKIP (or whatever it will be called in the
> future) to avoid that problem.

Ah, okay. Very well, let's spell this out explicitly in the
documentation; it's an important difference.

> > > > Furthermore, the decisions about
> > > > whether to skip the resume_noirq, resume_early, and resume callbacks
> > > > are based on different criteria from the decisions on the suspend side.
> > >
> > > Right, because there are drivers that don't want devices to stay in suspend
> > > after system resume even though they have been left in suspend by it.
> >
> > This suggests that sometimes we may want to issue non-matching
> > callbacks. For example, ->resume_noirq, ->resume_early, and ->resume
> > but not ->suspend, ->suspend_late, or ->suspend_noirq. Is that what
> > you are saying?
>
> Yes.
>
> As per devices.rst:
>
> "the driver must be prepared to
> cope with the invocation of its system-wide resume callbacks back-to-back with
> its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
> so on) and the final state of the device must reflect the "active" runtime PM
> status in that case."

Here would also be a good place to mention the difference between "keep
the device in runtime suspend" and "not issue the various resume
callbacks". In theory, a subsystem and a driver could collaborate to
make their resume-side callbacks keep the device runtime-suspended, so
these two concepts are not identical.

Alternatively, we could specify that this sort of thing is never
allowed: When the ->resume callback returns, the device _must_ be
powered-up and runtime-active. If we do this, then the _only_ way to
avoid powering up the device (and to let it remain in runtime suspend)
is for the core to skip issuing the resume-side callbacks. Or at
least, skip issuing the ->resume callback.

> > > > SMART_SUSPEND seems to have two different meanings. (1) If the device
> > > > is already in runtime suspend when a system sleep starts, skip the
> > > > suspend_late and suspend_noirq callbacks. (2) Under certain (similar)
> > > > circumstances, skip the resume callbacks. The documentation only
> > > > mentions (1) but the code also handles (2).
> > >
> > > That's because (2) is the THAW case and I was distracted before I got
> > > to documenting it properly. Sorry.
> > >
> > > The problem is that if you leave the device in runtime suspend, calling
> > > ->freeze_late() or ->freeze_noirq() on it is not useful and if you have
> > > skipped those, running the corresponding "thaw" callbacks is not useful
> > > either (what would they do, specifically?).
> > >
> > > There is a whole problem of whether or not devices should be left in
> > > runtime suspend during hibernation and I have not had a chance to get
> > > to the bottom of that yet.
> >
> > Not only that. The distinction between SMART_SUSPEND and
> > direct_complete is rather subtle, and it doesn't seem to be carefully
> > explained anywhere. In fact, I'm not sure I understand it myself. :-)
> > For example, the direct_complete mechanism is very careful about not
> > leaving a device in runtime suspend if a descendant (or other dependent
> > device) will remain active. Does SMART_SUSPEND behave similarly? If
> > so, it isn't documented.
>
> The difference is that SMART_SUSPEND allows the ->suspend callback to
> be invoked which may decide to resume the device (or reconfigure it
> for system wakeup if that doesn't require resuming it). IOW, this
> means "I can cope with a runtime-suspended device going forward".
> [But if the device is still runtime-suspended during ->suspend_late(),
> its configuration is regarded as "final".]
>
> In turn, direct_complete means something like "if this device is
> runtime-suspended, leave it as is and don't touch it during the whole
> suspend-resume cycle".

Right; let's spell this out in the documentation too.

> > > > At a couple of points in the code, THAW and RESTORE events are each
> > > > treatedly specially, with no explanation.
> > >
> > > Right, which is related to the kind of work in progress situation regarding
> > > the flags and hibernation mentioned above. Again, sorry about that.
> >
> > I haven't thought about those issues as much as you have. Still, it
> > seems obvious that the FREEZE/THAW phases should be very happy to leave
> > devices in runtime suspend throughout (without even worrying about
> > wakeup settings), and the RESTORE phase should always bring everything
> > back out of runtime suspend.
>
> These were exactly my original thoughts, but then when I started to
> consider possible interactions the restore kernel (which also carries
> out the "freeze" transition before jumping into the image kernel), it
> became less clear.
>
> The concerns is basically whether or not attempting to power on
> devices that are already powered on can always be guaranteed to work.

This doesn't affect THAW, because during THAW the driver knows what
state the device is in. It only affects RESTORE. But during RESTORE
the driver really doesn't know anything about the device state, even
with the current code. The restore kernel doesn't even know whether
the boot kernel put the device through a FREEZE transition, because
it's possible that the driver was in a module that hadn't been loaded
yet when the resume-from-hibernation started.

Thus, drivers face this problem already. I don't think we need to
worry about it.

> > What to do during the POWEROFF phase isn't so clear, because it depends
> > on how the platform handles the poweroff transition.
>
> POWEROFF is exactly analogous to SUSPEND AFAICS.

The difference is that on many platforms (such as desktop PCs) the
POWEROFF callbacks don't have to power-down the device, because the
firmware will power _everything_ off (except the devices needed for
wakeup, of course). But on other platforms this isn't true, so on them
POWEROFF does need to behave like SUSPEND.

> > Okay, let's start with direct_complete. The direct_complete mechanism
> > is applied to the SUSPEND and RESUME phases under the following
> > conditions:
> >
> > DPM_FLAG_NEVER_SKIP (or better, DPM_FLAG_NO_DIRECT_COMPLETE)
> > is clear; [Incidentally, since a driver can set this flag
> > whenever its ->prepare routine returns 0, why do we need
> > DPM_FLAG_SMART_PREPARE?]
>
> Because the former allows the driver to avoid providing a ->prepare
> callback always returning 1.

Do you mean NEVER_SKIP allows the driver to avoid providing a ->prepare
callback which always returns _0_? If that's not what you meant, I
don't understand.

> > Either the device has no system-PM callbacks at all or else the
> > ->prepare callback returns a positive value;
>
> Why so?

Isn't that exactly what __device_prepare() does? After your latest
patch, we have:

dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
(ret > 0 || dev->power.no_pm_callbacks) &&
!dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);

which is exactly what I said, isn't it?

> > All of the device's descendants and dependents also want to use
> > direct_complete;
>
> Yes.
>
> > Neither the device nor any of its descendants/dependents is
> > enabled for wakeup;
>
> Yes.
>
> > The device is runtime suspended just before the ->suspend
> > callback would normally be issued.
>
> Yes.
>
> > When the mechanism applies, none of the suspend or resume callbacks (in
> > any of their normal, _early, _late, or _noirq variants) are issued,
> > only ->complete. Consequently the device remains in runtime suspend
> > throughout the system sleep.
> >
> > Currently, direct_complete is never applied during any of the system
> > hibernation phases (FREEZE, THAW, POWEROFF, RESTORE). This may change
> > in the future.
> >
> > Is this description correct and complete?
>
> It is mostly. :-)

I forgot to mention that if power.syscore is set then none of these
mechanisms apply because none of the callbacks are issued. Does
anything else need to be changed?

> > Can you give a similarly
> > succinct outline for how SMART_SUSPEND and LEAVE_SUSPENDED should work?
> > And also describe how they differ from direct_complete and how they
> > interact with it? (For example, how does setting both flags differ
> > from returning a positive value from ->prepare?)
>
> I will, but I need some time to do that. Stay tuned.

You bet!

Alan Stern

2020-04-03 15:06:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Sunday, March 29, 2020 6:27:38 PM CEST Alan Stern wrote:
> On Sun, 29 Mar 2020, Rafael J. Wysocki wrote:
>
> > On Sat, Mar 28, 2020 at 8:58 PM Alan Stern <[email protected]> wrote:

[cut]

> > > Can you give a similarly
> > > succinct outline for how SMART_SUSPEND and LEAVE_SUSPENDED should work?
> > > And also describe how they differ from direct_complete and how they
> > > interact with it? (For example, how does setting both flags differ
> > > from returning a positive value from ->prepare?)
> >
> > I will, but I need some time to do that. Stay tuned.
>
> You bet!

Sorry for the delay, too much distraction nowadays.

I'll address the other points in your message separately.

The rules for SMART_SUSPEND are as follows:

(a) If SMART_SUSPEND is set and the device is runtime-suspended during system
suspend, it is not expected to be resumed by the core or the middle layer
(subsystem) code unless the latter has a specific reason to do that (e.g.
it knows that the device needs to be reconfigured which cannot be done
without resuming it).

The device can still be resumed when it is needed to suspend a dependent
device, but that cannot happen before the "late suspend" phase.

(b) Drivers that set SMART_SUSPEND are allowed to reuse their PM-runtime
callbacks for system-wide suspend and resume.

That is, they can point either the ->suspend_late or the ->suspend_noirq
callback pointer to the same function as ->runtime_suspend and they can
point either the ->resume_noirq or ->the resume_early callback to the'
same function as ->runtime_resume.

(c) Drivers that set SMART_SUSPEND are alwo allowed to provide special
simplified callbacks for the "freeze" and "thaw" transitions during
hibernation (and restore) and (if they do so) special callbacks for the
"restore" phase.

[OK, I realize that (b) and (c) are not documented, see the notes below.]

Because of (a), if the device with SMART_SUSPEND set is still runtime-suspended
during the "late" phase of suspend, the core will not invoke the driver's
"late" and "noirq" suspend callbacks directly (*). Middle layer (subsystem)
code is expected to behave accordingly.

Because of (b), if the "late" and "noirq" driver callbacks were skipped during
the "freeze" transition, the core will also avoid invoking the "noirq" and
"early" callbacks provided by the driver during the "thaw" transition and
the callbacks during the "restore" transition will be executed unconditionally
(**). Middle layer code is expected to behave accordingly.

Notes:

1. I have considered splitting SMART_SUSPEND into two or even three flags
so that (a), (b) and (c) are each associated with a separate flag, but
then I would expect the majority of users to use all of them anyway.

2. LEAVE_SUSPENDED (which may be better renamed to SKIP_RESUME) is kind of
expected to be used along with SMART_SUSPEND unless there is a good enough
reason to avoid using it. I admit that this isn't really straightforward,
maybe the default behavior should be to skip the resume and there should be
FORCE_RESUME instead of LEAVE_SUSPENDED.

3. (*) Under the assumption that either ->suspend_late or ->suspend_noirq
points to the same routine as ->runtime_suspend (and the other is NULL),
invokig that callback for a runtime-suspended device is technically invalid.
In turn, under the assumption that either ->resume_early or ->resume_noirq
points to the same routine as ->runtime_resume (and the other is NULL), it is
valid to invoke that callback if the late/noirq suspend was skipped.

4. (**) If the "freeze" and "thaw" callbacks are simplified, they cannot be
run back-to-back with ->runtime_resume and ->runtime_suspend, respectively.
Thus if "freeze" is skippend, "thaw" must be skipped too. However,
"restore" needs to be prepared to be invoked after "freeze" or
->runtime_suspend (and the state of the device may not match the
callback that ran previously), so it must be special.

5. I agree that skipping the driver level of callbacks depending on what is
provided by the middle layer is inconsistent, but I wanted to take the
users of pm_runtime_force_suspend/resume() into account by letting those
things run.

It would be more consistent to expect middle layer code (bus types, PM
domains) to provide either all of the noirq/early/late callbacks, or none
of them and make SMART_SUSPEND and pm_runtime_force_suspend/resume()
mutually exclusive.

Cheers!



2020-04-03 16:14:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Friday, April 3, 2020 5:04:16 PM CEST Rafael J. Wysocki wrote:
> On Sunday, March 29, 2020 6:27:38 PM CEST Alan Stern wrote:
> > On Sun, 29 Mar 2020, Rafael J. Wysocki wrote:
> >
> > > On Sat, Mar 28, 2020 at 8:58 PM Alan Stern <[email protected]> wrote:
>
> [cut]
>
> > > > Can you give a similarly
> > > > succinct outline for how SMART_SUSPEND and LEAVE_SUSPENDED should work?
> > > > And also describe how they differ from direct_complete and how they
> > > > interact with it? (For example, how does setting both flags differ
> > > > from returning a positive value from ->prepare?)
> > >
> > > I will, but I need some time to do that. Stay tuned.
> >
> > You bet!
>
> Sorry for the delay, too much distraction nowadays.
>
> I'll address the other points in your message separately.
>
> The rules for SMART_SUSPEND are as follows:
>
> (a) If SMART_SUSPEND is set and the device is runtime-suspended during system
> suspend, it is not expected to be resumed by the core or the middle layer
> (subsystem) code unless the latter has a specific reason to do that (e.g.
> it knows that the device needs to be reconfigured which cannot be done
> without resuming it).
>
> The device can still be resumed when it is needed to suspend a dependent
> device, but that cannot happen before the "late suspend" phase.

s/cannot/must/

> (b) Drivers that set SMART_SUSPEND are allowed to reuse their PM-runtime
> callbacks for system-wide suspend and resume.
>
> That is, they can point either the ->suspend_late or the ->suspend_noirq
> callback pointer to the same function as ->runtime_suspend and they can
> point either the ->resume_noirq or ->the resume_early callback to the'
> same function as ->runtime_resume.
>
> (c) Drivers that set SMART_SUSPEND are alwo allowed to provide special

s/alwo/also/

> simplified callbacks for the "freeze" and "thaw" transitions during
> hibernation (and restore) and (if they do so) special callbacks for the
> "restore" phase.
>
> [OK, I realize that (b) and (c) are not documented, see the notes below.]
>
> Because of (a), if the device with SMART_SUSPEND set is still runtime-suspended
> during the "late" phase of suspend, the core will not invoke the driver's
> "late" and "noirq" suspend callbacks directly (*). Middle layer (subsystem)
> code is expected to behave accordingly.
>
> Because of (b), if the "late" and "noirq" driver callbacks were skipped during
> the "freeze" transition, the core will also avoid invoking the "noirq" and
> "early" callbacks provided by the driver during the "thaw" transition and
> the callbacks during the "restore" transition will be executed unconditionally
> (**). Middle layer code is expected to behave accordingly.
>
> Notes:
>
> 1. I have considered splitting SMART_SUSPEND into two or even three flags
> so that (a), (b) and (c) are each associated with a separate flag, but
> then I would expect the majority of users to use all of them anyway.
>
> 2. LEAVE_SUSPENDED (which may be better renamed to SKIP_RESUME) is kind of
> expected to be used along with SMART_SUSPEND unless there is a good enough
> reason to avoid using it. I admit that this isn't really straightforward,
> maybe the default behavior should be to skip the resume and there should be
> FORCE_RESUME instead of LEAVE_SUSPENDED.
>
> 3. (*) Under the assumption that either ->suspend_late or ->suspend_noirq
> points to the same routine as ->runtime_suspend (and the other is NULL),
> invokig that callback for a runtime-suspended device is technically invalid.
> In turn, under the assumption that either ->resume_early or ->resume_noirq
> points to the same routine as ->runtime_resume (and the other is NULL), it is
> valid to invoke that callback if the late/noirq suspend was skipped.
>
> 4. (**) If the "freeze" and "thaw" callbacks are simplified, they cannot be
> run back-to-back with ->runtime_resume and ->runtime_suspend, respectively.

That is, ->freeze -> ->runtime_resume would be invalid and
->runtime_suspend -> ->thaw would be invalid.

> Thus if "freeze" is skippend, "thaw" must be skipped too. However,
> "restore" needs to be prepared to be invoked after "freeze" or
> ->runtime_suspend (and the state of the device may not match the
> callback that ran previously), so it must be special.
>
> 5. I agree that skipping the driver level of callbacks depending on what is
> provided by the middle layer is inconsistent, but I wanted to take the
> users of pm_runtime_force_suspend/resume() into account by letting those
> things run.
>
> It would be more consistent to expect middle layer code (bus types, PM
> domains) to provide either all of the noirq/early/late callbacks, or none
> of them and make SMART_SUSPEND and pm_runtime_force_suspend/resume()
> mutually exclusive.
>
> Cheers!



2020-04-03 16:42:43

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Fri, 3 Apr 2020, Rafael J. Wysocki wrote:

> I'll address the other points in your message separately.
>
> The rules for SMART_SUSPEND are as follows:

These rules are a sort of high-level overview. They don't (for the
most part) say exactly what will happen, in terms of which callbacks
will be issued and under what circumstances. I have tried to provide
such a description inline below.

> (a) If SMART_SUSPEND is set and the device is runtime-suspended during system
> suspend, it is not expected to be resumed by the core or the middle layer
> (subsystem) code unless the latter has a specific reason to do that (e.g.
> it knows that the device needs to be reconfigured which cannot be done
> without resuming it).
>
> The device can still be resumed when it is needed to suspend a dependent
> device, but that cannot happen before the "late suspend" phase.

Don't you mean it cannot happen during or after the "late suspend"
phase? (More precisely, it cannot happen after the time when the core
would issue the device's ->suspend_late callback, but it can happen
between that time and the time when the "late suspend" phase began --
for example, from within the dependent device's ->suspend_late
callback.) After all, __device_suspend_late() calls
__pm_runtime_disable(), following which the device _cannot_ be runtime
resumed.

Putting this in operational terms, it seems to say that if
SMART_SUSPEND is set and the device is runtime-suspended at the times
when the core would normally issue the driver's ->suspend_late or
->suspend_noirq callback, then the core will skip the callback (with a
similar requirement for subsystems). Correct?

> (b) Drivers that set SMART_SUSPEND are allowed to reuse their PM-runtime
> callbacks for system-wide suspend and resume.
>
> That is, they can point either the ->suspend_late or the ->suspend_noirq
> callback pointer to the same function as ->runtime_suspend and they can
> point either the ->resume_noirq or ->the resume_early callback to the'
> same function as ->runtime_resume.

Well, in theory any driver or subsystem can do this whenever it wants
to, regardless of any flag settings. It's then up to the driver or
subsystem to make sure the callback "does the right thing".

What I'm concerned about now is: What guarantees can the core give to
the driver and subsystem, so that they will know what is necessary in
order to "do the right thing"?

> (c) Drivers that set SMART_SUSPEND are alwo allowed to provide special
> simplified callbacks for the "freeze" and "thaw" transitions during
> hibernation (and restore) and (if they do so) special callbacks for the
> "restore" phase.

What do you mean by "simplified"?

As I see it, the suspend-side callbacks are generally responsible for
four things:

1. Quiesce the device (finish ongoing I/O and do not allow any
more to start).

2. Save the current device state.

3. Install the appropriate wakeup settings.

4. Put the device into low-power mode.

(Not explicitly listed: Perform a runtime-resume if needed in order to
carry out these four items.)

During a SUSPEND transition, we usually expect all four to happen.
During a FREEZE transition, we only require 1. During a POWEROFF
transition we require 1 and 3, and possibly 4 (depending on how the
platform handles poweroff).

Similar requirements apply to the resume-side callbacks. (But note
that RESTORE is not the inverse of POWEROFF; it is more like an inverse
of FREEZE with the added complication that the device's initial state
is unknown.)

What changes to this analysis would SMART_SUSPEND allow? None if the
device is runtime-active. But if the device is runtime-suspended and
the wakeup settings don't need to be changed, then presumably none of
the four items are necessary.

Is this what you mean?

> [OK, I realize that (b) and (c) are not documented, see the notes below.]
>
> Because of (a), if the device with SMART_SUSPEND set is still runtime-suspended
> during the "late" phase of suspend, the core will not invoke the driver's
> "late" and "noirq" suspend callbacks directly (*). Middle layer (subsystem)
> code is expected to behave accordingly.

Okay, this agrees with what I wrote above.

> Because of (b), if the "late" and "noirq" driver callbacks were skipped during
> the "freeze" transition, the core will also avoid invoking the "noirq" and
> "early" callbacks provided by the driver during the "thaw" transition and
> the callbacks during the "restore" transition will be executed unconditionally
> (**). Middle layer code is expected to behave accordingly.

All right. To summarize: If the driver's ->freeze_late callback is
skipped then the driver's ->thaw-early will be skipped, and similarly
for ->freeze_noirq and ->thaw_noirq. But RESTORE callbacks are never
skipped. Correct?

However, the most difficult transitions are SUSPEND and RESUME. Is it
accurate to say that if the driver's ->suspend_late callback is skipped
then the driver's ->resume_early will be skipped, and similarly for
->suspend_noirq and ->resume_noirq?

> Notes:
>
> 1. I have considered splitting SMART_SUSPEND into two or even three flags
> so that (a), (b) and (c) are each associated with a separate flag, but
> then I would expect the majority of users to use all of them anyway.
>
> 2. LEAVE_SUSPENDED (which may be better renamed to SKIP_RESUME) is kind of
> expected to be used along with SMART_SUSPEND unless there is a good enough
> reason to avoid using it. I admit that this isn't really straightforward,
> maybe the default behavior should be to skip the resume and there should be
> FORCE_RESUME instead of LEAVE_SUSPENDED.

One question not addressed above (in fact, the original reason for
getting you involved in this discussion): What about the device's
power.runtime_status? Shall we say that that core will call
pm_runtime_set_active() at some point before issuing the ->complete
callback unless some combination of flags is set? And what should that
combination be?

After all, we expect that most drivers will want their devices to be in
the runtime-active state at the end of a system sleep or hibernation.
It makes sense for the core to do the necessary housekeeping.

> 3. (*) Under the assumption that either ->suspend_late or ->suspend_noirq
> points to the same routine as ->runtime_suspend (and the other is NULL),
> invokig that callback for a runtime-suspended device is technically invalid.

Does this invalidate anything I wrote above?

> In turn, under the assumption that either ->resume_early or ->resume_noirq
> points to the same routine as ->runtime_resume (and the other is NULL), it is
> valid to invoke that callback if the late/noirq suspend was skipped.

In other words, it's okay for the core either to issue or skip those
callbacks. Presumably the decision will be made based on some flag
setting?

> 4. (**) If the "freeze" and "thaw" callbacks are simplified, they cannot be
> run back-to-back with ->runtime_resume and ->runtime_suspend, respectively.
> Thus if "freeze" is skippend, "thaw" must be skipped too. However,
> "restore" needs to be prepared to be invoked after "freeze" or
> ->runtime_suspend (and the state of the device may not match the
> callback that ran previously), so it must be special.
>
> 5. I agree that skipping the driver level of callbacks depending on what is
> provided by the middle layer is inconsistent, but I wanted to take the
> users of pm_runtime_force_suspend/resume() into account by letting those
> things run.
>
> It would be more consistent to expect middle layer code (bus types, PM
> domains) to provide either all of the noirq/early/late callbacks, or none
> of them and make SMART_SUSPEND and pm_runtime_force_suspend/resume()
> mutually exclusive.

I don't have a clear idea of how pm_runtime_force_suspend/resume() gets
used. Are we better off ignoring it for the time being?

Alan Stern

2020-04-03 17:44:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Sunday, March 29, 2020 6:27:38 PM CEST Alan Stern wrote:
> On Sun, 29 Mar 2020, Rafael J. Wysocki wrote:
>
> > On Sat, Mar 28, 2020 at 8:58 PM Alan Stern <[email protected]> wrote:
>
> > > > > A large part of the problem is related to an inconsistency between the
> > > > > documentation and the code. include/linux/pm.h says that
> > > > > DPM_FLAG_SMART_SUSPEND tells bus types and PM domains about what the
> > > > > driver wants. This strongly implies that the PM core will ignore
> > > > > SMART_SUSPEND. But in fact the core does check that flag and takes its
> > > > > own actions if the device has no subsystem-level callbacks!
> > > >
> > > > Right, which is because in those cases there is no "middle layer" between
> > > > the driver and the core and if you want the driver to work both with
> > > > something like genpd or the ACPI PM domain and without anything like that,
> > > > the core needs to take those actions for consistency.
> > >
> > > Sure, the core is acting as a proxy for the missing subsystem
> > > callbacks. Still, it should be documented properly.
> > >
> > > Also, couldn't we simplify the behavior? Right now the core checks
> > > that there are no subsystem-level callbacks for any of _early, _late,
> > > and _noirq variants before skipping a callback. Couldn't we forget
> > > about all that checking and simply skip the device-level callbacks?
> > > (Yes, I realize this could lead to inconsistent behavior if the
> > > subsystem has some callbacks defined but not others -- but subsystems
> > > should never do that, at least, not if it would lead to trouble.)
> >
> > In quite a few cases the middle layer has nothing specific to do in a
> > given phase of suspend/resume, but the driver may.
> >
> > Subsystems haven't been required to provide callbacks for all phases
> > so far, so this change would require some modifications in there.
> >
> > I actually prefer the core to do more, even if that means more
> > complexity in it, to avoid possible subtle differences in behavior
> > between subsystems.
>
> What I meant was that it might be reasonable to get rid of the
> no_subsys_cb checks. For example, change __device_suspend_noirq() as
> follows:
>
> - no_subsys_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL);
> -
> - if (dev_pm_smart_suspend_and_suspended(dev) && no_subsys_cb)
> + if (dev_pm_smart_suspend_and_suspended(dev))
> goto Skip;
>
> with similar changes to the _suspend_late, _resume_noirq, and
> _resume_early. In each stage, we would bypass the driver's callback
> if SMART_SUSPEND was set and there was no subsystem-level callback for
> _that_ stage -- rather than there being no subsystem-level callbacks
> for _any_ of the stages.

I understand that.

As mentioned in the other message, I attempted to allow
pm_runtime_force_suspend/resume() to be used along with setting SMART_SUSPEND,
but that looks like a mistake now.

I agree that skipping the driver-level callbacks regardless of what is provided
by the subsystem would be more consistent.

> > > > > Furthermore, the PM core's actions don't seem to make sense. If the
> > > > > flag is set and the device is runtime-suspended when the system sleep
> > > > > begins, the core will skip issuing the suspend_late and suspend_noirq
> > > > > callbacks to the driver. But it doesn't skip issuing the suspend
> > > > > callback! I can't figure that out.
> > > >
> > > > That's because if the core gets to executing ->suspend_late, PM-runtime has
> > > > been disabled for the device and if the device is runtime-suspended at that
> > > > point, so (at least if SMART_SUSPEND is set for the device) there is no reason
> > > > to do anything more to it.
> > >
> > > But if SMART_SUSPEND is set and the device is runtime-suspended, why
> > > issue the ->suspend callback?
> >
> > The driver itself or the middle-layer may want to resume the device.
> >
> > Arguably, it may do that in ->prepare() too, but see below.
> >
> > > Why not just do pm_runtime_disable()
> > > then (to prevent the device from resuming) and skip the callback?
> >
> > Because another driver may want to runtime-resume that device in order
> > to use it for something before ->suspend_late(). Of course, you may
> > argue that this means a missing device link or similar, so it is not
> > clear-cut.
> >
> > The general rule is that "synchronous" PM-runtime can be expected to
> > work before ->suspend_late(), so ->suspend() callbacks should be able
> > to use it safely in all cases in principle.
>
> (With one exception: Since the PM core does pm_runtime_get_noresume()
> during the prepare stage, going _into_ runtime suspend is impossible
> during ->prepare and ->suspend. Of course, drivers can always call
> their runtime_suspend routines directly, but that wouldn't affect the
> power.runtime_status value.)
>
> > That expectation goes against direct_complete in some cases, so
> > drivers need to set NEVER_SKIP (or whatever it will be called in the
> > future) to avoid that problem.
>
> Ah, okay. Very well, let's spell this out explicitly in the
> documentation; it's an important difference.
>
> > > > > Furthermore, the decisions about
> > > > > whether to skip the resume_noirq, resume_early, and resume callbacks
> > > > > are based on different criteria from the decisions on the suspend side.
> > > >
> > > > Right, because there are drivers that don't want devices to stay in suspend
> > > > after system resume even though they have been left in suspend by it.
> > >
> > > This suggests that sometimes we may want to issue non-matching
> > > callbacks. For example, ->resume_noirq, ->resume_early, and ->resume
> > > but not ->suspend, ->suspend_late, or ->suspend_noirq. Is that what
> > > you are saying?
> >
> > Yes.
> >
> > As per devices.rst:
> >
> > "the driver must be prepared to
> > cope with the invocation of its system-wide resume callbacks back-to-back with
> > its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
> > so on) and the final state of the device must reflect the "active" runtime PM
> > status in that case."
>
> Here would also be a good place to mention the difference between "keep
> the device in runtime suspend" and "not issue the various resume
> callbacks". In theory, a subsystem and a driver could collaborate to
> make their resume-side callbacks keep the device runtime-suspended, so
> these two concepts are not identical.
>
> Alternatively, we could specify that this sort of thing is never
> allowed: When the ->resume callback returns, the device _must_ be
> powered-up and runtime-active. If we do this, then the _only_ way to
> avoid powering up the device (and to let it remain in runtime suspend)
> is for the core to skip issuing the resume-side callbacks. Or at
> least, skip issuing the ->resume callback.

Basically, all devices with SMART_SUSPEND set whose late/noirq suspend callbacks
were skipped can be left in suspend during system-wide resume by skipping their
callbacks, so that they can be resumed by PM-runtime (that becomes kind of like
direct-complete at that point), but some drivers may not want that for earlier
device response after system-wide resume (if it is resumed by the system-wide
code, it will be immediately ready when user space is unfrozen).

It is expected that LEAVE_SUSPENDED will be used along with SMART_SUSPEND unless
the above is the case.

> > > > > SMART_SUSPEND seems to have two different meanings. (1) If the device
> > > > > is already in runtime suspend when a system sleep starts, skip the
> > > > > suspend_late and suspend_noirq callbacks. (2) Under certain (similar)
> > > > > circumstances, skip the resume callbacks. The documentation only
> > > > > mentions (1) but the code also handles (2).
> > > >
> > > > That's because (2) is the THAW case and I was distracted before I got
> > > > to documenting it properly. Sorry.
> > > >
> > > > The problem is that if you leave the device in runtime suspend, calling
> > > > ->freeze_late() or ->freeze_noirq() on it is not useful and if you have
> > > > skipped those, running the corresponding "thaw" callbacks is not useful
> > > > either (what would they do, specifically?).
> > > >
> > > > There is a whole problem of whether or not devices should be left in
> > > > runtime suspend during hibernation and I have not had a chance to get
> > > > to the bottom of that yet.
> > >
> > > Not only that. The distinction between SMART_SUSPEND and
> > > direct_complete is rather subtle, and it doesn't seem to be carefully
> > > explained anywhere. In fact, I'm not sure I understand it myself. :-)
> > > For example, the direct_complete mechanism is very careful about not
> > > leaving a device in runtime suspend if a descendant (or other dependent
> > > device) will remain active. Does SMART_SUSPEND behave similarly? If
> > > so, it isn't documented.
> >
> > The difference is that SMART_SUSPEND allows the ->suspend callback to
> > be invoked which may decide to resume the device (or reconfigure it
> > for system wakeup if that doesn't require resuming it). IOW, this
> > means "I can cope with a runtime-suspended device going forward".
> > [But if the device is still runtime-suspended during ->suspend_late(),
> > its configuration is regarded as "final".]
> >
> > In turn, direct_complete means something like "if this device is
> > runtime-suspended, leave it as is and don't touch it during the whole
> > suspend-resume cycle".
>
> Right; let's spell this out in the documentation too.
>
> > > > > At a couple of points in the code, THAW and RESTORE events are each
> > > > > treatedly specially, with no explanation.
> > > >
> > > > Right, which is related to the kind of work in progress situation regarding
> > > > the flags and hibernation mentioned above. Again, sorry about that.
> > >
> > > I haven't thought about those issues as much as you have. Still, it
> > > seems obvious that the FREEZE/THAW phases should be very happy to leave
> > > devices in runtime suspend throughout (without even worrying about
> > > wakeup settings), and the RESTORE phase should always bring everything
> > > back out of runtime suspend.
> >
> > These were exactly my original thoughts, but then when I started to
> > consider possible interactions the restore kernel (which also carries
> > out the "freeze" transition before jumping into the image kernel), it
> > became less clear.
> >
> > The concerns is basically whether or not attempting to power on
> > devices that are already powered on can always be guaranteed to work.
>
> This doesn't affect THAW, because during THAW the driver knows what
> state the device is in. It only affects RESTORE. But during RESTORE
> the driver really doesn't know anything about the device state, even
> with the current code. The restore kernel doesn't even know whether
> the boot kernel put the device through a FREEZE transition, because
> it's possible that the driver was in a module that hadn't been loaded
> yet when the resume-from-hibernation started.
>
> Thus, drivers face this problem already. I don't think we need to
> worry about it.

OK

> > > What to do during the POWEROFF phase isn't so clear, because it depends
> > > on how the platform handles the poweroff transition.
> >
> > POWEROFF is exactly analogous to SUSPEND AFAICS.
>
> The difference is that on many platforms (such as desktop PCs) the
> POWEROFF callbacks don't have to power-down the device, because the
> firmware will power _everything_ off (except the devices needed for
> wakeup, of course). But on other platforms this isn't true, so on them
> POWEROFF does need to behave like SUSPEND.

And there are platforms where the firmware turns off everything (except for
wakeup devices) at the end of system-wide suspend too.

There really isn't that much of a difference in general.

> > > Okay, let's start with direct_complete. The direct_complete mechanism
> > > is applied to the SUSPEND and RESUME phases under the following
> > > conditions:
> > >
> > > DPM_FLAG_NEVER_SKIP (or better, DPM_FLAG_NO_DIRECT_COMPLETE)
> > > is clear; [Incidentally, since a driver can set this flag
> > > whenever its ->prepare routine returns 0, why do we need
> > > DPM_FLAG_SMART_PREPARE?]
> >
> > Because the former allows the driver to avoid providing a ->prepare
> > callback always returning 1.
>
> Do you mean NEVER_SKIP allows the driver to avoid providing a ->prepare
> callback which always returns _0_? If that's not what you meant, I
> don't understand.

Yes, I thought 0 and wrote 1, sorry.

> > > Either the device has no system-PM callbacks at all or else the
> > > ->prepare callback returns a positive value;
> >
> > Why so?
>
> Isn't that exactly what __device_prepare() does? After your latest
> patch, we have:
>
> dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
> (ret > 0 || dev->power.no_pm_callbacks) &&
> !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
>
> which is exactly what I said, isn't it?

I misread what you wrote, so agreed.

> > > All of the device's descendants and dependents also want to use
> > > direct_complete;
> >
> > Yes.
> >
> > > Neither the device nor any of its descendants/dependents is
> > > enabled for wakeup;
> >
> > Yes.
> >
> > > The device is runtime suspended just before the ->suspend
> > > callback would normally be issued.
> >
> > Yes.
> >
> > > When the mechanism applies, none of the suspend or resume callbacks (in
> > > any of their normal, _early, _late, or _noirq variants) are issued,
> > > only ->complete. Consequently the device remains in runtime suspend
> > > throughout the system sleep.
> > >
> > > Currently, direct_complete is never applied during any of the system
> > > hibernation phases (FREEZE, THAW, POWEROFF, RESTORE). This may change
> > > in the future.
> > >
> > > Is this description correct and complete?
> >
> > It is mostly. :-)
>
> I forgot to mention that if power.syscore is set then none of these
> mechanisms apply because none of the callbacks are issued. Does
> anything else need to be changed?

No, I don't think so.



2020-04-03 18:35:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Friday, April 3, 2020 6:41:05 PM CEST Alan Stern wrote:
> On Fri, 3 Apr 2020, Rafael J. Wysocki wrote:
>
> > I'll address the other points in your message separately.
> >
> > The rules for SMART_SUSPEND are as follows:
>
> These rules are a sort of high-level overview. They don't (for the
> most part) say exactly what will happen, in terms of which callbacks
> will be issued and under what circumstances. I have tried to provide
> such a description inline below.
>
> > (a) If SMART_SUSPEND is set and the device is runtime-suspended during system
> > suspend, it is not expected to be resumed by the core or the middle layer
> > (subsystem) code unless the latter has a specific reason to do that (e.g.
> > it knows that the device needs to be reconfigured which cannot be done
> > without resuming it).
> >
> > The device can still be resumed when it is needed to suspend a dependent
> > device, but that cannot happen before the "late suspend" phase.
>
> Don't you mean it cannot happen during or after the "late suspend"
> phase?

Yes, I do.

> (More precisely, it cannot happen after the time when the core
> would issue the device's ->suspend_late callback, but it can happen
> between that time and the time when the "late suspend" phase began --
> for example, from within the dependent device's ->suspend_late
> callback.) After all, __device_suspend_late() calls
> __pm_runtime_disable(), following which the device _cannot_ be runtime
> resumed.

Right.

> Putting this in operational terms, it seems to say that if
> SMART_SUSPEND is set and the device is runtime-suspended at the times
> when the core would normally issue the driver's ->suspend_late or
> ->suspend_noirq callback, then the core will skip the callback (with a
> similar requirement for subsystems). Correct?

Yes.

> > (b) Drivers that set SMART_SUSPEND are allowed to reuse their PM-runtime
> > callbacks for system-wide suspend and resume.
> >
> > That is, they can point either the ->suspend_late or the ->suspend_noirq
> > callback pointer to the same function as ->runtime_suspend and they can
> > point either the ->resume_noirq or ->the resume_early callback to the'
> > same function as ->runtime_resume.
>
> Well, in theory any driver or subsystem can do this whenever it wants
> to, regardless of any flag settings.

Not exactly.

Say the driver wants to point both ->runtime_suspend and ->suspend_late to
the same function.

If the bus type doesn't provide system-wide PM callbacks at all (which is
the case for some bus types), that only works if the device is never
runtime-suspended when ->suspend_late is about to run, because otherwise
the function in question needs to check the context in which it is running
(PM-runtime vs system-wide and runtime-suspended vs runtime-active in the
latter case) which at least is awkward and hard to get right.

> It's then up to the driver or
> subsystem to make sure the callback "does the right thing".

In theory.

> What I'm concerned about now is: What guarantees can the core give to
> the driver and subsystem, so that they will know what is necessary in
> order to "do the right thing"?

I'm not sure what you mean.

If the subsystem provides callbacks, the core will run them regardless.

If it does not provide callbacks, the core will skip ->suspend_late and
->suspend_noirq for the driver and the device will remain suspended.

If SMART_SUSPEND is not set, the core will execute all of the callbacks
that are present.

> > (c) Drivers that set SMART_SUSPEND are alwo allowed to provide special
> > simplified callbacks for the "freeze" and "thaw" transitions during
> > hibernation (and restore) and (if they do so) special callbacks for the
> > "restore" phase.
>
> What do you mean by "simplified"?

Avoiding actual PM.

> As I see it, the suspend-side callbacks are generally responsible for
> four things:
>
> 1. Quiesce the device (finish ongoing I/O and do not allow any
> more to start).
>
> 2. Save the current device state.
>
> 3. Install the appropriate wakeup settings.
>
> 4. Put the device into low-power mode.
>
> (Not explicitly listed: Perform a runtime-resume if needed in order to
> carry out these four items.)
>
> During a SUSPEND transition, we usually expect all four to happen.
> During a FREEZE transition, we only require 1.

That's what I mean by "simplified".

> During a POWEROFF
> transition we require 1 and 3, and possibly 4 (depending on how the
> platform handles poweroff).

But doing 2 is not a bug AFAICS.

> Similar requirements apply to the resume-side callbacks. (But note
> that RESTORE is not the inverse of POWEROFF; it is more like an inverse
> of FREEZE with the added complication that the device's initial state
> is unknown.)

It actually isn't even an inverse of FREEZE. It is like RESUME with the
additional requirements that (a) it can never be skipped and (b) the
device need not be in a low-power state when it runs (the initial state
of it is unknown if you will).

> What changes to this analysis would SMART_SUSPEND allow? None if the
> device is runtime-active. But if the device is runtime-suspended and
> the wakeup settings don't need to be changed, then presumably none of
> the four items are necessary.
>
> Is this what you mean?

No.

What I meant was that even if the driver pointed ->runtime_suspend and
->suspend_late (say) to the same function and it pointed ->resume_early
and ->runtime_resume to the same function, it didn't have to point
->freeze_late and ->thaw_early to the same pair of functions, respectively.

It can point ->freeze_late and ->thaw_early to a pair of different functions
that only quiesce the device and reverse that, respectively.

> > [OK, I realize that (b) and (c) are not documented, see the notes below.]
> >
> > Because of (a), if the device with SMART_SUSPEND set is still runtime-suspended
> > during the "late" phase of suspend, the core will not invoke the driver's
> > "late" and "noirq" suspend callbacks directly (*). Middle layer (subsystem)
> > code is expected to behave accordingly.
>
> Okay, this agrees with what I wrote above.
>
> > Because of (b), if the "late" and "noirq" driver callbacks were skipped during
> > the "freeze" transition, the core will also avoid invoking the "noirq" and
> > "early" callbacks provided by the driver during the "thaw" transition and
> > the callbacks during the "restore" transition will be executed unconditionally
> > (**). Middle layer code is expected to behave accordingly.
>
> All right. To summarize: If the driver's ->freeze_late callback is
> skipped then the driver's ->thaw-early will be skipped, and similarly
> for ->freeze_noirq and ->thaw_noirq. But RESTORE callbacks are never
> skipped. Correct?

Yes.

> However, the most difficult transitions are SUSPEND and RESUME. Is it
> accurate to say that if the driver's ->suspend_late callback is skipped
> then the driver's ->resume_early will be skipped, and similarly for
> ->suspend_noirq and ->resume_noirq?

If LEAVE_SUSPENDED is set in addition to SMART_SUSPEND, then yes.

> > Notes:
> >
> > 1. I have considered splitting SMART_SUSPEND into two or even three flags
> > so that (a), (b) and (c) are each associated with a separate flag, but
> > then I would expect the majority of users to use all of them anyway.
> >
> > 2. LEAVE_SUSPENDED (which may be better renamed to SKIP_RESUME) is kind of
> > expected to be used along with SMART_SUSPEND unless there is a good enough
> > reason to avoid using it. I admit that this isn't really straightforward,
> > maybe the default behavior should be to skip the resume and there should be
> > FORCE_RESUME instead of LEAVE_SUSPENDED.
>
> One question not addressed above (in fact, the original reason for
> getting you involved in this discussion): What about the device's
> power.runtime_status? Shall we say that that core will call
> pm_runtime_set_active() at some point before issuing the ->complete
> callback unless some combination of flags is set? And what should that
> combination be?
>
> After all, we expect that most drivers will want their devices to be in
> the runtime-active state at the end of a system sleep or hibernation.
> It makes sense for the core to do the necessary housekeeping.

The core will set the PM-runtime status to "active" in device_resume_noirq()
if (a) the subsystem callbacks are not invoked (otherwise the subsystem is
responsible for doing that) and (b) if the driver's callback not skipped
(in which case its ->resume_early callback will not be skipped too).

> > 3. (*) Under the assumption that either ->suspend_late or ->suspend_noirq
> > points to the same routine as ->runtime_suspend (and the other is NULL),
> > invokig that callback for a runtime-suspended device is technically invalid.
>
> Does this invalidate anything I wrote above?

I don't think so. It is the reason why driver callbacks are skipped for
runtime-suspended devices.

> > In turn, under the assumption that either ->resume_early or ->resume_noirq
> > points to the same routine as ->runtime_resume (and the other is NULL), it is
> > valid to invoke that callback if the late/noirq suspend was skipped.
>
> In other words, it's okay for the core either to issue or skip those
> callbacks. Presumably the decision will be made based on some flag
> setting?

Yes. A flag combined with the PM-runtime status of the device in
device_suspend_noirq().

> > 4. (**) If the "freeze" and "thaw" callbacks are simplified, they cannot be
> > run back-to-back with ->runtime_resume and ->runtime_suspend, respectively.
> > Thus if "freeze" is skippend, "thaw" must be skipped too. However,
> > "restore" needs to be prepared to be invoked after "freeze" or
> > ->runtime_suspend (and the state of the device may not match the
> > callback that ran previously), so it must be special.
> >
> > 5. I agree that skipping the driver level of callbacks depending on what is
> > provided by the middle layer is inconsistent, but I wanted to take the
> > users of pm_runtime_force_suspend/resume() into account by letting those
> > things run.
> >
> > It would be more consistent to expect middle layer code (bus types, PM
> > domains) to provide either all of the noirq/early/late callbacks, or none
> > of them and make SMART_SUSPEND and pm_runtime_force_suspend/resume()
> > mutually exclusive.
>
> I don't have a clear idea of how pm_runtime_force_suspend/resume() gets
> used. Are we better off ignoring it for the time being?

Yes, we are.



2020-04-03 20:17:21

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

For the most part we seem to be in agreement.

On Fri, 3 Apr 2020, Rafael J. Wysocki wrote:

> On Friday, April 3, 2020 6:41:05 PM CEST Alan Stern wrote:
> > On Fri, 3 Apr 2020, Rafael J. Wysocki wrote:

> > > (b) Drivers that set SMART_SUSPEND are allowed to reuse their PM-runtime
> > > callbacks for system-wide suspend and resume.
> > >
> > > That is, they can point either the ->suspend_late or the ->suspend_noirq
> > > callback pointer to the same function as ->runtime_suspend and they can
> > > point either the ->resume_noirq or ->the resume_early callback to the'
> > > same function as ->runtime_resume.
> >
> > Well, in theory any driver or subsystem can do this whenever it wants
> > to, regardless of any flag settings.
>
> Not exactly.
>
> Say the driver wants to point both ->runtime_suspend and ->suspend_late to
> the same function.
>
> If the bus type doesn't provide system-wide PM callbacks at all (which is
> the case for some bus types), that only works if the device is never
> runtime-suspended when ->suspend_late is about to run, because otherwise
> the function in question needs to check the context in which it is running
> (PM-runtime vs system-wide and runtime-suspended vs runtime-active in the
> latter case) which at least is awkward and hard to get right.
>
> > It's then up to the driver or
> > subsystem to make sure the callback "does the right thing".
>
> In theory.

Okay. In any case, this is about what drivers should do, not about
what the core should do.

> > What I'm concerned about now is: What guarantees can the core give to
> > the driver and subsystem, so that they will know what is necessary in
> > order to "do the right thing"?
>
> I'm not sure what you mean.
>
> If the subsystem provides callbacks, the core will run them regardless.
>
> If it does not provide callbacks, the core will skip ->suspend_late and
> ->suspend_noirq for the driver and the device will remain suspended.
>
> If SMART_SUSPEND is not set, the core will execute all of the callbacks
> that are present.

All right, then those are the guarantees I was thinking of.

> > > (c) Drivers that set SMART_SUSPEND are alwo allowed to provide special
> > > simplified callbacks for the "freeze" and "thaw" transitions during
> > > hibernation (and restore) and (if they do so) special callbacks for the
> > > "restore" phase.
> >
> > What do you mean by "simplified"?
>
> Avoiding actual PM.
>
> > As I see it, the suspend-side callbacks are generally responsible for
> > four things:
> >
> > 1. Quiesce the device (finish ongoing I/O and do not allow any
> > more to start).
> >
> > 2. Save the current device state.
> >
> > 3. Install the appropriate wakeup settings.
> >
> > 4. Put the device into low-power mode.
> >
> > (Not explicitly listed: Perform a runtime-resume if needed in order to
> > carry out these four items.)
> >
> > During a SUSPEND transition, we usually expect all four to happen.

Based on what you said elsewhere, 4 may not be needed for SUSPEND
(depending on the platform).

> > During a FREEZE transition, we only require 1.

Actually, FREEZE should do 2 as well. Doing all four is acceptable,
though not optimal.

> That's what I mean by "simplified".
>
> > During a POWEROFF
> > transition we require 1 and 3, and possibly 4 (depending on how the
> > platform handles poweroff).
>
> But doing 2 is not a bug AFAICS.

Agreed.

> > Similar requirements apply to the resume-side callbacks. (But note
> > that RESTORE is not the inverse of POWEROFF; it is more like an inverse
> > of FREEZE with the added complication that the device's initial state
> > is unknown.)
>
> It actually isn't even an inverse of FREEZE. It is like RESUME with the
> additional requirements that (a) it can never be skipped and (b) the
> device need not be in a low-power state when it runs (the initial state
> of it is unknown if you will).

Let's put it like this: The resume-side callbacks should have the
overall effect of bringing the device back to its initial state, with
the following exceptions and complications:

Unless SMART_SUSPEND and LEAVE_SUSPEND are both set, a device
that was in runtime suspend before the suspend_late phase
must end up being runtime-active after the matching RESUME.

Unless SMART_SUSPEND is set, a device that was in runtime
suspend before the freeze_late phase must end up being
runtime-active after the matching THAW.

[I'm not so sure about this. Wouldn't it make more sense to treat
_every_ device as though SMART_SUSPEND was set for FREEZE/THAW
transitions, and require subsystems to do the same?]

After RESTORE, _every_ device must end up being runtime
active.

In general, each resume-side callback should undo the effect
of the matching suspend-side callback. However, because of
the requirements mentioned in the preceding sentences,
sometimes a resume-side callback will be issued even though
the matching suspend-side callback was skipped -- i.e., when
a device that starts out runtime-suspended ends up being
runtime-active.

How does that sound?

> > What changes to this analysis would SMART_SUSPEND allow? None if the
> > device is runtime-active. But if the device is runtime-suspended and
> > the wakeup settings don't need to be changed, then presumably none of
> > the four items are necessary.
> >
> > Is this what you mean?
>
> No.
>
> What I meant was that even if the driver pointed ->runtime_suspend and
> ->suspend_late (say) to the same function and it pointed ->resume_early
> and ->runtime_resume to the same function, it didn't have to point
> ->freeze_late and ->thaw_early to the same pair of functions, respectively.
>
> It can point ->freeze_late and ->thaw_early to a pair of different functions
> that only quiesce the device and reverse that, respectively.

Again, that describes what drivers or subsystems should do, not what
the core will do.

> > > [OK, I realize that (b) and (c) are not documented, see the notes below.]
> > >
> > > Because of (a), if the device with SMART_SUSPEND set is still runtime-suspended
> > > during the "late" phase of suspend, the core will not invoke the driver's
> > > "late" and "noirq" suspend callbacks directly (*). Middle layer (subsystem)
> > > code is expected to behave accordingly.
> >
> > Okay, this agrees with what I wrote above.
> >
> > > Because of (b), if the "late" and "noirq" driver callbacks were skipped during
> > > the "freeze" transition, the core will also avoid invoking the "noirq" and
> > > "early" callbacks provided by the driver during the "thaw" transition and
> > > the callbacks during the "restore" transition will be executed unconditionally
> > > (**). Middle layer code is expected to behave accordingly.
> >
> > All right. To summarize: If the driver's ->freeze_late callback is
> > skipped then the driver's ->thaw-early will be skipped, and similarly
> > for ->freeze_noirq and ->thaw_noirq. But RESTORE callbacks are never
> > skipped. Correct?
>
> Yes.

And this will be true whether or not LEAVE_SUSPENDED is set, right?

> > However, the most difficult transitions are SUSPEND and RESUME. Is it
> > accurate to say that if the driver's ->suspend_late callback is skipped
> > then the driver's ->resume_early will be skipped, and similarly for
> > ->suspend_noirq and ->resume_noirq?
>
> If LEAVE_SUSPENDED is set in addition to SMART_SUSPEND, then yes.
>
> > > Notes:
> > >
> > > 1. I have considered splitting SMART_SUSPEND into two or even three flags
> > > so that (a), (b) and (c) are each associated with a separate flag, but
> > > then I would expect the majority of users to use all of them anyway.
> > >
> > > 2. LEAVE_SUSPENDED (which may be better renamed to SKIP_RESUME) is kind of
> > > expected to be used along with SMART_SUSPEND unless there is a good enough
> > > reason to avoid using it. I admit that this isn't really straightforward,
> > > maybe the default behavior should be to skip the resume and there should be
> > > FORCE_RESUME instead of LEAVE_SUSPENDED.
> >
> > One question not addressed above (in fact, the original reason for
> > getting you involved in this discussion): What about the device's
> > power.runtime_status? Shall we say that that core will call
> > pm_runtime_set_active() at some point before issuing the ->complete
> > callback unless some combination of flags is set? And what should that
> > combination be?
> >
> > After all, we expect that most drivers will want their devices to be in
> > the runtime-active state at the end of a system sleep or hibernation.
> > It makes sense for the core to do the necessary housekeeping.
>
> The core will set the PM-runtime status to "active" in device_resume_noirq()
> if (a) the subsystem callbacks are not invoked (otherwise the subsystem is
> responsible for doing that) and (b) if the driver's callback not skipped
> (in which case its ->resume_early callback will not be skipped too).

Are you certain you want the subsystem callback to be responsible for
setting the runtime status to "active"? Isn't this an example of
something the core could do in order to help simplify subsystems?

> > > 3. (*) Under the assumption that either ->suspend_late or ->suspend_noirq
> > > points to the same routine as ->runtime_suspend (and the other is NULL),
> > > invokig that callback for a runtime-suspended device is technically invalid.
> >
> > Does this invalidate anything I wrote above?
>
> I don't think so. It is the reason why driver callbacks are skipped for
> runtime-suspended devices.

And this brings up another thing the core might do to help simplify
drivers and subsystems: If SMART_SUSPEND isn't set and the device is in
runtime suspend, couldn't the core do a pm_runtime_resume before
issuing the ->suspend or ->suspend_late callback?

> > > In turn, under the assumption that either ->resume_early or ->resume_noirq
> > > points to the same routine as ->runtime_resume (and the other is NULL), it is
> > > valid to invoke that callback if the late/noirq suspend was skipped.
> >
> > In other words, it's okay for the core either to issue or skip those
> > callbacks. Presumably the decision will be made based on some flag
> > setting?
>
> Yes. A flag combined with the PM-runtime status of the device in
> device_suspend_noirq().
>
> > > 4. (**) If the "freeze" and "thaw" callbacks are simplified, they cannot be
> > > run back-to-back with ->runtime_resume and ->runtime_suspend, respectively.
> > > Thus if "freeze" is skippend, "thaw" must be skipped too. However,
> > > "restore" needs to be prepared to be invoked after "freeze" or
> > > ->runtime_suspend (and the state of the device may not match the
> > > callback that ran previously), so it must be special.
> > >
> > > 5. I agree that skipping the driver level of callbacks depending on what is
> > > provided by the middle layer is inconsistent, but I wanted to take the
> > > users of pm_runtime_force_suspend/resume() into account by letting those
> > > things run.
> > >
> > > It would be more consistent to expect middle layer code (bus types, PM
> > > domains) to provide either all of the noirq/early/late callbacks, or none
> > > of them and make SMART_SUSPEND and pm_runtime_force_suspend/resume()
> > > mutually exclusive.
> >
> > I don't have a clear idea of how pm_runtime_force_suspend/resume() gets
> > used. Are we better off ignoring it for the time being?
>
> Yes, we are.

We're converging on a final answer!

Alan Stern

2020-04-06 16:46:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Friday, April 3, 2020 10:15:09 PM CEST Alan Stern wrote:
> For the most part we seem to be in agreement.

Indeed.

> On Fri, 3 Apr 2020, Rafael J. Wysocki wrote:
>
> > On Friday, April 3, 2020 6:41:05 PM CEST Alan Stern wrote:
> > > On Fri, 3 Apr 2020, Rafael J. Wysocki wrote:
>
> > > > (b) Drivers that set SMART_SUSPEND are allowed to reuse their PM-runtime
> > > > callbacks for system-wide suspend and resume.
> > > >
> > > > That is, they can point either the ->suspend_late or the ->suspend_noirq
> > > > callback pointer to the same function as ->runtime_suspend and they can
> > > > point either the ->resume_noirq or ->the resume_early callback to the'
> > > > same function as ->runtime_resume.
> > >
> > > Well, in theory any driver or subsystem can do this whenever it wants
> > > to, regardless of any flag settings.
> >
> > Not exactly.
> >
> > Say the driver wants to point both ->runtime_suspend and ->suspend_late to
> > the same function.
> >
> > If the bus type doesn't provide system-wide PM callbacks at all (which is
> > the case for some bus types), that only works if the device is never
> > runtime-suspended when ->suspend_late is about to run, because otherwise
> > the function in question needs to check the context in which it is running
> > (PM-runtime vs system-wide and runtime-suspended vs runtime-active in the
> > latter case) which at least is awkward and hard to get right.
> >
> > > It's then up to the driver or
> > > subsystem to make sure the callback "does the right thing".
> >
> > In theory.
>
> Okay. In any case, this is about what drivers should do, not about
> what the core should do.

These two things are related to each other though.

> > > What I'm concerned about now is: What guarantees can the core give to
> > > the driver and subsystem, so that they will know what is necessary in
> > > order to "do the right thing"?
> >
> > I'm not sure what you mean.
> >
> > If the subsystem provides callbacks, the core will run them regardless.
> >
> > If it does not provide callbacks, the core will skip ->suspend_late and
> > ->suspend_noirq for the driver and the device will remain suspended.
> >
> > If SMART_SUSPEND is not set, the core will execute all of the callbacks
> > that are present.
>
> All right, then those are the guarantees I was thinking of.

OK

> > > > (c) Drivers that set SMART_SUSPEND are alwo allowed to provide special
> > > > simplified callbacks for the "freeze" and "thaw" transitions during
> > > > hibernation (and restore) and (if they do so) special callbacks for the
> > > > "restore" phase.
> > >
> > > What do you mean by "simplified"?
> >
> > Avoiding actual PM.
> >
> > > As I see it, the suspend-side callbacks are generally responsible for
> > > four things:
> > >
> > > 1. Quiesce the device (finish ongoing I/O and do not allow any
> > > more to start).
> > >
> > > 2. Save the current device state.
> > >
> > > 3. Install the appropriate wakeup settings.
> > >
> > > 4. Put the device into low-power mode.
> > >
> > > (Not explicitly listed: Perform a runtime-resume if needed in order to
> > > carry out these four items.)
> > >
> > > During a SUSPEND transition, we usually expect all four to happen.
>
> Based on what you said elsewhere, 4 may not be needed for SUSPEND
> (depending on the platform).

Right.

> > > During a FREEZE transition, we only require 1.
>
> Actually, FREEZE should do 2 as well. Doing all four is acceptable,
> though not optimal.

Right.

> > That's what I mean by "simplified".
> >
> > > During a POWEROFF
> > > transition we require 1 and 3, and possibly 4 (depending on how the
> > > platform handles poweroff).
> >
> > But doing 2 is not a bug AFAICS.
>
> Agreed.
>
> > > Similar requirements apply to the resume-side callbacks. (But note
> > > that RESTORE is not the inverse of POWEROFF; it is more like an inverse
> > > of FREEZE with the added complication that the device's initial state
> > > is unknown.)
> >
> > It actually isn't even an inverse of FREEZE. It is like RESUME with the
> > additional requirements that (a) it can never be skipped and (b) the
> > device need not be in a low-power state when it runs (the initial state
> > of it is unknown if you will).
>
> Let's put it like this: The resume-side callbacks should have the
> overall effect of bringing the device back to its initial state, with
> the following exceptions and complications:
>
> Unless SMART_SUSPEND and LEAVE_SUSPEND are both set, a device
> that was in runtime suspend before the suspend_late phase
> must end up being runtime-active after the matching RESUME.
>
> Unless SMART_SUSPEND is set, a device that was in runtime
> suspend before the freeze_late phase must end up being
> runtime-active after the matching THAW.

Correct.

> [I'm not so sure about this. Wouldn't it make more sense to treat
> _every_ device as though SMART_SUSPEND was set for FREEZE/THAW
> transitions, and require subsystems to do the same?]

Drivers may expect devices to be runtime-active when their suspend
callbacks are invoked unless they set SMART_SUSPEND. IOW, without
SMART_SUSPEND set the device should not be left in runtime suspend
during system-wide suspend at all unless direct-complete is applied
to it.

> After RESTORE, _every_ device must end up being runtime
> active.

Correct.

> In general, each resume-side callback should undo the effect
> of the matching suspend-side callback. However, because of
> the requirements mentioned in the preceding sentences,
> sometimes a resume-side callback will be issued even though
> the matching suspend-side callback was skipped -- i.e., when
> a device that starts out runtime-suspended ends up being
> runtime-active.
>
> How does that sound?

It is correct, but in general the other way around is possible too.
That is, a suspend-side callback may be issued without the matching
resume-side one and the device's PM runtime status may be changed
if LEAVE_SUSPENDED is set and SMART_SUSPEND is unset.

> > > What changes to this analysis would SMART_SUSPEND allow? None if the
> > > device is runtime-active. But if the device is runtime-suspended and
> > > the wakeup settings don't need to be changed, then presumably none of
> > > the four items are necessary.
> > >
> > > Is this what you mean?
> >
> > No.
> >
> > What I meant was that even if the driver pointed ->runtime_suspend and
> > ->suspend_late (say) to the same function and it pointed ->resume_early
> > and ->runtime_resume to the same function, it didn't have to point
> > ->freeze_late and ->thaw_early to the same pair of functions, respectively.
> >
> > It can point ->freeze_late and ->thaw_early to a pair of different functions
> > that only quiesce the device and reverse that, respectively.
>
> Again, that describes what drivers or subsystems should do, not what
> the core will do.
>
> > > > [OK, I realize that (b) and (c) are not documented, see the notes below.]
> > > >
> > > > Because of (a), if the device with SMART_SUSPEND set is still runtime-suspended
> > > > during the "late" phase of suspend, the core will not invoke the driver's
> > > > "late" and "noirq" suspend callbacks directly (*). Middle layer (subsystem)
> > > > code is expected to behave accordingly.
> > >
> > > Okay, this agrees with what I wrote above.
> > >
> > > > Because of (b), if the "late" and "noirq" driver callbacks were skipped during
> > > > the "freeze" transition, the core will also avoid invoking the "noirq" and
> > > > "early" callbacks provided by the driver during the "thaw" transition and
> > > > the callbacks during the "restore" transition will be executed unconditionally
> > > > (**). Middle layer code is expected to behave accordingly.
> > >
> > > All right. To summarize: If the driver's ->freeze_late callback is
> > > skipped then the driver's ->thaw-early will be skipped, and similarly
> > > for ->freeze_noirq and ->thaw_noirq. But RESTORE callbacks are never
> > > skipped. Correct?
> >
> > Yes.
>
> And this will be true whether or not LEAVE_SUSPENDED is set, right?

Right.

> > > However, the most difficult transitions are SUSPEND and RESUME. Is it
> > > accurate to say that if the driver's ->suspend_late callback is skipped
> > > then the driver's ->resume_early will be skipped, and similarly for
> > > ->suspend_noirq and ->resume_noirq?
> >
> > If LEAVE_SUSPENDED is set in addition to SMART_SUSPEND, then yes.
> >
> > > > Notes:
> > > >
> > > > 1. I have considered splitting SMART_SUSPEND into two or even three flags
> > > > so that (a), (b) and (c) are each associated with a separate flag, but
> > > > then I would expect the majority of users to use all of them anyway.
> > > >
> > > > 2. LEAVE_SUSPENDED (which may be better renamed to SKIP_RESUME) is kind of
> > > > expected to be used along with SMART_SUSPEND unless there is a good enough
> > > > reason to avoid using it. I admit that this isn't really straightforward,
> > > > maybe the default behavior should be to skip the resume and there should be
> > > > FORCE_RESUME instead of LEAVE_SUSPENDED.
> > >
> > > One question not addressed above (in fact, the original reason for
> > > getting you involved in this discussion): What about the device's
> > > power.runtime_status? Shall we say that that core will call
> > > pm_runtime_set_active() at some point before issuing the ->complete
> > > callback unless some combination of flags is set? And what should that
> > > combination be?
> > >
> > > After all, we expect that most drivers will want their devices to be in
> > > the runtime-active state at the end of a system sleep or hibernation.
> > > It makes sense for the core to do the necessary housekeeping.
> >
> > The core will set the PM-runtime status to "active" in device_resume_noirq()
> > if (a) the subsystem callbacks are not invoked (otherwise the subsystem is
> > responsible for doing that) and (b) if the driver's callback not skipped
> > (in which case its ->resume_early callback will not be skipped too).
>
> Are you certain you want the subsystem callback to be responsible for
> setting the runtime status to "active"? Isn't this an example of
> something the core could do in order to help simplify subsystems?

The rationale here is that whoever decides whether or not to skip the
driver-level callbacks, should also set the PM-runtime status of the
device to match that decision.

> > > > 3. (*) Under the assumption that either ->suspend_late or ->suspend_noirq
> > > > points to the same routine as ->runtime_suspend (and the other is NULL),
> > > > invokig that callback for a runtime-suspended device is technically invalid.
> > >
> > > Does this invalidate anything I wrote above?
> >
> > I don't think so. It is the reason why driver callbacks are skipped for
> > runtime-suspended devices.
>
> And this brings up another thing the core might do to help simplify
> drivers and subsystems: If SMART_SUSPEND isn't set and the device is in
> runtime suspend, couldn't the core do a pm_runtime_resume before
> issuing the ->suspend or ->suspend_late callback?

It could, but sometimes that is not desirable. Like when the drivver points its
suspend callback to pm_runtime_force_suspend().

> > > > In turn, under the assumption that either ->resume_early or ->resume_noirq
> > > > points to the same routine as ->runtime_resume (and the other is NULL), it is
> > > > valid to invoke that callback if the late/noirq suspend was skipped.
> > >
> > > In other words, it's okay for the core either to issue or skip those
> > > callbacks. Presumably the decision will be made based on some flag
> > > setting?
> >
> > Yes. A flag combined with the PM-runtime status of the device in
> > device_suspend_noirq().
> >
> > > > 4. (**) If the "freeze" and "thaw" callbacks are simplified, they cannot be
> > > > run back-to-back with ->runtime_resume and ->runtime_suspend, respectively.
> > > > Thus if "freeze" is skippend, "thaw" must be skipped too. However,
> > > > "restore" needs to be prepared to be invoked after "freeze" or
> > > > ->runtime_suspend (and the state of the device may not match the
> > > > callback that ran previously), so it must be special.
> > > >
> > > > 5. I agree that skipping the driver level of callbacks depending on what is
> > > > provided by the middle layer is inconsistent, but I wanted to take the
> > > > users of pm_runtime_force_suspend/resume() into account by letting those
> > > > things run.
> > > >
> > > > It would be more consistent to expect middle layer code (bus types, PM
> > > > domains) to provide either all of the noirq/early/late callbacks, or none
> > > > of them and make SMART_SUSPEND and pm_runtime_force_suspend/resume()
> > > > mutually exclusive.
> > >
> > > I don't have a clear idea of how pm_runtime_force_suspend/resume() gets
> > > used. Are we better off ignoring it for the time being?
> >
> > Yes, we are.
>
> We're converging on a final answer!

I think so.

In the meantime I have created a git branch with changes to simplify the code,
rename some things and clarify the documentation a bit:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
pm-sleep-core

(https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=pm-sleep-core
for web access).

I'm going to post these changes as patches soon.

Cheers!



2020-04-06 20:25:49

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Mon, 6 Apr 2020, Rafael J. Wysocki wrote:

> In the meantime I have created a git branch with changes to simplify the code,
> rename some things and clarify the documentation a bit:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> pm-sleep-core
>
> (https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=pm-sleep-core
> for web access).
>
> I'm going to post these changes as patches soon.

All right, those are some significant changes. It'll take me a little
while to absorb them.

> On Friday, April 3, 2020 10:15:09 PM CEST Alan Stern wrote:

> > Let's put it like this: The resume-side callbacks should have the
> > overall effect of bringing the device back to its initial state, with
> > the following exceptions and complications:
> >
> > Unless SMART_SUSPEND and LEAVE_SUSPEND are both set, a device
> > that was in runtime suspend before the suspend_late phase
> > must end up being runtime-active after the matching RESUME.
> >
> > Unless SMART_SUSPEND is set, a device that was in runtime
> > suspend before the freeze_late phase must end up being
> > runtime-active after the matching THAW.
>
> Correct.
>
> > [I'm not so sure about this. Wouldn't it make more sense to treat
> > _every_ device as though SMART_SUSPEND was set for FREEZE/THAW
> > transitions, and require subsystems to do the same?]
>
> Drivers may expect devices to be runtime-active when their suspend
> callbacks are invoked unless they set SMART_SUSPEND. IOW, without
> SMART_SUSPEND set the device should not be left in runtime suspend
> during system-wide suspend at all unless direct-complete is applied
> to it.

[Let's confine this discussion to the not-direct-complete case.]

Okay, say that SMART_SUSPEND isn't set and the device is initially
runtime-suspended. Since the core knows all this, shouldn't the core
then call pm_runtime_resume() immediately before ->suspend? Why leave
this up to subsystems or drivers (which can easily get it wrong --
not to mention all the code duplication it would require)?

Also, doesn't it make sense for some subsystems or drivers to want
their devices to remain in runtime suspend throughout a FREEZE/THAW
transition but not throughout a SUSPEND/RESUME transition? With only a
single SMART_SUSPEND flag, how can we accomodate this desire?

Finally, my description above says that LEAVE_SUSPENDED matters for
SUSPEND/RESUME but not for FREEZE/THAW. Is that really what you have
in mind?

> > After RESTORE, _every_ device must end up being runtime
> > active.
>
> Correct.
>
> > In general, each resume-side callback should undo the effect
> > of the matching suspend-side callback. However, because of
> > the requirements mentioned in the preceding sentences,
> > sometimes a resume-side callback will be issued even though
> > the matching suspend-side callback was skipped -- i.e., when
> > a device that starts out runtime-suspended ends up being
> > runtime-active.
> >
> > How does that sound?
>
> It is correct, but in general the other way around is possible too.
> That is, a suspend-side callback may be issued without the matching
> resume-side one and the device's PM runtime status may be changed
> if LEAVE_SUSPENDED is set and SMART_SUSPEND is unset.

This is inconsistent with what I wrote above (the "Unless SMART_SUSPEND
and LEAVE_SUSPENDED are both set" part). Are you saying that text
should be changed?

> > Are you certain you want the subsystem callback to be responsible for
> > setting the runtime status to "active"? Isn't this an example of
> > something the core could do in order to help simplify subsystems?
>
> The rationale here is that whoever decides whether or not to skip the
> driver-level callbacks, should also set the PM-runtime status of the
> device to match that decision.

Well, that's not really a fair description. The decision about
skipping driver-level callbacks is being made right here, by us, now.
(Or if you prefer, by the developers who originally added the
SMART_SUSPEND flag.) We require subsystems to obey the decisions being
outlined in this discussion.

Given that fact, this is again a case of having the core do something
rather than forcing subsystems/drivers to do it (possibly getting it
wrong and certainly creating a lot of code duplication).

If a subsystem really wants to override our decision, it can always
call pm_runtime_set_{active|suspended} to override the core's setting.

> > And this brings up another thing the core might do to help simplify
> > drivers and subsystems: If SMART_SUSPEND isn't set and the device is in
> > runtime suspend, couldn't the core do a pm_runtime_resume before
> > issuing the ->suspend or ->suspend_late callback?
>
> It could, but sometimes that is not desirable. Like when the drivver points its
> suspend callback to pm_runtime_force_suspend().

This seems to contradict what you wrote above: "Drivers may expect
devices to be runtime-active when their suspend callbacks are invoked
unless they set SMART_SUSPEND. IOW, without SMART_SUSPEND set the
device should not be left in runtime suspend during system-wide suspend
at all unless direct-complete is applied to it."

If you stand by that statement then drivers should never point their
suspend callback to pm_runtime_force_suspend() unless they also set
SMART_SUSPEND.

Alan Stern

2020-04-09 18:49:02

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Monday, April 6, 2020 10:25:08 PM CEST Alan Stern wrote:
> On Mon, 6 Apr 2020, Rafael J. Wysocki wrote:
>
> > In the meantime I have created a git branch with changes to simplify the code,
> > rename some things and clarify the documentation a bit:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > pm-sleep-core
> >
> > (https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=pm-sleep-core
> > for web access).
> >
> > I'm going to post these changes as patches soon.
>
> All right, those are some significant changes. It'll take me a little
> while to absorb them.
>
> > On Friday, April 3, 2020 10:15:09 PM CEST Alan Stern wrote:
>
> > > Let's put it like this: The resume-side callbacks should have the
> > > overall effect of bringing the device back to its initial state, with
> > > the following exceptions and complications:
> > >
> > > Unless SMART_SUSPEND and LEAVE_SUSPEND are both set, a device
> > > that was in runtime suspend before the suspend_late phase
> > > must end up being runtime-active after the matching RESUME.
> > >
> > > Unless SMART_SUSPEND is set, a device that was in runtime
> > > suspend before the freeze_late phase must end up being
> > > runtime-active after the matching THAW.
> >
> > Correct.
> >
> > > [I'm not so sure about this. Wouldn't it make more sense to treat
> > > _every_ device as though SMART_SUSPEND was set for FREEZE/THAW
> > > transitions, and require subsystems to do the same?]
> >
> > Drivers may expect devices to be runtime-active when their suspend
> > callbacks are invoked unless they set SMART_SUSPEND. IOW, without
> > SMART_SUSPEND set the device should not be left in runtime suspend
> > during system-wide suspend at all unless direct-complete is applied
> > to it.
>
> [Let's confine this discussion to the not-direct-complete case.]
>
> Okay, say that SMART_SUSPEND isn't set and the device is initially
> runtime-suspended. Since the core knows all this, shouldn't the core
> then call pm_runtime_resume() immediately before ->suspend? Why leave
> this up to subsystems or drivers (which can easily get it wrong --
> not to mention all the code duplication it would require)?

I would agree in principle, but that has been done by subsystems forever and
(at least in some cases) drivers on bus types like platform on i2c (where
subsystem-level PM callbacks are not provided in general unless there is a PM
domain doing that) don't expect the devices to be resumed and they
decide whether or not to do that themselves.

Making the core resume the runtime-suspended devices during system-wide
suspend, would require those drivers to adapt and it is rather hard to
even estimate how many of them there are.

> Also, doesn't it make sense for some subsystems or drivers to want
> their devices to remain in runtime suspend throughout a FREEZE/THAW
> transition but not throughout a SUSPEND/RESUME transition? With only a
> single SMART_SUSPEND flag, how can we accomodate this desire?

That's a fair statement, but in general it is more desirable to optimize
suspend/resume than to optimize hibernation, so the latter is not a priority.

I'm not ruling out adding one more flag specific to hibernation or similar
in the future.

> Finally, my description above says that LEAVE_SUSPENDED matters for
> SUSPEND/RESUME but not for FREEZE/THAW. Is that really what you have
> in mind?

Yes, it is. LEAVE_SUSPENDED really does not apply to hibernation at all.

> > > After RESTORE, _every_ device must end up being runtime
> > > active.
> >
> > Correct.
> >
> > > In general, each resume-side callback should undo the effect
> > > of the matching suspend-side callback. However, because of
> > > the requirements mentioned in the preceding sentences,
> > > sometimes a resume-side callback will be issued even though
> > > the matching suspend-side callback was skipped -- i.e., when
> > > a device that starts out runtime-suspended ends up being
> > > runtime-active.
> > >
> > > How does that sound?
> >
> > It is correct, but in general the other way around is possible too.
> > That is, a suspend-side callback may be issued without the matching
> > resume-side one and the device's PM runtime status may be changed
> > if LEAVE_SUSPENDED is set and SMART_SUSPEND is unset.
>
> This is inconsistent with what I wrote above (the "Unless SMART_SUSPEND
> and LEAVE_SUSPENDED are both set" part). Are you saying that text
> should be changed?

Yes, in fact SMART_SUSPEND need not be set for resume callbacks to be skipped.

LEAVE_SUSPENDED must be set for that to happen (except for hibernation) and it
may be sufficient if the subsystem sets power.may_skip_resume in addition.

> > > Are you certain you want the subsystem callback to be responsible for
> > > setting the runtime status to "active"? Isn't this an example of
> > > something the core could do in order to help simplify subsystems?
> >
> > The rationale here is that whoever decides whether or not to skip the
> > driver-level callbacks, should also set the PM-runtime status of the
> > device to match that decision.
>
> Well, that's not really a fair description. The decision about
> skipping driver-level callbacks is being made right here, by us, now.
> (Or if you prefer, by the developers who originally added the
> SMART_SUSPEND flag.) We require subsystems to obey the decisions being
> outlined in this discussion.
>
> Given that fact, this is again a case of having the core do something
> rather than forcing subsystems/drivers to do it (possibly getting it
> wrong and certainly creating a lot of code duplication).
>
> If a subsystem really wants to override our decision, it can always
> call pm_runtime_set_{active|suspended} to override the core's setting.

OK, fair enough.

I've incorporated this into the changes on the pm-sleep-core branch
mentioned before.

> > > And this brings up another thing the core might do to help simplify
> > > drivers and subsystems: If SMART_SUSPEND isn't set and the device is in
> > > runtime suspend, couldn't the core do a pm_runtime_resume before
> > > issuing the ->suspend or ->suspend_late callback?
> >
> > It could, but sometimes that is not desirable. Like when the drivver points its
> > suspend callback to pm_runtime_force_suspend().
>
> This seems to contradict what you wrote above: "Drivers may expect
> devices to be runtime-active when their suspend callbacks are invoked
> unless they set SMART_SUSPEND. IOW, without SMART_SUSPEND set the
> device should not be left in runtime suspend during system-wide suspend
> at all unless direct-complete is applied to it."
>
> If you stand by that statement then drivers should never point their
> suspend callback to pm_runtime_force_suspend() unless they also set
> SMART_SUSPEND.

OK, let me rephrase.

Some drivers that don't use SMART_SUSPEND expect the devices to be runtime-active
when their system-wide PM callbacks run, but the other drivers do not have such
expectations, because the subsystems they work with have never resumed devices
during system-wide suspend.

SMART_SUSPEND is not needed for the latter category of drivers, but it is for
the former and I want the behavior when SMART_SUSPEND *is* set to be consistent
across the core and subsystems, while the other case have never been so.

Cheers!



2020-04-11 02:43:28

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Okay, this is my attempt to summarize what we have been discussing.
But first: There is a dev_pm_skip_resume() helper routine which
subsystems can call to see whether resume-side _early and _noirq driver
callbacks should be skipped. But there is no corresponding
dev_pm_skip_suspend() helper routine. Let's add one, or rename
dev_pm_smart_suspend_and_suspended() to dev_pm_skip_suspend().

Given that, here's my understanding of what should happen. (I'm
assuming the direct_complete mechanism is not being used.) This tries
to describe what we _want_ to happen, which is not always the same as
what the current code actually _does_.

During the suspend side, for each of the
{suspend,freeze,poweroff}_{late,noirq} phases: If
dev_pm_skip_suspend() returns true then the subsystem should
not invoke the driver's callback, and if there is no subsystem
callback then the core will not invoke the driver's callback.

During the resume side, for each of the
{resume,thaw,restore}_{early,noirq} phases: If
dev_pm_skip_resume() returns true then the subsystem should
not invoke the driver's callback, and if there is no subsystem
callback then the core will not invoke the driver's callback.

dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
set and the device's runtime status is "suspended".

power.must_resume gets set following the suspend-side _noirq
phase if power.usage_count > 1 (indicating the device was
in active use before the start of the sleep transition) or
power.must_resume is set for any of the device's dependents.

dev_pm_skip_resume() will return "false" if the current
transition is RESTORE or power.must_resume is set. Otherwise:
It will return true if the current transition is THAW,
SMART_SUSPEND is set, and the device's runtime status is
"suspended". It will return "true" if the current transition is
RESUME, SMART_SUSPEND and MAY_SKIP_RESUME are both set, and
the device's runtime status is "suspended". For a RESUME
transition, it will also return "true" if MAY_SKIP_RESUME and
power.may_skip_resume are both set, regardless of
SMART_SUSPEND or the current runtime status.

At the start of the {resume,thaw,restore}_noirq phase, if
dev_pm_skip_resume() returns true then the core will set the
runtime status to "suspended". Otherwise it will set the
runtime status to "active". If this is not what the subsystem
or driver wants, it must update the runtime status itself.

Comments and differences with respect to the code in your pm-sleep-core
branch:

I'm not sure whether we should specify other conditions for
setting power.must_resume.

dev_pm_skip_resume() doesn't compute the value described
above. I'm pretty sure the existing code is wrong.

device_resume_noirq() checks
dev_pm_smart_suspend_and_suspended() before setting the
runtime PM status to "active", contrary to the text above.
The difference shows up in the case where SMART_SUSPEND is
clear but the runtime PM status is "suspended". Don't we want
to set the status to "active" in this case? Or is there some
need to accomodate legacy drivers here? In any case, wouldn't
it be good enough to check just the SMART_SUSPEND flag?

__device_suspend_late() sets power.may_skip_resume, contrary
to the comment in include/linux/pm.h: That flag is supposed to
be set by subsystems, not the core.

I'm not at all sure that this algorithm won't run into trouble
at some point when it tries to set a device's runtime status
to "active" but the parent's status is set to "suspended".
And I'm not sure this problem can be fixed by adjusting
power.must_resume.

Alan Stern

2020-04-15 01:14:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Saturday, April 11, 2020 4:41:14 AM CEST Alan Stern wrote:
> Okay, this is my attempt to summarize what we have been discussing.
> But first: There is a dev_pm_skip_resume() helper routine which
> subsystems can call to see whether resume-side _early and _noirq driver
> callbacks should be skipped. But there is no corresponding
> dev_pm_skip_suspend() helper routine. Let's add one, or rename
> dev_pm_smart_suspend_and_suspended() to dev_pm_skip_suspend().

OK

> Given that, here's my understanding of what should happen. (I'm
> assuming the direct_complete mechanism is not being used.) This tries
> to describe what we _want_ to happen, which is not always the same as
> what the current code actually _does_.

OK

> During the suspend side, for each of the
> {suspend,freeze,poweroff}_{late,noirq} phases: If
> dev_pm_skip_suspend() returns true then the subsystem should
> not invoke the driver's callback, and if there is no subsystem
> callback then the core will not invoke the driver's callback.
>
> During the resume side, for each of the
> {resume,thaw,restore}_{early,noirq} phases: If
> dev_pm_skip_resume() returns true then the subsystem should
> not invoke the driver's callback, and if there is no subsystem
> callback then the core will not invoke the driver's callback.
>
> dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
> set and the device's runtime status is "suspended".

Agreed with the above.

> power.must_resume gets set following the suspend-side _noirq
> phase if power.usage_count > 1 (indicating the device was
> in active use before the start of the sleep transition) or
> power.must_resume is set for any of the device's dependents.

Or MAY_SKIP_RESUME is unset (which means that the driver does not
allow its resume callbacks to be skipped), or power.may_skip_resume
is unset (which means that the subsystem does not allow the
driver callbacks to be skipped).

> dev_pm_skip_resume() will return "false" if the current
> transition is RESTORE or power.must_resume is set. Otherwise:
> It will return true if the current transition is THAW,
> SMART_SUSPEND is set, and the device's runtime status is
> "suspended".

The other way around. That is:

dev_pm_skip_resume() will return "true" if the current transition is
THAW and dev_pm_skip_suspend() returns "true" for that device (so
SMART_SUSPEND is set, and the device's runtime status is "suspended",
as per the definition of that function above).

Otherwise, it will return "true" if the current transition is RESTORE
(which means that all devices are resumed) or power.must_resume is not
set (so this particular device need not be resumed).

> It will return "true" if the current transition is
> RESUME, SMART_SUSPEND and MAY_SKIP_RESUME are both set, and
> the device's runtime status is "suspended".

Unless MAY_SKIP_RESUME is unset for at least one of its descendants (or
dependent devices).

> For a RESUME
> transition, it will also return "true" if MAY_SKIP_RESUME and
> power.may_skip_resume are both set, regardless of
> SMART_SUSPEND or the current runtime status.

And if the device was not in active use before suspend (as per its usage
counter) or MAY_SKIP_RESUME is unset for at least one of its descendants (or
dependent devices in general).

> At the start of the {resume,thaw,restore}_noirq phase, if
> dev_pm_skip_resume() returns true then the core will set the
> runtime status to "suspended". Otherwise it will set the
> runtime status to "active". If this is not what the subsystem
> or driver wants, it must update the runtime status itself.

Right.

> Comments and differences with respect to the code in your pm-sleep-core
> branch:
>
> I'm not sure whether we should specify other conditions for
> setting power.must_resume.

IMO we should.

Otherwise it is rather hard to catch the case in which one of the
device's descendants has MAY_SKIP_RESUME unset (and so the device
needs to be resumed).

> dev_pm_skip_resume() doesn't compute the value described
> above. I'm pretty sure the existing code is wrong.

Well, we don't seem to have reached an agreement on some details
above ...

Cheers!



2020-04-15 21:36:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Note to self: avoid replying to technical messages late in the night ...

On Mon, Apr 13, 2020 at 11:32 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Saturday, April 11, 2020 4:41:14 AM CEST Alan Stern wrote:
> > Okay, this is my attempt to summarize what we have been discussing.
> > But first: There is a dev_pm_skip_resume() helper routine which
> > subsystems can call to see whether resume-side _early and _noirq driver
> > callbacks should be skipped. But there is no corresponding
> > dev_pm_skip_suspend() helper routine. Let's add one, or rename
> > dev_pm_smart_suspend_and_suspended() to dev_pm_skip_suspend().
>
> OK
>
> > Given that, here's my understanding of what should happen. (I'm
> > assuming the direct_complete mechanism is not being used.) This tries
> > to describe what we _want_ to happen, which is not always the same as
> > what the current code actually _does_.
>
> OK
>
> > During the suspend side, for each of the
> > {suspend,freeze,poweroff}_{late,noirq} phases: If
> > dev_pm_skip_suspend() returns true then the subsystem should
> > not invoke the driver's callback, and if there is no subsystem
> > callback then the core will not invoke the driver's callback.
> >
> > During the resume side, for each of the
> > {resume,thaw,restore}_{early,noirq} phases: If
> > dev_pm_skip_resume() returns true then the subsystem should
> > not invoke the driver's callback, and if there is no subsystem
> > callback then the core will not invoke the driver's callback.
> >
> > dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
> > set and the device's runtime status is "suspended".
>
> Agreed with the above.
>
> > power.must_resume gets set following the suspend-side _noirq
> > phase if power.usage_count > 1 (indicating the device was
> > in active use before the start of the sleep transition) or
> > power.must_resume is set for any of the device's dependents.
>
> Or MAY_SKIP_RESUME is unset (which means that the driver does not
> allow its resume callbacks to be skipped), or power.may_skip_resume
> is unset (which means that the subsystem does not allow the
> driver callbacks to be skipped).
>
> > dev_pm_skip_resume() will return "false" if the current
> > transition is RESTORE or power.must_resume is set. Otherwise:
> > It will return true if the current transition is THAW,
> > SMART_SUSPEND is set, and the device's runtime status is
> > "suspended".
>
> The other way around. That is:
>
> dev_pm_skip_resume() will return "true" if the current transition is
> THAW and dev_pm_skip_suspend() returns "true" for that device (so
> SMART_SUSPEND is set, and the device's runtime status is "suspended",
> as per the definition of that function above).

The above is what I wanted to say ->

> Otherwise, it will return "true" if the current transition is RESTORE
> (which means that all devices are resumed) or power.must_resume is not
> set (so this particular device need not be resumed).

-> but this isn't. In particular, I messed up the RESTORE part, so it
should read:

Otherwise, it will return "true" if the current transition is *not*
RESTORE (in which case all devices would be resumed) *and*
power.must_resume is not set (so this particular device need not be
resumed).

Sorry about that.

> > It will return "true" if the current transition is
> > RESUME, SMART_SUSPEND and MAY_SKIP_RESUME are both set, and
> > the device's runtime status is "suspended".
>
> Unless MAY_SKIP_RESUME is unset for at least one of its descendants (or
> dependent devices).

That should include the power.may_skip_resume flag, so as to read as follows:

Unless MAY_SKIP_RESUME is unset or power.may_skip_resume is unset for
at least one of its descendants (or dependent devices).

> > For a RESUME
> > transition, it will also return "true" if MAY_SKIP_RESUME and
> > power.may_skip_resume are both set, regardless of
> > SMART_SUSPEND or the current runtime status.
>
> And if the device was not in active use before suspend (as per its usage
> counter) or MAY_SKIP_RESUME is unset for at least one of its descendants (or
> dependent devices in general).

And analogously here, so what I really should have written is:

And if the device was not in active use before suspend (as per its
usage counter) or MAY_SKIP_RESUME or power.may_skip_resume is unset
for at least one of its descendants (or dependent devices in general).

> > At the start of the {resume,thaw,restore}_noirq phase, if
> > dev_pm_skip_resume() returns true then the core will set the
> > runtime status to "suspended". Otherwise it will set the
> > runtime status to "active". If this is not what the subsystem
> > or driver wants, it must update the runtime status itself.
>
> Right.
>
> > Comments and differences with respect to the code in your pm-sleep-core
> > branch:
> >
> > I'm not sure whether we should specify other conditions for
> > setting power.must_resume.
>
> IMO we should.

In fact, this is part of the implementation and it helps to
"propagate" the "must resume" condition to the parent and the
first-order suppliers of the device (which is sufficient, because
their power.must_resume "propagates" in the same way and so on).

IOW, the important piece is what the return value of
dev_pm_skip_resume() should be in particular conditions and that
return value is computed with the help of power.must_resume (and it
might have been computed in a different, possibly less efficient,
way).

> Otherwise it is rather hard to catch the case in which one of the
> device's descendants has MAY_SKIP_RESUME unset (and so the device
> needs to be resumed).
>
> > dev_pm_skip_resume() doesn't compute the value described
> > above. I'm pretty sure the existing code is wrong.
>
> Well, we don't seem to have reached an agreement on some details
> above ...

Sorry for failing to be careful enough ...

2020-04-15 21:47:34

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Tue, 14 Apr 2020, Rafael J. Wysocki wrote:

> Note to self: avoid replying to technical messages late in the night ...
>
> On Mon, Apr 13, 2020 at 11:32 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Saturday, April 11, 2020 4:41:14 AM CEST Alan Stern wrote:
> > > Okay, this is my attempt to summarize what we have been discussing.
> > > But first: There is a dev_pm_skip_resume() helper routine which
> > > subsystems can call to see whether resume-side _early and _noirq driver
> > > callbacks should be skipped. But there is no corresponding
> > > dev_pm_skip_suspend() helper routine. Let's add one, or rename
> > > dev_pm_smart_suspend_and_suspended() to dev_pm_skip_suspend().
> >
> > OK
> >
> > > Given that, here's my understanding of what should happen. (I'm
> > > assuming the direct_complete mechanism is not being used.) This tries
> > > to describe what we _want_ to happen, which is not always the same as
> > > what the current code actually _does_.
> >
> > OK
> >
> > > During the suspend side, for each of the
> > > {suspend,freeze,poweroff}_{late,noirq} phases: If
> > > dev_pm_skip_suspend() returns true then the subsystem should
> > > not invoke the driver's callback, and if there is no subsystem
> > > callback then the core will not invoke the driver's callback.
> > >
> > > During the resume side, for each of the
> > > {resume,thaw,restore}_{early,noirq} phases: If
> > > dev_pm_skip_resume() returns true then the subsystem should
> > > not invoke the driver's callback, and if there is no subsystem
> > > callback then the core will not invoke the driver's callback.
> > >
> > > dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
> > > set and the device's runtime status is "suspended".
> >
> > Agreed with the above.
> >
> > > power.must_resume gets set following the suspend-side _noirq
> > > phase if power.usage_count > 1 (indicating the device was
> > > in active use before the start of the sleep transition) or
> > > power.must_resume is set for any of the device's dependents.
> >
> > Or MAY_SKIP_RESUME is unset (which means that the driver does not
> > allow its resume callbacks to be skipped), or power.may_skip_resume
> > is unset (which means that the subsystem does not allow the
> > driver callbacks to be skipped).

Are you certain about that? It contradicts what you said earlier, that
MAY_SKIP_RESUME doesn't affect THAW transitions. Also, it would mean
that a device whose subsystem doesn't know about power.may_skip_resume
would never be allowed to stay in runtime suspend.

> > > dev_pm_skip_resume() will return "false" if the current
> > > transition is RESTORE or power.must_resume is set. Otherwise:
> > > It will return true if the current transition is THAW,
> > > SMART_SUSPEND is set, and the device's runtime status is
> > > "suspended".
> >
> > The other way around. That is:
> >
> > dev_pm_skip_resume() will return "true" if the current transition is
> > THAW and dev_pm_skip_suspend() returns "true" for that device (so
> > SMART_SUSPEND is set, and the device's runtime status is "suspended",
> > as per the definition of that function above).
>
> The above is what I wanted to say ->

So for THAW, dev_pm_skip_resume() can return "true" even if
power.must_resume is set? That doesn't seem right.

> > Otherwise, it will return "true" if the current transition is RESTORE
> > (which means that all devices are resumed) or power.must_resume is not
> > set (so this particular device need not be resumed).
>
> -> but this isn't. In particular, I messed up the RESTORE part, so it
> should read:
>
> Otherwise, it will return "true" if the current transition is *not*
> RESTORE (in which case all devices would be resumed) *and*
> power.must_resume is not set (so this particular device need not be
> resumed).
>
> Sorry about that.

For the RESTORE and THAW cases that is exactly the same as what I
wrote, apart from the THAW issue noted above.

> > > It will return "true" if the current transition is
> > > RESUME, SMART_SUSPEND and MAY_SKIP_RESUME are both set, and
> > > the device's runtime status is "suspended".
> >
> > Unless MAY_SKIP_RESUME is unset for at least one of its descendants (or
> > dependent devices).
>
> That should include the power.may_skip_resume flag, so as to read as follows:
>
> Unless MAY_SKIP_RESUME is unset or power.may_skip_resume is unset for
> at least one of its descendants (or dependent devices).

What about the runtime PM usage counter?

> > > For a RESUME
> > > transition, it will also return "true" if MAY_SKIP_RESUME and
> > > power.may_skip_resume are both set, regardless of
> > > SMART_SUSPEND or the current runtime status.
> >
> > And if the device was not in active use before suspend (as per its usage
> > counter) or MAY_SKIP_RESUME is unset for at least one of its descendants (or
> > dependent devices in general).
>
> And analogously here, so what I really should have written is:
>
> And if the device was not in active use before suspend (as per its
> usage counter) or MAY_SKIP_RESUME or power.may_skip_resume is unset
> for at least one of its descendants (or dependent devices in general).

In other words, for RESUME transitions you want the MAY_SKIP_RESUME and
power.may_skip_resume restrictions to propagate up from dependent
devices. And of course, the way to do that is by adding them into the
power.must_resume flag.

How do you want to handle the usage counter restriction.
Should that also propagate upward?

And how should the result of dev_pm_skip_resume() be affected by
SMART_SUSPEND for RESUME transitions?

Maybe this is getting confusing because of the way I organized it.
Let's try like this:

Transition Conditions for dev_pm_skip_resume() to return "true"
---------- ----------------------------------------------------

RESTORE Never

THAW power.must_resume is clear (which requires
MAY_SKIP_RESUME and power.may_skip_resume to be set and
the runtime usage counter to be = 1, and which
propagates up from dependent devices)
SMART_SUSPEND is set,
runtime status is "suspended"

RESUME Same as THAW? Or maybe don't require SMART_SUSPEND?
(But if SMART_SUSPEND is clear, how could the runtime
status be "suspended"?)

I can't really tell what you want, because your comments at various
times have been inconsistent.

Alan Stern

> > > At the start of the {resume,thaw,restore}_noirq phase, if
> > > dev_pm_skip_resume() returns true then the core will set the
> > > runtime status to "suspended". Otherwise it will set the
> > > runtime status to "active". If this is not what the subsystem
> > > or driver wants, it must update the runtime status itself.
> >
> > Right.
> >
> > > Comments and differences with respect to the code in your pm-sleep-core
> > > branch:
> > >
> > > I'm not sure whether we should specify other conditions for
> > > setting power.must_resume.
> >
> > IMO we should.
>
> In fact, this is part of the implementation and it helps to
> "propagate" the "must resume" condition to the parent and the
> first-order suppliers of the device (which is sufficient, because
> their power.must_resume "propagates" in the same way and so on).
>
> IOW, the important piece is what the return value of
> dev_pm_skip_resume() should be in particular conditions and that
> return value is computed with the help of power.must_resume (and it
> might have been computed in a different, possibly less efficient,
> way).
>
> > Otherwise it is rather hard to catch the case in which one of the
> > device's descendants has MAY_SKIP_RESUME unset (and so the device
> > needs to be resumed).
> >
> > > dev_pm_skip_resume() doesn't compute the value described
> > > above. I'm pretty sure the existing code is wrong.
> >
> > Well, we don't seem to have reached an agreement on some details
> > above ...
>
> Sorry for failing to be careful enough ...


2020-04-16 01:12:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Tuesday, April 14, 2020 7:47:35 PM CEST Alan Stern wrote:
> On Tue, 14 Apr 2020, Rafael J. Wysocki wrote:
>
> > Note to self: avoid replying to technical messages late in the night ...
> >
> > On Mon, Apr 13, 2020 at 11:32 PM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Saturday, April 11, 2020 4:41:14 AM CEST Alan Stern wrote:
> > > > Okay, this is my attempt to summarize what we have been discussing.
> > > > But first: There is a dev_pm_skip_resume() helper routine which
> > > > subsystems can call to see whether resume-side _early and _noirq driver
> > > > callbacks should be skipped. But there is no corresponding
> > > > dev_pm_skip_suspend() helper routine. Let's add one, or rename
> > > > dev_pm_smart_suspend_and_suspended() to dev_pm_skip_suspend().
> > >
> > > OK
> > >
> > > > Given that, here's my understanding of what should happen. (I'm
> > > > assuming the direct_complete mechanism is not being used.) This tries
> > > > to describe what we _want_ to happen, which is not always the same as
> > > > what the current code actually _does_.
> > >
> > > OK
> > >
> > > > During the suspend side, for each of the
> > > > {suspend,freeze,poweroff}_{late,noirq} phases: If
> > > > dev_pm_skip_suspend() returns true then the subsystem should
> > > > not invoke the driver's callback, and if there is no subsystem
> > > > callback then the core will not invoke the driver's callback.
> > > >
> > > > During the resume side, for each of the
> > > > {resume,thaw,restore}_{early,noirq} phases: If
> > > > dev_pm_skip_resume() returns true then the subsystem should
> > > > not invoke the driver's callback, and if there is no subsystem
> > > > callback then the core will not invoke the driver's callback.
> > > >
> > > > dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
> > > > set and the device's runtime status is "suspended".
> > >
> > > Agreed with the above.
> > >
> > > > power.must_resume gets set following the suspend-side _noirq
> > > > phase if power.usage_count > 1 (indicating the device was
> > > > in active use before the start of the sleep transition) or
> > > > power.must_resume is set for any of the device's dependents.
> > >
> > > Or MAY_SKIP_RESUME is unset (which means that the driver does not
> > > allow its resume callbacks to be skipped), or power.may_skip_resume
> > > is unset (which means that the subsystem does not allow the
> > > driver callbacks to be skipped).
>
> Are you certain about that? It contradicts what you said earlier, that
> MAY_SKIP_RESUME doesn't affect THAW transitions.

Yes, MAY_SKIP_RESUME, as well as power.may_skip_resume for that matter, really
should not affect the THAW transition at all. I overlooked that when I was
writing the above (and earlier).

This means that the dev_pm_skip_resume() logic really is relatively
straightforward:
- If the current transition is RESTORE, return "false".
- Otherwise, if the current transition is THAW, return the return value
of dev_pm_skip_suspend().
- Otherwise (so the current transition is RESUME which is the only remaining
case), return the logical negation of power.must_resume.

> Also, it would mean
> that a device whose subsystem doesn't know about power.may_skip_resume
> would never be allowed to stay in runtime suspend.

Not really, because I want the core to set power.may_skip_resume for the
devices for which dev_pm_skip_suspend() returns "true" if the "suspend_late"
subsystem-level callback is not present. [It might be more consistent
to simply set it for all devices for which dev_pm_skip_suspend() returns
"true" and let the subsystems update it should they want to? IOW, the
default value of power.may_skip_resume could be the return value of
dev_pm_skip_suspend()?]

> > > > dev_pm_skip_resume() will return "false" if the current
> > > > transition is RESTORE or power.must_resume is set. Otherwise:
> > > > It will return true if the current transition is THAW,
> > > > SMART_SUSPEND is set, and the device's runtime status is
> > > > "suspended".
> > >
> > > The other way around. That is:
> > >
> > > dev_pm_skip_resume() will return "true" if the current transition is
> > > THAW and dev_pm_skip_suspend() returns "true" for that device (so
> > > SMART_SUSPEND is set, and the device's runtime status is "suspended",
> > > as per the definition of that function above).
> >
> > The above is what I wanted to say ->
>
> So for THAW, dev_pm_skip_resume() can return "true" even if
> power.must_resume is set? That doesn't seem right.

But it cannot be the other way around.

For example, invoking ->thaw_early() from the driver without the corresponding
->freeze_late() would be a bug in general, unless they point to the same
routines as ->runtime_resume() and ->runtime_suspend() (or equivalent),
respectively, but that need not be the case.

> > > Otherwise, it will return "true" if the current transition is RESTORE
> > > (which means that all devices are resumed) or power.must_resume is not
> > > set (so this particular device need not be resumed).
> >
> > -> but this isn't. In particular, I messed up the RESTORE part, so it
> > should read:
> >
> > Otherwise, it will return "true" if the current transition is *not*
> > RESTORE (in which case all devices would be resumed) *and*
> > power.must_resume is not set (so this particular device need not be
> > resumed).
> >
> > Sorry about that.
>
> For the RESTORE and THAW cases that is exactly the same as what I
> wrote, apart from the THAW issue noted above.

OK then.

> > > > It will return "true" if the current transition is
> > > > RESUME, SMART_SUSPEND and MAY_SKIP_RESUME are both set, and
> > > > the device's runtime status is "suspended".
> > >
> > > Unless MAY_SKIP_RESUME is unset for at least one of its descendants (or
> > > dependent devices).
> >
> > That should include the power.may_skip_resume flag, so as to read as follows:
> >
> > Unless MAY_SKIP_RESUME is unset or power.may_skip_resume is unset for
> > at least one of its descendants (or dependent devices).
>
> What about the runtime PM usage counter?

Yes, it applies to that too.

Of course, if dev_pm_skip_suspend() returns "true", the usage counter cannot
be greater than 1 (for the given device as well as for any dependent devices).

> > > > For a RESUME
> > > > transition, it will also return "true" if MAY_SKIP_RESUME and
> > > > power.may_skip_resume are both set, regardless of
> > > > SMART_SUSPEND or the current runtime status.
> > >
> > > And if the device was not in active use before suspend (as per its usage
> > > counter) or MAY_SKIP_RESUME is unset for at least one of its descendants (or
> > > dependent devices in general).
> >
> > And analogously here, so what I really should have written is:
> >
> > And if the device was not in active use before suspend (as per its
> > usage counter) or MAY_SKIP_RESUME or power.may_skip_resume is unset
> > for at least one of its descendants (or dependent devices in general).
>
> In other words, for RESUME transitions you want the MAY_SKIP_RESUME and
> power.may_skip_resume restrictions to propagate up from dependent
> devices.

Yes, I do.

> And of course, the way to do that is by adding them into the
> power.must_resume flag.

Right.

> How do you want to handle the usage counter restriction.
> Should that also propagate upward?

Yes, it should.

> And how should the result of dev_pm_skip_resume() be affected by
> SMART_SUSPEND for RESUME transitions?

Not directly, just through power.must_resume.

> Maybe this is getting confusing because of the way I organized it.
> Let's try like this:
>
> Transition Conditions for dev_pm_skip_resume() to return "true"
> ---------- ----------------------------------------------------
>
> RESTORE Never

Right.

> THAW power.must_resume is clear (which requires
> MAY_SKIP_RESUME and power.may_skip_resume to be set and
> the runtime usage counter to be = 1, and which
> propagates up from dependent devices)
> SMART_SUSPEND is set,
> runtime status is "suspended"

Like I said above:

THAW dev_pm_skip_suspend() returns "true".

>
> RESUME Same as THAW? Or maybe don't require SMART_SUSPEND?
> (But if SMART_SUSPEND is clear, how could the runtime
> status be "suspended"?)

RESUME power.must_resume is clear (which requires
MAY_SKIP_RESUME and power.may_skip_resume to be set and
the runtime usage counter to be = 1, and which
propagates up from dependent devices)

Nothing else is really strictly required IMO.

>
> I can't really tell what you want, because your comments at various
> times have been inconsistent.

Sorry for the inconsistencies, I hope that it's more clear now.

Cheers!



2020-04-16 15:22:58

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

Thanks for all your help straightening this out. I think the end
result will be a distinct improvement over the old code.

On Thu, 16 Apr 2020, Rafael J. Wysocki wrote:

> This means that the dev_pm_skip_resume() logic really is relatively
> straightforward:
> - If the current transition is RESTORE, return "false".
> - Otherwise, if the current transition is THAW, return the return value
> of dev_pm_skip_suspend().
> - Otherwise (so the current transition is RESUME which is the only remaining
> case), return the logical negation of power.must_resume.
>
> > Also, it would mean
> > that a device whose subsystem doesn't know about power.may_skip_resume
> > would never be allowed to stay in runtime suspend.
>
> Not really, because I want the core to set power.may_skip_resume for the
> devices for which dev_pm_skip_suspend() returns "true" if the "suspend_late"
> subsystem-level callback is not present. [It might be more consistent
> to simply set it for all devices for which dev_pm_skip_suspend() returns
> "true" and let the subsystems update it should they want to? IOW, the
> default value of power.may_skip_resume could be the return value of
> dev_pm_skip_suspend()?]

How about this? Let's set power.may_skip_resume to "true" for each
device before issuing ->prepare. The subsystem can set it to "false"
if it wants to during any of the suspend-side callbacks. Following the
->suspend_noirq callback, the core will do the equivalent of:

dev->power.may_skip_resume &= dev_pm_skip_suspend(dev);

before propagating the flag. Any subsystem changes to support this
should be minimal, since only ACPI and PCI currently use
may_skip_resume.

> > What about the runtime PM usage counter?
>
> Yes, it applies to that too.
>
> Of course, if dev_pm_skip_suspend() returns "true", the usage counter cannot
> be greater than 1 (for the given device as well as for any dependent devices).

Well, in theory the subsystem could call pm_runtime_get_noresume(). I
can't imagine why it would want to, though.

So here's what we've got:

> > Transition Conditions for dev_pm_skip_resume() to return "true"
> > ---------- ----------------------------------------------------
> >
> > RESTORE Never
>
> Right.

> THAW dev_pm_skip_suspend() returns "true".

> RESUME power.must_resume is clear (which requires
> MAY_SKIP_RESUME and power.may_skip_resume to be set and
> the runtime usage counter to be = 1, and which
> propagates up from dependent devices)
>
> Nothing else is really strictly required IMO.

This seems very clear and simple. And I will repeat here some of the
things posted earlier, to make the description more complete:

During the suspend side, for each of the
{suspend,freeze,poweroff}_{late,noirq} phases: If
dev_pm_skip_suspend() returns true then the subsystem should
not invoke the driver's callback, and if there is no subsystem
callback then the core will not invoke the driver's callback.

During the resume side, for each of the
{resume,thaw,restore}_{early,noirq} phases: If
dev_pm_skip_resume() returns true then the subsystem should
not invoke the driver's callback, and if there is no subsystem
callback then the core will not invoke the driver's callback.

dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
set and the device's runtime status is "suspended".

For dev_pm_skip_resume() and power.must_resume, see above.

At the start of the {resume,thaw,restore}_noirq phase, if
dev_pm_skip_resume() returns true then the core will set the
runtime status to "suspended". Otherwise it will set the
runtime status to "active". If this is not what the subsystem
or driver wants, it must update the runtime status itself.

For this to work properly, we will have to rely on subsystems/drivers
to call pm_runtime_resume() during the suspend/freeze transition if
SMART_SUSPEND is clear. Otherwise we could have the following
scenario:

Device A has a child B, and both are runtime suspended when hibernation
starts. Suppose that the SMART_SUSPEND flag is set for A but not for
B, and suppose that B's subsystem/driver neglects to call
pm_runtime_resume() during the FREEZE transition. Then during the THAW
transition, dev_pm_skip_resume() will return "true" for A and "false"
for B. This will lead to an error when the core tries to set B's
runtime status to "active" while A's status is "suspended".

One way to avoid this is to have the core make the pm_runtime_resume()
call, but you have said that you don't like that approach. Any
suggestions?

Should the core take some special action following ->freeze_noirq if
the runtime status is "suspended" and SMART_SUSPEND is clear?

Alan Stern

2020-04-17 10:00:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Thursday, April 16, 2020 5:18:15 PM CEST Alan Stern wrote:
> Thanks for all your help straightening this out. I think the end
> result will be a distinct improvement over the old code.

Yes, I believe so.

> On Thu, 16 Apr 2020, Rafael J. Wysocki wrote:
>
> > This means that the dev_pm_skip_resume() logic really is relatively
> > straightforward:
> > - If the current transition is RESTORE, return "false".
> > - Otherwise, if the current transition is THAW, return the return value
> > of dev_pm_skip_suspend().
> > - Otherwise (so the current transition is RESUME which is the only remaining
> > case), return the logical negation of power.must_resume.
> >
> > > Also, it would mean
> > > that a device whose subsystem doesn't know about power.may_skip_resume
> > > would never be allowed to stay in runtime suspend.
> >
> > Not really, because I want the core to set power.may_skip_resume for the
> > devices for which dev_pm_skip_suspend() returns "true" if the "suspend_late"
> > subsystem-level callback is not present. [It might be more consistent
> > to simply set it for all devices for which dev_pm_skip_suspend() returns
> > "true" and let the subsystems update it should they want to? IOW, the
> > default value of power.may_skip_resume could be the return value of
> > dev_pm_skip_suspend()?]
>
> How about this? Let's set power.may_skip_resume to "true" for each
> device before issuing ->prepare.

Yes, it can be set to 'true' by default for all devices.

It doesn't need to be before ->prepare, it can be before ->suspend (as it
is now).

> The subsystem can set it to "false"
> if it wants to during any of the suspend-side callbacks. Following the
> ->suspend_noirq callback, the core will do the equivalent of:
>
> dev->power.may_skip_resume &= dev_pm_skip_suspend(dev);
>
> before propagating the flag. Any subsystem changes to support this
> should be minimal, since only ACPI and PCI currently use
> may_skip_resume.

IMO it can be simpler even.

Because power.may_skip_resume is taken into account along with
MAY_SKIP_RESUME and the driver setting the latter must be prepared
for skipping its resume callbacks regardless of the suspend side of
things, they may always be skipped (and the device may be left in
suspend accordingly) if there is a reason to avoid doing that.

The core doesn't know about those reasons, so it has no reason to
touch power.may_skip_resume after setting it at the outset and then
whoever sees a reason why these callbacks should run (the subsystem
or the driver) needs to clear power.may_skip_resume (and clearing it
more than once obviously makes no difference).

> > > What about the runtime PM usage counter?
> >
> > Yes, it applies to that too.
> >
> > Of course, if dev_pm_skip_suspend() returns "true", the usage counter cannot
> > be greater than 1 (for the given device as well as for any dependent devices).
>
> Well, in theory the subsystem could call pm_runtime_get_noresume(). I
> can't imagine why it would want to, though.

Indeed.

> So here's what we've got:
>
> > > Transition Conditions for dev_pm_skip_resume() to return "true"
> > > ---------- ----------------------------------------------------
> > >
> > > RESTORE Never
> >
> > Right.
>
> > THAW dev_pm_skip_suspend() returns "true".
>
> > RESUME power.must_resume is clear (which requires
> > MAY_SKIP_RESUME and power.may_skip_resume to be set and
> > the runtime usage counter to be = 1, and which
> > propagates up from dependent devices)
> >
> > Nothing else is really strictly required IMO.
>
> This seems very clear and simple. And I will repeat here some of the
> things posted earlier, to make the description more complete:
>
> During the suspend side, for each of the
> {suspend,freeze,poweroff}_{late,noirq} phases: If
> dev_pm_skip_suspend() returns true then the subsystem should
> not invoke the driver's callback, and if there is no subsystem
> callback then the core will not invoke the driver's callback.
>
> During the resume side, for each of the
> {resume,thaw,restore}_{early,noirq} phases: If
> dev_pm_skip_resume() returns true then the subsystem should
> not invoke the driver's callback, and if there is no subsystem
> callback then the core will not invoke the driver's callback.
>
> dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
> set and the device's runtime status is "suspended".
>
> For dev_pm_skip_resume() and power.must_resume, see above.
>
> At the start of the {resume,thaw,restore}_noirq phase, if
> dev_pm_skip_resume() returns true then the core will set the
> runtime status to "suspended". Otherwise it will set the
> runtime status to "active". If this is not what the subsystem
> or driver wants, it must update the runtime status itself.
>
> For this to work properly, we will have to rely on subsystems/drivers
> to call pm_runtime_resume() during the suspend/freeze transition if
> SMART_SUSPEND is clear.

That has been the case forever, though.

> Otherwise we could have the following scenario:
>
> Device A has a child B, and both are runtime suspended when hibernation
> starts. Suppose that the SMART_SUSPEND flag is set for A but not for
> B, and suppose that B's subsystem/driver neglects to call
> pm_runtime_resume() during the FREEZE transition. Then during the THAW
> transition, dev_pm_skip_resume() will return "true" for A and "false"
> for B. This will lead to an error when the core tries to set B's
> runtime status to "active" while A's status is "suspended".
>
> One way to avoid this is to have the core make the pm_runtime_resume()
> call, but you have said that you don't like that approach. Any
> suggestions?

Because the core has not been calling pm_runtime_resume() during system-wide
suspend for devices with SMART_SUSPEND clear, that should not be changed or
we'll see regressions.

I know for a fact that some drivers expect the core to be doing nothing
with respect to that.

> Should the core take some special action following ->freeze_noirq if
> the runtime status is "suspended" and SMART_SUSPEND is clear?

Again, anything like that would change the current behavior which may
not be expected by at least some drivers, so I wouldn't change that.

IOW, SMART_SUSPEND clear means to the core that *it* need not care about
the suspend side at all (because somebody else will do that).



2020-04-17 16:14:34

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Fri, 17 Apr 2020, Rafael J. Wysocki wrote:

> On Thursday, April 16, 2020 5:18:15 PM CEST Alan Stern wrote:

> > > IOW, the
> > > default value of power.may_skip_resume could be the return value of
> > > dev_pm_skip_suspend()?]
> >
> > How about this? Let's set power.may_skip_resume to "true" for each
> > device before issuing ->prepare.
>
> Yes, it can be set to 'true' by default for all devices.
>
> It doesn't need to be before ->prepare, it can be before ->suspend (as it
> is now).

I suggested doing it before ->prepare so that subsystems can clear
power.may_skip_resume in their ->prepare callbacks. If you think the
ability to do that isn't important then fine, initialize the flag
before ->suspend.

> > The subsystem can set it to "false"
> > if it wants to during any of the suspend-side callbacks. Following the
> > ->suspend_noirq callback, the core will do the equivalent of:
> >
> > dev->power.may_skip_resume &= dev_pm_skip_suspend(dev);
> >
> > before propagating the flag. Any subsystem changes to support this
> > should be minimal, since only ACPI and PCI currently use
> > may_skip_resume.
>
> IMO it can be simpler even.
>
> Because power.may_skip_resume is taken into account along with
> MAY_SKIP_RESUME and the driver setting the latter must be prepared
> for skipping its resume callbacks regardless of the suspend side of
> things, they may always be skipped (and the device may be left in
> suspend accordingly) if there is a reason to avoid doing that.
>
> The core doesn't know about those reasons, so it has no reason to
> touch power.may_skip_resume after setting it at the outset and then
> whoever sees a reason why these callbacks should run (the subsystem
> or the driver) needs to clear power.may_skip_resume (and clearing it
> more than once obviously makes no difference).

I was trying to implement your suggestion of making the default for
power.may_skip_resume be the return value of dev_pm_skip_suspend().
However, making the default value be "true" is indeed simpler, and I
think it would work okay.

> > So here's what we've got:
> >
> > > > Transition Conditions for dev_pm_skip_resume() to return "true"
> > > > ---------- ----------------------------------------------------
> > > >
> > > > RESTORE Never
> > >
> > > Right.
> >
> > > THAW dev_pm_skip_suspend() returns "true".
> >
> > > RESUME power.must_resume is clear (which requires
> > > MAY_SKIP_RESUME and power.may_skip_resume to be set and
> > > the runtime usage counter to be = 1, and which
> > > propagates up from dependent devices)
> > >
> > > Nothing else is really strictly required IMO.
> >
> > This seems very clear and simple. And I will repeat here some of the
> > things posted earlier, to make the description more complete:
> >
> > During the suspend side, for each of the
> > {suspend,freeze,poweroff}_{late,noirq} phases: If
> > dev_pm_skip_suspend() returns true then the subsystem should
> > not invoke the driver's callback, and if there is no subsystem
> > callback then the core will not invoke the driver's callback.
> >
> > During the resume side, for each of the
> > {resume,thaw,restore}_{early,noirq} phases: If
> > dev_pm_skip_resume() returns true then the subsystem should
> > not invoke the driver's callback, and if there is no subsystem
> > callback then the core will not invoke the driver's callback.
> >
> > dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
> > set and the device's runtime status is "suspended".
> >
> > For dev_pm_skip_resume() and power.must_resume, see above.
> >
> > At the start of the {resume,thaw,restore}_noirq phase, if
> > dev_pm_skip_resume() returns true then the core will set the
> > runtime status to "suspended". Otherwise it will set the
> > runtime status to "active". If this is not what the subsystem
> > or driver wants, it must update the runtime status itself.
> >
> > For this to work properly, we will have to rely on subsystems/drivers
> > to call pm_runtime_resume() during the suspend/freeze transition if
> > SMART_SUSPEND is clear.
>
> That has been the case forever, though.

I'm not so sure about that. The existing PM core code doesn't ever get
into a situation where it tries to set a device's runtime status to
"active" while the parent's status is "suspended".

> > Otherwise we could have the following scenario:
> >
> > Device A has a child B, and both are runtime suspended when hibernation
> > starts. Suppose that the SMART_SUSPEND flag is set for A but not for
> > B, and suppose that B's subsystem/driver neglects to call
> > pm_runtime_resume() during the FREEZE transition. Then during the THAW
> > transition, dev_pm_skip_resume() will return "true" for A and "false"
> > for B. This will lead to an error when the core tries to set B's
> > runtime status to "active" while A's status is "suspended".
> >
> > One way to avoid this is to have the core make the pm_runtime_resume()
> > call, but you have said that you don't like that approach. Any
> > suggestions?
>
> Because the core has not been calling pm_runtime_resume() during system-wide
> suspend for devices with SMART_SUSPEND clear, that should not be changed or
> we'll see regressions.
>
> I know for a fact that some drivers expect the core to be doing nothing
> with respect to that.
>
> > Should the core take some special action following ->freeze_noirq if
> > the runtime status is "suspended" and SMART_SUSPEND is clear?
>
> Again, anything like that would change the current behavior which may
> not be expected by at least some drivers, so I wouldn't change that.
>
> IOW, SMART_SUSPEND clear means to the core that *it* need not care about
> the suspend side at all (because somebody else will do that).

But the core _does_ need to care, because if somebody else fails to
take care of the suspend side then the core would trigger the WARN() in
pm_runtime_enable() for the parent device. I guess we could consider
such a WARN() to be a symptom of a bug in the driver or subsystem,
rather than in the core; is that how you want to handle the scenario
above?

This approach doesn't seem robust. I can easily imagine cases where
the parent's driver is aware of SMART_SUSPEND but the child's driver
isn't. Currently we don't require the child's driver to call
pm_runtime_resume(). Do you really want to consider all such cases to
be bugs?

Basically, I'm saying that if the core allows things to arrive at a
situation where we can come out of THAW with a runtime-suspended parent
and a runtime-active child, it really should be considered to be the
core's fault.

Alan Stern

2020-04-17 21:58:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Friday, April 17, 2020 6:10:19 PM CEST Alan Stern wrote:
> On Fri, 17 Apr 2020, Rafael J. Wysocki wrote:
>
> > On Thursday, April 16, 2020 5:18:15 PM CEST Alan Stern wrote:
>

[cut]

> > > So here's what we've got:
> > >
> > > > > Transition Conditions for dev_pm_skip_resume() to return "true"
> > > > > ---------- ----------------------------------------------------
> > > > >
> > > > > RESTORE Never
> > > >
> > > > Right.
> > >
> > > > THAW dev_pm_skip_suspend() returns "true".
> > >
> > > > RESUME power.must_resume is clear (which requires
> > > > MAY_SKIP_RESUME and power.may_skip_resume to be set and
> > > > the runtime usage counter to be = 1, and which
> > > > propagates up from dependent devices)
> > > >
> > > > Nothing else is really strictly required IMO.
> > >
> > > This seems very clear and simple. And I will repeat here some of the
> > > things posted earlier, to make the description more complete:
> > >
> > > During the suspend side, for each of the
> > > {suspend,freeze,poweroff}_{late,noirq} phases: If
> > > dev_pm_skip_suspend() returns true then the subsystem should
> > > not invoke the driver's callback, and if there is no subsystem
> > > callback then the core will not invoke the driver's callback.
> > >
> > > During the resume side, for each of the
> > > {resume,thaw,restore}_{early,noirq} phases: If
> > > dev_pm_skip_resume() returns true then the subsystem should
> > > not invoke the driver's callback, and if there is no subsystem
> > > callback then the core will not invoke the driver's callback.
> > >
> > > dev_pm_skip_suspend() will return "true" if SMART_SUSPEND is
> > > set and the device's runtime status is "suspended".
> > >
> > > For dev_pm_skip_resume() and power.must_resume, see above.
> > >
> > > At the start of the {resume,thaw,restore}_noirq phase, if
> > > dev_pm_skip_resume() returns true then the core will set the
> > > runtime status to "suspended". Otherwise it will set the
> > > runtime status to "active". If this is not what the subsystem
> > > or driver wants, it must update the runtime status itself.

There is one detail here that I missed, sorry about that.

Actually, the core can only set the runtime status to "active" for
devices where dev_pm_skip_suspend() returns 'true'.

First, if the device is not "suspended", its status is "active" already
anyway.

Second, if the device has SMART_SUSPEND clear, the driver may not expect
its runtime status to change from "suspended" to "active" during system-wide
resume-type transitions (the driver's system-wide PM callbacks may use
the runtime status to determine what to do and changing the status this
way may confuse that).

[Actually, the drivers that set neither SMART_SUSPEND nor MAY_SKIP_RESUME
may not expect the runtime status to change during system-wide resume-type
transitions at all, but there is the corner case when the driver can set
MAY_SKIP_RESUME without setting SMART_SUSPEND. In that case its "noirq"
and "early" resume callbacks may be skipped and then it should expect
the runtime status to sometimes change from "active" to "suspended" during
RESUME transitions, but it may still not expect to see changes the other way
around, as in that case all of its callbacks are going to be invoked and
apply the internal runtime status handling mentioned above.]

So overall:

At the start of the {resume,thaw,restore}_noirq phase, if
dev_pm_skip_resume() returns true ,then the core will set the
runtime status to "suspended". Otherwise, if dev_pm_skip_suspend()
also returns true, then the core will set the runtime status to "active".
If this is not what the subsystem or driver wants, it must update the
runtime status itself.

> > > For this to work properly, we will have to rely on subsystems/drivers
> > > to call pm_runtime_resume() during the suspend/freeze transition if
> > > SMART_SUSPEND is clear.
> >
> > That has been the case forever, though.
>
> I'm not so sure about that. The existing PM core code doesn't ever get
> into a situation where it tries to set a device's runtime status to
> "active" while the parent's status is "suspended".

I'm assuming that you refer to the scenario below.

> > > Otherwise we could have the following scenario:
> > >
> > > Device A has a child B, and both are runtime suspended when hibernation
> > > starts. Suppose that the SMART_SUSPEND flag is set for A but not for
> > > B, and suppose that B's subsystem/driver neglects to call
> > > pm_runtime_resume() during the FREEZE transition. Then during the THAW
> > > transition, dev_pm_skip_resume() will return "true" for A and "false"
> > > for B. This will lead to an error when the core tries to set B's
> > > runtime status to "active" while A's status is "suspended".

That cannot happen, because dev_pm_smart_suspend() also returns 'false' for B
and so its runtime status will not be changed to "active".

> > > One way to avoid this is to have the core make the pm_runtime_resume()
> > > call, but you have said that you don't like that approach. Any
> > > suggestions?
> >
> > Because the core has not been calling pm_runtime_resume() during system-wide
> > suspend for devices with SMART_SUSPEND clear, that should not be changed or
> > we'll see regressions.
> >
> > I know for a fact that some drivers expect the core to be doing nothing
> > with respect to that.
> >
> > > Should the core take some special action following ->freeze_noirq if
> > > the runtime status is "suspended" and SMART_SUSPEND is clear?
> >
> > Again, anything like that would change the current behavior which may
> > not be expected by at least some drivers, so I wouldn't change that.
> >
> > IOW, SMART_SUSPEND clear means to the core that *it* need not care about
> > the suspend side at all (because somebody else will do that).
>
> But the core _does_ need to care, because if somebody else fails to
> take care of the suspend side then the core would trigger the WARN() in
> pm_runtime_enable() for the parent device. I guess we could consider
> such a WARN() to be a symptom of a bug in the driver or subsystem,
> rather than in the core; is that how you want to handle the scenario
> above?
>
> This approach doesn't seem robust. I can easily imagine cases where
> the parent's driver is aware of SMART_SUSPEND but the child's driver
> isn't. Currently we don't require the child's driver to call
> pm_runtime_resume(). Do you really want to consider all such cases to
> be bugs?
>
> Basically, I'm saying that if the core allows things to arrive at a
> situation where we can come out of THAW with a runtime-suspended parent
> and a runtime-active child, it really should be considered to be the
> core's fault.

AFAICS, this particular one is not an issue.

BTW, I have updated my pm-sleep-core branch to reflect what appears to be
the current state-of-the-art to me.

I'm going to post a v2 of this patch series over the weekend for reference.

Cheers!



2020-04-17 23:39:39

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Fri, 17 Apr 2020, Rafael J. Wysocki wrote:

> There is one detail here that I missed, sorry about that.
>
> Actually, the core can only set the runtime status to "active" for
> devices where dev_pm_skip_suspend() returns 'true'.
>
> First, if the device is not "suspended", its status is "active" already
> anyway.
>
> Second, if the device has SMART_SUSPEND clear, the driver may not expect
> its runtime status to change from "suspended" to "active" during system-wide
> resume-type transitions (the driver's system-wide PM callbacks may use
> the runtime status to determine what to do and changing the status this
> way may confuse that).
>
> [Actually, the drivers that set neither SMART_SUSPEND nor MAY_SKIP_RESUME
> may not expect the runtime status to change during system-wide resume-type
> transitions at all, but there is the corner case when the driver can set
> MAY_SKIP_RESUME without setting SMART_SUSPEND. In that case its "noirq"
> and "early" resume callbacks may be skipped and then it should expect
> the runtime status to sometimes change from "active" to "suspended" during
> RESUME transitions, but it may still not expect to see changes the other way
> around, as in that case all of its callbacks are going to be invoked and
> apply the internal runtime status handling mentioned above.]
>
> So overall:
>
> At the start of the {resume,thaw,restore}_noirq phase, if
> dev_pm_skip_resume() returns true ,then the core will set the
> runtime status to "suspended". Otherwise, if dev_pm_skip_suspend()
> also returns true, then the core will set the runtime status to "active".
> If this is not what the subsystem or driver wants, it must update the
> runtime status itself.

Sigh. The bug which prompted this whole thread was when I forgot to
set the runtime PM status back to "active" in one of my drivers. I was
hoping the core could handle it for me automatically.

I guess the answer is always to set the SMART_SUSPEND flag.


> > > > For this to work properly, we will have to rely on subsystems/drivers
> > > > to call pm_runtime_resume() during the suspend/freeze transition if
> > > > SMART_SUSPEND is clear.
> > >
> > > That has been the case forever, though.
> >
> > I'm not so sure about that. The existing PM core code doesn't ever get
> > into a situation where it tries to set a device's runtime status to
> > "active" while the parent's status is "suspended".
>
> I'm assuming that you refer to the scenario below.
>
> > > > Otherwise we could have the following scenario:
> > > >
> > > > Device A has a child B, and both are runtime suspended when hibernation
> > > > starts. Suppose that the SMART_SUSPEND flag is set for A but not for
> > > > B, and suppose that B's subsystem/driver neglects to call
> > > > pm_runtime_resume() during the FREEZE transition. Then during the THAW
> > > > transition, dev_pm_skip_resume() will return "true" for A and "false"
> > > > for B. This will lead to an error when the core tries to set B's
> > > > runtime status to "active" while A's status is "suspended".
>
> That cannot happen, because dev_pm_smart_suspend() also returns 'false' for B
> and so its runtime status will not be changed to "active".

Yes, your change to dev_pm_skip_resume() will prevent the problem from
arising.


> BTW, I have updated my pm-sleep-core branch to reflect what appears to be
> the current state-of-the-art to me.
>
> I'm going to post a v2 of this patch series over the weekend for reference.

Okay, I'll check it out.

By the way, if you don't mind I may want to do some editing of
devices.rst.

Alan Stern


2020-04-18 09:41:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Sat, Apr 18, 2020 at 1:37 AM Alan Stern <[email protected]> wrote:
>
> On Fri, 17 Apr 2020, Rafael J. Wysocki wrote:
>
> > There is one detail here that I missed, sorry about that.
> >
> > Actually, the core can only set the runtime status to "active" for
> > devices where dev_pm_skip_suspend() returns 'true'.
> >
> > First, if the device is not "suspended", its status is "active" already
> > anyway.
> >
> > Second, if the device has SMART_SUSPEND clear, the driver may not expect
> > its runtime status to change from "suspended" to "active" during system-wide
> > resume-type transitions (the driver's system-wide PM callbacks may use
> > the runtime status to determine what to do and changing the status this
> > way may confuse that).
> >
> > [Actually, the drivers that set neither SMART_SUSPEND nor MAY_SKIP_RESUME
> > may not expect the runtime status to change during system-wide resume-type
> > transitions at all, but there is the corner case when the driver can set
> > MAY_SKIP_RESUME without setting SMART_SUSPEND. In that case its "noirq"
> > and "early" resume callbacks may be skipped and then it should expect
> > the runtime status to sometimes change from "active" to "suspended" during
> > RESUME transitions, but it may still not expect to see changes the other way
> > around, as in that case all of its callbacks are going to be invoked and
> > apply the internal runtime status handling mentioned above.]
> >
> > So overall:
> >
> > At the start of the {resume,thaw,restore}_noirq phase, if
> > dev_pm_skip_resume() returns true ,then the core will set the
> > runtime status to "suspended". Otherwise, if dev_pm_skip_suspend()
> > also returns true, then the core will set the runtime status to "active".
> > If this is not what the subsystem or driver wants, it must update the
> > runtime status itself.
>
> Sigh. The bug which prompted this whole thread was when I forgot to
> set the runtime PM status back to "active" in one of my drivers. I was
> hoping the core could handle it for me automatically.
>
> I guess the answer is always to set the SMART_SUSPEND flag.

I would say so. :-)

> > > > > For this to work properly, we will have to rely on subsystems/drivers
> > > > > to call pm_runtime_resume() during the suspend/freeze transition if
> > > > > SMART_SUSPEND is clear.
> > > >
> > > > That has been the case forever, though.
> > >
> > > I'm not so sure about that. The existing PM core code doesn't ever get
> > > into a situation where it tries to set a device's runtime status to
> > > "active" while the parent's status is "suspended".
> >
> > I'm assuming that you refer to the scenario below.
> >
> > > > > Otherwise we could have the following scenario:
> > > > >
> > > > > Device A has a child B, and both are runtime suspended when hibernation
> > > > > starts. Suppose that the SMART_SUSPEND flag is set for A but not for
> > > > > B, and suppose that B's subsystem/driver neglects to call
> > > > > pm_runtime_resume() during the FREEZE transition. Then during the THAW
> > > > > transition, dev_pm_skip_resume() will return "true" for A and "false"
> > > > > for B. This will lead to an error when the core tries to set B's
> > > > > runtime status to "active" while A's status is "suspended".
> >
> > That cannot happen, because dev_pm_smart_suspend() also returns 'false' for B
> > and so its runtime status will not be changed to "active".
>
> Yes, your change to dev_pm_skip_resume() will prevent the problem from
> arising.
>
>
> > BTW, I have updated my pm-sleep-core branch to reflect what appears to be
> > the current state-of-the-art to me.
> >
> > I'm going to post a v2 of this patch series over the weekend for reference.
>
> Okay, I'll check it out.
>
> By the way, if you don't mind I may want to do some editing of
> devices.rst.

Sure, please feel free to do that.

Cheers!

2020-04-20 20:28:32

by Alan Stern

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On Wed, 25 Mar 2020, Alan Stern wrote:

> On Wed, 25 Mar 2020, Qais Yousef wrote:
>
> > Thanks for all the hints Alan.
> >
> > I think I figured it out, the below patch seems to fix it for me. Looking
> > at other drivers resume functions it seems we're missing the
> > pm_runtime_disable()->set_active()->enable() dance. Doing that fixes the
> > warning and the dev_err() in driver/base/power.
>
> Ah, yes. This should have been added years ago; guess I forgot. :-(
>
> > I don't see xhci-plat.c doing that, I wonder if it needs it too.
> >
> > I'm not well versed about the details and the rules here. So my fix could be
> > a hack, though it does seem the right thing to do.
> >
> > I wonder why the power core doesn't handle this transparently..
>
> Initially, we didn't want the PM core to do this automatically because
> we thought some devices might want to remain runtime-suspended
> following a system resume, and only the device driver would know what
> to do.

Qais:

So it looks like the discussion with Rafael will lead to changes in the
PM core, but they won't go into the -stable kernels, and they won't
directly fix the problem here.

In the meantime, why don't you write up your patch below and submit it
properly? Even better, create similar patches for ehci-platform.c and
xhci-plat.c and submit them too.

Alan Stern

> > diff --git a/drivers/usb/host/ohci-platform.c b/drivers/usb/host/ohci-platform.c
> > index 7addfc2cbadc..eb92c8092fae 100644
> > --- a/drivers/usb/host/ohci-platform.c
> > +++ b/drivers/usb/host/ohci-platform.c
> > @@ -299,6 +299,10 @@ static int ohci_platform_resume(struct device *dev)
> > }
> >
> > ohci_resume(hcd, false);
> > +
> > + pm_runtime_disable(dev);
> > + pm_runtime_set_active(dev);
> > + pm_runtime_enable(dev);
> > return 0;
> > }
> > #endif /* CONFIG_PM_SLEEP */
> >
> >
> > Thanks
> >
> > --
> > Qais Yousef
>
>
>

2020-04-21 11:17:14

by Qais Yousef

[permalink] [raw]
Subject: Re: lockdep warning in urb.c:363 usb_submit_urb

On 04/20/20 16:26, Alan Stern wrote:
> On Wed, 25 Mar 2020, Alan Stern wrote:
>
> > On Wed, 25 Mar 2020, Qais Yousef wrote:
> >
> > > Thanks for all the hints Alan.
> > >
> > > I think I figured it out, the below patch seems to fix it for me. Looking
> > > at other drivers resume functions it seems we're missing the
> > > pm_runtime_disable()->set_active()->enable() dance. Doing that fixes the
> > > warning and the dev_err() in driver/base/power.
> >
> > Ah, yes. This should have been added years ago; guess I forgot. :-(
> >
> > > I don't see xhci-plat.c doing that, I wonder if it needs it too.
> > >
> > > I'm not well versed about the details and the rules here. So my fix could be
> > > a hack, though it does seem the right thing to do.
> > >
> > > I wonder why the power core doesn't handle this transparently..
> >
> > Initially, we didn't want the PM core to do this automatically because
> > we thought some devices might want to remain runtime-suspended
> > following a system resume, and only the device driver would know what
> > to do.
>
> Qais:
>
> So it looks like the discussion with Rafael will lead to changes in the
> PM core, but they won't go into the -stable kernels, and they won't
> directly fix the problem here.
>
> In the meantime, why don't you write up your patch below and submit it
> properly? Even better, create similar patches for ehci-platform.c and
> xhci-plat.c and submit them too.

Sure.

Thanks

--
Qais Yousef

>
> Alan Stern
>
> > > diff --git a/drivers/usb/host/ohci-platform.c b/drivers/usb/host/ohci-platform.c
> > > index 7addfc2cbadc..eb92c8092fae 100644
> > > --- a/drivers/usb/host/ohci-platform.c
> > > +++ b/drivers/usb/host/ohci-platform.c
> > > @@ -299,6 +299,10 @@ static int ohci_platform_resume(struct device *dev)
> > > }
> > >
> > > ohci_resume(hcd, false);
> > > +
> > > + pm_runtime_disable(dev);
> > > + pm_runtime_set_active(dev);
> > > + pm_runtime_enable(dev);
> > > return 0;
> > > }
> > > #endif /* CONFIG_PM_SLEEP */
> > >
> > >
> > > Thanks
> > >
> > > --
> > > Qais Yousef
> >
> >
> >
>