2019-06-05 08:41:42

by Jon Hunter

[permalink] [raw]
Subject: [REGRESSION v5.2-rc] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE (431235818bc3)

Hi Trond,

I have been noticing intermittent failures with a system suspend test on
some of our machines that have a NFS mounted root file-system. Bisecting
this issue points to your commit 431235818bc3 ("SUNRPC: Declare RPC
timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3 does
appear to resolve the problem.

The cause of the suspend failure appears to be a long delay observed
sometimes when resuming from suspend, and this is causing our test to
timeout. For example, in a failing case I see something like the
following ...

[ 69.667385] PM: suspend entry (deep)

[ 69.675642] Filesystems sync: 0.000 seconds

[ 69.684983] Freezing user space processes ... (elapsed 0.001 seconds) done.

[ 69.697880] OOM killer disabled.

[ 69.705670] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.

[ 69.719043] printk: Suspending console(s) (use no_console_suspend to debug)

[ 69.758911] Disabling non-boot CPUs ...

[ 69.761875] IRQ 17: no longer affine to CPU3

[ 69.762609] Entering suspend state LP1

[ 69.762636] Enabling non-boot CPUs ...

[ 69.763600] CPU1 is up

[ 69.764517] CPU2 is up

[ 69.765532] CPU3 is up

[ 69.845832] mmc1: queuing unknown CIS tuple 0x80 (50 bytes)

[ 69.854223] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)

[ 69.857238] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)

[ 69.892700] mmc1: queuing unknown CIS tuple 0x02 (1 bytes)

[ 70.407286] OOM killer enabled.

[ 70.414674] Restarting tasks ... done.

[ 70.423232] PM: suspend exit

[ 73.533252] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1

[ 105.461852] nfs: server 192.168.99.1 not responding, still trying

[ 105.462347] nfs: server 192.168.99.1 not responding, still trying

[ 105.484809] nfs: server 192.168.99.1 OK

[ 105.486454] nfs: server 192.168.99.1 OK


So it would appear that making these timers deferrable is having an impact
when resuming from suspend. Do you have any thoughts on this?

Thanks
Jon


2019-06-05 22:02:02

by Trond Myklebust

[permalink] [raw]
Subject: Re: [REGRESSION v5.2-rc] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE (431235818bc3)

On Wed, 2019-06-05 at 09:40 +0100, Jon Hunter wrote:
> Hi Trond,
>
> I have been noticing intermittent failures with a system suspend test
> on
> some of our machines that have a NFS mounted root file-system.
> Bisecting
> this issue points to your commit 431235818bc3 ("SUNRPC: Declare RPC
> timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3
> does
> appear to resolve the problem.
>
> The cause of the suspend failure appears to be a long delay observed
> sometimes when resuming from suspend, and this is causing our test to
> timeout. For example, in a failing case I see something like the
> following ...
>
> [ 69.667385] PM: suspend entry (deep)
>
> [ 69.675642] Filesystems sync: 0.000 seconds
>
> [ 69.684983] Freezing user space processes ... (elapsed 0.001
> seconds) done.
>
> [ 69.697880] OOM killer disabled.
>
> [ 69.705670] Freezing remaining freezable tasks ... (elapsed 0.001
> seconds) done.
>
> [ 69.719043] printk: Suspending console(s) (use no_console_suspend
> to debug)
>
> [ 69.758911] Disabling non-boot CPUs ...
>
> [ 69.761875] IRQ 17: no longer affine to CPU3
>
> [ 69.762609] Entering suspend state LP1
>
> [ 69.762636] Enabling non-boot CPUs ...
>
> [ 69.763600] CPU1 is up
>
> [ 69.764517] CPU2 is up
>
> [ 69.765532] CPU3 is up
>
> [ 69.845832] mmc1: queuing unknown CIS tuple 0x80 (50 bytes)
>
> [ 69.854223] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
>
> [ 69.857238] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
>
> [ 69.892700] mmc1: queuing unknown CIS tuple 0x02 (1 bytes)
>
> [ 70.407286] OOM killer enabled.
>
> [ 70.414674] Restarting tasks ... done.
>
> [ 70.423232] PM: suspend exit
>
> [ 73.533252] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa
> 0xCDE1
>
> [ 105.461852] nfs: server 192.168.99.1 not responding, still trying
>
> [ 105.462347] nfs: server 192.168.99.1 not responding, still trying
>
> [ 105.484809] nfs: server 192.168.99.1 OK
>
> [ 105.486454] nfs: server 192.168.99.1 OK
>
>
> So it would appear that making these timers deferrable is having an
> impact
> when resuming from suspend. Do you have any thoughts on this?
>

I'd be OK with just reverting this patch if it is causing a performance
issue.

Anna?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-06-18 09:37:19

by Jon Hunter

[permalink] [raw]
Subject: Re: [REGRESSION v5.2-rc] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE (431235818bc3)

Trond, Anna,

On 12/06/2019 15:23, Jon Hunter wrote:
>
> On 05/06/2019 23:01, Trond Myklebust wrote:
>
> ...
>
>> I'd be OK with just reverting this patch if it is causing a performance
>> issue.
>>
>> Anna?
>
> Any update on this?

I have not seen any update on this. Do you plan to revert this?

We are getting ever closer to v5.2 and this problem still persists.

Thanks
Jon

--
nvpublic

2019-06-19 13:38:30

by Jon Hunter

[permalink] [raw]
Subject: Re: [REGRESSION v5.2-rc] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE (431235818bc3)


On 18/06/2019 21:18, Anna Schumaker wrote:
>
> On Tue, 2019-06-18 at 10:35 +0100, Jon Hunter wrote:
>> Trond, Anna,
>>
>> On 12/06/2019 15:23, Jon Hunter wrote:
>>> On 05/06/2019 23:01, Trond Myklebust wrote:
>>>
>>> ...
>>>
>>>> I'd be OK with just reverting this patch if it is causing a
>>>> performance
>>>> issue.
>>>>
>>>> Anna?
>>>
>>> Any update on this?
>>
>> I have not seen any update on this. Do you plan to revert this?
>>
>> We are getting ever closer to v5.2 and this problem still persists.
>
> Hi Jon,
>
> Sorry it took me so long to see this. I've applied the revert and
> pushed it out to my linux-next branch. I'm planning to send it with
> some other bugfixes this week.
Thanks Anna!

Jon
--
nvpublic