2013-10-17 20:42:21

by Helge Deller

[permalink] [raw]
Subject: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

I'm seeing a regression with current kernel git head when using NFS-mounts.
Architecture in my case is parisc, although I don't think that this is relevant.
At least kernel 3.10 (and I think 3.11) didn't showed that problem.

The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
Here is an output with kswapd1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1
28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top
1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init

This is what ps shows:
lsXXXX:~# ps -ef | grep mount
root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr

And using sysrq to show the blocked tasks I get in syslog:
SysRq : Show Blocked State
mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
Backtrace:
[<0000000040113a68>] __schedule+0x500/0x810

I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
The NFS directory was already mounted and in use when this mount happened again (called by cron-job).

Any ideas?

Helge


2013-10-17 21:07:32

by Myklebust, Trond

[permalink] [raw]
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

On Thu, 2013-10-17 at 22:42 +-0200, Helge Deller wrote:
+AD4- I'm seeing a regression with current kernel git head when using NFS-mounts.
+AD4- Architecture in my case is parisc, although I don't think that this is relevant.
+AD4- At least kernel 3.10 (and I think 3.11) didn't showed that problem.
+AD4-
+AD4- The symtom is, that +ACI-top+ACI- shows high usage of either kswapd0 or kswapd1.
+AD4- Here is an output with kswapd1:
+AD4- PID USER PR NI VIRT RES SHR S +ACU-CPU +ACU-MEM TIME+- COMMAND
+AD4- 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1
+AD4- 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top
+AD4- 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init
+AD4-
+AD4- This is what ps shows:
+AD4- lsXXXX:+AH4AIw- ps -ef +AHw- grep mount
+AD4- root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
+AD4- root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
+AD4- root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
+AD4-
+AD4- And using sysrq to show the blocked tasks I get in syslog:
+AD4- SysRq : Show Blocked State
+AD4- mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
+AD4- Backtrace:
+AD4- +AFsAPA-0000000040113a68+AD4AXQ- +AF8AXw-schedule+-0x500/0x810
+AD4-
+AD4- I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
+AD4- The NFS directory was already mounted and in use when this mount happened again (called by cron-job).
+AD4-
+AD4- Any ideas?

If the NFS directory is already mounted, then why is the automounter
trying to mount it a second time?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com

2013-10-18 19:26:33

by Helge Deller

[permalink] [raw]
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
> On Thu, 2013-10-17 at 22:42 +0200, Helge Deller wrote:
>> I'm seeing a regression with current kernel git head when using NFS-mounts.
>> Architecture in my case is parisc, although I don't think that this is relevant.
>> At least kernel 3.10 (and I think 3.11) didn't showed that problem.
>>
>> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
>> Here is an output with kswapd1:
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1
>> 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top
>> 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init
>>
>> This is what ps shows:
>> lsXXXX:~# ps -ef | grep mount
>> root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
>> root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
>> root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
>>
>> And using sysrq to show the blocked tasks I get in syslog:
>> SysRq : Show Blocked State
>> mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
>> Backtrace:
>> [<0000000040113a68>] __schedule+0x500/0x810
>>
>> I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
>> The NFS directory was already mounted and in use when this mount happened again (called by cron-job).
>>
>> Any ideas?
>
> If the NFS directory is already mounted, then why is the automounter
> trying to mount it a second time?

I was wrong in this.
The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new
mount.nfs was called).

I'm now not even sure, that the high kswapd is really triggered by the NFS problem,
because I now have another machine with the blocked NFS-mount, but without
the high kswapd usage.

Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly
some kind of regression since it doesn't happen with older kernels.

Helge

2013-10-18 19:36:27

by Myklebust, Trond

[permalink] [raw]
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

On Fri, 2013-10-18 at 21:26 +-0200, Helge Deller wrote:
+AD4- On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
+AD4- +AD4- On Thu, 2013-10-17 at 22:42 +020-, Helge Deller wrote:
+AD4- +AD4APg- I'm seeing a regression with current kernel git head when using NFS-mounts.
+AD4- +AD4APg- Architecture in my case is parisc, although I don't think that this is relevant.
+AD4- +AD4APg- At least kernel 3.10 (and I think 3.11) didn't showed that problem.
+AD4- +AD4APg-
+AD4- +AD4APg- The symtom is, that +ACI-top+ACI- shows high usage of either kswapd0 or kswapd1.
+AD4- +AD4APg- Here is an output with kswapd1:
+AD4- +AD4APg- PID USER PR NI VIRT RES SHR S +ACU-CPU +ACU-MEM TIME COMMAND
+AD4- +AD4APg- 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1
+AD4- +AD4APg- 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top
+AD4- +AD4APg- 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init
+AD4- +AD4APg-
+AD4- +AD4APg- This is what ps shows:
+AD4- +AD4APg- lsXXXX:+AH4AIw- ps -ef +AHw- grep mount
+AD4- +AD4APg- root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
+AD4- +AD4APg- root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
+AD4- +AD4APg- root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
+AD4- +AD4APg-
+AD4- +AD4APg- And using sysrq to show the blocked tasks I get in syslog:
+AD4- +AD4APg- SysRq : Show Blocked State
+AD4- +AD4APg- mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
+AD4- +AD4APg- Backtrace:
+AD4- +AD4APg- +AFsAPA-0000000040113a68+AD4AXQ- +AF8AXw-schedule+0x500/0x810-
+AD4- +AD4APg-
+AD4- +AD4APg- I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
+AD4- +AD4APg- The NFS directory was already mounted and in use when this mount happened again (called by cron-job).
+AD4- +AD4APg-
+AD4- +AD4APg- Any ideas?
+AD4- +AD4-
+AD4- +AD4- If the NFS directory is already mounted, then why is the automounter
+AD4- +AD4- trying to mount it a second time?
+AD4-
+AD4- I was wrong in this.
+AD4- The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new
+AD4- mount.nfs was called).
+AD4-
+AD4- I'm now not even sure, that the high kswapd is really triggered by the NFS problem,
+AD4- because I now have another machine with the blocked NFS-mount, but without
+AD4- the high kswapd usage.
+AD4-
+AD4- Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly
+AD4- some kind of regression since it doesn't happen with older kernels.

Have you ever reproduced it without the automounter?

Also, could you please try a sysRQ-t the next time it happens, so that
we can get a trace of where the mount program is hanging. Knowing that
the mount is stuck in +ACIAXwBf-schedule()+ACI- is not really interesting unless we
know from where that was called.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com

2013-10-18 20:03:26

by Helge Deller

[permalink] [raw]
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
> On Fri, 2013-10-18 at 21:26 +0200, Helge Deller wrote:
>> On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
>>> On Thu, 2013-10-17 at 22:42 m, Helge Deller wrote:
>>>> I'm seeing a regression with current kernel git head when using NFS-mounts.
>>>> Architecture in my case is parisc, although I don't think that this is relevant.
>>>> At least kernel 3.10 (and I think 3.11) didn't showed that problem.
>>>>
>>>> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
>>>> Here is an output with kswapd1:
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND
>>>> 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1
>>>> 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top
>>>> 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init
>>>>
>>>> This is what ps shows:
>>>> lsXXXX:~# ps -ef | grep mount
>>>> root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
>>>> root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
>>>> root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
>>>>
>>>> And using sysrq to show the blocked tasks I get in syslog:
>>>> SysRq : Show Blocked State
>>>> mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
>>>> Backtrace:
>>>> [<0000000040113a68>] __schedule
>>>>
>>>> I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
>>>> The NFS directory was already mounted and in use when this mount happened again (called by cron-job).
>>>>
>>>> Any ideas?
>>>
>>> If the NFS directory is already mounted, then why is the automounter
>>> trying to mount it a second time?
>>
>> I was wrong in this.
>> The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new
>> mount.nfs was called).
>>
>> I'm now not even sure, that the high kswapd is really triggered by the NFS problem,
>> because I now have another machine with the blocked NFS-mount, but without
>> the high kswapd usage.
>>
>> Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly
>> some kind of regression since it doesn't happen with older kernels.
>
> Have you ever reproduced it without the automounter?

No, because it happens only after quite some time (>12h) and only if I have it
under pressure (load is >9 on a 4-way box).

I'll try it as soon as possible.

> Also, could you please try a sysRQ-t the next time it happens, so that
> we can get a trace of where the mount program is hanging. Knowing that
> the mount is stuck in "__schedule()" is not really interesting unless we
> know from where that was called.

Actually, the machine was still running in this state.
Here is sysrq-t:
[112009.084000] mount S 00000000401040c0 0 25331 1 0x00000010
[112009.084000] Backtrace:
[112009.084000] [<0000000040113a68>] __schedule+0x500/0x810
[112009.232000]
[112009.232000] mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
[112009.232000] Backtrace:
[112009.232000] [<0000000040113a68>] __schedule+0x500/0x810

Helge

2013-10-18 20:12:19

by Myklebust, Trond

[permalink] [raw]
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

On Fri, 2013-10-18 at 22:03 +-0200, Helge Deller wrote:
+AD4- On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
+AD4- +AD4- Also, could you please try a sysRQ-t the next time it happens, so that
+AD4- +AD4- we can get a trace of where the mount program is hanging. Knowing that
+AD4- +AD4- the mount is stuck in +ACIAXwBf-schedule()+ACI- is not really interesting unless we
+AD4- +AD4- know from where that was called.
+AD4-
+AD4- Actually, the machine was still running in this state.
+AD4- Here is sysrq-t:
+AD4- +AFs-112009.084000+AF0- mount S 00000000401040c0 0 25331 1 0x00000010
+AD4- +AFs-112009.084000+AF0- Backtrace:
+AD4- +AFs-112009.084000+AF0- +AFsAPA-0000000040113a68+AD4AXQ- +AF8AXw-schedule+0x500/0x810-
+AD4- +AFs-112009.232000+AF0-
+AD4- +AFs-112009.232000+AF0- mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
+AD4- +AFs-112009.232000+AF0- Backtrace:
+AD4- +AFs-112009.232000+AF0- +AFsAPA-0000000040113a68+AD4AXQ- +AF8AXw-schedule+0x500/0x810-

That makes no sense unless sysrq-t works differently on parisc than on
other platforms. I'd expect the backtrace to at least include a system
call. Parisc experts?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com

2013-10-19 18:27:49

by Helge Deller

[permalink] [raw]
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

On 10/18/2013 10:12 PM, Myklebust, Trond wrote:
> On Fri, 2013-10-18 at 22:03 +0200, Helge Deller wrote:
>> On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
>>> Also, could you please try a sysRQ-t the next time it happens, so that
>>> we can get a trace of where the mount program is hanging. Knowing that
>>> the mount is stuck in "__schedule()" is not really interesting unless we
>>> know from where that was called.
>>
>> Actually, the machine was still running in this state.
>> Here is sysrq-t:
>> [112009.084000] mount S 00000000401040c0 0 25331 1 0x00000010
>> [112009.084000] Backtrace:
>> [112009.084000] [<0000000040113a68>] __schedule
>> [112009.232000]
>> [112009.232000] mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
>> [112009.232000] Backtrace:
>> [112009.232000] [<0000000040113a68>] __schedule
>
> That makes no sense unless sysrq-t works differently on parisc than on
> other platforms. I'd expect the backtrace to at least include a system
> call. Parisc experts?

sysrq-t doesn't work differently on parisc. For other processes I do get
a backtrace like the one on x86_64.
That's the main reason why I asked for ideas here on the list.
I do see the stuck process, but don't see any indications where it comes from.

Helge

2013-10-31 19:45:58

by Helge Deller

[permalink] [raw]
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

On 10/19/2013 08:27 PM, Helge Deller wrote:
> On 10/18/2013 10:12 PM, Myklebust, Trond wrote:
>> On Fri, 2013-10-18 at 22:03 m, Helge Deller wrote:
>>> On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
>>>> Also, could you please try a sysRQ-t the next time it happens, so that
>>>> we can get a trace of where the mount program is hanging. Knowing that
>>>> the mount is stuck in "__schedule()" is not really interesting unless we
>>>> know from where that was called.
>>>
>>> Actually, the machine was still running in this state.
>>> Here is sysrq-t:
>>> [112009.084000] mount S 00000000401040c0 0 25331 1 0x00000010
>>> [112009.084000] Backtrace:
>>> [112009.084000] [<0000000040113a68>] __schedule
>>> [112009.232000]
>>> [112009.232000] mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
>>> [112009.232000] Backtrace:
>>> [112009.232000] [<0000000040113a68>] __schedule
>>
>> That makes no sense unless sysrq-t works differently on parisc than on
>> other platforms. I'd expect the backtrace to at least include a system
>> call. Parisc experts?
>
> sysrq-t doesn't work differently on parisc. For other processes I do get
> a backtrace like the one on x86_64.
> That's the main reason why I asked for ideas here on the list.
> I do see the stuck process, but don't see any indications where it comes from.

I'm pretty sure that the regression (kswapd using 100% CPU) I reported here is fixed by this patch:
http://git.kernel.org/cgit/linux/kernel/git/deller/parisc-linux.git/commit/?id=c56b097af26cb11c1f49a4311ba538c825666fed

I will start some testing...

Helge