LinuxLists.cc - NFS client large rsize/wsize (tcp?) problems

2012-12-30 12:58:56

Subject: NFS client large rsize/wsize (tcp?) problems

Hello All,

I am almost complete NOOB on this matter, so please be gentle ;-) I do
believe there is some sort of problem inside the NFS code though.

Background: I have:
- linux server x86_64, vanilla kernel 3.6.7 sharing a few exports
- several set-top-boxes running linux, arch is mipsel32, they're
almost vanilla, but they have prioprietary closed source drivers for the
DVB frontends:
* MaxDigital XP1000, kernel 3.5.1
* DMM DM8000, kernel 3.2.0
* VU+ Ultimo, kernel 3.1.1

These all suffer from the same problem. When they have a share mounted
with default parameters, using tcp, they crash sooner or later, notably
after heavy share access. The dm8000 has it's gui killed by the
OOM-killer whilst the xp1000 and the ultimo simply lock up.

The OOM-killer reports it needs blocks of 128k (probably for NFS, but it
doesn't say it), but can't find them. For the other stb'es it's not
clear as they lock up.

I've "discovered" a few interesting things:
- adding swap to the dm8000 makes the problem almost go away, although
without NFS it definitely doesn't need swap, ever.
- when I ran my laptop (x86_64!) with a slightly older kernel (2.6.35
iirc) from a rescue cd, at a certain point I also got nasty dmesg
reports and the "dd" proces got stuck in D state, this was reproducable
over reboots.
- all clients work flawlessly (over extended perdiods of time) if
mounted using udp and smaller rsize/wsize values (max 32k). Tcp seems to
work as well, as long as the size values are kept under 32k.
- the x86_64 laptop also worked fine when mounted this way
- so apparently it's not a stb/mipsel32/proprietary driver issue.
- stb's running older kernels (notably 2.6.18) don't suffer from this
problem

Can please anyone enlighten me? I can't find similar reports other than
from fellow stb users.

Thanks!

2013-01-02 21:44:02

by J. Bruce Fields

[permalink] [raw]

Subject: Re: NFS client large rsize/wsize (tcp?) problems

On Wed, Jan 02, 2013 at 08:21:37PM +0100, Erik Slagter wrote:
> On 02-01-13 19:47, Myklebust, Trond wrote:
>
> > You probably have a NIC that doesn't support scatter-gather.
>
> I am not 100% sure, but as it's a satellite set-top-box, with very basic
> ethernet connectivity, I'd say you're right on the spot.
>
> Is there a way to workaround that?
>
> >> As a temporary workaround (for "dumb users" that don't know what a mount
> >> option is, yes it's awful!) I'd like to modify the kernel of the clients
> >> to negotiate a smaller buffer size, 32k would probably suffice. I've had
> >> a few shots but have not been successful yet, can you give me a pointer
> >> please?
>
> > man nfsmount.conf
>
> Thanks for the hint, I didn't know that! Unfortunately I can't use it,
> because the mount command on the stb is a "busybox" version, so it's
> very basic, and doesn't check this file (checked it using strace...)

You can also configure this server-side by writing to
/proc/fs/nfsd/max_block_size (or your distro may have some config file
where that's set). But then of course it limits all clients whether
they need the workaround or not.

--b.

2013-01-02 19:21:41

by Erik Slagter

[permalink] [raw]

Subject: Re: NFS client large rsize/wsize (tcp?) problems

On 02-01-13 19:47, Myklebust, Trond wrote:

> You probably have a NIC that doesn't support scatter-gather.

I am not 100% sure, but as it's a satellite set-top-box, with very basic
ethernet connectivity, I'd say you're right on the spot.

Is there a way to workaround that?

>> As a temporary workaround (for "dumb users" that don't know what a mount
>> option is, yes it's awful!) I'd like to modify the kernel of the clients
>> to negotiate a smaller buffer size, 32k would probably suffice. I've had
>> a few shots but have not been successful yet, can you give me a pointer
>> please?

> man nfsmount.conf

Thanks for the hint, I didn't know that! Unfortunately I can't use it,
because the mount command on the stb is a "busybox" version, so it's
very basic, and doesn't check this file (checked it using strace...)

2013-01-02 18:47:32

by Myklebust, Trond

[permalink] [raw]

Subject: Re: NFS client large rsize/wsize (tcp?) problems

On Wed, 2013-01-02 at 19:37 +-0100, Erik Slagter wrote:
+AD4- On 02-01-13 19:21, J. Bruce Fields wrote:
+AD4-
+AD4- +AD4APg- The OOM-killer reports it needs blocks of 128k (probably for NFS,
+AD4- +AD4APg- but it doesn't say it), but can't find them.
+AD4- +AD4-
+AD4- +AD4- Details? (Could you show us the log messages?) Anything else
+AD4- +AD4- interesting in the logs before then? (E.g. any +ACI-order-n allocation
+AD4- +AD4- failed+ACI- messages?)
+AD4-
+AD4- Hmmm, that will be tricky. The one box that produces OOM-messages has
+AD4- this after about a week of usage, and they only log in memory :-(
+AD4-
+AD4- Ah, I've found one+ACE-
+AD4-
+AD4- +AD4- enigma2 invoked oom-killer: gfp+AF8-mask+AD0-0xd0, order+AD0-0, oom+AF8-adj+AD0-0, oom+AF8-score+AF8-adj+AD0-0
+AD4- +AD4- Call Trace:
+AD4- +AD4- +AFsAPA-80485708+AD4AXQ- dump+AF8-stack+-0x8/0x34
+AD4- +AD4- +AFsAPA-80081f60+AD4AXQ- dump+AF8-header.isra.9+-0x88/0x1a4
+AD4- +AD4- +AFsAPA-80082268+AD4AXQ- oom+AF8-kill+AF8-process.constprop.16+-0xc4/0x2b8
+AD4- +AD4- +AFsAPA-800828c4+AD4AXQ- out+AF8-of+AF8-memory+-0x2a8/0x3a8
+AD4- +AD4- +AFsAPA-80085e78+AD4AXQ- +AF8AXw-alloc+AF8-pages+AF8-nodemask+-0x640/0x654
+AD4- +AD4- +AFsAPA-8048683c+AD4AXQ- cache+AF8-alloc+AF8-refill+-0x350/0x668
+AD4- +AD4- +AFsAPA-800b1f10+AD4AXQ- kmem+AF8-cache+AF8-alloc+-0xe0/0x104
+AD4- +AD4- +AFsAPA-80185360+AD4AXQ- nfs+AF8-create+AF8-request+-0x40/0x178
+AD4- +AD4- +AFsAPA-80187544+AD4AXQ- readpage+AF8-async+AF8-filler+-0x9c/0x1bc
+AD4- +AD4- +AFsAPA-80089b98+AD4AXQ- read+AF8-cache+AF8-pages+-0xe4/0x144
+AD4- +AD4- +AFsAPA-801886ac+AD4AXQ- nfs+AF8-readpages+-0xd4/0x1cc
+AD4- +AD4- +AFsAPA-80089928+AD4AXQ- +AF8AXw-do+AF8-page+AF8-cache+AF8-readahead+-0x218/0x2e4
+AD4- +AD4- +AFsAPA-80089d58+AD4AXQ- ra+AF8-submit+-0x28/0x34
+AD4- +AD4- +AFsAPA-8008a138+AD4AXQ- page+AF8-cache+AF8-sync+AF8-readahead+-0x48/0x70
+AD4- +AD4- +AFsAPA-80080ae0+AD4AXQ- generic+AF8-file+AF8-aio+AF8-read+-0x55c/0x858
+AD4- +AD4- +AFsAPA-80179560+AD4AXQ- nfs+AF8-file+AF8-read+-0xac/0x194
+AD4- +AD4- +AFsAPA-800b5004+AD4AXQ- do+AF8-sync+AF8-read+-0xb8/0x120
+AD4- +AD4- +AFsAPA-800b5ca0+AD4AXQ- vfs+AF8-read+-0xa0/0x180
+AD4- +AD4- +AFsAPA-800b5dcc+AD4AXQ- sys+AF8-read+-0x4c/0x90
+AD4- +AD4- +AFsAPA-8000c61c+AD4AXQ- stack+AF8-done+-0x20/0x40
+AD4- +AD4-
+AD4- +AD4- Mem-Info:
+AD4- +AD4- Normal per-cpu:
+AD4- +AD4- CPU 0: hi: 90, btch: 15 usd: 14
+AD4- +AD4- CPU 1: hi: 90, btch: 15 usd: 0
+AD4- +AD4- active+AF8-anon:22459 inactive+AF8-anon:57 isolated+AF8-anon:0
+AD4- +AD4- active+AF8-file:972 inactive+AF8-file:1968 isolated+AF8-file:0
+AD4- +AD4- unevictable:0 dirty:0 writeback:144 unstable:0
+AD4- +AD4- free:501 slab+AF8-reclaimable:526 slab+AF8-unreclaimable:2701
+AD4- +AD4- mapped:686 shmem:142 pagetables:137 bounce:0
+AD4- +AD4- Normal free:2004kB min:2036kB low:2544kB high:3052kB active+AF8-anon:89836kB inactive+AF8-anon:228kB active+AF8-file:3888kB inactive+AF8-file:7872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:260096kB mlocked:0kB dirty:0kB writeback:576kB mapped:2744kB shmem:568kB slab+AF8-reclaimable:2104kB slab+AF8-unreclaimable:10804kB kernel+AF8-stack:792kB pagetables:548kB unstable:0kB bounce:0kB writeback+AF8-tmp:0kB pages+AF8-scanned:14594 all+AF8-unreclaimable? yes
+AD4- +AD4- lowmem+AF8-reserve+AFsAXQ-: 0 0
+AD4- +AD4- Normal: 317+ACo-4kB 90+ACo-8kB 1+ACo-16kB 0+ACo-32kB 0+ACo-64kB 0+ACo-128kB 0+ACo-256kB 0+ACo-512kB 0+ACo-1024kB 0+ACo-2048kB 0+ACo-4096kB +AD0- 2004kB
+AD4- +AD4- 3101 total pagecache pages
+AD4- +AD4- 0 pages in swap cache
+AD4- +AD4- Swap cache stats: add 0, delete 0, find 0/0
+AD4- +AD4- Free swap +AD0- 0kB
+AD4- +AD4- Total swap +AD0- 0kB
+AD4- +AD4- 65536 pages RAM
+AD4- +AD4- 28149 pages reserved
+AD4- +AD4- 3039 pages shared
+AD4- +AD4- 33680 pages non-shared
+AD4- +AD4- +AFs- pid +AF0- uid tgid total+AF8-vm rss cpu oom+AF8-adj oom+AF8-score+AF8-adj name
+AD4- +AD4- +AFs- 254+AF0- 0 254 474 16 1 0 0 wdog
+AD4- +AD4- +AFs- 263+AF0- 0 263 1225 88 0 0 0 tpmd
+AD4- +AD4- +AFs- 327+AF0- 0 327 1026 255 1 0 0 nmbd
+AD4- +AD4- +AFs- 329+AF0- 0 329 1803 175 1 0 0 smbd
+AD4- +AD4- +AFs- 349+AF0- 0 349 1803 175 0 0 0 smbd
+AD4- +AD4- +AFs- 372+AF0- 1 372 499 19 1 0 0 portmap
+AD4- +AD4- +AFs- 383+AF0- 998 383 762 37 1 0 0 dbus-daemon
+AD4- +AD4- +AFs- 387+AF0- 0 387 666 24 1 0 0 dropbear
+AD4- +AD4- +AFs- 392+AF0- 0 392 664 48 0 0 0 crond
+AD4- +AD4- +AFs- 398+AF0- 0 398 758 22 1 0 0 inetd
+AD4- +AD4- +AFs- 401+AF0- 0 401 664 35 1 0 0 syslogd
+AD4- +AD4- +AFs- 403+AF0- 0 403 664 52 0 0 0 klogd
+AD4- +AD4- +AFs- 410+AF0- 997 410 922 95 1 0 0 avahi-daemon
+AD4- +AD4- +AFs- 411+AF0- 997 411 922 42 0 0 0 avahi-daemon
+AD4- +AD4- +AFs- 7811+AF0- 65534 7811 7424 187 1 0 0 msgd
+AD4- +AD4- +AFs- 7819+AF0- 0 7819 1266 45 0 0 0 oscam
+AD4- +AD4- +AFs- 7820+AF0- 0 7820 6733 2491 1 0 0 oscam
+AD4- +AD4- +AFs- 7821+AF0- 0 7821 664 16 1 0 0 enigma2.sh
+AD4- +AD4- +AFs- 7828+AF0- 0 7828 44920 19651 1 0 0 enigma2
+AD4- +AD4- Out of memory: Kill process 7828 (enigma2) score 496 or sacrifice child
+AD4- +AD4- Killed process 7828 (enigma2) total-vm:179680kB, anon-rss:77180kB, file-rss:1424kB
+AD4-
+AD4- The other boxes simply lock up.
+AD4-
+AD4- This does NOT happen with NFS mounted using smaller buffers+ACE-

You probably have a NIC that doesn't support scatter-gather.

+AD4- +AD4APg- I've +ACI-discovered+ACI- a few interesting things:
+AD4- +AD4APg- - adding swap to the dm8000 makes the problem almost go away,
+AD4- +AD4APg- although without NFS it definitely doesn't need swap, ever.
+AD4- +AD4APg- - when I ran my laptop (x86+AF8-64+ACE-) with a slightly older kernel
+AD4- +AD4APg- (2.6.35 iirc) from a rescue cd, at a certain point I also got nasty
+AD4- +AD4APg- dmesg reports and the +ACI-dd+ACI- proces got stuck in D state, this was
+AD4- +AD4APg- reproducable over reboots.
+AD4- +AD4-
+AD4- +AD4- Why do you believe that's the same problem?
+AD4-
+AD4- Because all are solved with smaller nfs mount buffers. That is as much
+AD4- as I understand.
+AD4-
+AD4- +AD4- OK, thanks for the reports, let us know i you're able to narrow it down
+AD4- +AD4- farther. It's not familiar off the top of my head.
+AD4-
+AD4- Okay, at least it's good to know it's not a known problem with a known
+AD4- solution / workaround. I hope the kernel message helps.
+AD4-
+AD4- As a temporary workaround (for +ACI-dumb users+ACI- that don't know what a mount
+AD4- option is, yes it's awful+ACE-) I'd like to modify the kernel of the clients
+AD4- to negotiate a smaller buffer size, 32k would probably suffice. I've had
+AD4- a few shots but have not been successful yet, can you give me a pointer
+AD4- please?
+AD4-

man nfsmount.conf

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com

2013-01-02 18:21:50

by J. Bruce Fields

[permalink] [raw]

Subject: Re: NFS client large rsize/wsize (tcp?) problems

On Sun, Dec 30, 2012 at 01:53:18PM +0100, Erik Slagter wrote:
> Hello All,
>
> I am almost complete NOOB on this matter, so please be gentle ;-)

Hah! Hah! Hah!

> I
> do believe there is some sort of problem inside the NFS code though.
>
> Background: I have:
> - linux server x86_64, vanilla kernel 3.6.7 sharing a few exports
> - several set-top-boxes running linux, arch is mipsel32, they're
> almost vanilla, but they have prioprietary closed source drivers for
> the DVB frontends:
> * MaxDigital XP1000, kernel 3.5.1
> * DMM DM8000, kernel 3.2.0
> * VU+ Ultimo, kernel 3.1.1
>
> These all suffer from the same problem. When they have a share
> mounted with default parameters, using tcp, they crash sooner or
> later, notably after heavy share access. The dm8000 has it's gui
> killed by the OOM-killer whilst the xp1000 and the ultimo simply
> lock up.
>
> The OOM-killer reports it needs blocks of 128k (probably for NFS,
> but it doesn't say it), but can't find them.

Details? (Could you show us the log messages?) Anything else
interesting in the logs before then? (E.g. any "order-n allocation
failed" messages?)

> For the other stb'es
> it's not clear as they lock up.
>
> I've "discovered" a few interesting things:
> - adding swap to the dm8000 makes the problem almost go away,
> although without NFS it definitely doesn't need swap, ever.
> - when I ran my laptop (x86_64!) with a slightly older kernel
> (2.6.35 iirc) from a rescue cd, at a certain point I also got nasty
> dmesg reports and the "dd" proces got stuck in D state, this was
> reproducable over reboots.

Why do you believe that's the same problem?

> - all clients work flawlessly (over extended perdiods of time) if
> mounted using udp and smaller rsize/wsize values (max 32k). Tcp
> seems to work as well, as long as the size values are kept under
> 32k.
> - the x86_64 laptop also worked fine when mounted this way
> - so apparently it's not a stb/mipsel32/proprietary driver issue.
> - stb's running older kernels (notably 2.6.18) don't suffer from
> this problem

OK, thanks for the reports, let us know i you're able to narrow it down
farther. It's not familiar off the top of my head.

--b.

>
> Can please anyone enlighten me? I can't find similar reports other
> than from fellow stb users.
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-01-02 18:38:03

by Erik Slagter

[permalink] [raw]

Subject: Re: NFS client large rsize/wsize (tcp?) problems

On 02-01-13 19:21, J. Bruce Fields wrote:

>> The OOM-killer reports it needs blocks of 128k (probably for NFS,
>> but it doesn't say it), but can't find them.
>
> Details? (Could you show us the log messages?) Anything else
> interesting in the logs before then? (E.g. any "order-n allocation
> failed" messages?)

Hmmm, that will be tricky. The one box that produces OOM-messages has
this after about a week of usage, and they only log in memory :-(

Ah, I've found one!

> enigma2 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
> Call Trace:
> [<80485708>] dump_stack+0x8/0x34
> [<80081f60>] dump_header.isra.9+0x88/0x1a4
> [<80082268>] oom_kill_process.constprop.16+0xc4/0x2b8
> [<800828c4>] out_of_memory+0x2a8/0x3a8
> [<80085e78>] __alloc_pages_nodemask+0x640/0x654
> [<8048683c>] cache_alloc_refill+0x350/0x668
> [<800b1f10>] kmem_cache_alloc+0xe0/0x104
> [<80185360>] nfs_create_request+0x40/0x178
> [<80187544>] readpage_async_filler+0x9c/0x1bc
> [<80089b98>] read_cache_pages+0xe4/0x144
> [<801886ac>] nfs_readpages+0xd4/0x1cc
> [<80089928>] __do_page_cache_readahead+0x218/0x2e4
> [<80089d58>] ra_submit+0x28/0x34
> [<8008a138>] page_cache_sync_readahead+0x48/0x70
> [<80080ae0>] generic_file_aio_read+0x55c/0x858
> [<80179560>] nfs_file_read+0xac/0x194
> [<800b5004>] do_sync_read+0xb8/0x120
> [<800b5ca0>] vfs_read+0xa0/0x180
> [<800b5dcc>] sys_read+0x4c/0x90
> [<8000c61c>] stack_done+0x20/0x40
>
> Mem-Info:
> Normal per-cpu:
> CPU 0: hi: 90, btch: 15 usd: 14
> CPU 1: hi: 90, btch: 15 usd: 0
> active_anon:22459 inactive_anon:57 isolated_anon:0
> active_file:972 inactive_file:1968 isolated_file:0
> unevictable:0 dirty:0 writeback:144 unstable:0
> free:501 slab_reclaimable:526 slab_unreclaimable:2701
> mapped:686 shmem:142 pagetables:137 bounce:0
> Normal free:2004kB min:2036kB low:2544kB high:3052kB active_anon:89836kB inactive_anon:228kB active_file:3888kB inactive_file:7872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:260096kB mlocked:0kB dirty:0kB writeback:576kB mapped:2744kB shmem:568kB slab_reclaimable:2104kB slab_unreclaimable:10804kB kernel_stack:792kB pagetables:548kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:14594 all_unreclaimable? yes
> lowmem_reserve[]: 0 0
> Normal: 317*4kB 90*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2004kB
> 3101 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 65536 pages RAM
> 28149 pages reserved
> 3039 pages shared
> 33680 pages non-shared
> [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> [ 254] 0 254 474 16 1 0 0 wdog
> [ 263] 0 263 1225 88 0 0 0 tpmd
> [ 327] 0 327 1026 255 1 0 0 nmbd
> [ 329] 0 329 1803 175 1 0 0 smbd
> [ 349] 0 349 1803 175 0 0 0 smbd
> [ 372] 1 372 499 19 1 0 0 portmap
> [ 383] 998 383 762 37 1 0 0 dbus-daemon
> [ 387] 0 387 666 24 1 0 0 dropbear
> [ 392] 0 392 664 48 0 0 0 crond
> [ 398] 0 398 758 22 1 0 0 inetd
> [ 401] 0 401 664 35 1 0 0 syslogd
> [ 403] 0 403 664 52 0 0 0 klogd
> [ 410] 997 410 922 95 1 0 0 avahi-daemon
> [ 411] 997 411 922 42 0 0 0 avahi-daemon
> [ 7811] 65534 7811 7424 187 1 0 0 msgd
> [ 7819] 0 7819 1266 45 0 0 0 oscam
> [ 7820] 0 7820 6733 2491 1 0 0 oscam
> [ 7821] 0 7821 664 16 1 0 0 enigma2.sh
> [ 7828] 0 7828 44920 19651 1 0 0 enigma2
> Out of memory: Kill process 7828 (enigma2) score 496 or sacrifice child
> Killed process 7828 (enigma2) total-vm:179680kB, anon-rss:77180kB, file-rss:1424kB

The other boxes simply lock up.

This does NOT happen with NFS mounted using smaller buffers!

>> I've "discovered" a few interesting things:
>> - adding swap to the dm8000 makes the problem almost go away,
>> although without NFS it definitely doesn't need swap, ever.
>> - when I ran my laptop (x86_64!) with a slightly older kernel
>> (2.6.35 iirc) from a rescue cd, at a certain point I also got nasty
>> dmesg reports and the "dd" proces got stuck in D state, this was
>> reproducable over reboots.
>
> Why do you believe that's the same problem?

Because all are solved with smaller nfs mount buffers. That is as much
as I understand.

> OK, thanks for the reports, let us know i you're able to narrow it down
> farther. It's not familiar off the top of my head.

Okay, at least it's good to know it's not a known problem with a known
solution / workaround. I hope the kernel message helps.

As a temporary workaround (for "dumb users" that don't know what a mount
option is, yes it's awful!) I'd like to modify the kernel of the clients
to negotiate a smaller buffer size, 32k would probably suffice. I've had
a few shots but have not been successful yet, can you give me a pointer
please?

Thanks!