2024-03-24 20:49:14

by Jan Schunk

[permalink] [raw]
Subject: Aw: Re: [External] : nfsd: memory leak when client does many file operations

The "heavy usage" is a simple script runinng on the client and does the following:
1. Create a empty git repository on the share
2. Unpacking a tar.gz archive (Qnap GPL source code)
3. Remove some folders/files
4. Use diff to compare it with an older version
5. commit them to the git
6. Repeat at step 2 with next archive

On my armhf NAS the other memory consuming workload is an SMB server.
On the test VM the other memory consuming workload is a GNOME desktop.

But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
The size of swap also does not make a difference.

> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
> Von: "Chuck Lever III" <[email protected]>
> An: "Jan Schunk" <[email protected]>
> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
>
>
> > On Mar 24, 2024, at 3:57 PM, Jan Schunk <[email protected]> wrote:
> >
> > Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> > Not found on: v6.4, v6.1.82 and below
> > Architectures: amd64 and arm(hf)
> >
> > Steps to reproduce:
> > - Create a VM with 1GB RAM
> > - Install Debian 12
> > - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> > - Export some folder
> > On the client:
> > - Mount the share
> > - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
>
> Hi Jan, thanks for the report.
>
> The "produce heavy usage" instruction here is pretty vague.
> I run CI testing with kmemleak enabled, and have not seen
> any leaks on recent kernels when running the git regression
> tests, which are similar to this kind of workload.
>
> Can you try to narrow the reproducer for us, even just a
> little? What client action exactly is triggering the memory
> leak? Is there any other workload on your NFS server that
> might be consuming memory?
>
>
> > On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> >
> > [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> > [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> > [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> > [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> > [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> > [121971.930000] Hardware name: Freescale LS1024A
> > [121971.940000] unwind_backtrace from show_stack+0xb/0xc
> > [121971.940000] show_stack from dump_stack_lvl+0x2b/0x34
> > [121971.950000] dump_stack_lvl from dump_header+0x35/0x212
> > [121971.950000] dump_header from out_of_memory+0x317/0x34c
> > [121971.960000] out_of_memory from __alloc_pages+0x8e7/0xbb0
> > [121971.970000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> > [121971.970000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> > [121971.980000] svc_recv from nfsd+0x7d/0xd4
> > [121971.980000] nfsd from kthread+0xb9/0xcc
> > [121971.990000] kthread from ret_from_fork+0x11/0x1c
> > [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> > [121971.990000] dfa0: 00000000 00000000 00000000 00000000
> > [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [121972.020000] Mem-Info:
> > [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> > [121972.020000] active_file:1200 inactive_file:1204 isolated_file:98
> > [121972.020000] unevictable:394 dirty:296 writeback:17
> > [121972.020000] slab_reclaimable:13680 slab_unreclaimable:4350
> > [121972.020000] mapped:637 shmem:4 pagetables:414
> > [121972.020000] sec_pagetables:0 bounce:0
> > [121972.020000] kernel_misc_reclaimable:0
> > [121972.020000] free:7279 free_pcp:184 free_cma:1094
> > [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> > [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> > [121972.120000] lowmem_reserve[]: 0 0
> > [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> > [121972.140000] 2991 total pagecache pages
> > [121972.140000] 166 pages in swap cache
> > [121972.140000] Free swap = 93424kB
> > [121972.150000] Total swap = 102396kB
> > [121972.150000] 262144 pages RAM
> > [121972.150000] 0 pages HighMem/MovableOnly
> > [121972.160000] 9147 pages reserved
> > [121972.160000] 4096 pages cma reserved
> > [121972.160000] Unreclaimable slab info:
> > [121972.170000] Name Used Total
> > [121972.170000] bio-88 64KB 64KB
> > [121972.180000] TCPv6 61KB 61KB
> > [121972.180000] bio-76 16KB 16KB
> > [121972.190000] bio-188 11KB 11KB
> > [121972.190000] nfs_read_data 22KB 22KB
> > [121972.200000] kioctx 15KB 15KB
> > [121972.200000] posix_timers_cache 7KB 7KB
> > [121972.210000] UDP 63KB 63KB
> > [121972.220000] tw_sock_TCP 3KB 3KB
> > [121972.220000] request_sock_TCP 3KB 3KB
> > [121972.230000] TCP 62KB 62KB
> > [121972.230000] bio-168 7KB 7KB
> > [121972.240000] ep_head 8KB 8KB
> > [121972.240000] request_queue 15KB 15KB
> > [121972.250000] bio-124 18KB 40KB
> > [121972.250000] biovec-max 264KB 264KB
> > [121972.260000] biovec-128 63KB 63KB
> > [121972.260000] biovec-64 157KB 157KB
> > [121972.270000] skbuff_small_head 94KB 94KB
> > [121972.270000] skbuff_fclone_cache 55KB 63KB
> > [121972.280000] skbuff_head_cache 59KB 59KB
> > [121972.280000] fsnotify_mark_connector 16KB 28KB
> > [121972.290000] sigqueue 19KB 31KB
> > [121972.300000] shmem_inode_cache 1622KB 1662KB
> > [121972.300000] kernfs_iattrs_cache 15KB 15KB
> > [121972.310000] kernfs_node_cache 2107KB 2138KB
> > [121972.310000] filp 259KB 315KB
> > [121972.320000] net_namespace 30KB 30KB
> > [121972.320000] uts_namespace 15KB 15KB
> > [121972.330000] vma_lock 143KB 179KB
> > [121972.330000] vm_area_struct 459KB 553KB
> > [121972.340000] sighand_cache 191KB 220KB
> > [121972.340000] task_struct 378KB 446KB
> > [121972.350000] anon_vma_chain 753KB 804KB
> > [121972.360000] anon_vma 170KB 207KB
> > [121972.360000] trace_event_file 83KB 83KB
> > [121972.370000] mm_struct 157KB 173KB
> > [121972.370000] vmap_area 217KB 354KB
> > [121972.380000] kmalloc-8k 224KB 224KB
> > [121972.380000] kmalloc-4k 860KB 992KB
> > [121972.390000] kmalloc-2k 352KB 352KB
> > [121972.390000] kmalloc-1k 563KB 576KB
> > [121972.400000] kmalloc-512 936KB 936KB
> > [121972.400000] kmalloc-256 196KB 240KB
> > [121972.410000] kmalloc-192 160KB 169KB
> > [121972.410000] kmalloc-128 546KB 764KB
> > [121972.420000] kmalloc-64 1213KB 1288KB
> > [121972.420000] kmem_cache_node 12KB 12KB
> > [121972.430000] kmem_cache 16KB 16KB
> > [121972.440000] Tasks state (memory values in pages):
> > [121972.440000] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > [121972.450000] [ 209] 0 209 5140 320 0 320 0 16384 480 -1000 systemd-udevd
> > [121972.460000] [ 230] 998 230 2887 55 32 23 0 18432 0 0 systemd-network
> > [121972.470000] [ 420] 0 420 596 0 0 0 0 6144 22 0 mdadm
> > [121972.490000] [ 421] 102 421 1393 56 32 24 0 10240 0 0 rpcbind
> > [121972.500000] [ 429] 996 429 3695 17 0 17 0 20480 0 0 systemd-resolve
> > [121972.510000] [ 433] 0 433 494 51 0 51 0 8192 0 0 rpc.idmapd
> > [121972.520000] [ 434] 0 434 743 92 33 59 0 8192 7 0 nfsdcld
> > [121972.530000] [ 451] 0 451 390 0 0 0 0 6144 0 0 acpid
> > [121972.540000] [ 453] 105 453 1380 50 32 18 0 10240 18 0 avahi-daemon
> > [121972.550000] [ 454] 101 454 1549 16 0 16 0 12288 32 -900 dbus-daemon
> > [121972.560000] [ 466] 0 466 3771 60 0 60 0 14336 0 0 irqbalance
> > [121972.570000] [ 475] 0 475 6269 32 32 0 0 18432 0 0 rsyslogd
> > [121972.590000] [ 487] 105 487 1347 68 38 30 0 10240 0 0 avahi-daemon
> > [121972.600000] [ 492] 0 492 1765 0 0 0 0 12288 0 0 cron
> > [121972.610000] [ 493] 0 493 2593 0 0 0 0 16384 0 0 wpa_supplicant
> > [121972.620000] [ 494] 0 494 607 0 0 0 0 8192 32 0 atd
> > [121972.630000] [ 506] 0 506 1065 25 0 25 0 10240 0 0 rpc.mountd
> > [121972.640000] [ 514] 103 514 809 25 0 25 0 8192 0 0 rpc.statd
> > [121972.650000] [ 522] 0 522 999 31 0 31 0 10240 0 0 agetty
> > [121972.660000] [ 524] 0 524 1540 28 0 28 0 12288 0 0 agetty
> > [121972.670000] [ 525] 0 525 9098 56 32 24 0 34816 0 0 unattended-upgr
> > [121972.690000] [ 526] 0 526 2621 320 0 320 0 14336 192 -1000 sshd
> > [121972.700000] [ 539] 0 539 849 32 32 0 0 8192 0 0 in.tftpd
> > [121972.710000] [ 544] 113 544 4361 6 6 0 0 16384 25 0 chronyd
> > [121972.720000] [ 546] 0 546 16816 62 32 30 0 45056 0 0 winbindd
> > [121972.730000] [ 552] 0 552 16905 59 32 27 0 45056 3 0 winbindd
> > [121972.740000] [ 559] 0 559 17849 94 32 30 32 49152 4 0 smbd
> > [121972.750000] [ 572] 0 572 17409 40 16 24 0 43008 11 0 smbd-notifyd
> > [121972.760000] [ 573] 0 573 17412 16 16 0 0 43008 24 0 cleanupd
> > [121972.770000] [ 584] 0 584 3036 20 0 20 0 16384 4 0 sshd
> > [121972.780000] [ 589] 0 589 16816 32 2 30 0 40960 21 0 winbindd
> > [121972.790000] [ 590] 0 590 27009 47 23 24 0 65536 21 0 smbd
> > [121972.810000] [ 597] 501 597 3344 91 32 59 0 20480 0 100 systemd
> > [121972.820000] [ 653] 501 653 3036 0 0 0 0 16384 33 0 sshd
> > [121972.830000] [ 656] 501 656 1938 93 32 61 0 12288 9 0 bash
> > [121972.840000] [ 704] 0 704 395 352 64 288 0 6144 0 -1000 watchdog
> > [121972.850000] [ 738] 501 738 2834 12 0 12 0 16384 6 0 top
> > [121972.860000] [ 4750] 0 4750 4218 44 26 18 0 18432 11 0 proftpd
> > [121972.870000] [ 4768] 0 4768 401 31 0 31 0 6144 0 0 apt.systemd.dai
> > [121972.880000] [ 4772] 0 4772 401 31 0 31 0 6144 0 0 apt.systemd.dai
> > [121972.890000] [ 4778] 0 4778 13556 54 0 54 0 59392 26 0 apt-get
> > [121972.900000] Out of memory and no killable processes...
> > [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> > [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> > [121972.920000] Hardware name: Freescale LS1024A
> > [121972.930000] unwind_backtrace from show_stack+0xb/0xc
> > [121972.930000] show_stack from dump_stack_lvl+0x2b/0x34
> > [121972.940000] dump_stack_lvl from panic+0xbf/0x264
> > [121972.940000] panic from out_of_memory+0x33f/0x34c
> > [121972.950000] out_of_memory from __alloc_pages+0x8e7/0xbb0
> > [121972.950000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> > [121972.960000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> > [121972.960000] svc_recv from nfsd+0x7d/0xd4
> > [121972.970000] nfsd from kthread+0xb9/0xcc
> > [121972.970000] kthread from ret_from_fork+0x11/0x1c
> > [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> > [121972.980000] dfa0: 00000000 00000000 00000000 00000000
> > [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [121973.010000] CPU0: stopping
> > [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> > [121973.010000] Hardware name: Freescale LS1024A
> > [121973.010000] unwind_backtrace from show_stack+0xb/0xc
> > [121973.010000] show_stack from dump_stack_lvl+0x2b/0x34
> > [121973.010000] dump_stack_lvl from do_handle_IPI+0x151/0x178
> > [121973.010000] do_handle_IPI from ipi_handler+0x13/0x18
> > [121973.010000] ipi_handler from handle_percpu_devid_irq+0x55/0x144
> > [121973.010000] handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> > [121973.010000] generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> > [121973.010000] gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> > [121973.010000] generic_handle_arch_irq from call_with_stack+0xd/0x10
> > [121973.010000] Rebooting in 90 seconds..
>
> --
> Chuck Lever
>
>


2024-03-24 21:13:26

by Chuck Lever

[permalink] [raw]
Subject: Re: [External] : nfsd: memory leak when client does many file operations


> On Mar 24, 2024, at 4:48 PM, Jan Schunk <[email protected]> wrote:
>
> The "heavy usage" is a simple script runinng on the client and does the following:
> 1. Create a empty git repository on the share
> 2. Unpacking a tar.gz archive (Qnap GPL source code)
> 3. Remove some folders/files
> 4. Use diff to compare it with an older version
> 5. commit them to the git
> 6. Repeat at step 2 with next archive
>
> On my armhf NAS the other memory consuming workload is an SMB server.

I'm not sure any of us has a Freescale system to try this ...


> On the test VM the other memory consuming workload is a GNOME desktop.

... and so I'm hoping this VM is an x86_64 system.


> But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
> The size of swap also does not make a difference.

What is the nfsd thread count on the server? 'pgrep -c nfsd'

What version of NFS does your client mount with?

What is the speed of the network between your client and server?

What is the type of the exported file system?

Do you use NFS with Kerberos?


>> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
>> Von: "Chuck Lever III" <[email protected]>
>> An: "Jan Schunk" <[email protected]>
>> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>>
>>
>>
>>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <[email protected]> wrote:
>>>
>>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
>>> Not found on: v6.4, v6.1.82 and below
>>> Architectures: amd64 and arm(hf)
>>>
>>> Steps to reproduce:
>>> - Create a VM with 1GB RAM
>>> - Install Debian 12
>>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
>>> - Export some folder
>>> On the client:
>>> - Mount the share
>>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
>>
>> Hi Jan, thanks for the report.
>>
>> The "produce heavy usage" instruction here is pretty vague.
>> I run CI testing with kmemleak enabled, and have not seen
>> any leaks on recent kernels when running the git regression
>> tests, which are similar to this kind of workload.
>>
>> Can you try to narrow the reproducer for us, even just a
>> little? What client action exactly is triggering the memory
>> leak? Is there any other workload on your NFS server that
>> might be consuming memory?
>>
>>
>>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
>>>
>>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
>>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
>>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
>>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
>>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>> [121971.930000] Hardware name: Freescale LS1024A
>>> [121971.940000] unwind_backtrace from show_stack+0xb/0xc
>>> [121971.940000] show_stack from dump_stack_lvl+0x2b/0x34
>>> [121971.950000] dump_stack_lvl from dump_header+0x35/0x212
>>> [121971.950000] dump_header from out_of_memory+0x317/0x34c
>>> [121971.960000] out_of_memory from __alloc_pages+0x8e7/0xbb0
>>> [121971.970000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>> [121971.970000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>> [121971.980000] svc_recv from nfsd+0x7d/0xd4
>>> [121971.980000] nfsd from kthread+0xb9/0xcc
>>> [121971.990000] kthread from ret_from_fork+0x11/0x1c
>>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>> [121971.990000] dfa0: 00000000 00000000 00000000 00000000
>>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> [121972.020000] Mem-Info:
>>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
>>> [121972.020000] active_file:1200 inactive_file:1204 isolated_file:98
>>> [121972.020000] unevictable:394 dirty:296 writeback:17
>>> [121972.020000] slab_reclaimable:13680 slab_unreclaimable:4350
>>> [121972.020000] mapped:637 shmem:4 pagetables:414
>>> [121972.020000] sec_pagetables:0 bounce:0
>>> [121972.020000] kernel_misc_reclaimable:0
>>> [121972.020000] free:7279 free_pcp:184 free_cma:1094
>>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
>>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
>>> [121972.120000] lowmem_reserve[]: 0 0
>>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
>>> [121972.140000] 2991 total pagecache pages
>>> [121972.140000] 166 pages in swap cache
>>> [121972.140000] Free swap = 93424kB
>>> [121972.150000] Total swap = 102396kB
>>> [121972.150000] 262144 pages RAM
>>> [121972.150000] 0 pages HighMem/MovableOnly
>>> [121972.160000] 9147 pages reserved
>>> [121972.160000] 4096 pages cma reserved
>>> [121972.160000] Unreclaimable slab info:
>>> [121972.170000] Name Used Total
>>> [121972.170000] bio-88 64KB 64KB
>>> [121972.180000] TCPv6 61KB 61KB
>>> [121972.180000] bio-76 16KB 16KB
>>> [121972.190000] bio-188 11KB 11KB
>>> [121972.190000] nfs_read_data 22KB 22KB
>>> [121972.200000] kioctx 15KB 15KB
>>> [121972.200000] posix_timers_cache 7KB 7KB
>>> [121972.210000] UDP 63KB 63KB
>>> [121972.220000] tw_sock_TCP 3KB 3KB
>>> [121972.220000] request_sock_TCP 3KB 3KB
>>> [121972.230000] TCP 62KB 62KB
>>> [121972.230000] bio-168 7KB 7KB
>>> [121972.240000] ep_head 8KB 8KB
>>> [121972.240000] request_queue 15KB 15KB
>>> [121972.250000] bio-124 18KB 40KB
>>> [121972.250000] biovec-max 264KB 264KB
>>> [121972.260000] biovec-128 63KB 63KB
>>> [121972.260000] biovec-64 157KB 157KB
>>> [121972.270000] skbuff_small_head 94KB 94KB
>>> [121972.270000] skbuff_fclone_cache 55KB 63KB
>>> [121972.280000] skbuff_head_cache 59KB 59KB
>>> [121972.280000] fsnotify_mark_connector 16KB 28KB
>>> [121972.290000] sigqueue 19KB 31KB
>>> [121972.300000] shmem_inode_cache 1622KB 1662KB
>>> [121972.300000] kernfs_iattrs_cache 15KB 15KB
>>> [121972.310000] kernfs_node_cache 2107KB 2138KB
>>> [121972.310000] filp 259KB 315KB
>>> [121972.320000] net_namespace 30KB 30KB
>>> [121972.320000] uts_namespace 15KB 15KB
>>> [121972.330000] vma_lock 143KB 179KB
>>> [121972.330000] vm_area_struct 459KB 553KB
>>> [121972.340000] sighand_cache 191KB 220KB
>>> [121972.340000] task_struct 378KB 446KB
>>> [121972.350000] anon_vma_chain 753KB 804KB
>>> [121972.360000] anon_vma 170KB 207KB
>>> [121972.360000] trace_event_file 83KB 83KB
>>> [121972.370000] mm_struct 157KB 173KB
>>> [121972.370000] vmap_area 217KB 354KB
>>> [121972.380000] kmalloc-8k 224KB 224KB
>>> [121972.380000] kmalloc-4k 860KB 992KB
>>> [121972.390000] kmalloc-2k 352KB 352KB
>>> [121972.390000] kmalloc-1k 563KB 576KB
>>> [121972.400000] kmalloc-512 936KB 936KB
>>> [121972.400000] kmalloc-256 196KB 240KB
>>> [121972.410000] kmalloc-192 160KB 169KB
>>> [121972.410000] kmalloc-128 546KB 764KB
>>> [121972.420000] kmalloc-64 1213KB 1288KB
>>> [121972.420000] kmem_cache_node 12KB 12KB
>>> [121972.430000] kmem_cache 16KB 16KB
>>> [121972.440000] Tasks state (memory values in pages):
>>> [121972.440000] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
>>> [121972.450000] [ 209] 0 209 5140 320 0 320 0 16384 480 -1000 systemd-udevd
>>> [121972.460000] [ 230] 998 230 2887 55 32 23 0 18432 0 0 systemd-network
>>> [121972.470000] [ 420] 0 420 596 0 0 0 0 6144 22 0 mdadm
>>> [121972.490000] [ 421] 102 421 1393 56 32 24 0 10240 0 0 rpcbind
>>> [121972.500000] [ 429] 996 429 3695 17 0 17 0 20480 0 0 systemd-resolve
>>> [121972.510000] [ 433] 0 433 494 51 0 51 0 8192 0 0 rpc.idmapd
>>> [121972.520000] [ 434] 0 434 743 92 33 59 0 8192 7 0 nfsdcld
>>> [121972.530000] [ 451] 0 451 390 0 0 0 0 6144 0 0 acpid
>>> [121972.540000] [ 453] 105 453 1380 50 32 18 0 10240 18 0 avahi-daemon
>>> [121972.550000] [ 454] 101 454 1549 16 0 16 0 12288 32 -900 dbus-daemon
>>> [121972.560000] [ 466] 0 466 3771 60 0 60 0 14336 0 0 irqbalance
>>> [121972.570000] [ 475] 0 475 6269 32 32 0 0 18432 0 0 rsyslogd
>>> [121972.590000] [ 487] 105 487 1347 68 38 30 0 10240 0 0 avahi-daemon
>>> [121972.600000] [ 492] 0 492 1765 0 0 0 0 12288 0 0 cron
>>> [121972.610000] [ 493] 0 493 2593 0 0 0 0 16384 0 0 wpa_supplicant
>>> [121972.620000] [ 494] 0 494 607 0 0 0 0 8192 32 0 atd
>>> [121972.630000] [ 506] 0 506 1065 25 0 25 0 10240 0 0 rpc.mountd
>>> [121972.640000] [ 514] 103 514 809 25 0 25 0 8192 0 0 rpc.statd
>>> [121972.650000] [ 522] 0 522 999 31 0 31 0 10240 0 0 agetty
>>> [121972.660000] [ 524] 0 524 1540 28 0 28 0 12288 0 0 agetty
>>> [121972.670000] [ 525] 0 525 9098 56 32 24 0 34816 0 0 unattended-upgr
>>> [121972.690000] [ 526] 0 526 2621 320 0 320 0 14336 192 -1000 sshd
>>> [121972.700000] [ 539] 0 539 849 32 32 0 0 8192 0 0 in.tftpd
>>> [121972.710000] [ 544] 113 544 4361 6 6 0 0 16384 25 0 chronyd
>>> [121972.720000] [ 546] 0 546 16816 62 32 30 0 45056 0 0 winbindd
>>> [121972.730000] [ 552] 0 552 16905 59 32 27 0 45056 3 0 winbindd
>>> [121972.740000] [ 559] 0 559 17849 94 32 30 32 49152 4 0 smbd
>>> [121972.750000] [ 572] 0 572 17409 40 16 24 0 43008 11 0 smbd-notifyd
>>> [121972.760000] [ 573] 0 573 17412 16 16 0 0 43008 24 0 cleanupd
>>> [121972.770000] [ 584] 0 584 3036 20 0 20 0 16384 4 0 sshd
>>> [121972.780000] [ 589] 0 589 16816 32 2 30 0 40960 21 0 winbindd
>>> [121972.790000] [ 590] 0 590 27009 47 23 24 0 65536 21 0 smbd
>>> [121972.810000] [ 597] 501 597 3344 91 32 59 0 20480 0 100 systemd
>>> [121972.820000] [ 653] 501 653 3036 0 0 0 0 16384 33 0 sshd
>>> [121972.830000] [ 656] 501 656 1938 93 32 61 0 12288 9 0 bash
>>> [121972.840000] [ 704] 0 704 395 352 64 288 0 6144 0 -1000 watchdog
>>> [121972.850000] [ 738] 501 738 2834 12 0 12 0 16384 6 0 top
>>> [121972.860000] [ 4750] 0 4750 4218 44 26 18 0 18432 11 0 proftpd
>>> [121972.870000] [ 4768] 0 4768 401 31 0 31 0 6144 0 0 apt.systemd.dai
>>> [121972.880000] [ 4772] 0 4772 401 31 0 31 0 6144 0 0 apt.systemd.dai
>>> [121972.890000] [ 4778] 0 4778 13556 54 0 54 0 59392 26 0 apt-get
>>> [121972.900000] Out of memory and no killable processes...
>>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
>>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>> [121972.920000] Hardware name: Freescale LS1024A
>>> [121972.930000] unwind_backtrace from show_stack+0xb/0xc
>>> [121972.930000] show_stack from dump_stack_lvl+0x2b/0x34
>>> [121972.940000] dump_stack_lvl from panic+0xbf/0x264
>>> [121972.940000] panic from out_of_memory+0x33f/0x34c
>>> [121972.950000] out_of_memory from __alloc_pages+0x8e7/0xbb0
>>> [121972.950000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>> [121972.960000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>> [121972.960000] svc_recv from nfsd+0x7d/0xd4
>>> [121972.970000] nfsd from kthread+0xb9/0xcc
>>> [121972.970000] kthread from ret_from_fork+0x11/0x1c
>>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>> [121972.980000] dfa0: 00000000 00000000 00000000 00000000
>>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> [121973.010000] CPU0: stopping
>>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>> [121973.010000] Hardware name: Freescale LS1024A
>>> [121973.010000] unwind_backtrace from show_stack+0xb/0xc
>>> [121973.010000] show_stack from dump_stack_lvl+0x2b/0x34
>>> [121973.010000] dump_stack_lvl from do_handle_IPI+0x151/0x178
>>> [121973.010000] do_handle_IPI from ipi_handler+0x13/0x18
>>> [121973.010000] ipi_handler from handle_percpu_devid_irq+0x55/0x144
>>> [121973.010000] handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
>>> [121973.010000] generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
>>> [121973.010000] gic_handle_irq from generic_handle_arch_irq+0x27/0x34
>>> [121973.010000] generic_handle_arch_irq from call_with_stack+0xd/0x10
>>> [121973.010000] Rebooting in 90 seconds..
>>
>> --
>> Chuck Lever
>>
>>

--
Chuck Lever


2024-03-24 21:40:33

by Jan Schunk

[permalink] [raw]
Subject: Aw: Re: [External] : nfsd: memory leak when client does many file operations

Yes, the VM is x86_64.

"pgrep -c nfsd" says: 9

I use NFS version 3.

All network ports are connected with 1GBit/s.

The exported file system is ext4.

I do not use any authentication.

The mount options in /etc/fstab are:
rw,noatime,nfsvers=3,proto=tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noauto

The line in /etc/exports:
/export/data3 192.168.0.0/16(fsid=<uuid>,rw,no_root_squash,async,no_subtree_check)


> Gesendet: Sonntag, den 24.03.2024 um 22:10 Uhr
> Von: "Chuck Lever III" <[email protected]>
> An: "Jan Schunk" <[email protected]>
> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
>
> > On Mar 24, 2024, at 4:48 PM, Jan Schunk <[email protected]> wrote:
> >
> > The "heavy usage" is a simple script runinng on the client and does the following:
> > 1. Create a empty git repository on the share
> > 2. Unpacking a tar.gz archive (Qnap GPL source code)
> > 3. Remove some folders/files
> > 4. Use diff to compare it with an older version
> > 5. commit them to the git
> > 6. Repeat at step 2 with next archive
> >
> > On my armhf NAS the other memory consuming workload is an SMB server.
>
> I'm not sure any of us has a Freescale system to try this ...
>
>
> > On the test VM the other memory consuming workload is a GNOME desktop.
>
> ... and so I'm hoping this VM is an x86_64 system.
>
>
> > But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
> > The size of swap also does not make a difference.
>
> What is the nfsd thread count on the server? 'pgrep -c nfsd'
>
> What version of NFS does your client mount with?
>
> What is the speed of the network between your client and server?
>
> What is the type of the exported file system?
>
> Do you use NFS with Kerberos?
>
>
> >> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
> >> Von: "Chuck Lever III" <[email protected]>
> >> An: "Jan Schunk" <[email protected]>
> >> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> >> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >>
> >>
> >>
> >>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <[email protected]> wrote:
> >>>
> >>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> >>> Not found on: v6.4, v6.1.82 and below
> >>> Architectures: amd64 and arm(hf)
> >>>
> >>> Steps to reproduce:
> >>> - Create a VM with 1GB RAM
> >>> - Install Debian 12
> >>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> >>> - Export some folder
> >>> On the client:
> >>> - Mount the share
> >>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
> >>
> >> Hi Jan, thanks for the report.
> >>
> >> The "produce heavy usage" instruction here is pretty vague.
> >> I run CI testing with kmemleak enabled, and have not seen
> >> any leaks on recent kernels when running the git regression
> >> tests, which are similar to this kind of workload.
> >>
> >> Can you try to narrow the reproducer for us, even just a
> >> little? What client action exactly is triggering the memory
> >> leak? Is there any other workload on your NFS server that
> >> might be consuming memory?
> >>
> >>
> >>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> >>>
> >>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> >>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> >>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> >>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> >>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>> [121971.930000] Hardware name: Freescale LS1024A
> >>> [121971.940000] unwind_backtrace from show_stack+0xb/0xc
> >>> [121971.940000] show_stack from dump_stack_lvl+0x2b/0x34
> >>> [121971.950000] dump_stack_lvl from dump_header+0x35/0x212
> >>> [121971.950000] dump_header from out_of_memory+0x317/0x34c
> >>> [121971.960000] out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>> [121971.970000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>> [121971.970000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>> [121971.980000] svc_recv from nfsd+0x7d/0xd4
> >>> [121971.980000] nfsd from kthread+0xb9/0xcc
> >>> [121971.990000] kthread from ret_from_fork+0x11/0x1c
> >>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>> [121971.990000] dfa0: 00000000 00000000 00000000 00000000
> >>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>> [121972.020000] Mem-Info:
> >>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> >>> [121972.020000] active_file:1200 inactive_file:1204 isolated_file:98
> >>> [121972.020000] unevictable:394 dirty:296 writeback:17
> >>> [121972.020000] slab_reclaimable:13680 slab_unreclaimable:4350
> >>> [121972.020000] mapped:637 shmem:4 pagetables:414
> >>> [121972.020000] sec_pagetables:0 bounce:0
> >>> [121972.020000] kernel_misc_reclaimable:0
> >>> [121972.020000] free:7279 free_pcp:184 free_cma:1094
> >>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> >>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> >>> [121972.120000] lowmem_reserve[]: 0 0
> >>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> >>> [121972.140000] 2991 total pagecache pages
> >>> [121972.140000] 166 pages in swap cache
> >>> [121972.140000] Free swap = 93424kB
> >>> [121972.150000] Total swap = 102396kB
> >>> [121972.150000] 262144 pages RAM
> >>> [121972.150000] 0 pages HighMem/MovableOnly
> >>> [121972.160000] 9147 pages reserved
> >>> [121972.160000] 4096 pages cma reserved
> >>> [121972.160000] Unreclaimable slab info:
> >>> [121972.170000] Name Used Total
> >>> [121972.170000] bio-88 64KB 64KB
> >>> [121972.180000] TCPv6 61KB 61KB
> >>> [121972.180000] bio-76 16KB 16KB
> >>> [121972.190000] bio-188 11KB 11KB
> >>> [121972.190000] nfs_read_data 22KB 22KB
> >>> [121972.200000] kioctx 15KB 15KB
> >>> [121972.200000] posix_timers_cache 7KB 7KB
> >>> [121972.210000] UDP 63KB 63KB
> >>> [121972.220000] tw_sock_TCP 3KB 3KB
> >>> [121972.220000] request_sock_TCP 3KB 3KB
> >>> [121972.230000] TCP 62KB 62KB
> >>> [121972.230000] bio-168 7KB 7KB
> >>> [121972.240000] ep_head 8KB 8KB
> >>> [121972.240000] request_queue 15KB 15KB
> >>> [121972.250000] bio-124 18KB 40KB
> >>> [121972.250000] biovec-max 264KB 264KB
> >>> [121972.260000] biovec-128 63KB 63KB
> >>> [121972.260000] biovec-64 157KB 157KB
> >>> [121972.270000] skbuff_small_head 94KB 94KB
> >>> [121972.270000] skbuff_fclone_cache 55KB 63KB
> >>> [121972.280000] skbuff_head_cache 59KB 59KB
> >>> [121972.280000] fsnotify_mark_connector 16KB 28KB
> >>> [121972.290000] sigqueue 19KB 31KB
> >>> [121972.300000] shmem_inode_cache 1622KB 1662KB
> >>> [121972.300000] kernfs_iattrs_cache 15KB 15KB
> >>> [121972.310000] kernfs_node_cache 2107KB 2138KB
> >>> [121972.310000] filp 259KB 315KB
> >>> [121972.320000] net_namespace 30KB 30KB
> >>> [121972.320000] uts_namespace 15KB 15KB
> >>> [121972.330000] vma_lock 143KB 179KB
> >>> [121972.330000] vm_area_struct 459KB 553KB
> >>> [121972.340000] sighand_cache 191KB 220KB
> >>> [121972.340000] task_struct 378KB 446KB
> >>> [121972.350000] anon_vma_chain 753KB 804KB
> >>> [121972.360000] anon_vma 170KB 207KB
> >>> [121972.360000] trace_event_file 83KB 83KB
> >>> [121972.370000] mm_struct 157KB 173KB
> >>> [121972.370000] vmap_area 217KB 354KB
> >>> [121972.380000] kmalloc-8k 224KB 224KB
> >>> [121972.380000] kmalloc-4k 860KB 992KB
> >>> [121972.390000] kmalloc-2k 352KB 352KB
> >>> [121972.390000] kmalloc-1k 563KB 576KB
> >>> [121972.400000] kmalloc-512 936KB 936KB
> >>> [121972.400000] kmalloc-256 196KB 240KB
> >>> [121972.410000] kmalloc-192 160KB 169KB
> >>> [121972.410000] kmalloc-128 546KB 764KB
> >>> [121972.420000] kmalloc-64 1213KB 1288KB
> >>> [121972.420000] kmem_cache_node 12KB 12KB
> >>> [121972.430000] kmem_cache 16KB 16KB
> >>> [121972.440000] Tasks state (memory values in pages):
> >>> [121972.440000] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> >>> [121972.450000] [ 209] 0 209 5140 320 0 320 0 16384 480 -1000 systemd-udevd
> >>> [121972.460000] [ 230] 998 230 2887 55 32 23 0 18432 0 0 systemd-network
> >>> [121972.470000] [ 420] 0 420 596 0 0 0 0 6144 22 0 mdadm
> >>> [121972.490000] [ 421] 102 421 1393 56 32 24 0 10240 0 0 rpcbind
> >>> [121972.500000] [ 429] 996 429 3695 17 0 17 0 20480 0 0 systemd-resolve
> >>> [121972.510000] [ 433] 0 433 494 51 0 51 0 8192 0 0 rpc.idmapd
> >>> [121972.520000] [ 434] 0 434 743 92 33 59 0 8192 7 0 nfsdcld
> >>> [121972.530000] [ 451] 0 451 390 0 0 0 0 6144 0 0 acpid
> >>> [121972.540000] [ 453] 105 453 1380 50 32 18 0 10240 18 0 avahi-daemon
> >>> [121972.550000] [ 454] 101 454 1549 16 0 16 0 12288 32 -900 dbus-daemon
> >>> [121972.560000] [ 466] 0 466 3771 60 0 60 0 14336 0 0 irqbalance
> >>> [121972.570000] [ 475] 0 475 6269 32 32 0 0 18432 0 0 rsyslogd
> >>> [121972.590000] [ 487] 105 487 1347 68 38 30 0 10240 0 0 avahi-daemon
> >>> [121972.600000] [ 492] 0 492 1765 0 0 0 0 12288 0 0 cron
> >>> [121972.610000] [ 493] 0 493 2593 0 0 0 0 16384 0 0 wpa_supplicant
> >>> [121972.620000] [ 494] 0 494 607 0 0 0 0 8192 32 0 atd
> >>> [121972.630000] [ 506] 0 506 1065 25 0 25 0 10240 0 0 rpc.mountd
> >>> [121972.640000] [ 514] 103 514 809 25 0 25 0 8192 0 0 rpc.statd
> >>> [121972.650000] [ 522] 0 522 999 31 0 31 0 10240 0 0 agetty
> >>> [121972.660000] [ 524] 0 524 1540 28 0 28 0 12288 0 0 agetty
> >>> [121972.670000] [ 525] 0 525 9098 56 32 24 0 34816 0 0 unattended-upgr
> >>> [121972.690000] [ 526] 0 526 2621 320 0 320 0 14336 192 -1000 sshd
> >>> [121972.700000] [ 539] 0 539 849 32 32 0 0 8192 0 0 in.tftpd
> >>> [121972.710000] [ 544] 113 544 4361 6 6 0 0 16384 25 0 chronyd
> >>> [121972.720000] [ 546] 0 546 16816 62 32 30 0 45056 0 0 winbindd
> >>> [121972.730000] [ 552] 0 552 16905 59 32 27 0 45056 3 0 winbindd
> >>> [121972.740000] [ 559] 0 559 17849 94 32 30 32 49152 4 0 smbd
> >>> [121972.750000] [ 572] 0 572 17409 40 16 24 0 43008 11 0 smbd-notifyd
> >>> [121972.760000] [ 573] 0 573 17412 16 16 0 0 43008 24 0 cleanupd
> >>> [121972.770000] [ 584] 0 584 3036 20 0 20 0 16384 4 0 sshd
> >>> [121972.780000] [ 589] 0 589 16816 32 2 30 0 40960 21 0 winbindd
> >>> [121972.790000] [ 590] 0 590 27009 47 23 24 0 65536 21 0 smbd
> >>> [121972.810000] [ 597] 501 597 3344 91 32 59 0 20480 0 100 systemd
> >>> [121972.820000] [ 653] 501 653 3036 0 0 0 0 16384 33 0 sshd
> >>> [121972.830000] [ 656] 501 656 1938 93 32 61 0 12288 9 0 bash
> >>> [121972.840000] [ 704] 0 704 395 352 64 288 0 6144 0 -1000 watchdog
> >>> [121972.850000] [ 738] 501 738 2834 12 0 12 0 16384 6 0 top
> >>> [121972.860000] [ 4750] 0 4750 4218 44 26 18 0 18432 11 0 proftpd
> >>> [121972.870000] [ 4768] 0 4768 401 31 0 31 0 6144 0 0 apt.systemd.dai
> >>> [121972.880000] [ 4772] 0 4772 401 31 0 31 0 6144 0 0 apt.systemd.dai
> >>> [121972.890000] [ 4778] 0 4778 13556 54 0 54 0 59392 26 0 apt-get
> >>> [121972.900000] Out of memory and no killable processes...
> >>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> >>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>> [121972.920000] Hardware name: Freescale LS1024A
> >>> [121972.930000] unwind_backtrace from show_stack+0xb/0xc
> >>> [121972.930000] show_stack from dump_stack_lvl+0x2b/0x34
> >>> [121972.940000] dump_stack_lvl from panic+0xbf/0x264
> >>> [121972.940000] panic from out_of_memory+0x33f/0x34c
> >>> [121972.950000] out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>> [121972.950000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>> [121972.960000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>> [121972.960000] svc_recv from nfsd+0x7d/0xd4
> >>> [121972.970000] nfsd from kthread+0xb9/0xcc
> >>> [121972.970000] kthread from ret_from_fork+0x11/0x1c
> >>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>> [121972.980000] dfa0: 00000000 00000000 00000000 00000000
> >>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>> [121973.010000] CPU0: stopping
> >>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>> [121973.010000] Hardware name: Freescale LS1024A
> >>> [121973.010000] unwind_backtrace from show_stack+0xb/0xc
> >>> [121973.010000] show_stack from dump_stack_lvl+0x2b/0x34
> >>> [121973.010000] dump_stack_lvl from do_handle_IPI+0x151/0x178
> >>> [121973.010000] do_handle_IPI from ipi_handler+0x13/0x18
> >>> [121973.010000] ipi_handler from handle_percpu_devid_irq+0x55/0x144
> >>> [121973.010000] handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> >>> [121973.010000] generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> >>> [121973.010000] gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> >>> [121973.010000] generic_handle_arch_irq from call_with_stack+0xd/0x10
> >>> [121973.010000] Rebooting in 90 seconds..
> >>
> >> --
> >> Chuck Lever
> >>
> >>
>
> --
> Chuck Lever
>
>

2024-03-24 22:13:54

by Chuck Lever

[permalink] [raw]
Subject: Re: [External] : nfsd: memory leak when client does many file operations



> On Mar 24, 2024, at 5:39 PM, Jan Schunk <[email protected]> wrote:
>
> Yes, the VM is x86_64.
>
> "pgrep -c nfsd" says: 9
>
> I use NFS version 3.
>
> All network ports are connected with 1GBit/s.
>
> The exported file system is ext4.
>
> I do not use any authentication.
>
> The mount options in /etc/fstab are:
> rw,noatime,nfsvers=3,proto=tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noauto
>
> The line in /etc/exports:
> /export/data3 192.168.0.0/16(fsid=<uuid>,rw,no_root_squash,async,no_subtree_check)

Is it possible to reproduce this issue without the "noatime"
mount option and without the "async" export option?


>> Gesendet: Sonntag, den 24.03.2024 um 22:10 Uhr
>> Von: "Chuck Lever III" <[email protected]>
>> An: "Jan Schunk" <[email protected]>
>> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>>
>>
>>> On Mar 24, 2024, at 4:48 PM, Jan Schunk <[email protected]> wrote:
>>>
>>> The "heavy usage" is a simple script runinng on the client and does the following:
>>> 1. Create a empty git repository on the share
>>> 2. Unpacking a tar.gz archive (Qnap GPL source code)
>>> 3. Remove some folders/files
>>> 4. Use diff to compare it with an older version
>>> 5. commit them to the git
>>> 6. Repeat at step 2 with next archive
>>>
>>> On my armhf NAS the other memory consuming workload is an SMB server.
>>
>> I'm not sure any of us has a Freescale system to try this ...
>>
>>
>>> On the test VM the other memory consuming workload is a GNOME desktop.
>>
>> ... and so I'm hoping this VM is an x86_64 system.
>>
>>
>>> But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
>>> The size of swap also does not make a difference.
>>
>> What is the nfsd thread count on the server? 'pgrep -c nfsd'
>>
>> What version of NFS does your client mount with?
>>
>> What is the speed of the network between your client and server?
>>
>> What is the type of the exported file system?
>>
>> Do you use NFS with Kerberos?
>>
>>
>>>> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
>>>> Von: "Chuck Lever III" <[email protected]>
>>>> An: "Jan Schunk" <[email protected]>
>>>> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
>>>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>>>>
>>>>
>>>>
>>>>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <[email protected]> wrote:
>>>>>
>>>>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
>>>>> Not found on: v6.4, v6.1.82 and below
>>>>> Architectures: amd64 and arm(hf)
>>>>>
>>>>> Steps to reproduce:
>>>>> - Create a VM with 1GB RAM
>>>>> - Install Debian 12
>>>>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
>>>>> - Export some folder
>>>>> On the client:
>>>>> - Mount the share
>>>>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
>>>>
>>>> Hi Jan, thanks for the report.
>>>>
>>>> The "produce heavy usage" instruction here is pretty vague.
>>>> I run CI testing with kmemleak enabled, and have not seen
>>>> any leaks on recent kernels when running the git regression
>>>> tests, which are similar to this kind of workload.
>>>>
>>>> Can you try to narrow the reproducer for us, even just a
>>>> little? What client action exactly is triggering the memory
>>>> leak? Is there any other workload on your NFS server that
>>>> might be consuming memory?
>>>>
>>>>
>>>>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
>>>>>
>>>>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
>>>>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
>>>>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
>>>>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
>>>>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>>>> [121971.930000] Hardware name: Freescale LS1024A
>>>>> [121971.940000] unwind_backtrace from show_stack+0xb/0xc
>>>>> [121971.940000] show_stack from dump_stack_lvl+0x2b/0x34
>>>>> [121971.950000] dump_stack_lvl from dump_header+0x35/0x212
>>>>> [121971.950000] dump_header from out_of_memory+0x317/0x34c
>>>>> [121971.960000] out_of_memory from __alloc_pages+0x8e7/0xbb0
>>>>> [121971.970000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>>>> [121971.970000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>>>> [121971.980000] svc_recv from nfsd+0x7d/0xd4
>>>>> [121971.980000] nfsd from kthread+0xb9/0xcc
>>>>> [121971.990000] kthread from ret_from_fork+0x11/0x1c
>>>>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>>>> [121971.990000] dfa0: 00000000 00000000 00000000 00000000
>>>>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>>>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>>> [121972.020000] Mem-Info:
>>>>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
>>>>> [121972.020000] active_file:1200 inactive_file:1204 isolated_file:98
>>>>> [121972.020000] unevictable:394 dirty:296 writeback:17
>>>>> [121972.020000] slab_reclaimable:13680 slab_unreclaimable:4350
>>>>> [121972.020000] mapped:637 shmem:4 pagetables:414
>>>>> [121972.020000] sec_pagetables:0 bounce:0
>>>>> [121972.020000] kernel_misc_reclaimable:0
>>>>> [121972.020000] free:7279 free_pcp:184 free_cma:1094
>>>>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
>>>>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
>>>>> [121972.120000] lowmem_reserve[]: 0 0
>>>>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
>>>>> [121972.140000] 2991 total pagecache pages
>>>>> [121972.140000] 166 pages in swap cache
>>>>> [121972.140000] Free swap = 93424kB
>>>>> [121972.150000] Total swap = 102396kB
>>>>> [121972.150000] 262144 pages RAM
>>>>> [121972.150000] 0 pages HighMem/MovableOnly
>>>>> [121972.160000] 9147 pages reserved
>>>>> [121972.160000] 4096 pages cma reserved
>>>>> [121972.160000] Unreclaimable slab info:
>>>>> [121972.170000] Name Used Total
>>>>> [121972.170000] bio-88 64KB 64KB
>>>>> [121972.180000] TCPv6 61KB 61KB
>>>>> [121972.180000] bio-76 16KB 16KB
>>>>> [121972.190000] bio-188 11KB 11KB
>>>>> [121972.190000] nfs_read_data 22KB 22KB
>>>>> [121972.200000] kioctx 15KB 15KB
>>>>> [121972.200000] posix_timers_cache 7KB 7KB
>>>>> [121972.210000] UDP 63KB 63KB
>>>>> [121972.220000] tw_sock_TCP 3KB 3KB
>>>>> [121972.220000] request_sock_TCP 3KB 3KB
>>>>> [121972.230000] TCP 62KB 62KB
>>>>> [121972.230000] bio-168 7KB 7KB
>>>>> [121972.240000] ep_head 8KB 8KB
>>>>> [121972.240000] request_queue 15KB 15KB
>>>>> [121972.250000] bio-124 18KB 40KB
>>>>> [121972.250000] biovec-max 264KB 264KB
>>>>> [121972.260000] biovec-128 63KB 63KB
>>>>> [121972.260000] biovec-64 157KB 157KB
>>>>> [121972.270000] skbuff_small_head 94KB 94KB
>>>>> [121972.270000] skbuff_fclone_cache 55KB 63KB
>>>>> [121972.280000] skbuff_head_cache 59KB 59KB
>>>>> [121972.280000] fsnotify_mark_connector 16KB 28KB
>>>>> [121972.290000] sigqueue 19KB 31KB
>>>>> [121972.300000] shmem_inode_cache 1622KB 1662KB
>>>>> [121972.300000] kernfs_iattrs_cache 15KB 15KB
>>>>> [121972.310000] kernfs_node_cache 2107KB 2138KB
>>>>> [121972.310000] filp 259KB 315KB
>>>>> [121972.320000] net_namespace 30KB 30KB
>>>>> [121972.320000] uts_namespace 15KB 15KB
>>>>> [121972.330000] vma_lock 143KB 179KB
>>>>> [121972.330000] vm_area_struct 459KB 553KB
>>>>> [121972.340000] sighand_cache 191KB 220KB
>>>>> [121972.340000] task_struct 378KB 446KB
>>>>> [121972.350000] anon_vma_chain 753KB 804KB
>>>>> [121972.360000] anon_vma 170KB 207KB
>>>>> [121972.360000] trace_event_file 83KB 83KB
>>>>> [121972.370000] mm_struct 157KB 173KB
>>>>> [121972.370000] vmap_area 217KB 354KB
>>>>> [121972.380000] kmalloc-8k 224KB 224KB
>>>>> [121972.380000] kmalloc-4k 860KB 992KB
>>>>> [121972.390000] kmalloc-2k 352KB 352KB
>>>>> [121972.390000] kmalloc-1k 563KB 576KB
>>>>> [121972.400000] kmalloc-512 936KB 936KB
>>>>> [121972.400000] kmalloc-256 196KB 240KB
>>>>> [121972.410000] kmalloc-192 160KB 169KB
>>>>> [121972.410000] kmalloc-128 546KB 764KB
>>>>> [121972.420000] kmalloc-64 1213KB 1288KB
>>>>> [121972.420000] kmem_cache_node 12KB 12KB
>>>>> [121972.430000] kmem_cache 16KB 16KB
>>>>> [121972.440000] Tasks state (memory values in pages):
>>>>> [121972.440000] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
>>>>> [121972.450000] [ 209] 0 209 5140 320 0 320 0 16384 480 -1000 systemd-udevd
>>>>> [121972.460000] [ 230] 998 230 2887 55 32 23 0 18432 0 0 systemd-network
>>>>> [121972.470000] [ 420] 0 420 596 0 0 0 0 6144 22 0 mdadm
>>>>> [121972.490000] [ 421] 102 421 1393 56 32 24 0 10240 0 0 rpcbind
>>>>> [121972.500000] [ 429] 996 429 3695 17 0 17 0 20480 0 0 systemd-resolve
>>>>> [121972.510000] [ 433] 0 433 494 51 0 51 0 8192 0 0 rpc.idmapd
>>>>> [121972.520000] [ 434] 0 434 743 92 33 59 0 8192 7 0 nfsdcld
>>>>> [121972.530000] [ 451] 0 451 390 0 0 0 0 6144 0 0 acpid
>>>>> [121972.540000] [ 453] 105 453 1380 50 32 18 0 10240 18 0 avahi-daemon
>>>>> [121972.550000] [ 454] 101 454 1549 16 0 16 0 12288 32 -900 dbus-daemon
>>>>> [121972.560000] [ 466] 0 466 3771 60 0 60 0 14336 0 0 irqbalance
>>>>> [121972.570000] [ 475] 0 475 6269 32 32 0 0 18432 0 0 rsyslogd
>>>>> [121972.590000] [ 487] 105 487 1347 68 38 30 0 10240 0 0 avahi-daemon
>>>>> [121972.600000] [ 492] 0 492 1765 0 0 0 0 12288 0 0 cron
>>>>> [121972.610000] [ 493] 0 493 2593 0 0 0 0 16384 0 0 wpa_supplicant
>>>>> [121972.620000] [ 494] 0 494 607 0 0 0 0 8192 32 0 atd
>>>>> [121972.630000] [ 506] 0 506 1065 25 0 25 0 10240 0 0 rpc.mountd
>>>>> [121972.640000] [ 514] 103 514 809 25 0 25 0 8192 0 0 rpc.statd
>>>>> [121972.650000] [ 522] 0 522 999 31 0 31 0 10240 0 0 agetty
>>>>> [121972.660000] [ 524] 0 524 1540 28 0 28 0 12288 0 0 agetty
>>>>> [121972.670000] [ 525] 0 525 9098 56 32 24 0 34816 0 0 unattended-upgr
>>>>> [121972.690000] [ 526] 0 526 2621 320 0 320 0 14336 192 -1000 sshd
>>>>> [121972.700000] [ 539] 0 539 849 32 32 0 0 8192 0 0 in.tftpd
>>>>> [121972.710000] [ 544] 113 544 4361 6 6 0 0 16384 25 0 chronyd
>>>>> [121972.720000] [ 546] 0 546 16816 62 32 30 0 45056 0 0 winbindd
>>>>> [121972.730000] [ 552] 0 552 16905 59 32 27 0 45056 3 0 winbindd
>>>>> [121972.740000] [ 559] 0 559 17849 94 32 30 32 49152 4 0 smbd
>>>>> [121972.750000] [ 572] 0 572 17409 40 16 24 0 43008 11 0 smbd-notifyd
>>>>> [121972.760000] [ 573] 0 573 17412 16 16 0 0 43008 24 0 cleanupd
>>>>> [121972.770000] [ 584] 0 584 3036 20 0 20 0 16384 4 0 sshd
>>>>> [121972.780000] [ 589] 0 589 16816 32 2 30 0 40960 21 0 winbindd
>>>>> [121972.790000] [ 590] 0 590 27009 47 23 24 0 65536 21 0 smbd
>>>>> [121972.810000] [ 597] 501 597 3344 91 32 59 0 20480 0 100 systemd
>>>>> [121972.820000] [ 653] 501 653 3036 0 0 0 0 16384 33 0 sshd
>>>>> [121972.830000] [ 656] 501 656 1938 93 32 61 0 12288 9 0 bash
>>>>> [121972.840000] [ 704] 0 704 395 352 64 288 0 6144 0 -1000 watchdog
>>>>> [121972.850000] [ 738] 501 738 2834 12 0 12 0 16384 6 0 top
>>>>> [121972.860000] [ 4750] 0 4750 4218 44 26 18 0 18432 11 0 proftpd
>>>>> [121972.870000] [ 4768] 0 4768 401 31 0 31 0 6144 0 0 apt.systemd.dai
>>>>> [121972.880000] [ 4772] 0 4772 401 31 0 31 0 6144 0 0 apt.systemd.dai
>>>>> [121972.890000] [ 4778] 0 4778 13556 54 0 54 0 59392 26 0 apt-get
>>>>> [121972.900000] Out of memory and no killable processes...
>>>>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
>>>>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>>>> [121972.920000] Hardware name: Freescale LS1024A
>>>>> [121972.930000] unwind_backtrace from show_stack+0xb/0xc
>>>>> [121972.930000] show_stack from dump_stack_lvl+0x2b/0x34
>>>>> [121972.940000] dump_stack_lvl from panic+0xbf/0x264
>>>>> [121972.940000] panic from out_of_memory+0x33f/0x34c
>>>>> [121972.950000] out_of_memory from __alloc_pages+0x8e7/0xbb0
>>>>> [121972.950000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>>>> [121972.960000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>>>> [121972.960000] svc_recv from nfsd+0x7d/0xd4
>>>>> [121972.970000] nfsd from kthread+0xb9/0xcc
>>>>> [121972.970000] kthread from ret_from_fork+0x11/0x1c
>>>>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>>>> [121972.980000] dfa0: 00000000 00000000 00000000 00000000
>>>>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>>>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>>> [121973.010000] CPU0: stopping
>>>>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>>>> [121973.010000] Hardware name: Freescale LS1024A
>>>>> [121973.010000] unwind_backtrace from show_stack+0xb/0xc
>>>>> [121973.010000] show_stack from dump_stack_lvl+0x2b/0x34
>>>>> [121973.010000] dump_stack_lvl from do_handle_IPI+0x151/0x178
>>>>> [121973.010000] do_handle_IPI from ipi_handler+0x13/0x18
>>>>> [121973.010000] ipi_handler from handle_percpu_devid_irq+0x55/0x144
>>>>> [121973.010000] handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
>>>>> [121973.010000] generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
>>>>> [121973.010000] gic_handle_irq from generic_handle_arch_irq+0x27/0x34
>>>>> [121973.010000] generic_handle_arch_irq from call_with_stack+0xd/0x10
>>>>> [121973.010000] Rebooting in 90 seconds..
>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>>>
>>
>> --
>> Chuck Lever
>>
>>

--
Chuck Lever


2024-03-25 19:58:58

by Jan Schunk

[permalink] [raw]
Subject: Aw: Re: [External] : nfsd: memory leak when client does many file operations

The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.

Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.

top - 00:49:49 up 3 min, 1 user, load average: 0,21, 0,19, 0,09
Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
%CPU(s): 0,2 us, 0,3 sy, 0,0 ni, 99,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
MiB Spch: 467,0 total, 302,3 free, 89,3 used, 88,1 buff/cache
MiB Swap: 975,0 total, 975,0 free, 0,0 used. 377,7 avail Spch

top - 15:05:39 up 14:19, 1 user, load average: 1,87, 1,72, 1,65
Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
%CPU(s): 0,2 us, 4,9 sy, 0,0 ni, 53,3 id, 39,0 wa, 0,0 hi, 2,6 si, 0,0 st
MiB Spch: 467,0 total, 21,2 free, 147,1 used, 310,9 buff/cache
MiB Swap: 975,0 total, 952,9 free, 22,1 used. 319,9 avail Spch

top - 20:48:16 up 20:01, 1 user, load average: 5,02, 2,72, 2,08
Tasks: 104 total, 5 running, 99 sleeping, 0 stopped, 0 zombie
%CPU(s): 0,2 us, 46,4 sy, 0,0 ni, 11,9 id, 2,3 wa, 0,0 hi, 39,2 si, 0,0 st
MiB Spch: 467,0 total, 16,9 free, 190,8 used, 271,6 buff/cache
MiB Swap: 975,0 total, 952,9 free, 22,1 used. 276,2 avail Spch

> Gesendet: Sonntag, den 24.03.2024 um 23:13 Uhr
> Von: "Chuck Lever III" <[email protected]>
> An: "Jan Schunk" <[email protected]>
> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
>
>
> > On Mar 24, 2024, at 5:39 PM, Jan Schunk <[email protected]> wrote:
> >
> > Yes, the VM is x86_64.
> >
> > "pgrep -c nfsd" says: 9
> >
> > I use NFS version 3.
> >
> > All network ports are connected with 1GBit/s.
> >
> > The exported file system is ext4.
> >
> > I do not use any authentication.
> >
> > The mount options in /etc/fstab are:
> > rw,noatime,nfsvers=3,proto=tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noauto
> >
> > The line in /etc/exports:
> > /export/data3 192.168.0.0/16(fsid=<uuid>,rw,no_root_squash,async,no_subtree_check)
>
> Is it possible to reproduce this issue without the "noatime"
> mount option and without the "async" export option?
>
>
> >> Gesendet: Sonntag, den 24.03.2024 um 22:10 Uhr
> >> Von: "Chuck Lever III" <[email protected]>
> >> An: "Jan Schunk" <[email protected]>
> >> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> >> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >>
> >>
> >>> On Mar 24, 2024, at 4:48 PM, Jan Schunk <[email protected]> wrote:
> >>>
> >>> The "heavy usage" is a simple script runinng on the client and does the following:
> >>> 1. Create a empty git repository on the share
> >>> 2. Unpacking a tar.gz archive (Qnap GPL source code)
> >>> 3. Remove some folders/files
> >>> 4. Use diff to compare it with an older version
> >>> 5. commit them to the git
> >>> 6. Repeat at step 2 with next archive
> >>>
> >>> On my armhf NAS the other memory consuming workload is an SMB server.
> >>
> >> I'm not sure any of us has a Freescale system to try this ...
> >>
> >>
> >>> On the test VM the other memory consuming workload is a GNOME desktop.
> >>
> >> ... and so I'm hoping this VM is an x86_64 system.
> >>
> >>
> >>> But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
> >>> The size of swap also does not make a difference.
> >>
> >> What is the nfsd thread count on the server? 'pgrep -c nfsd'
> >>
> >> What version of NFS does your client mount with?
> >>
> >> What is the speed of the network between your client and server?
> >>
> >> What is the type of the exported file system?
> >>
> >> Do you use NFS with Kerberos?
> >>
> >>
> >>>> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
> >>>> Von: "Chuck Lever III" <[email protected]>
> >>>> An: "Jan Schunk" <[email protected]>
> >>>> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> >>>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >>>>
> >>>>
> >>>>
> >>>>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <[email protected]> wrote:
> >>>>>
> >>>>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> >>>>> Not found on: v6.4, v6.1.82 and below
> >>>>> Architectures: amd64 and arm(hf)
> >>>>>
> >>>>> Steps to reproduce:
> >>>>> - Create a VM with 1GB RAM
> >>>>> - Install Debian 12
> >>>>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> >>>>> - Export some folder
> >>>>> On the client:
> >>>>> - Mount the share
> >>>>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
> >>>>
> >>>> Hi Jan, thanks for the report.
> >>>>
> >>>> The "produce heavy usage" instruction here is pretty vague.
> >>>> I run CI testing with kmemleak enabled, and have not seen
> >>>> any leaks on recent kernels when running the git regression
> >>>> tests, which are similar to this kind of workload.
> >>>>
> >>>> Can you try to narrow the reproducer for us, even just a
> >>>> little? What client action exactly is triggering the memory
> >>>> leak? Is there any other workload on your NFS server that
> >>>> might be consuming memory?
> >>>>
> >>>>
> >>>>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> >>>>>
> >>>>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> >>>>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> >>>>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> >>>>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> >>>>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121971.930000] Hardware name: Freescale LS1024A
> >>>>> [121971.940000] unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121971.940000] show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121971.950000] dump_stack_lvl from dump_header+0x35/0x212
> >>>>> [121971.950000] dump_header from out_of_memory+0x317/0x34c
> >>>>> [121971.960000] out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>>>> [121971.970000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>>>> [121971.970000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>>>> [121971.980000] svc_recv from nfsd+0x7d/0xd4
> >>>>> [121971.980000] nfsd from kthread+0xb9/0xcc
> >>>>> [121971.990000] kthread from ret_from_fork+0x11/0x1c
> >>>>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>>>> [121971.990000] dfa0: 00000000 00000000 00000000 00000000
> >>>>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>>>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>>>> [121972.020000] Mem-Info:
> >>>>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> >>>>> [121972.020000] active_file:1200 inactive_file:1204 isolated_file:98
> >>>>> [121972.020000] unevictable:394 dirty:296 writeback:17
> >>>>> [121972.020000] slab_reclaimable:13680 slab_unreclaimable:4350
> >>>>> [121972.020000] mapped:637 shmem:4 pagetables:414
> >>>>> [121972.020000] sec_pagetables:0 bounce:0
> >>>>> [121972.020000] kernel_misc_reclaimable:0
> >>>>> [121972.020000] free:7279 free_pcp:184 free_cma:1094
> >>>>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> >>>>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> >>>>> [121972.120000] lowmem_reserve[]: 0 0
> >>>>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> >>>>> [121972.140000] 2991 total pagecache pages
> >>>>> [121972.140000] 166 pages in swap cache
> >>>>> [121972.140000] Free swap = 93424kB
> >>>>> [121972.150000] Total swap = 102396kB
> >>>>> [121972.150000] 262144 pages RAM
> >>>>> [121972.150000] 0 pages HighMem/MovableOnly
> >>>>> [121972.160000] 9147 pages reserved
> >>>>> [121972.160000] 4096 pages cma reserved
> >>>>> [121972.160000] Unreclaimable slab info:
> >>>>> [121972.170000] Name Used Total
> >>>>> [121972.170000] bio-88 64KB 64KB
> >>>>> [121972.180000] TCPv6 61KB 61KB
> >>>>> [121972.180000] bio-76 16KB 16KB
> >>>>> [121972.190000] bio-188 11KB 11KB
> >>>>> [121972.190000] nfs_read_data 22KB 22KB
> >>>>> [121972.200000] kioctx 15KB 15KB
> >>>>> [121972.200000] posix_timers_cache 7KB 7KB
> >>>>> [121972.210000] UDP 63KB 63KB
> >>>>> [121972.220000] tw_sock_TCP 3KB 3KB
> >>>>> [121972.220000] request_sock_TCP 3KB 3KB
> >>>>> [121972.230000] TCP 62KB 62KB
> >>>>> [121972.230000] bio-168 7KB 7KB
> >>>>> [121972.240000] ep_head 8KB 8KB
> >>>>> [121972.240000] request_queue 15KB 15KB
> >>>>> [121972.250000] bio-124 18KB 40KB
> >>>>> [121972.250000] biovec-max 264KB 264KB
> >>>>> [121972.260000] biovec-128 63KB 63KB
> >>>>> [121972.260000] biovec-64 157KB 157KB
> >>>>> [121972.270000] skbuff_small_head 94KB 94KB
> >>>>> [121972.270000] skbuff_fclone_cache 55KB 63KB
> >>>>> [121972.280000] skbuff_head_cache 59KB 59KB
> >>>>> [121972.280000] fsnotify_mark_connector 16KB 28KB
> >>>>> [121972.290000] sigqueue 19KB 31KB
> >>>>> [121972.300000] shmem_inode_cache 1622KB 1662KB
> >>>>> [121972.300000] kernfs_iattrs_cache 15KB 15KB
> >>>>> [121972.310000] kernfs_node_cache 2107KB 2138KB
> >>>>> [121972.310000] filp 259KB 315KB
> >>>>> [121972.320000] net_namespace 30KB 30KB
> >>>>> [121972.320000] uts_namespace 15KB 15KB
> >>>>> [121972.330000] vma_lock 143KB 179KB
> >>>>> [121972.330000] vm_area_struct 459KB 553KB
> >>>>> [121972.340000] sighand_cache 191KB 220KB
> >>>>> [121972.340000] task_struct 378KB 446KB
> >>>>> [121972.350000] anon_vma_chain 753KB 804KB
> >>>>> [121972.360000] anon_vma 170KB 207KB
> >>>>> [121972.360000] trace_event_file 83KB 83KB
> >>>>> [121972.370000] mm_struct 157KB 173KB
> >>>>> [121972.370000] vmap_area 217KB 354KB
> >>>>> [121972.380000] kmalloc-8k 224KB 224KB
> >>>>> [121972.380000] kmalloc-4k 860KB 992KB
> >>>>> [121972.390000] kmalloc-2k 352KB 352KB
> >>>>> [121972.390000] kmalloc-1k 563KB 576KB
> >>>>> [121972.400000] kmalloc-512 936KB 936KB
> >>>>> [121972.400000] kmalloc-256 196KB 240KB
> >>>>> [121972.410000] kmalloc-192 160KB 169KB
> >>>>> [121972.410000] kmalloc-128 546KB 764KB
> >>>>> [121972.420000] kmalloc-64 1213KB 1288KB
> >>>>> [121972.420000] kmem_cache_node 12KB 12KB
> >>>>> [121972.430000] kmem_cache 16KB 16KB
> >>>>> [121972.440000] Tasks state (memory values in pages):
> >>>>> [121972.440000] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> >>>>> [121972.450000] [ 209] 0 209 5140 320 0 320 0 16384 480 -1000 systemd-udevd
> >>>>> [121972.460000] [ 230] 998 230 2887 55 32 23 0 18432 0 0 systemd-network
> >>>>> [121972.470000] [ 420] 0 420 596 0 0 0 0 6144 22 0 mdadm
> >>>>> [121972.490000] [ 421] 102 421 1393 56 32 24 0 10240 0 0 rpcbind
> >>>>> [121972.500000] [ 429] 996 429 3695 17 0 17 0 20480 0 0 systemd-resolve
> >>>>> [121972.510000] [ 433] 0 433 494 51 0 51 0 8192 0 0 rpc.idmapd
> >>>>> [121972.520000] [ 434] 0 434 743 92 33 59 0 8192 7 0 nfsdcld
> >>>>> [121972.530000] [ 451] 0 451 390 0 0 0 0 6144 0 0 acpid
> >>>>> [121972.540000] [ 453] 105 453 1380 50 32 18 0 10240 18 0 avahi-daemon
> >>>>> [121972.550000] [ 454] 101 454 1549 16 0 16 0 12288 32 -900 dbus-daemon
> >>>>> [121972.560000] [ 466] 0 466 3771 60 0 60 0 14336 0 0 irqbalance
> >>>>> [121972.570000] [ 475] 0 475 6269 32 32 0 0 18432 0 0 rsyslogd
> >>>>> [121972.590000] [ 487] 105 487 1347 68 38 30 0 10240 0 0 avahi-daemon
> >>>>> [121972.600000] [ 492] 0 492 1765 0 0 0 0 12288 0 0 cron
> >>>>> [121972.610000] [ 493] 0 493 2593 0 0 0 0 16384 0 0 wpa_supplicant
> >>>>> [121972.620000] [ 494] 0 494 607 0 0 0 0 8192 32 0 atd
> >>>>> [121972.630000] [ 506] 0 506 1065 25 0 25 0 10240 0 0 rpc.mountd
> >>>>> [121972.640000] [ 514] 103 514 809 25 0 25 0 8192 0 0 rpc.statd
> >>>>> [121972.650000] [ 522] 0 522 999 31 0 31 0 10240 0 0 agetty
> >>>>> [121972.660000] [ 524] 0 524 1540 28 0 28 0 12288 0 0 agetty
> >>>>> [121972.670000] [ 525] 0 525 9098 56 32 24 0 34816 0 0 unattended-upgr
> >>>>> [121972.690000] [ 526] 0 526 2621 320 0 320 0 14336 192 -1000 sshd
> >>>>> [121972.700000] [ 539] 0 539 849 32 32 0 0 8192 0 0 in.tftpd
> >>>>> [121972.710000] [ 544] 113 544 4361 6 6 0 0 16384 25 0 chronyd
> >>>>> [121972.720000] [ 546] 0 546 16816 62 32 30 0 45056 0 0 winbindd
> >>>>> [121972.730000] [ 552] 0 552 16905 59 32 27 0 45056 3 0 winbindd
> >>>>> [121972.740000] [ 559] 0 559 17849 94 32 30 32 49152 4 0 smbd
> >>>>> [121972.750000] [ 572] 0 572 17409 40 16 24 0 43008 11 0 smbd-notifyd
> >>>>> [121972.760000] [ 573] 0 573 17412 16 16 0 0 43008 24 0 cleanupd
> >>>>> [121972.770000] [ 584] 0 584 3036 20 0 20 0 16384 4 0 sshd
> >>>>> [121972.780000] [ 589] 0 589 16816 32 2 30 0 40960 21 0 winbindd
> >>>>> [121972.790000] [ 590] 0 590 27009 47 23 24 0 65536 21 0 smbd
> >>>>> [121972.810000] [ 597] 501 597 3344 91 32 59 0 20480 0 100 systemd
> >>>>> [121972.820000] [ 653] 501 653 3036 0 0 0 0 16384 33 0 sshd
> >>>>> [121972.830000] [ 656] 501 656 1938 93 32 61 0 12288 9 0 bash
> >>>>> [121972.840000] [ 704] 0 704 395 352 64 288 0 6144 0 -1000 watchdog
> >>>>> [121972.850000] [ 738] 501 738 2834 12 0 12 0 16384 6 0 top
> >>>>> [121972.860000] [ 4750] 0 4750 4218 44 26 18 0 18432 11 0 proftpd
> >>>>> [121972.870000] [ 4768] 0 4768 401 31 0 31 0 6144 0 0 apt.systemd.dai
> >>>>> [121972.880000] [ 4772] 0 4772 401 31 0 31 0 6144 0 0 apt.systemd.dai
> >>>>> [121972.890000] [ 4778] 0 4778 13556 54 0 54 0 59392 26 0 apt-get
> >>>>> [121972.900000] Out of memory and no killable processes...
> >>>>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> >>>>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121972.920000] Hardware name: Freescale LS1024A
> >>>>> [121972.930000] unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121972.930000] show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121972.940000] dump_stack_lvl from panic+0xbf/0x264
> >>>>> [121972.940000] panic from out_of_memory+0x33f/0x34c
> >>>>> [121972.950000] out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>>>> [121972.950000] __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>>>> [121972.960000] __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>>>> [121972.960000] svc_recv from nfsd+0x7d/0xd4
> >>>>> [121972.970000] nfsd from kthread+0xb9/0xcc
> >>>>> [121972.970000] kthread from ret_from_fork+0x11/0x1c
> >>>>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>>>> [121972.980000] dfa0: 00000000 00000000 00000000 00000000
> >>>>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>>>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>>>> [121973.010000] CPU0: stopping
> >>>>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121973.010000] Hardware name: Freescale LS1024A
> >>>>> [121973.010000] unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121973.010000] show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121973.010000] dump_stack_lvl from do_handle_IPI+0x151/0x178
> >>>>> [121973.010000] do_handle_IPI from ipi_handler+0x13/0x18
> >>>>> [121973.010000] ipi_handler from handle_percpu_devid_irq+0x55/0x144
> >>>>> [121973.010000] handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> >>>>> [121973.010000] generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> >>>>> [121973.010000] gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> >>>>> [121973.010000] generic_handle_arch_irq from call_with_stack+0xd/0x10
> >>>>> [121973.010000] Rebooting in 90 seconds..
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>>>
> >>>>
> >>
> >> --
> >> Chuck Lever
> >>
> >>
>
> --
> Chuck Lever
>
>

2024-03-25 20:13:16

by Chuck Lever

[permalink] [raw]
Subject: Re: [External] : nfsd: memory leak when client does many file operations



> On Mar 25, 2024, at 3:55 PM, Jan Schunk <[email protected]> wrote:
>
> The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
>
> Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
>
> top - 00:49:49 up 3 min, 1 user, load average: 0,21, 0,19, 0,09
> Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
> %CPU(s): 0,2 us, 0,3 sy, 0,0 ni, 99,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
> MiB Spch: 467,0 total, 302,3 free, 89,3 used, 88,1 buff/cache
> MiB Swap: 975,0 total, 975,0 free, 0,0 used. 377,7 avail Spch
>
> top - 15:05:39 up 14:19, 1 user, load average: 1,87, 1,72, 1,65
> Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
> %CPU(s): 0,2 us, 4,9 sy, 0,0 ni, 53,3 id, 39,0 wa, 0,0 hi, 2,6 si, 0,0 st
> MiB Spch: 467,0 total, 21,2 free, 147,1 used, 310,9 buff/cache
> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 319,9 avail Spch
>
> top - 20:48:16 up 20:01, 1 user, load average: 5,02, 2,72, 2,08
> Tasks: 104 total, 5 running, 99 sleeping, 0 stopped, 0 zombie
> %CPU(s): 0,2 us, 46,4 sy, 0,0 ni, 11,9 id, 2,3 wa, 0,0 hi, 39,2 si, 0,0 st
> MiB Spch: 467,0 total, 16,9 free, 190,8 used, 271,6 buff/cache
> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 276,2 avail Spch

I don't see anything in your original memory dump that
might account for this. But I'm at a loss because I'm
a kernel developer, not a support guy -- I don't have
any tools or expertise that can troubleshoot a system
without rebuilding a kernel with instrumentation. My
first instinct is to tell you to bisect between v6.3
and v6.4, or at least enable kmemleak, but I'm guessing
you don't build your own kernels.

My only recourse at this point would be to try to
reproduce it myself, but unfortunately I've just
upgraded my whole lab to Fedora 39, and there's a grub
bug that prevents booting any custom-built kernel
on my hardware.

So I'm stuck until I can nail that down. Anyone else
care to help out?


--
Chuck Lever


2024-03-25 20:28:05

by Jan Schunk

[permalink] [raw]
Subject: Aw: Re: [External] : nfsd: memory leak when client does many file operations

I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?
What do you mean with "bisect between v6.3 and v6.4"?
Everything including v6.4 is OK, the problem starts at v6.5.

I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.

Btw. thanks for guiding me so far.

> Gesendet: Montag, den 25.03.2024 um 21:11 Uhr
> Von: "Chuck Lever III" <[email protected]>
> An: "Jan Schunk" <[email protected]>
> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
>
>
> > On Mar 25, 2024, at 3:55 PM, Jan Schunk <[email protected]> wrote:
> >
> > The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
> >
> > Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
> >
> > top - 00:49:49 up 3 min, 1 user, load average: 0,21, 0,19, 0,09
> > Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
> > %CPU(s): 0,2 us, 0,3 sy, 0,0 ni, 99,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
> > MiB Spch: 467,0 total, 302,3 free, 89,3 used, 88,1 buff/cache
> > MiB Swap: 975,0 total, 975,0 free, 0,0 used. 377,7 avail Spch
> >
> > top - 15:05:39 up 14:19, 1 user, load average: 1,87, 1,72, 1,65
> > Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
> > %CPU(s): 0,2 us, 4,9 sy, 0,0 ni, 53,3 id, 39,0 wa, 0,0 hi, 2,6 si, 0,0 st
> > MiB Spch: 467,0 total, 21,2 free, 147,1 used, 310,9 buff/cache
> > MiB Swap: 975,0 total, 952,9 free, 22,1 used. 319,9 avail Spch
> >
> > top - 20:48:16 up 20:01, 1 user, load average: 5,02, 2,72, 2,08
> > Tasks: 104 total, 5 running, 99 sleeping, 0 stopped, 0 zombie
> > %CPU(s): 0,2 us, 46,4 sy, 0,0 ni, 11,9 id, 2,3 wa, 0,0 hi, 39,2 si, 0,0 st
> > MiB Spch: 467,0 total, 16,9 free, 190,8 used, 271,6 buff/cache
> > MiB Swap: 975,0 total, 952,9 free, 22,1 used. 276,2 avail Spch
>
> I don't see anything in your original memory dump that
> might account for this. But I'm at a loss because I'm
> a kernel developer, not a support guy -- I don't have
> any tools or expertise that can troubleshoot a system
> without rebuilding a kernel with instrumentation. My
> first instinct is to tell you to bisect between v6.3
> and v6.4, or at least enable kmemleak, but I'm guessing
> you don't build your own kernels.
>
> My only recourse at this point would be to try to
> reproduce it myself, but unfortunately I've just
> upgraded my whole lab to Fedora 39, and there's a grub
> bug that prevents booting any custom-built kernel
> on my hardware.
>
> So I'm stuck until I can nail that down. Anyone else
> care to help out?
>
>
> --
> Chuck Lever
>
>

2024-03-25 20:37:39

by Chuck Lever

[permalink] [raw]
Subject: Re: [External] : nfsd: memory leak when client does many file operations



> On Mar 25, 2024, at 4:26 PM, Jan Schunk <[email protected]> wrote:
>
> I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?

Location:
-> Kernel hacking
-> Memory Debugging
(1) -> Kernel memory leak detector (DEBUG_KMEMLEAK [=n])


> What do you mean with "bisect between v6.3 and v6.4"?

After you "git clone" the kernel source:

$ git bisect start v6.4 v6.3

Build the kernel and test. If the test fails:

$ cd <your kernel source tree>; git bisect bad

If the test succeeds:

$ cd <your kernel source tree>; git bisect good

Rebuild and try again until it lands on the first broken commit.


> Everything including v6.4 is OK, the problem starts at v6.5.

I misremembered. Use "$ git bisect start v6.5 v6.4" then.


> I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.

Bisection is a mechanical test-based process. You don't need
to look at code until you've reached the first bad commit.

--
Chuck Lever


2024-03-26 11:15:43

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [External] : nfsd: memory leak when client does many file operations

On 25 Mar 2024, at 16:11, Chuck Lever III wrote:

>> On Mar 25, 2024, at 3:55 PM, Jan Schunk <[email protected]> wrote:
>>
>> The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
>>
>> Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
>>
>> top - 00:49:49 up 3 min, 1 user, load average: 0,21, 0,19, 0,09
>> Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
>> %CPU(s): 0,2 us, 0,3 sy, 0,0 ni, 99,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
>> MiB Spch: 467,0 total, 302,3 free, 89,3 used, 88,1 buff/cache
>> MiB Swap: 975,0 total, 975,0 free, 0,0 used. 377,7 avail Spch
>>
>> top - 15:05:39 up 14:19, 1 user, load average: 1,87, 1,72, 1,65
>> Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
>> %CPU(s): 0,2 us, 4,9 sy, 0,0 ni, 53,3 id, 39,0 wa, 0,0 hi, 2,6 si, 0,0 st
>> MiB Spch: 467,0 total, 21,2 free, 147,1 used, 310,9 buff/cache
>> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 319,9 avail Spch
>>
>> top - 20:48:16 up 20:01, 1 user, load average: 5,02, 2,72, 2,08
>> Tasks: 104 total, 5 running, 99 sleeping, 0 stopped, 0 zombie
>> %CPU(s): 0,2 us, 46,4 sy, 0,0 ni, 11,9 id, 2,3 wa, 0,0 hi, 39,2 si, 0,0 st
>> MiB Spch: 467,0 total, 16,9 free, 190,8 used, 271,6 buff/cache
>> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 276,2 avail Spch
>
> I don't see anything in your original memory dump that
> might account for this. But I'm at a loss because I'm
> a kernel developer, not a support guy -- I don't have
> any tools or expertise that can troubleshoot a system
> without rebuilding a kernel with instrumentation. My
> first instinct is to tell you to bisect between v6.3
> and v6.4, or at least enable kmemleak, but I'm guessing
> you don't build your own kernels.
>
> My only recourse at this point would be to try to
> reproduce it myself, but unfortunately I've just
> upgraded my whole lab to Fedora 39, and there's a grub
> bug that prevents booting any custom-built kernel
> on my hardware.
>
> So I'm stuck until I can nail that down. Anyone else
> care to help out?

Sure - I can throw some stuff..

Can we dig into which memory slabs might be growing? Something like:

watch -d "cat /proc/slabinfo | grep nfsd"

.. for a bit might show what is growing.

Then use a systemtap script like the one below to trace the allocations - use:

stap -v --all-modules kmem_alloc.stp <slab_name>

Ben


8<---------------------------- save as kmem_alloc.stp ----------------------------

# This script displays the number of given slab allocations and the backtraces leading up to it.

global slab = @1
global stats, stacks
probe kernel.function("kmem_cache_alloc") {
if (kernel_string($s->name) == slab) {
stats[execname()] <<< 1
stacks[execname(),kernel_string($s->name),backtrace()] <<< 1
}
}
# Exit after 10 seconds
# probe timer.ms(10000) { exit () }
probe end {
printf("Number of %s slab allocations by process\n", slab)
foreach ([exec] in stats) {
printf("%s:\t%d\n",exec,@count(stats[exec]))
}
printf("\nBacktrace of processes when allocating\n")
foreach ([proc,cache,bt] in stacks) {
printf("Exec: %s Name: %s Count: %d\n",proc,cache,@count(stacks[proc,cache,bt]))
print_stack(bt)
printf("\n-------------------------------------------------------\n\n")
}
}


2024-03-26 16:50:43

by Jan Schunk

[permalink] [raw]
Subject: Aw: Re: [External] : nfsd: memory leak when client does many file operations

Thanks, I do some tests with DEBUG_KMEMLEAK enabled and git bisect now.

> Gesendet: Montag, den 25.03.2024 um 21:36 Uhr
> Von: "Chuck Lever III" <[email protected]>
> An: "Jan Schunk" <[email protected]>
> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
>
>
> > On Mar 25, 2024, at 4:26 PM, Jan Schunk <[email protected]> wrote:
> >
> > I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?
>
> Location:
> -> Kernel hacking
> -> Memory Debugging
> (1) -> Kernel memory leak detector (DEBUG_KMEMLEAK [=n])
>
>
> > What do you mean with "bisect between v6.3 and v6.4"?
>
> After you "git clone" the kernel source:
>
> $ git bisect start v6.4 v6.3
>
> Build the kernel and test. If the test fails:
>
> $ cd <your kernel source tree>; git bisect bad
>
> If the test succeeds:
>
> $ cd <your kernel source tree>; git bisect good
>
> Rebuild and try again until it lands on the first broken commit.
>
>
> > Everything including v6.4 is OK, the problem starts at v6.5.
>
> I misremembered. Use "$ git bisect start v6.5 v6.4" then.
>
>
> > I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.
>
> Bisection is a mechanical test-based process. You don't need
> to look at code until you've reached the first bad commit.
>
> --
> Chuck Lever
>
>

2024-03-28 22:03:51

by Jan Schunk

[permalink] [raw]
Subject: Aw: Re: [External] : nfsd: memory leak when client does many file operations

Inside the VM I was not able to reproduce the issue on v6.5.x so I keep concentrating on v6.6.x.

Current status:

$ git bisect start v6.6 v6.5
Bisecting: 7882 revisions left to test after this (roughly 13 steps)
[a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

--
$ git bisect good
Bisecting: 3935 revisions left to test after this (roughly 12 steps)
[e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

--
$ git bisect bad
Bisecting: 2014 revisions left to test after this (roughly 11 steps)
[e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

--
$ git bisect bad
Bisecting: 975 revisions left to test after this (roughly 10 steps)
[4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

--
$ git bisect good
Bisecting: 476 revisions left to test after this (roughly 9 steps)
[4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

--
$ git bisect good
Bisecting: 237 revisions left to test after this (roughly 8 steps)
[e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

> Gesendet: Montag, den 25.03.2024 um 21:36 Uhr
> Von: "Chuck Lever III" <[email protected]>
> An: "Jan Schunk" <[email protected]>
> Cc: "Jeff Layton" <[email protected]>, "Neil Brown" <[email protected]>, "Olga Kornievskaia" <[email protected]>, "Dai Ngo" <[email protected]>, "Tom Talpey" <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "[email protected]" <[email protected]>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
>
>
> > On Mar 25, 2024, at 4:26 PM, Jan Schunk <[email protected]> wrote:
> >
> > I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?
>
> Location:
> -> Kernel hacking
> -> Memory Debugging
> (1) -> Kernel memory leak detector (DEBUG_KMEMLEAK [=n])
>
>
> > What do you mean with "bisect between v6.3 and v6.4"?
>
> After you "git clone" the kernel source:
>
> $ git bisect start v6.4 v6.3
>
> Build the kernel and test. If the test fails:
>
> $ cd <your kernel source tree>; git bisect bad
>
> If the test succeeds:
>
> $ cd <your kernel source tree>; git bisect good
>
> Rebuild and try again until it lands on the first broken commit.
>
>
> > Everything including v6.4 is OK, the problem starts at v6.5.
>
> I misremembered. Use "$ git bisect start v6.5 v6.4" then.
>
>
> > I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.
>
> Bisection is a mechanical test-based process. You don't need
> to look at code until you've reached the first bad commit.
>
> --
> Chuck Lever
>
>