2024-03-29 00:30:47

by Chuck Lever

[permalink] [raw]
Subject: Re: [External] : nfsd: memory leak when client does many file operations



> On Mar 28, 2024, at 6:03 PM, Jan Schunk <[email protected]> wrote:
>
> Inside the VM I was not able to reproduce the issue on v6.5.x so I keep concentrating on v6.6.x.
>
> Current status:
>
> $ git bisect start v6.6 v6.5
> Bisecting: 7882 revisions left to test after this (roughly 13 steps)
> [a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
>
> --
> $ git bisect good
> Bisecting: 3935 revisions left to test after this (roughly 12 steps)
> [e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
>
> --
> $ git bisect bad
> Bisecting: 2014 revisions left to test after this (roughly 11 steps)
> [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
>
> --
> $ git bisect bad
> Bisecting: 975 revisions left to test after this (roughly 10 steps)
> [4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>
> --
> $ git bisect good
> Bisecting: 476 revisions left to test after this (roughly 9 steps)
> [4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
>
> --
> $ git bisect good
> Bisecting: 237 revisions left to test after this (roughly 8 steps)
> [e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Good, keep going.

I've tried replicating the free memory loss here, using the
git regression suite on my nfsd-fixes branch. Taking a
meminfo sample between each of four test runs, the only
clear downward trend I see is:

free:3019839 < start
free:2858438 < after first run
free:2836058 < after second run
free:2822077 < after third run
free:2797143 < after fourth run

All other metrics seem to vary arbitrarily.

The only slightly suspicious slab I see is buffer_head.
/sys/kernel/debug/kmemleak has a single entry in it, not
related to NFSD.

At this point I'm kind of suspecting that the issue will
not be related to NFSD or SUNRPC or any particular slab
cache, but will be orphaned whole pages. Your bisect
still seems like the best shot at localizing the
misbehavior.


--
Chuck Lever