2019-07-09 08:00:08

by Anton Ivanov

[permalink] [raw]
Subject: Fwd: NFS Caching broken in 4.19.37


Forwarding to maintainers (apologies, did not cc on first send).

A.

-------- Forwarded Message --------
Subject: NFS Caching broken in 4.19.37
Date: Mon, 8 Jul 2019 19:19:54 +0100
From: Anton Ivanov <[email protected]>
Organization: Cambridge Greys
To: Linux Kernel Mailing List <[email protected]>

Hi list,

NFS caching appears broken in 4.19.37.

The more cores/threads the easier to reproduce. Tested with identical
results on Ryzen 1600 and 1600X.

1. Mount an openwrt build tree over NFS v4
2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in
a loop
3. Result after 3-4 iterations:

State on the client

ls -laF
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../

State as seen on the server (mounted via nfs from localhost):

ls -laF
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../
-rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h

Actual state on the filesystem:

ls -laF
/exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../
-rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h

So the client has quite clearly lost the plot. Telling it to drop caches
and re-reading the directory shows the file present.

It is possible to reproduce this using a linux kernel tree too, just
takes much more iterations - 10+ at least.

Both client and server run 4.19.37 from Debian buster. This is filed as
debian bug 931500. I originally thought it to be autofs related, but
IMHO it is actually something fundamentally broken in nfs caching
resulting in cache corruption.

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/