2008-01-21 03:27:23

by Erez Zadok

[permalink] [raw]
Subject: [NFS] nfs2/3 ESTALE bug on mount point (v2.6.24-rc8)

Since around 2.6.24-rc5 or so I've had an occasional problem: I get an
ESTALE error on the mount point after setting up a localhost exported mount
point, and trying to mkdir something there (this is part of my setup scripts
prior to running unionfs regression tests).

The problem doesn't exist in 2.6.23 or earlier stable kernels. It doesn't
appear in nfs4 either, only nfs2 and nfs3.

The problem is seen intermittently, and is probably some form of a race. I
was finally able to narrow it down a bit. I was able to write a shell
script that for me reproduces the problem within a few minutes (I tried it
on v2.6.24-rc8-74-ga7da60f and several different machine configurations).

I've included the shell script below. Hopefully you can use it to track the
problem down. The mkdir command in the middle of the script is that one
that'll eventually cause an ESTALE error and cause the script to abort; you
can run "df" afterward to see the stale mount points.

Notes: the one anecdotal factor that seems to make the bug appear sooner is
if you increase the number of total mounts that the script below creates
($MAX in the script).

Hope this helps.

Thanks,
Erez.


#!/bin/sh
# script to tickle a "stale filehandle" mount-point bug in nfs2/3
# Erez Zadok.

# mount flags
FLAGS=no_root_squash,rw,async
# max no. of nfs mounts (each using a loop device)
MAX=6
# total no. of times to try test
COUNT=1000

function runcmd
{
echo "CMD: $@"
$@
ret=$?
test $ret -ne 0 && exit $ret
}

function doit
{
for c in `seq 0 $MAX`; do
runcmd dd if=/dev/zero of=/tmp/fs.$$.$c bs=1024k count=1 seek=100
runcmd losetup /dev/loop$c /tmp/fs.$$.$c
runcmd mkfs -t ext2 -q /dev/loop$c
runcmd mkdir -p /n/export/b$c
runcmd mount -t ext2 /dev/loop$c /n/export/b$c
runcmd exportfs -o $FLAGS localhost:/n/export/b$c
runcmd mkdir -p /n/lower/b$c
runcmd mount -t nfs -o nfsvers=3 localhost:/n/export/b$c /n/lower/b$c
done

# this mkdir command will eventually cause an ESTALE error on the mnt pt
for c in `seq 0 $MAX`; do
runcmd mkdir -p /n/lower/b$c/dir
done

# check if "df" prints" "stale file handle"
for i in `seq 1 10` ; do
sleep 0.1
echo -n "."
if test -n "`df 2>&1 | grep -i stale`" ; then
df
exit 123
fi
done
echo

for c in `seq 0 $MAX`; do
runcmd umount /n/lower/b$c
runcmd exportfs -u localhost:/n/export/b$c
runcmd umount /n/export/b$c
runcmd losetup -d /dev/loop$c
runcmd rm -f /tmp/fs.$$.$c
done
}

count=$COUNT
while test $count -gt 0 ; do
echo "------------------------------------------------------------------"
echo "COUNT $count"
doit
let count=count-1
done
##############################################################################

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs