2024-05-30 20:09:22

by Yifei Liu

[permalink] [raw]
Subject: Inquiry about unmounting issues in Linux kernel NFS

Hi,

I am Yifei Liu, a student at Stony Brook University working on a
project - Metis, which uses model checking to test file systems. A
part of our testing involves mounting the file system, performing an
operation, validating the system state, and then unmounting it. I've
noticed an inconsistency in NFSv3 and NFSv4 on the Linux kernel v5.15
and v6.3.0 running Ubuntu 22.04: sometimes unmounting the underlying
server file system after unexport takes 25-35 seconds due to the
device being busy (EBUSY), while other times it's nearly instant.
This happens using any of Ext4, BtrFS, or XFS as the underlying file
system for NFS.

Here is the process I followed:

1. Mount underlying server filesystem:
1a) Run mkfs.ext4 on a 256KiB brd ramdisk or a regular disk partition
1b) Mount ext4 on /mnt/server

2. Export the file system: exportfs -o rw,sync,no_root_squash
localhost:/mnt/server

3. Mount NFS at client: mount -t nfs -o rw,nolock,vers=4,proto=tcp
localhost:/mnt/server /mnt/local

4. Unmount NFS at client: umount /mnt/local

5. Unexport the server file system: exportfs -u localhost:/mnt/server

6. Unmount underlying file system: umount /mnt/server (This sometimes
succeeds instantly, and sometimes is delayed by up to 30 seconds)

I have also embedded a shell script below that replicates this process
in a loop 10 times. You are likely to encounter delayed unmounts most
of the time. Is this behavior expected, or are there some other steps
I am missing?

#!/bin/bash

# This script is used to reproduce busy unmount issue in NFSv4 and NFSv3
# where the NFS server and client are on the same machine

# Pre-defined variables, note that the server and local mount points
will be recreated
# Make sure the ramdisk device and directories are not used by other
processes or file systems
SERVER_MNT_DIR="/mnt/server"
LOCAL_MNT_DIR="/mnt/local"
EXT4_RAMDISK="/dev/ram0"
SLEEP_SECONDS=5

# Set up device and mount points
setup() {
# Load brd ramdisk kernel module
if lsmod | grep -q "^brd"; then
echo "brd module is loaded. Unloading it now."
if rmmod brd; then
echo "Successfully removed brd module."
else
echo "Failed to remove brd module."
exit 1
fi
fi

# Load brd module with 256 KiB ramdisk size
if modprobe brd rd_size=256; then
echo "Successfully loaded brd module."
else
echo "Failed to load brd module."
exit 1
fi

# Check if the mount point is already mounted, unmount it
if test -n "$(mount | grep $SERVER_MNT_DIR)" ; then
umount $SERVER_MNT_DIR || exit $?
fi

# Check if the mount point is already mounted, unmount it
if test -n "$(mount | grep $LOCAL_MNT_DIR)" ; then
umount $LOCAL_MNT_DIR || exit $?
fi

# Remove mount point if not created, and create it again
if test -d $SERVER_MNT_DIR ; then
rm -rf $SERVER_MNT_DIR
fi

# Remove mount point if not created, and create it again
if test -d $LOCAL_MNT_DIR ; then
rm -rf $LOCAL_MNT_DIR
fi

# Create mount points and set permissions
mkdir -p $SERVER_MNT_DIR || { echo "Failed to create directory
$SERVER_MNT_DIR"; exit $?; }
mkdir -p $LOCAL_MNT_DIR || { echo "Failed to create directory
$LOCAL_MNT_DIR"; exit $?; }
chmod 755 $SERVER_MNT_DIR || { echo "Failed to set permissions for
$SERVER_MNT_DIR"; exit $?; }
chmod 755 $LOCAL_MNT_DIR || { echo "Failed to set permissions for
$LOCAL_MNT_DIR"; exit $?; }

# Check if NFS kernel server is running, start it if not
if systemctl is-active --quiet nfs-kernel-server; then
echo "NFS kernel server is already running."
else
echo "Starting NFS kernel server..."
systemctl start nfs-kernel-server
# Sleep for a while to make sure the NFS server is started
sleep 20
echo "NFS kernel server started."
fi

# Create ext4 file system on ramdisk for the NFS server export path
MKFS_FLAGS="-F -v -E lazy_itable_init=0,lazy_journal_init=0"
mkfs.ext4 ${MKFS_FLAGS} $EXT4_RAMDISK || exit $?
}

# Run the setup function
setup

# Loop 10 times to reproduce the unmount EBUSY issue
loop_max=10

for ((i=1; i<=$loop_max; i++)); do
echo " ---------- Loop ID: $i ---------- "

mount -t ext4 $EXT4_RAMDISK $SERVER_MNT_DIR || exit $?

exportfs -o rw,sync,no_root_squash localhost:$SERVER_MNT_DIR || exit $?
# Mount with NFSv4 or NFSv3
mount -t nfs -o rw,nolock,vers=4,proto=tcp
localhost:$SERVER_MNT_DIR $LOCAL_MNT_DIR || exit $?

date > $LOCAL_MNT_DIR/date.txt || exit $?

umount $LOCAL_MNT_DIR || exit $?
exportfs -u localhost:$SERVER_MNT_DIR || exit $?

# Try to unmount, if EBUSY, sleep for 5 seconds and try again
total_sleep=0
while true; do
# Expected to have "target is busy" error here
umount $SERVER_MNT_DIR

if [ $? -eq 0 ]; then
echo "Unmount succeeded with $total_sleep seconds of sleep."
break
else
echo "Unmount failed, sleeping for 5 seconds..."
total_sleep=$((total_sleep+SLEEP_SECONDS))
sleep $SLEEP_SECONDS
fi
done

done

Thank you for your time and help,

Yifei Liu
File Systems and Storage Lab (FSL)
Department of Computer Science
Stony Brook University