2003-01-16 10:51:26

by Oleg Drokin

[permalink] [raw]
Subject: [2.4] VFS locking problem during concurrent link/unlink

Hello!

Debugging reiserfs problem that can be demonstrated with script created by
Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs
fault and not VFS.
Though the Zygo claims script only produces problems on reiserfs, I am trying
it now myself on ext2 (which will take some time).

Debugging shows that reiserfs_link is sometimes called for inodes whose
i_nlink is zero (and all corresponding data is deleted already).
So my current guess of what's going on is this:

process 1 process 2
sys_unlink("a/b") sys_link("a/b", "c/d");
down(inode of ("a")); down(inode of "c");

lock_kernel()
reiserfs_unlink("a/b")
decreases i_nlink of a/b to zero
and removes name "b" from "a"
unlock_kernel()
d_delete("a/b")
(*)
lock_kernel()
reiserfs_link()
at this point we do usual stuff,
but inode's n_nlink is zero and
file data already removed which
indicates reiserfs_delete_inode()
was already called at (*)
unlock_kernel()

So my question is "Is it really ok that sys_link/vfs_link does not
take semaphore on parent dir of original path?", or should we
actually put a workaround in reiserfs code to avoid such a situation?

Bye,
Oleg
PS: Here's the script:
#!/bin/bash

# Create an empty filesystem:

mkreiserfs -f -f /dev/hdb2
mount /dev/hdb2 /data1 -t reiserfs

cd /data1

# Script used to control the load average. Note that as written the loops
# below will keep spawning new processes, so we need some way to throttle
# them. Change the '-lt 10' to another number to change the number
# of processes.

cat <<'LC' > loadcheck && chmod 755 loadcheck
#!/bin/sh
read av1 av5 av15 rest < /proc/loadavg
echo -n "Load Average: $av1 ... "
av1=${av1%.*}
if [ $av1 -lt 10 ]; then
echo OK
exit 0
else
echo "Whoa, Nellie!"
exit 1
fi
LC
# Create directories used by test
mkdir foo bar
mkdir foo/etc foo/usr foo/var

# Start up some rsyncs. I use /etc, /usr, and /var because there's a
# good mixture of files with some hardlinks between them, and on a normal
# Linux system some of them change from time to time.

while sleep 1m; do
./loadcheck || continue;
for x in usr etc var; do
rsync -avxHS --delete /$x/. foo/$x/. &
done;
done &
# Start up some cp -al's and rm -rf's. Note there are two concurrent
# sets of 'cp's and two concurrent sets of 'rm's, and each of those
# has different instances of 'cp' and 'rm' running at different times.
for x in 1 2; do
while sleep 1m; do
./loadcheck || continue;
cp -al foo bar/`date +%s` &
done &
while sleep 1m; do
./loadcheck || continue;
for x in bar/*; do
rm -rf $x;
sleep 1m;
done &
done &
done &




2003-01-16 15:31:30

by Chris Mason

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

On Thu, 2003-01-16 at 06:00, Oleg Drokin wrote:
> Hello!
>
> Debugging reiserfs problem that can be demonstrated with script created by
> Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs
> fault and not VFS.
> Though the Zygo claims script only produces problems on reiserfs, I am trying
> it now myself on ext2 (which will take some time).
>
> Debugging shows that reiserfs_link is sometimes called for inodes whose
> i_nlink is zero (and all corresponding data is deleted already).
> So my current guess of what's going on is this:
>

No, this is a reiserfs bug, since we schedule after doing link checks in
reiserfs_link and reiserfs_unlink. I sent a patch to reiserfs dev a
while ago, I'll pull it out of the suse kernel and rediff against
2.4.20.

-chris


2003-01-16 15:35:00

by Oleg Drokin

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

Hello!

On Thu, Jan 16, 2003 at 10:39:41AM -0500, Chris Mason wrote:
> > Debugging reiserfs problem that can be demonstrated with script created by
> > Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs
> > fault and not VFS.
> > Though the Zygo claims script only produces problems on reiserfs, I am trying
> > it now myself on ext2 (which will take some time).
> >
> > Debugging shows that reiserfs_link is sometimes called for inodes whose
> > i_nlink is zero (and all corresponding data is deleted already).
> > So my current guess of what's going on is this:
> No, this is a reiserfs bug, since we schedule after doing link checks in
> reiserfs_link and reiserfs_unlink. I sent a patch to reiserfs dev a
> while ago, I'll pull it out of the suse kernel and rediff against
> 2.4.20.

Yes we do.
But on the other hand I've put a check at the beginning of reiserfs_link
and I am still seeing these links on inodes with i_nlink == 0.

Bye,
Oleg

2003-01-16 15:53:49

by Chris Mason

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

On Thu, 2003-01-16 at 10:43, Oleg Drokin wrote:
> Hello!
>
> On Thu, Jan 16, 2003 at 10:39:41AM -0500, Chris Mason wrote:
> > > Debugging reiserfs problem that can be demonstrated with script created by
> > > Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs
> > > fault and not VFS.
> > > Though the Zygo claims script only produces problems on reiserfs, I am trying
> > > it now myself on ext2 (which will take some time).
> > >
> > > Debugging shows that reiserfs_link is sometimes called for inodes whose
> > > i_nlink is zero (and all corresponding data is deleted already).
> > > So my current guess of what's going on is this:
> > No, this is a reiserfs bug, since we schedule after doing link checks in
> > reiserfs_link and reiserfs_unlink. I sent a patch to reiserfs dev a
> > while ago, I'll pull it out of the suse kernel and rediff against
> > 2.4.20.
>
> Yes we do.
> But on the other hand I've put a check at the beginning of reiserfs_link
> and I am still seeing these links on inodes with i_nlink == 0.

That's because we don't inc the link count in reiserfs_link before we
schedule. The bug works a little like this:

link count at 1
reiserfs_link: make new directory entry for link, schedule()
reiserfs_unlink: dec link count to zero, remove file stat data
reiserfs_link: inc link count, return thinking the stat data is still
there

All of which leads to expanding chaos as we process this link pointing
to nowhere but still have a valid in ram inode pointing to it.

-chris


2003-01-16 15:57:42

by Nikita Danilov

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

Chris Mason writes:
> On Thu, 2003-01-16 at 10:43, Oleg Drokin wrote:

[...]

>
> link count at 1
> reiserfs_link: make new directory entry for link, schedule()
> reiserfs_unlink: dec link count to zero, remove file stat data
> reiserfs_link: inc link count, return thinking the stat data is still
> there

What protects ext2 from doing the same on the SMP?

>
> All of which leads to expanding chaos as we process this link pointing
> to nowhere but still have a valid in ram inode pointing to it.
>
> -chris

Nikita.

>
>

2003-01-16 15:58:33

by Oleg Drokin

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

Hello!

On Thu, Jan 16, 2003 at 11:02:08AM -0500, Chris Mason wrote:
> > Yes we do.
> > But on the other hand I've put a check at the beginning of reiserfs_link
> > and I am still seeing these links on inodes with i_nlink == 0.
> That's because we don't inc the link count in reiserfs_link before we
> schedule. The bug works a little like this:
> link count at 1
> reiserfs_link: make new directory entry for link, schedule()
> reiserfs_unlink: dec link count to zero, remove file stat data
> reiserfs_link: inc link count, return thinking the stat data is still
> there

That was my original yesterday's assumption.
But debug prints I put in place did not confirmed it.
Also if we are having a dentry pinned in memory (by sys_link), inode should not
be deleted, so the statdata should stay inplace as Nikita argues.

Bye,
Oleg

2003-01-16 16:14:23

by Chris Mason

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

On Thu, 2003-01-16 at 11:06, Nikita Danilov wrote:
> Chris Mason writes:
> > On Thu, 2003-01-16 at 10:43, Oleg Drokin wrote:
>
> [...]
>
> >
> > link count at 1
> > reiserfs_link: make new directory entry for link, schedule()
> > reiserfs_unlink: dec link count to zero, remove file stat data
> > reiserfs_link: inc link count, return thinking the stat data is still
> > there
>
> What protects ext2 from doing the same on the SMP?
>

They inc the link count in ext2_link before scheduling. The patch below
is what I had in mind, but is untested.

-chris

--- 1.24/fs/reiserfs/namei.c Fri Aug 9 11:22:33 2002
+++ edited/fs/reiserfs/namei.c Thu Jan 16 11:20:11 2003
@@ -748,6 +748,7 @@
int windex ;
struct reiserfs_transaction_handle th ;
int jbegin_count;
+ unsigned long savelink;

inode = dentry->d_inode;

@@ -783,11 +784,20 @@
inode->i_nlink = 1;
}

+ inode->i_nlink--;
+
+ /*
+ * we schedule before doing the add_save_link call, save the link
+ * count so we don't race
+ */
+ savelink = inode->i_nlink;
+
+
retval = reiserfs_cut_from_item (&th, &path, &(de.de_entry_key), dir, NULL, 0);
- if (retval < 0)
+ if (retval < 0) {
+ inode->i_nlink++;
goto end_unlink;
-
- inode->i_nlink--;
+ }
inode->i_ctime = CURRENT_TIME;
reiserfs_update_sd (&th, inode);

@@ -796,7 +806,7 @@
dir->i_ctime = dir->i_mtime = CURRENT_TIME;
reiserfs_update_sd (&th, dir);

- if (!inode->i_nlink)
+ if (!savelink)
/* prevent file from getting lost */
add_save_link (&th, inode, 0/* not truncate */);

@@ -900,6 +910,12 @@
//FIXME: sd_nlink is 32 bit for new files
return -EMLINK;
}
+ if (inode->i_nlink == 0) {
+ return -ENOENT;
+ }
+
+ /* inc before scheduling so reiserfs_unlink knows we are here */
+ inode->i_nlink++;

journal_begin(&th, dir->i_sb, jbegin_count) ;
windex = push_journal_writer("reiserfs_link") ;
@@ -912,12 +928,12 @@
reiserfs_update_inode_transaction(dir) ;

if (retval) {
+ inode->i_nlink--;
pop_journal_writer(windex) ;
journal_end(&th, dir->i_sb, jbegin_count) ;
return retval;
}

- inode->i_nlink++;
ctime = CURRENT_TIME;
inode->i_ctime = ctime;
reiserfs_update_sd (&th, inode);
@@ -992,6 +1008,7 @@
int jbegin_count ;
umode_t old_inode_mode;
time_t ctime;
+ unsigned long savelink = 1;


/* two balancings: old name removal, new name insertion or "save" link,
@@ -1161,6 +1178,7 @@
} else {
new_dentry_inode->i_nlink--;
}
+ savelink = new_dentry_inode->i_nlink;
ctime = CURRENT_TIME;
new_dentry_inode->i_ctime = ctime;
}
@@ -1196,7 +1214,7 @@
reiserfs_update_sd (&th, new_dir);

if (new_dentry_inode) {
- if (new_dentry_inode->i_nlink == 0)
+ if (savelink == 0)
add_save_link (&th, new_dentry_inode, 0/* not truncate */);
reiserfs_update_sd (&th, new_dentry_inode);
}



2003-01-16 16:54:49

by Oleg Drokin

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

Hello!

On Thu, Jan 16, 2003 at 11:22:32AM -0500, Chris Mason wrote:

> They inc the link count in ext2_link before scheduling. The patch below
> is what I had in mind, but is untested.

It does not change anything.

Bye,
Oleg

2003-01-16 17:11:57

by Oleg Drokin

[permalink] [raw]
Subject: Re: [2.4] VFS locking problem during concurrent link/unlink

Hello!

On Thu, Jan 16, 2003 at 08:03:41PM +0300, Oleg Drokin wrote:
> > They inc the link count in ext2_link before scheduling. The patch below
> > is what I had in mind, but is untested.
> It does not change anything.

Err, I mean I still can reproduce the problem with this patch.

Bye,
Oleg