2011-05-20 05:59:41

by Erez Zadok

[permalink] [raw]
Subject: Re: [PATCH 5/7] overlay filesystem (inode.c bad error path)

Miklos,

I tried your overlayfs.v9 git repo w/ racer, using two separate ext3 filesystems (one for lowerdir and another for upperdir). I got the WARN_ON in ovl_permission to trigger within about 10 minutes of testing. Looking at the code, I see a problem in returning w/o cleaning up an dput-ing the alias dentry. Simple patch enclosed below.

Cheers,
Erez.


diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3c15d54..6c70f57 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -80,7 +82,8 @@ int ovl_permission(struct inode *inode, int mask, unsigned int flags)
realinode = ACCESS_ONCE(realdentry->d_inode);
if (!realinode) {
WARN_ON(!(flags & IPERM_FLAG_RCU));
- return -ENOENT;
+ err = -ENOENT;
+ goto out_dput;
}

if (mask & MAY_WRITE) {


2011-05-20 05:55:18

by Erez Zadok

[permalink] [raw]
Subject: Re: [PATCH 5/7] overlay filesystem (inode.c bad error path)

On May 20, 2011, at 1:39 AM, Erez Zadok wrote:

> Miklos,
>
> I tried your overlayfs.v9 git repo w/ racer, using two separate ext3 filesystems (one for lowerdir and another for upperdir). I got the WARN_ON in ovl_permission to trigger within about 10 minutes of testing. Looking at the code, I see a problem in returning w/o cleaning up an dput-ing the alias dentry. Simple patch enclosed below.
>
> Cheers,
> Erez.

Niklos, I forgot to mention that I had to apply a small fix to a VFS bug in fs/namei.c:1362, where a mix of symlinks and renames, using racer, triggers an BUG_ON at the VFS layer (very reproducible). Without this fix, racer oopses in the VFS well before it gets to trigger overlayfs bugs. It's a pity 2.6.39 was released with this very reproducible *VFS* level bug (doesn't anyone run racer+fsx+ltp before releasing a new kernel?)

Anyway, here's the small patch here. Clearly viro/hch need to review this "fix" b/c I'm not sure it's really the right one

Cheers,
Erez.


VFS: move BUG_ON test for symlink nd->depth after current->link_count test

This solves a bug in nested_symlink (which was rewritten from
do_follow_link), and follows the order of depth tests that existed before.
The bug triggers a BUG_ON in fs/namei.c:1346, when running racer with
symlink and rename ops.

Signed-off-by: Erez Zadok <[email protected]>
diff --git a/fs/namei.c b/fs/namei.c
index 017c3fa..7a93387 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1343,12 +1343,12 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
{
int res;

- BUG_ON(nd->depth >= MAX_NESTED_LINKS);
if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
path_put_conditional(path, nd);
path_put(&nd->path);
return -ELOOP;
}
+ BUG_ON(nd->depth >= MAX_NESTED_LINKS);

nd->depth++;
current->link_count++;

2011-05-20 08:54:12

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 5/7] overlay filesystem (inode.c bad error path)

Erez Zadok <[email protected]> writes:

> I tried your overlayfs.v9 git repo w/ racer, using two separate ext3
> filesystems (one for lowerdir and another for upperdir). I got the
> WARN_ON in ovl_permission to trigger within about 10 minutes of
> testing. Looking at the code, I see a problem in returning w/o
> cleaning up an dput-ing the alias dentry. Simple patch enclosed
> below.

Hmm, thanks. The more interesting question is: why does that WARN_ON
trigger? I'll look into it.

Thanks,
Miklos

2011-05-20 14:17:19

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 5/7] overlay filesystem (inode.c bad error path)

Miklos Szeredi <[email protected]> writes:

> Erez Zadok <[email protected]> writes:
>
>> I tried your overlayfs.v9 git repo w/ racer, using two separate ext3
>> filesystems (one for lowerdir and another for upperdir). I got the
>> WARN_ON in ovl_permission to trigger within about 10 minutes of
>> testing. Looking at the code, I see a problem in returning w/o
>> cleaning up an dput-ing the alias dentry. Simple patch enclosed
>> below.
>
> Hmm, thanks. The more interesting question is: why does that WARN_ON
> trigger? I'll look into it.

I think I found the cause of all the bug and oopsen you are seeing.

Overlayfs expects upper and lower dentries to be always positive, it
never stores negative dentries there, there's no point, instead it
stores NULL.

There are basically two ways a positive dentry can become negative:

A) dentry becomes unhashed with d_count == 0

B) d_delete with d_count == 1

Case A is not possible in our case, since overlayfs keeps a ref on the
upper/lower dentries for the lifetime of the overlayfs dentry.

Case B is however possible, since no extra ref is taken before calling
vfs_unlink/vfs_rmdir. So it looks like this is being triggered.

This is easy to solve, just grab a ref to upperdentry before
unlink/rmdir. Equivalent is if we grab an extra reference from the
start. The below patch does this.

With the patch I can't trigger the bugs anymore.

Erez, could you please also check if reverting your patches and applying
this one fixes all the bugs?

Thanks,
Miklos



commit 9192816148e2c6b1d610226b1fc1c04c36216370
Author: Miklos Szeredi <[email protected]>
Date: Fri May 20 16:07:34 2011 +0200

ovl: don't allow upperdentry to go negative

Upperdentry can become negative if the file/directory is removed,
since d_delete checks for d_count == 1 (we are the only user of this
dentry) and changes it to negative in that case.

However users of overlayfs expect upper and lower dentries to be
either NULL or positive, and finding a negative dentry will trigger a
WARNING or Oops.

Fix this by keeping double reference on upperdentry which prevents
d_delete() turning the dentry into negative.

Reported-by: Erez Zadok <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index a9a09a6..c9db954 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -27,6 +27,10 @@ struct ovl_fs {
};

struct ovl_entry {
+ /*
+ * Keep "double reference" on upper dentries, so that
+ * d_delete() doesn't think it's OK to reset d_inode to NULL.
+ */
struct dentry *__upperdentry;
struct dentry *lowerdentry;
union {
@@ -152,8 +156,9 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)

WARN_ON(!mutex_is_locked(&upperdentry->d_parent->d_inode->i_mutex));
WARN_ON(oe->__upperdentry);
+ BUG_ON(!upperdentry->d_inode);
smp_wmb();
- oe->__upperdentry = upperdentry;
+ oe->__upperdentry = dget(upperdentry);
}

void ovl_dentry_version_inc(struct dentry *dentry)
@@ -218,6 +223,7 @@ static void ovl_dentry_release(struct dentry *dentry)

if (oe) {
dput(oe->__upperdentry);
+ dput(oe->__upperdentry);
dput(oe->lowerdentry);
call_rcu(&oe->rcu, ovl_entry_free);
}
@@ -326,7 +332,7 @@ int ovl_do_lookup(struct dentry *dentry)
}

if (upperdentry)
- oe->__upperdentry = upperdentry;
+ oe->__upperdentry = dget(upperdentry);

if (lowerdentry)
oe->lowerdentry = lowerdentry;
@@ -533,7 +539,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
mntput(upperpath.mnt);
mntput(lowerpath.mnt);

- oe->__upperdentry = upperpath.dentry;
+ oe->__upperdentry = dget(upperpath.dentry);
oe->lowerdentry = lowerpath.dentry;

root_dentry->d_fsdata = oe;

2011-05-20 14:25:24

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 5/7] overlay filesystem (inode.c bad error path)

Erez Zadok <[email protected]> writes:
> Niklos, I forgot to mention that I had to apply a small fix to a VFS
> bug in fs/namei.c:1362, where a mix of symlinks and renames, using
> racer, triggers an BUG_ON at the VFS layer (very reproducible).
> Without this fix, racer oopses in the VFS well before it gets to
> trigger overlayfs bugs. It's a pity 2.6.39 was released with this
> very reproducible *VFS* level bug (doesn't anyone run racer+fsx+ltp
> before releasing a new kernel?)
>
> Anyway, here's the small patch here. Clearly viro/hch need to review
> this "fix" b/c I'm not sure it's really the right one

Patch looks good to me.

Acked-by: Miklos Szeredi <[email protected]>

If you've got a patch fixing a serious bug, you'd better send it to
Linus and Andrew (as well as the VFS maintainers), where it'll get more
prompt evaluation.

You can still do that, also adding "CC: [email protected]" to the SOB
block in the patch so that the fix quickly makes it into the 2.6.39
stable series.

Thanks,
Miklos


>
> VFS: move BUG_ON test for symlink nd->depth after current->link_count test
>
> This solves a bug in nested_symlink (which was rewritten from
> do_follow_link), and follows the order of depth tests that existed before.
> The bug triggers a BUG_ON in fs/namei.c:1346, when running racer with
> symlink and rename ops.
>
> Signed-off-by: Erez Zadok <[email protected]>
> diff --git a/fs/namei.c b/fs/namei.c
> index 017c3fa..7a93387 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1343,12 +1343,12 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
> {
> int res;
>
> - BUG_ON(nd->depth >= MAX_NESTED_LINKS);
> if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
> path_put_conditional(path, nd);
> path_put(&nd->path);
> return -ELOOP;
> }
> + BUG_ON(nd->depth >= MAX_NESTED_LINKS);
>
> nd->depth++;
> current->link_count++;