2007-09-25 23:15:19

by Miklos Szeredi

[permalink] [raw]
Subject: missing mnt_drop_write() on open error

I get this at umount, if there was a failed open():

WARNING: at fs/namespace.c:586 __mntput()

I think the problem is that may_open() calls mnt_want_write(), but if
open doesn't succeed, mnt_drop_write() will not be called.

Miklos


2007-09-25 23:24:55

by Dave Hansen

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

On Wed, 2007-09-26 at 01:14 +0200, Miklos Szeredi wrote:
> I get this at umount, if there was a failed open():
>
> WARNING: at fs/namespace.c:586 __mntput()
>
> I think the problem is that may_open() calls mnt_want_write(), but if
> open doesn't succeed, mnt_drop_write() will not be called.

I see what you're talking about, and it does look like a bug to me. I'm
working on establishing where the 'struct file' gets created in relation
to all of this, and where best we should back out the write count.

I should have a patch in a few.

-- Dave

2007-09-26 01:21:34

by Dave Hansen

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

On Wed, 2007-09-26 at 01:14 +0200, Miklos Szeredi wrote:
> I get this at umount, if there was a failed open():
>
> WARNING: at fs/namespace.c:586 __mntput()
>
> I think the problem is that may_open() calls mnt_want_write(), but if
> open doesn't succeed, mnt_drop_write() will not be called.

Does this help? I'm also thinking that we should change the open_namei*
functions to simply return 'struct file *'. Those are the only users
other than NFS, and forcing the return of a file like that will force
users to do the fput() on it if they don't want it any more. We'd just
need to make sure no new may_open() users pop up. Any thoughts on that?

--

The r/o bind mount patches have an error in them:
they unconditionally take a mnt_want_write() on
all may_open() requests for writeable files. They
mistakenly do not mnt_drop_write() if may_open()
encounters an error.

---

lxc-dave/fs/namei.c | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)

diff -puN fs/namei.c~drop-write-if-may_open-fails fs/namei.c
--- lxc/fs/namei.c~drop-write-if-may_open-fails 2007-09-25 18:16:59.000000000 -0700
+++ lxc-dave/fs/namei.c 2007-09-25 18:16:59.000000000 -0700
@@ -1594,11 +1594,23 @@ int vfs_create(struct inode *dir, struct
return error;
}

+/*
+ * If you call this with FMODE_WRITE set in flag,
+ * and get a successful return, you will have
+ * acquired a write on nd->mnt.
+ *
+ * Basically, if you use this function, you either
+ * need to fix up that write count manually, or just
+ * return a 'struct file' from your function. When
+ * you __fput() that 'struct file' it will get fixed
+ * up automatically.
+ */
int may_open(struct nameidata *nd, int acc_mode, int flag)
{
struct dentry *dentry = nd->dentry;
struct inode *inode = dentry->d_inode;
int error;
+ int took_mnt_write = 0;

if (!inode)
return -ENOENT;
@@ -1624,42 +1636,48 @@ int may_open(struct nameidata *nd, int a
} else if (flag & FMODE_WRITE) {
/*
* effectively: !special_file()
- * balanced by __fput()
+ * Balanced by __fput() and
+ * released on error at the end
+ * of this function.
*/
error = mnt_want_write(nd->mnt);
if (error)
return error;
+ took_mnt_write = 1;
}

error = vfs_permission(nd, acc_mode);
if (error)
- return error;
+ goto out_err;
/*
* An append-only file must be opened in append mode for writing.
*/
if (IS_APPEND(inode)) {
+ error = -EPERM;
if ((flag & FMODE_WRITE) && !(flag & O_APPEND))
- return -EPERM;
+ goto out_err;
+ error = -EPERM;
if (flag & O_TRUNC)
- return -EPERM;
+ goto out_err;
}

/* O_NOATIME can only be set by the owner or superuser */
+ error = -EPERM;
if (flag & O_NOATIME)
if (!is_owner_or_cap(inode))
- return -EPERM;
+ goto out_err;

/*
* Ensure there are no outstanding leases on the file.
*/
error = break_lease(inode, flag);
if (error)
- return error;
+ goto out_err;

if (flag & O_TRUNC) {
error = get_write_access(inode);
if (error)
- return error;
+ goto out_err;

/*
* Refuse to truncate files with mandatory locks held on them.
@@ -1672,12 +1690,15 @@ int may_open(struct nameidata *nd, int a
}
put_write_access(inode);
if (error)
- return error;
+ goto out_err;
} else
if (flag & FMODE_WRITE)
DQUOT_INIT(inode);

- return 0;
+out_err:
+ if (error && took_mnt_write)
+ mnt_drop_write(nd->mnt);
+ return error;
}

static int open_namei_create(struct nameidata *nd, struct path *path,

-- Dave

2007-09-26 08:41:33

by Miklos Szeredi

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

> On Wed, 2007-09-26 at 01:14 +0200, Miklos Szeredi wrote:
> > I get this at umount, if there was a failed open():
> >
> > WARNING: at fs/namespace.c:586 __mntput()
> >
> > I think the problem is that may_open() calls mnt_want_write(), but if
> > open doesn't succeed, mnt_drop_write() will not be called.
>
> Does this help?

It didn't fix it for me, but the patch looks OK.

In __dentry_open() there's still a few places where fput() won't be
called, notably when ->open fails, which is what I'm triggering I
think.

Also even more horrible things can happen because of the
nd->intent.open.file thing. For example if the lookup routine calls
lookup_instantiate_filp(), and after this, but before may_open() some
error happens, then release_open_intent() will call fput() on the
file, which will cause mnt_drop_write() to be called, even though a
matching mnt_want_write() hasn't yet been called. Ugly, eh?

> I'm also thinking that we should change the open_namei*
> functions to simply return 'struct file *'. Those are the only users
> other than NFS, and forcing the return of a file like that will force
> users to do the fput() on it if they don't want it any more. We'd just
> need to make sure no new may_open() users pop up. Any thoughts on that?

Yeah, something needs to be done with open, because currently it's way
too convoluted.

Miklos

2007-09-26 15:07:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

On Wed, Sep 26, 2007 at 10:38:22AM +0200, Miklos Szeredi wrote:
> Also even more horrible things can happen because of the
> nd->intent.open.file thing. For example if the lookup routine calls
> lookup_instantiate_filp(), and after this, but before may_open() some
> error happens, then release_open_intent() will call fput() on the
> file, which will cause mnt_drop_write() to be called, even though a
> matching mnt_want_write() hasn't yet been called. Ugly, eh?

It's more than horrible. I wouldn't mind using the word utter crap
for it. It's defintively on my todo list to get rid of this junk
again. It managed to get in without proper review unfortunately.

>
> > I'm also thinking that we should change the open_namei*
> > functions to simply return 'struct file *'. Those are the only users
> > other than NFS, and forcing the return of a file like that will force
> > users to do the fput() on it if they don't want it any more. We'd just
> > need to make sure no new may_open() users pop up. Any thoughts on that?
>
> Yeah, something needs to be done with open, because currently it's way
> too convoluted.

I have some changes for open_namei pending and I'll see if incorporating
this makes sense.

2007-09-26 16:14:46

by Dave Hansen

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

On Wed, 2007-09-26 at 10:38 +0200, Miklos Szeredi wrote:
> > On Wed, 2007-09-26 at 01:14 +0200, Miklos Szeredi wrote:
> > > I get this at umount, if there was a failed open():
> > >
> > > WARNING: at fs/namespace.c:586 __mntput()
> > >
> > > I think the problem is that may_open() calls mnt_want_write(), but if
> > > open doesn't succeed, mnt_drop_write() will not be called.
> >
> > Does this help?
>
> It didn't fix it for me, but the patch looks OK.
>
> In __dentry_open() there's still a few places where fput() won't be
> called, notably when ->open fails, which is what I'm triggering I
> think.
>
> Also even more horrible things can happen because of the
> nd->intent.open.file thing. For example if the lookup routine calls
> lookup_instantiate_filp(), and after this, but before may_open() some
> error happens, then release_open_intent() will call fput() on the
> file, which will cause mnt_drop_write() to be called, even though a
> matching mnt_want_write() hasn't yet been called. Ugly, eh?

I used to have a patch that didn't completely trust that all files with
FMODE_WRITE set to have taken a write on the mnt. I think I used a flag
to indicate whether or not a particular file had a mnt_want_write() done
on its behalf. It somewhat artificially keeps the mnt write count
balanced, but I think it will let us detect when things like this go on.

> > I'm also thinking that we should change the open_namei*
> > functions to simply return 'struct file *'. Those are the only users
> > other than NFS, and forcing the return of a file like that will force
> > users to do the fput() on it if they don't want it any more. We'd just
> > need to make sure no new may_open() users pop up. Any thoughts on that?
>
> Yeah, something needs to be done with open, because currently it's way
> too convoluted.

Sounds like Christoph has some ideas...

-- Dave

2007-09-26 16:19:19

by Dave Hansen

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

On Wed, 2007-09-26 at 10:38 +0200, Miklos Szeredi wrote:
> In __dentry_open() there's still a few places where fput() won't be
> called, notably when ->open fails, which is what I'm triggering I
> think.
>
> Also even more horrible things can happen because of the
> nd->intent.open.file thing. For example if the lookup routine calls
> lookup_instantiate_filp(), and after this, but before may_open() some
> error happens, then release_open_intent() will call fput() on the
> file, which will cause mnt_drop_write() to be called, even though a
> matching mnt_want_write() hasn't yet been called. Ugly, eh?

I'm not sure it is _that_ horrible. ;)

Do you see any reason we can't just shadow the
get/put_write_access(inode) calls with mnt_want/drop_write() calls? I
think they're always matched.

-- Dave

2007-09-26 17:53:17

by Miklos Szeredi

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

> On Wed, 2007-09-26 at 10:38 +0200, Miklos Szeredi wrote:
> > In __dentry_open() there's still a few places where fput() won't be
> > called, notably when ->open fails, which is what I'm triggering I
> > think.
> >
> > Also even more horrible things can happen because of the
> > nd->intent.open.file thing. For example if the lookup routine calls
> > lookup_instantiate_filp(), and after this, but before may_open() some
> > error happens, then release_open_intent() will call fput() on the
> > file, which will cause mnt_drop_write() to be called, even though a
> > matching mnt_want_write() hasn't yet been called. Ugly, eh?
>
> I'm not sure it is _that_ horrible. ;)
>
> Do you see any reason we can't just shadow the
> get/put_write_access(inode) calls with mnt_want/drop_write() calls? I
> think they're always matched.

Maybe. Can we do the mnt_want_write() from __dentry_open(), instead
of may_open()? That would be a lot cleaner.

Btw, may_open() doesn't do mnt_want_write() around the truncation if
file is opened with O_TRUNC | O_RDONLY.

Miklos

2007-09-26 18:09:18

by Dave Hansen

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

On Wed, 2007-09-26 at 19:50 +0200, Miklos Szeredi wrote:
> Maybe. Can we do the mnt_want_write() from __dentry_open(), instead
> of may_open()? That would be a lot cleaner.

I'll explore that. It may make very good sense.

> Btw, may_open() doesn't do mnt_want_write() around the truncation if
> file is opened with O_TRUNC | O_RDONLY.

What's the path to may_open() in that case? open_namei() should wrap
all callers other than nfs, and it does:

/* O_TRUNC implies we need access checks for write permissions */
if (flag & O_TRUNC)
acc_mode |= MAY_WRITE;

Which should trigger the may_open() code.

later in open_namei():
...
ok:
error = may_open(nd, acc_mode, flag);
if (error)
goto exit;
return 0;


-- Dave

2007-09-26 18:59:28

by Miklos Szeredi

[permalink] [raw]
Subject: Re: missing mnt_drop_write() on open error

> > Btw, may_open() doesn't do mnt_want_write() around the truncation if
> > file is opened with O_TRUNC | O_RDONLY.
>
> What's the path to may_open() in that case? open_namei() should wrap
> all callers other than nfs, and it does:
>
> /* O_TRUNC implies we need access checks for write permissions */
> if (flag & O_TRUNC)
> acc_mode |= MAY_WRITE;
>
> Which should trigger the may_open() code.

Ah, I missed that.

Thanks,
Miklos