2008-08-01 02:14:25

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC] Reinstate NFS exportability for JFFS2.

On Friday August 1, [email protected] wrote:
> On Fri, 2008-08-01 at 10:16 +1000, Neil Brown wrote:
> > On Thursday July 31, [email protected] wrote:
> > > On Fri, 2008-05-02 at 11:38 +1000, Neil Brown wrote:
> > > > Why is there a deadlock here?
> >
> > I was really hoping you would answer this question.
>
> It's because the nfsd readdirplus code recurses into the file system.
> >From the file system's ->readdir() function we call back into nfsd's
> encode_entry(), which then calls back into the file system's ->lookup()
> function so that it can generate a filehandle.
>
> For any file system which has its own internal locking -- which includes
> jffs2, btrfs, xfs, jfs, gfs*, ocfs* and probably others -- that
> recursive call back into the file system will deadlock.
>
> In the specific case of JFFS2, we need that internal locking because of
> lock ordering constraints with the garbage collection -- we need to take
> the allocator lock _before_ the per-inode lock, which means we can't use
> the generic inode->i_mutex for that purpose. That's documented in
> fs/jffs2/README.Locking. I know fewer details about the other affected
> file systems.

It sounds to me like the core problem here is that the locking regime
imposed by the VFS simply isn't suitable for JFFS2 .. and a bunch of
other filesystems.

This seems to suggest that the VFS should be changed.
Superficially, it seems that changing the locking rules so that the
callee takes i_mutex rather than the caller taking it would help and,
in the case of JFFS2, make f->sem redundant. Does that match your
understanding?

That is obviously a big change and one that should not be made
lightly, but if it was to benefit a number of the newer filesystems, then
it would seem like the appropriate way to go.

Clearly we need a short term solution too as we don't want to wait for
VFS locking rules to be renegotiated. The idea of a "lock is owned by
me" check is appealing to me as it is a small, localised change that
would easily be removed if/when the locking we "fixed" properly.

In the JFFS2 case I imagine this taking the following form:

- new field in jffs2_inode_info "struct task_struct *sem_owner",
initialised to NULL
- in jffs2_readdir after locking ->sem, set ->sem_owner to current.
- before unlocking ->sem, set ->sem_owner to NULL
- in jffs2_lookup, check if "dir_f->sem_owner == current"
If it does, set a flag reminding us not to drop the lock,
else mutex_lock(&dir_f->sem);

This should fix the problem with very little cost either in
performance or elegance. And if/when the locking rules were changed,
the accompanying code review would immediately notice this and remove
it.

NeilBrown