2008-08-01 23:11:58

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC] Reinstate NFS exportability for JFFS2.

On Friday August 1, [email protected] wrote:
> On Fri, Aug 01, 2008 at 12:14:17PM +1000, Neil Brown wrote:
> > It sounds to me like the core problem here is that the locking regime
> > imposed by the VFS simply isn't suitable for JFFS2 .. and a bunch of
> > other filesystems.
> >
> > This seems to suggest that the VFS should be changed.
> > Superficially, it seems that changing the locking rules so that the
> > callee takes i_mutex rather than the caller taking it would help and,
> > in the case of JFFS2, make f->sem redundant. Does that match your
> > understanding?
> Huh? How could that possibly help? You are not changing the sequence
> of operations on locks, you only slightly modify the timing; how could
> that possibly eliminate a deadlock? Moreover, how the _hell_ could
> making exclusion area smaller make f->sem redundant?

By moving the locking of i_mutex inside the filesystem methods, I
reasoned that the filesystem would have more control and be able to do
things in the order that it wanted.

I gathered from the JFFS locking document - admitted a very brief
reading - that it the VFS took i_mutex at awkward times for JFFS2,
and the for this reason, JFFS2 introduced f->sem, described as:

"This is the JFFS2-internal equivalent of the inode mutex i->i_sem."

If i_mutex became internal, maybe JFFS2 wouldn't need an internal

But I did miss an important point.
The reason that the current code works with many filesystems is that
lookup_one_len *doesn't* take i_mutex before calling ->lookup. It
assumes that it is already held. So pushing the locking down into
->lookup would actually break other filesystems without fixing this
one. My Bad.

So it seems like recursion into the filesystem is a Bad Thing and we
need a different approach.
I really don't like NFSD having to cache all the names that it gets
from readdir and then hand them back to lookup one at a time, though
it might end up being the answer..

What about this (I know you're going to hate it)??

A new open flags: O_READDIRPLUS.
If ->readdir finds that it is set on filp->f_flags, then it does a
lookup on each name it finds and makes sure they are all in the
Then when NFSD calls lookup_one_len inside the filldir callback, it
will find the answer in the cache and not need to recurse into the

We would need some way to be sure that the new dentry didn't get
flushed from the dcache before NFSD called lookup_one_len. I guess
if ->readdir held a reference on it while filldir was called that
would be an adequate guarantee.

This would then make O_READDIRPLUS available to userspace too.
"ls -l" could use it and thus give the filesystem an opportunity to
optimise things.
->readdir could:
load a block of the directory
schedule reads for all the inodes in parallel
attach them to the dcache in series, using whatever locking
was needed and no more.

I'm actually starting to like the idea. Which means it is probably