2008-08-01 00:40:35

by David Woodhouse

[permalink] [raw]
Subject: Re: [RFC] Reinstate NFS exportability for JFFS2.

On Fri, 2008-08-01 at 10:16 +1000, Neil Brown wrote:
> On Thursday July 31, [email protected] wrote:
> > On Fri, 2008-05-02 at 11:38 +1000, Neil Brown wrote:
> > > Why is there a deadlock here?
>
> I was really hoping you would answer this question.

It's because the nfsd readdirplus code recurses into the file system.
>From the file system's ->readdir() function we call back into nfsd's
encode_entry(), which then calls back into the file system's ->lookup()
function so that it can generate a filehandle.

For any file system which has its own internal locking -- which includes
jffs2, btrfs, xfs, jfs, gfs*, ocfs* and probably others -- that
recursive call back into the file system will deadlock.

In the specific case of JFFS2, we need that internal locking because of
lock ordering constraints with the garbage collection -- we need to take
the allocator lock _before_ the per-inode lock, which means we can't use
the generic inode->i_mutex for that purpose. That's documented in
fs/jffs2/README.Locking. I know fewer details about the other affected
file systems.

> I can see the sense in your approach, but it does still seem a bit
> hackish. I would like to understand the details of the problem enough
> to be confident that there is no less-hackish solution.

I was thinking that we could perhaps get away with an extension to
readdir() that passes the filehandle to its filldir callback too. I'm
not sure if that's sufficient for NFS4 though.

Perhaps we could add a ->lookup_fh() method which is guaranteed to be
called _only_ from within ->readdir(), so it can have (or _lack_) the
appropriate locking.

For now though, this approach moves the problem into the nfsd code where
it belongs -- the VFS never calls into the file system recursively like
this, in the general case.

--
dwmw2



2008-08-01 00:53:26

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC] Reinstate NFS exportability for JFFS2.

On Thu, Jul 31, 2008 at 8:40 PM, David Woodhouse <[email protected]> wrote:
> On Fri, 2008-08-01 at 10:16 +1000, Neil Brown wrote:
>> On Thursday July 31, [email protected] wrote:
>> > On Fri, 2008-05-02 at 11:38 +1000, Neil Brown wrote:
>> > > Why is there a deadlock here?
>>
>> I was really hoping you would answer this question.
>
> It's because the nfsd readdirplus code recurses into the file system.
> >From the file system's ->readdir() function we call back into nfsd's
> encode_entry(), which then calls back into the file system's ->lookup()
> function so that it can generate a filehandle.
>
> For any file system which has its own internal locking -- which includes
> jffs2, btrfs, xfs, jfs, gfs*, ocfs* and probably others -- that
> recursive call back into the file system will deadlock.
>
> In the specific case of JFFS2, we need that internal locking because of
> lock ordering constraints with the garbage collection -- we need to take
> the allocator lock _before_ the per-inode lock, which means we can't use
> the generic inode->i_mutex for that purpose. That's documented in
> fs/jffs2/README.Locking. I know fewer details about the other affected
> file systems.
>
>> I can see the sense in your approach, but it does still seem a bit
>> hackish. I would like to understand the details of the problem enough
>> to be confident that there is no less-hackish solution.
>
> I was thinking that we could perhaps get away with an extension to
> readdir() that passes the filehandle to its filldir callback too. I'm
> not sure if that's sufficient for NFS4 though.

For now it is sufficient, IMO. NFSv4 doesn't implement a readdirplus
operation, and the performance benefits of NFSv3 readdirplus are
equivocal -- there isn't a strong desire to replicate the complexity
of NFSv3 readdirplus in NFSv4. I'm not even sure you can do it even
with a single compound RPC, so even in the long run NFSv4 may not ever
have the locking issues that NFSv3 does here.

I agree with Neil, though -- as a reviewer, I think the architecture
of your solution is valid, but would need to audit these pretty
closely to get a sense of the individual correctness/appropriateness
of each change.

--
"Alright guard, begin the unnecessarily slow-moving dipping mechanism."
--Dr. Evil