2008-09-23 17:04:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.

On Tue, Sep 23, 2008 at 12:33:09PM +0100, Ian Campbell wrote:
> On Tue, 2008-09-23 at 08:59 +0100, Ian Campbell wrote:
> > I've found that the problem was backported into the stable stream since
> > I cannot reproduce the issue with 2.6.26 but I can with 2.6.26.5. This
> > is quite useful since there are only 3 relevant looking changesets in
> > that range. I will bisect between these before confirming the culprit on
> > mainline.

Could you double-check that this is reproduceable with this commit
applied, and not reproduceable when it's not?

I suppose it's not impossible that this could be triggering the problem
in some very roundabout way, but it seems a bit out of left field--so I
wonder whether one of the bisection points could have gotten marked good
when it should have been bad, or vice-versa.

> It reports:
>
> daedfbe2a67628a40076a6c75fb945c60f608a2e is first bad commit
> commit daedfbe2a67628a40076a6c75fb945c60f608a2e
> Author: Trond Myklebust <[email protected]>
> Date: Wed Jun 11 17:39:04 2008 -0400
>
> NFS: Ensure we zap only the access and acl caches when setting new acls
>
> commit f41f741838480aeaa3a189cff6e210503cf9c42d upstream
>
> ...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the
> acls.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> I'm just about to build f41f741838480aeaa3a189cff6e210503cf9c42d and the
> one before and try those.
>
> I'm not using ACLs as far as I am aware.

I think commands like "ls" try to get posix acls these days, so it's
possible that the nfs3_proc_getacl code at least might be getting
called. Why that would matter I can't see.

--b.


2008-09-26 15:37:24

by Ian Campbell

[permalink] [raw]
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.

On Tue, 2008-09-23 at 13:03 -0400, J. Bruce Fields wrote:
> On Tue, Sep 23, 2008 at 12:33:09PM +0100, Ian Campbell wrote:
> > On Tue, 2008-09-23 at 08:59 +0100, Ian Campbell wrote:
> > > I've found that the problem was backported into the stable stream since
> > > I cannot reproduce the issue with 2.6.26 but I can with 2.6.26.5. This
> > > is quite useful since there are only 3 relevant looking changesets in
> > > that range. I will bisect between these before confirming the culprit on
> > > mainline.
>
> Could you double-check that this is reproduceable with this commit
> applied, and not reproduceable when it's not?

I've reproduced with exactly commit
f41f741838480aeaa3a189cff6e210503cf9c42d on trunk and am now running
2e96d2867245668dbdb973729288cf69b9fafa66 which is the changeset
immediately before.

> I suppose it's not impossible that this could be triggering the problem
> in some very roundabout way, but it seems a bit out of left field--so I
> wonder whether one of the bisection points could have gotten marked good
> when it should have been bad, or vice-versa.

It's possible, the good case is naturally quite hard to establish with
100% certainty. I declared v2.6.26 OK after an uptime of 4 days and 19
hours, compared with failure normally within 1-2 days. It's possible I
was premature in doing so. I'll run 2e96d2867 for at least a full week
before reporting back.

Ian.
--
Ian Campbell

Unix is mature OS, windows is still in diapers and they smell badly.
-- Rafael Skodlar


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part