2013-04-25 15:32:00

by Joakim Tjernlund

[permalink] [raw]
Subject: Re: NFS loop on 3.4.39

Joakim Tjernlund/Transmode wrote on 2013/04/24 15:16:26:
>
> "Myklebust, Trond" <[email protected]> wrote on 2013/04/23
16:18:07:
> >
> > On Tue, 2013-04-23 at 16:14 +0200, Joakim Tjernlund wrote:
> > > "Myklebust, Trond" <[email protected]> wrote on 2013/04/23
> > > 15:52:06:
> > > >
> > > > On Tue, 2013-04-23 at 15:38 +0200, Joakim Tjernlund wrote:
> > > > > So, it happened again. Just when hitting search on
bugs.gentoo.org in
> > > > > firefox 17.0.3
> > > > >
> > > > > This time I got a NFS loop with NFS4ERR_BAD_STATEID looping over
and
> > > over
> > > > > again and FF was hung. Not posting the logs as it does not
appear to
> > > > > do any good. Nothing in dmesg either.
> > > > >
> > > > > Noticed this patch on the NFS list:
> > > > > http://marc.info/?l=linux-nfs&m=136643651710066&w=2
> > > > > I wonder if that could be a potential cure and if so, could it
be
> > > > > backported to 3.4?
> > > >
> > > > It is in the testing branch on
> > > >
> > > > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary
> > > >
> > > > if you want to try it out. I'm not planning on backporting
anything that
> > > > hasn't been labelled with a Cc: stable in that branch.
> > >
> > > Well, we won't use tip of linus tree in production so there is
> > > little point to use your testing branch. However it looks like a
trivial
> > > backport so I can test it on my client easily.
> >
> > The point of testing would not be to discover if you can use Linus'
tree
> > in production, but rather to see if the problem is already fixed
> > upstream. If it is, we can bisect to figure out which patch is the
fix.
> >
> > > Even the NFS server if required, is the above referenced patch for
> > > NFS client/server or both? Any chance this is the culprit?
> >
> > That's a client patch.

> Tried 3.4.41+above nfs patch and also 3.8.8, they both have the
> NFS loop problem.
>
> Now I am at your
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
> testing branch
> With any luck the error will show soon.
>
> Question though the loop I see, could it be a NFS server bug ?
> If so it does matter what I do on my client I guess.

Ran http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary, testing
branch
for a day without problem.

Then I backed to 3.4.41 +
http://marc.info/?l=linux-nfs&m=136643651710066&w=2 +
http://marc.info/?l=linux-nfs&m=136674349127504&w=2
this morning, been using all day without problem. It is a good start
but not conclusive yet.

Is http://marc.info/?l=linux-nfs&m=136674349127504&w=2 supposed to
fix my type of problem?

Jocke





2013-04-25 15:59:04

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS loop on 3.4.39

On Thu, 2013-04-25 at 17:31 +0200, Joakim Tjernlund wrote:
> Joakim Tjernlund/Transmode wrote on 2013/04/24 15:16:26:
> >
> > "Myklebust, Trond" <[email protected]> wrote on 2013/04/23
> 16:18:07:
> > >
> > > On Tue, 2013-04-23 at 16:14 +0200, Joakim Tjernlund wrote:
> > > > "Myklebust, Trond" <[email protected]> wrote on 2013/04/23
> > > > 15:52:06:
> > > > >
> > > > > On Tue, 2013-04-23 at 15:38 +0200, Joakim Tjernlund wrote:
> > > > > > So, it happened again. Just when hitting search on
> bugs.gentoo.org in
> > > > > > firefox 17.0.3
> > > > > >
> > > > > > This time I got a NFS loop with NFS4ERR_BAD_STATEID looping over
> and
> > > > over
> > > > > > again and FF was hung. Not posting the logs as it does not
> appear to
> > > > > > do any good. Nothing in dmesg either.
> > > > > >
> > > > > > Noticed this patch on the NFS list:
> > > > > > http://marc.info/?l=linux-nfs&m=136643651710066&w=2
> > > > > > I wonder if that could be a potential cure and if so, could it
> be
> > > > > > backported to 3.4?
> > > > >
> > > > > It is in the testing branch on
> > > > >
> > > > > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary
> > > > >
> > > > > if you want to try it out. I'm not planning on backporting
> anything that
> > > > > hasn't been labelled with a Cc: stable in that branch.
> > > >
> > > > Well, we won't use tip of linus tree in production so there is
> > > > little point to use your testing branch. However it looks like a
> trivial
> > > > backport so I can test it on my client easily.
> > >
> > > The point of testing would not be to discover if you can use Linus'
> tree
> > > in production, but rather to see if the problem is already fixed
> > > upstream. If it is, we can bisect to figure out which patch is the
> fix.
> > >
> > > > Even the NFS server if required, is the above referenced patch for
> > > > NFS client/server or both? Any chance this is the culprit?
> > >
> > > That's a client patch.
>
> > Tried 3.4.41+above nfs patch and also 3.8.8, they both have the
> > NFS loop problem.
> >
> > Now I am at your
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
> > testing branch
> > With any luck the error will show soon.
> >
> > Question though the loop I see, could it be a NFS server bug ?
> > If so it does matter what I do on my client I guess.
>
> Ran http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary, testing
> branch
> for a day without problem.
>
> Then I backed to 3.4.41 +
> http://marc.info/?l=linux-nfs&m=136643651710066&w=2 +
> http://marc.info/?l=linux-nfs&m=136674349127504&w=2
> this morning, been using all day without problem. It is a good start
> but not conclusive yet.
>
> Is http://marc.info/?l=linux-nfs&m=136674349127504&w=2 supposed to
> fix my type of problem?

No. That's a follow up patch to commit
92b40e93849e29f9ca661de6442bb66282738bf7 (NFSv4: Use the open stateid if
the delegation has the wrong mode).


--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-25 16:12:33

by Joakim Tjernlund

[permalink] [raw]
Subject: Re: NFS loop on 3.4.39

"Myklebust, Trond" <[email protected]> wrote on 2013/04/25
17:59:01:
>
> On Thu, 2013-04-25 at 17:31 +0200, Joakim Tjernlund wrote:
> > Joakim Tjernlund/Transmode wrote on 2013/04/24 15:16:26:
> > >
> > > "Myklebust, Trond" <[email protected]> wrote on 2013/04/23
> > 16:18:07:
> > > >
> > > > On Tue, 2013-04-23 at 16:14 +0200, Joakim Tjernlund wrote:
> > > > > "Myklebust, Trond" <[email protected]> wrote on
2013/04/23
> > > > > 15:52:06:
> > > > > >
> > > > > > On Tue, 2013-04-23 at 15:38 +0200, Joakim Tjernlund wrote:
> > > > > > > So, it happened again. Just when hitting search on
> > bugs.gentoo.org in
> > > > > > > firefox 17.0.3
> > > > > > >
> > > > > > > This time I got a NFS loop with NFS4ERR_BAD_STATEID looping
over
> > and
> > > > > over
> > > > > > > again and FF was hung. Not posting the logs as it does not
> > appear to
> > > > > > > do any good. Nothing in dmesg either.
> > > > > > >
> > > > > > > Noticed this patch on the NFS list:
> > > > > > > http://marc.info/?l=linux-nfs&m=136643651710066&w=2
> > > > > > > I wonder if that could be a potential cure and if so, could
it
> > be
> > > > > > > backported to 3.4?
> > > > > >
> > > > > > It is in the testing branch on
> > > > > >
> > > > > > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary
> > > > > >
> > > > > > if you want to try it out. I'm not planning on backporting
> > anything that
> > > > > > hasn't been labelled with a Cc: stable in that branch.
> > > > >
> > > > > Well, we won't use tip of linus tree in production so there is
> > > > > little point to use your testing branch. However it looks like a

> > trivial
> > > > > backport so I can test it on my client easily.
> > > >
> > > > The point of testing would not be to discover if you can use
Linus'
> > tree
> > > > in production, but rather to see if the problem is already fixed
> > > > upstream. If it is, we can bisect to figure out which patch is the

> > fix.
> > > >
> > > > > Even the NFS server if required, is the above referenced patch
for
> > > > > NFS client/server or both? Any chance this is the culprit?
> > > >
> > > > That's a client patch.
> >
> > > Tried 3.4.41+above nfs patch and also 3.8.8, they both have the
> > > NFS loop problem.
> > >
> > > Now I am at your
> > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
> > > testing branch
> > > With any luck the error will show soon.
> > >
> > > Question though the loop I see, could it be a NFS server bug ?
> > > If so it does matter what I do on my client I guess.
> >
> > Ran http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
testing
> > branch
> > for a day without problem.
> >
> > Then I backed to 3.4.41 +
> > http://marc.info/?l=linux-nfs&m=136643651710066&w=2 +
> > http://marc.info/?l=linux-nfs&m=136674349127504&w=2
> > this morning, been using all day without problem. It is a good start
> > but not conclusive yet.
> >
> > Is http://marc.info/?l=linux-nfs&m=136674349127504&w=2 supposed to
> > fix my type of problem?
>
> No. That's a follow up patch to commit
> 92b40e93849e29f9ca661de6442bb66282738bf7 (NFSv4: Use the open stateid if
> the delegation has the wrong mode).

hmm, that commit is the first one I listed,
http://marc.info/?l=linux-nfs&m=136643651710066&w=2
and I know that using only that one does NOT fix the problem. I was hoping
that both of them could
be the answer?

Jocke