2013-07-25 13:52:09

by Larry Keegan

[permalink] [raw]
Subject: nfs client: Now you see it, now you don't

Dear Chaps,

I am experiencing some inexplicable NFS behaviour which I would like to
run past you.

I have a linux NFS server running kernel 3.10.2 and some clients
running the same. The server is actually a pair of identical
machines serving up a small number of ext4 filesystems atop drbd. They
don't do much apart from serve home directories and deliver mail
into them. These have worked just fine for aeons.

The problem I am seeing is that for the past month or so, on and off,
one NFS client starts reporting stale NFS file handles on some part of
the directory tree exported by the NFS server. During the outage the
other parts of the same export remain unaffected. Then, some ten
minutes to an hour later they're back to normal. Access to the affected
sub-directories remains possible from the server (both directly and via
nfs) and from other clients. There do not appear to be any errors on
the underlying ext4 filesystems.

Each NFS client seems to get the heebie-jeebies over some directory or
other pretty much independently. The problem affects all of the
filesystems exported by the NFS server, but clearly I notice it first
in home directories, and in particular in my dot subdirectories for
things like my mail client and browser. I'd say something's up the
spout about 20% of the time.

The server and clients are using nfs4, although for a while I tried
nfs3 without any appreciable difference. I do not have CONFIG_FSCACHE
set.

I wonder if anyone could tell me if they have ever come across this
before, or what debugging settings might help me diagnose the problem?

Yours,

Larry


2013-07-26 13:11:53

by Jeff Layton

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, 26 Jul 2013 12:41:01 +0000
Larry Keegan <[email protected]> wrote:

> On Thu, 25 Jul 2013 14:18:28 -0400
> Jeff Layton <[email protected]> wrote:
> > On Thu, 25 Jul 2013 17:05:26 +0000
> > Larry Keegan <[email protected]> wrote:
> >
> > > On Thu, 25 Jul 2013 10:11:43 -0400
> > > Jeff Layton <[email protected]> wrote:
> > > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > > Larry Keegan <[email protected]> wrote:
> > > >
> > > > > Dear Chaps,
> > > > >
> > > > > I am experiencing some inexplicable NFS behaviour which I would
> > > > > like to run past you.
> > > >
> > > > Were these machines running older kernels before this started
> > > > happening? What kernel did you upgrade from if so?
> > > >
> > > [snip out my long rambling reply]
> > > > What might be helpful is to do some network captures when the
> > > > problem occurs. What we want to know is whether the ESTALE errors
> > > > are coming from the server, or if the client is generating them.
> > > > That'll narrow down where we need to look for problems.
> > >
> > > As it was giving me gyp during typing I tried to capture some NFS
> > > traffic. Unfortunately claws-mail started a mail box check in the
> > > middle of this and the problem disappeared! Normally it's claws
> > > which starts this. It'll come along again soon enough and I'll send
> > > a trace.
> > >
> > Ok, we had a number of changes to how ESTALE errors are handled over
> > the last few releases. When you mentioned 3.10, I had assumed that you
> > might be hitting a regression in one of those, but those went in well
> > after the 3.4 series.
> >
> > Captures are probably your best bet. My suspicion is that the server
> > is returning these ESTALE errors occasionally, but it would be best to
> > have you confirm that. They may also help make sense of why it's
> > occurring...
> >
>
> Dear Jeff,
>
> I now have a good and a bad packet capture. I can run them through
> tshark -V but if I do this, they're really long, so I'm wondering how
> best to post them. I've posted the summaries below.
>
> The set-up is as follows: I'm running a few xterms on my desktop (the
> affected client) as well as claws-mail using the mailmbox plugin.
> Claws keeps a cache of the mailbox in .clawsmail/tagsdb/<foldername>.
> From time to time I blast a load of mail into these mail boxes using
> procmail. This seems to demonstrate the problem most of the time. After
> a few minutes everything gets back to normal.
>
> The actual mail is being delivered on my file server pair directly
> into /home/larry/Mail/<foldername>. Both file servers use automount to
> mount the same filesystem and attempt to deliver mail into the boxes
> simultaneously. Clearly the .lock files stop them stomping on each
> other. This works well.
>
> When it's in the mood to work, the test session on my desktop looks
> like this:
>
> # ls .claws-mail/tagsdb
> #mailmbox #mh
> # _
>
> When it doesn't it looks like this:
>
> # ls .claws-mail/tagsdb
> ls: cannot open directory .claws-mail/tagsdb: Stale NFS file handle
> # _
>
> I captured the packets on the network desktop. All else was quiet on
> the network, at least as far as TCP traffic was concerned. Here are the
> summaries:
>
> # tshark -r good tcp
> 10 1.304139000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
> 11 1.304653000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 10) ACCESS, [Allowed: RD LU MD XT DL]
> 12 1.304694000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=173 Ack=129 Win=3507 Len=0 TSval=119293240 TSecr=440910222
> 13 1.304740000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> 14 1.305225000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 13) LOOKUP
> 15 1.305283000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
> 16 1.305798000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 15) ACCESS, [Allowed: RD LU MD XT DL]
> 17 1.305835000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> 18 1.306330000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 17) LOOKUP
> 19 1.306373000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0x445c531a
> 20 1.306864000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 19) GETATTR
> 21 1.346003000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=877 Ack=941 Win=3507 Len=0 TSval=119293282 TSecr=440910225
> # tshark -r bad tcp
> 14 2.078769000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x76aee435, [Check: RD LU MD XT DL]
> 15 2.079266000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 14) ACCESS, [Allowed: RD LU MD XT DL]
> 16 2.079296000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=173 Ack=129 Win=3507 Len=0 TSval=180576023 TSecr=502193004
> 17 2.079338000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
> 18 2.079797000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 17) ACCESS, [Allowed: RD LU MD XT DL]
> 19 2.079834000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> 20 2.080331000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 19) GETATTR
> 21 2.080410000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> 22 2.080903000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 21) LOOKUP
> 23 2.080982000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> 24 2.081477000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 23) GETATTR
> 25 2.081509000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> 26 2.082010000 10.1.1.173 -> 10.1.1.139 NFS 178 V4 Reply (Call In 25) GETATTR
> 27 2.082040000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> 28 2.082542000 10.1.1.173 -> 10.1.1.139 NFS 142 V4 Reply (Call In 27) GETATTR
> 29 2.089525000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> 30 2.089996000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 29) GETATTR
> 31 2.090028000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> 32 2.090529000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 31) GETATTR
> 33 2.090577000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0x4e5465ab
> 34 2.091061000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 33) GETATTR
> 35 2.091110000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> 36 2.091593000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 35) LOOKUP
> 37 2.091657000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> 38 2.092126000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 37) GETATTR
> 39 2.092157000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> 40 2.092658000 10.1.1.173 -> 10.1.1.139 NFS 178 V4 Reply (Call In 39) GETATTR
> 41 2.092684000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> 42 2.093150000 10.1.1.173 -> 10.1.1.139 NFS 142 V4 Reply (Call In 41) GETATTR
> 43 2.100520000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> 44 2.101014000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 43) GETATTR
> 45 2.101040000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> 46 2.101547000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 45) GETATTR
> 47 2.141500000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=2657 Ack=2289 Win=3507 Len=0 TSval=180576086 TSecr=502193026
> # _
>
> The first thing that strikes me is the bad trace is much longer. This
> strikes me as reasonable because as well as the ESTALE problem I've
> noticed that the whole system seems sluggish. claws-mail is
> particularly so because it keeps saving my typing into a drafts
> mailbox, and because claws doesn't really understand traditional
> mboxes, it spends an inordinate amount of time locking and unlocking
> the boxes for each message in them. Claws also spews tracebacks
> frequently and it crashes from time to time, something it never did
> before the ESTALE problem occurred.
>
> Yours,
>
> Larry

I'm afraid I can't tell much from the above output. I don't see any
ESTALE errors there, but you can get similar issues if (for instance)
certain attributes of a file change. You mentioned that this is a DRBD
cluster, are you "floating" IP addresses between cluster nodes here? If
so, do your problems occur around the times that that's happening?

Also, what sort of filesystem is being exported here?

--
Jeff Layton <[email protected]>

2013-07-25 18:18:32

by Jeff Layton

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Thu, 25 Jul 2013 17:05:26 +0000
Larry Keegan <[email protected]> wrote:

> On Thu, 25 Jul 2013 10:11:43 -0400
> Jeff Layton <[email protected]> wrote:
> > On Thu, 25 Jul 2013 13:45:15 +0000
> > Larry Keegan <[email protected]> wrote:
> >
> > > Dear Chaps,
> > >
> > > I am experiencing some inexplicable NFS behaviour which I would
> > > like to run past you.
> > >
> > > I have a linux NFS server running kernel 3.10.2 and some clients
> > > running the same. The server is actually a pair of identical
> > > machines serving up a small number of ext4 filesystems atop drbd.
> > > They don't do much apart from serve home directories and deliver
> > > mail into them. These have worked just fine for aeons.
> > >
> > > The problem I am seeing is that for the past month or so, on and
> > > off, one NFS client starts reporting stale NFS file handles on some
> > > part of the directory tree exported by the NFS server. During the
> > > outage the other parts of the same export remain unaffected. Then,
> > > some ten minutes to an hour later they're back to normal. Access to
> > > the affected sub-directories remains possible from the server (both
> > > directly and via nfs) and from other clients. There do not appear
> > > to be any errors on the underlying ext4 filesystems.
> > >
> > > Each NFS client seems to get the heebie-jeebies over some directory
> > > or other pretty much independently. The problem affects all of the
> > > filesystems exported by the NFS server, but clearly I notice it
> > > first in home directories, and in particular in my dot
> > > subdirectories for things like my mail client and browser. I'd say
> > > something's up the spout about 20% of the time.
> > >
> > > The server and clients are using nfs4, although for a while I tried
> > > nfs3 without any appreciable difference. I do not have
> > > CONFIG_FSCACHE set.
> > >
> > > I wonder if anyone could tell me if they have ever come across this
> > > before, or what debugging settings might help me diagnose the
> > > problem?
> > >
> > > Yours,
> > >
> > > Larry
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-nfs" in the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> > Were these machines running older kernels before this started
> > happening? What kernel did you upgrade from if so?
> >
>
> Dear Jeff,
>
> The full story is this:
>
> I had a pair of boxes running kernel 3.4.3 with the aforementioned drbd
> pacemaker malarkey and some clients running the same.
>
> Then I upgraded the machines by moving from plain old dos partitions to
> gpt. This necessitated a complete reload of everything, but there were
> no software changes. I can be sure that nothing else was changed
> because I build my entire operating system in one ginormous makefile.
>
> Rapidly afterwards I switched the motherboards for ones with more PCI
> slots. There were no software changes except those relating to MAC
> addresses.
>
> Next I moved from 100Mbit to gigabit hubs. Then the problems started.
>
> The symptoms were much as I've described but I didn't see them that
> way. Instead I assumed the entire filesystem had gone to pot and tried
> to unmount it from the client. Fatal mistake. umount hung. I was left
> with an entry in /proc/mounts showing the affected mountpoints as
> "/home/larry\040(deleted)" for example. It was impossible to get rid of
> this and I had to reboot the box. Unfortunately the problem
> snowballed and affected all my NFS clients and the file servers, so
> they had to be bounced too.
>
> Anyway, to cut a long story short, this problem seemed to me to be a
> file server problem so I replaced network cards, swapped hubs,
> checked filesystems, you name it, but I never experienced any actual
> network connectivity problems, only NFS problems. As I had kernel 3.4.4
> upgrade scheduled I upgraded all the hosts. No change.
>
> Then I upgraded everything to kernel 3.4.51. No change.
>
> Then I tried mounting using NFS version 3. It could be argued the
> frequency of gyp reduced, but the substance remained.
>
> Then I bit the bullet and tried kernel 3.10. No change. I noticed that
> NFS_V4_1 was on so I turned it off and re-tested. No change. Then
> I tried 3.10.1 and 3.10.2. No change.
>
> I've played with the kernel options to remove FSCACHE, not that I was
> using it, and that's about it.
>
> Are there any (client or server) kernel options which I should know
> about?
>
> > What might be helpful is to do some network captures when the problem
> > occurs. What we want to know is whether the ESTALE errors are coming
> > from the server, or if the client is generating them. That'll narrow
> > down where we need to look for problems.
>
> As it was giving me gyp during typing I tried to capture some NFS
> traffic. Unfortunately claws-mail started a mail box check in the
> middle of this and the problem disappeared! Normally it's claws which
> starts this. It'll come along again soon enough and I'll send a trace.
>
> Thank you for your help.
>
> Yours,
>
> Larry.

Ok, we had a number of changes to how ESTALE errors are handled over
the last few releases. When you mentioned 3.10, I had assumed that you
might be hitting a regression in one of those, but those went in well
after the 3.4 series.

Captures are probably your best bet. My suspicion is that the server is
returning these ESTALE errors occasionally, but it would be best to
have you confirm that. They may also help make sense of why it's
occurring...

--
Jeff Layton <[email protected]>

2013-07-31 19:50:22

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Wed, 31 Jul 2013 10:03:28 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Fri, Jul 26, 2013 at 10:25:10PM +0000, Larry Keegan wrote:
> > As far as NFS client arrangements are concerned, both of the NFS
> > server machines also function as NFS clients, so /home/larry works
> > on them in the same way as it does on any other NFS client on the
> > network. It is just that the NFS servers also run my postfix MTAs.
>
> It's unrelated to your ESTALE problem, but note that a setup like this
> may be prone to deadlock. (The client may need to write to the server
> to free up memory. The server may need memory to service the write.
> If the server and client are on the same machine, this can deadlock.)
>
> --b.
>

Dear Bruce,

Perhaps you can clear something up for me. If I understand you
correctly, the following commands might lead to deadlock:

nfsserver# mount localhost:/filesystem /mnt
nfsserver# memory-eater &
[1] 1234
nfsserver# echo tip it over the edge > /mnt/file

but that it won't deadlock if there is memory to spare. The reason I
ask is I'd always assumed that any 'spare' memory in an active Linux
system would end up being consumed by the disc cache, and that the
cached pages are discarded or copied to disc when other parts of the
system want memory (or sooner), assuming there is memory available to do
that.

What I'm asking is whether this deadlock scenario is 'prone' to occur
whenever there are insufficient reclaimable pages free or whether this
can occur before that point? Can this deadlock occur even if the cache
is large enough to ensure that most of what it contains has been
written to disc already? IOW, ignoring the other parts of the O/S, if a
programme writes 100MB/sec maximum to an NFS mounted directory on the
same machine, and the NFS server commits its data to disc within 10
seconds say, would 4GB of RAM provide enough headroom to make this
deadlock unlikely?

Yours,

Larry.

2013-07-25 17:05:31

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Thu, 25 Jul 2013 10:11:43 -0400
Jeff Layton <[email protected]> wrote:
> On Thu, 25 Jul 2013 13:45:15 +0000
> Larry Keegan <[email protected]> wrote:
>
> > Dear Chaps,
> >
> > I am experiencing some inexplicable NFS behaviour which I would
> > like to run past you.
> >
> > I have a linux NFS server running kernel 3.10.2 and some clients
> > running the same. The server is actually a pair of identical
> > machines serving up a small number of ext4 filesystems atop drbd.
> > They don't do much apart from serve home directories and deliver
> > mail into them. These have worked just fine for aeons.
> >
> > The problem I am seeing is that for the past month or so, on and
> > off, one NFS client starts reporting stale NFS file handles on some
> > part of the directory tree exported by the NFS server. During the
> > outage the other parts of the same export remain unaffected. Then,
> > some ten minutes to an hour later they're back to normal. Access to
> > the affected sub-directories remains possible from the server (both
> > directly and via nfs) and from other clients. There do not appear
> > to be any errors on the underlying ext4 filesystems.
> >
> > Each NFS client seems to get the heebie-jeebies over some directory
> > or other pretty much independently. The problem affects all of the
> > filesystems exported by the NFS server, but clearly I notice it
> > first in home directories, and in particular in my dot
> > subdirectories for things like my mail client and browser. I'd say
> > something's up the spout about 20% of the time.
> >
> > The server and clients are using nfs4, although for a while I tried
> > nfs3 without any appreciable difference. I do not have
> > CONFIG_FSCACHE set.
> >
> > I wonder if anyone could tell me if they have ever come across this
> > before, or what debugging settings might help me diagnose the
> > problem?
> >
> > Yours,
> >
> > Larry
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-nfs" in the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Were these machines running older kernels before this started
> happening? What kernel did you upgrade from if so?
>

Dear Jeff,

The full story is this:

I had a pair of boxes running kernel 3.4.3 with the aforementioned drbd
pacemaker malarkey and some clients running the same.

Then I upgraded the machines by moving from plain old dos partitions to
gpt. This necessitated a complete reload of everything, but there were
no software changes. I can be sure that nothing else was changed
because I build my entire operating system in one ginormous makefile.

Rapidly afterwards I switched the motherboards for ones with more PCI
slots. There were no software changes except those relating to MAC
addresses.

Next I moved from 100Mbit to gigabit hubs. Then the problems started.

The symptoms were much as I've described but I didn't see them that
way. Instead I assumed the entire filesystem had gone to pot and tried
to unmount it from the client. Fatal mistake. umount hung. I was left
with an entry in /proc/mounts showing the affected mountpoints as
"/home/larry\040(deleted)" for example. It was impossible to get rid of
this and I had to reboot the box. Unfortunately the problem
snowballed and affected all my NFS clients and the file servers, so
they had to be bounced too.

Anyway, to cut a long story short, this problem seemed to me to be a
file server problem so I replaced network cards, swapped hubs,
checked filesystems, you name it, but I never experienced any actual
network connectivity problems, only NFS problems. As I had kernel 3.4.4
upgrade scheduled I upgraded all the hosts. No change.

Then I upgraded everything to kernel 3.4.51. No change.

Then I tried mounting using NFS version 3. It could be argued the
frequency of gyp reduced, but the substance remained.

Then I bit the bullet and tried kernel 3.10. No change. I noticed that
NFS_V4_1 was on so I turned it off and re-tested. No change. Then
I tried 3.10.1 and 3.10.2. No change.

I've played with the kernel options to remove FSCACHE, not that I was
using it, and that's about it.

Are there any (client or server) kernel options which I should know
about?

> What might be helpful is to do some network captures when the problem
> occurs. What we want to know is whether the ESTALE errors are coming
> from the server, or if the client is generating them. That'll narrow
> down where we need to look for problems.

As it was giving me gyp during typing I tried to capture some NFS
traffic. Unfortunately claws-mail started a mail box check in the
middle of this and the problem disappeared! Normally it's claws which
starts this. It'll come along again soon enough and I'll send a trace.

Thank you for your help.

Yours,

Larry.

2013-07-25 14:11:11

by Jeff Layton

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Thu, 25 Jul 2013 13:45:15 +0000
Larry Keegan <[email protected]> wrote:

> Dear Chaps,
>
> I am experiencing some inexplicable NFS behaviour which I would like to
> run past you.
>
> I have a linux NFS server running kernel 3.10.2 and some clients
> running the same. The server is actually a pair of identical
> machines serving up a small number of ext4 filesystems atop drbd. They
> don't do much apart from serve home directories and deliver mail
> into them. These have worked just fine for aeons.
>
> The problem I am seeing is that for the past month or so, on and off,
> one NFS client starts reporting stale NFS file handles on some part of
> the directory tree exported by the NFS server. During the outage the
> other parts of the same export remain unaffected. Then, some ten
> minutes to an hour later they're back to normal. Access to the affected
> sub-directories remains possible from the server (both directly and via
> nfs) and from other clients. There do not appear to be any errors on
> the underlying ext4 filesystems.
>
> Each NFS client seems to get the heebie-jeebies over some directory or
> other pretty much independently. The problem affects all of the
> filesystems exported by the NFS server, but clearly I notice it first
> in home directories, and in particular in my dot subdirectories for
> things like my mail client and browser. I'd say something's up the
> spout about 20% of the time.
>
> The server and clients are using nfs4, although for a while I tried
> nfs3 without any appreciable difference. I do not have CONFIG_FSCACHE
> set.
>
> I wonder if anyone could tell me if they have ever come across this
> before, or what debugging settings might help me diagnose the problem?
>
> Yours,
>
> Larry
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Were these machines running older kernels before this started
happening? What kernel did you upgrade from if so?

What might be helpful is to do some network captures when the problem
occurs. What we want to know is whether the ESTALE errors are coming
from the server, or if the client is generating them. That'll narrow
down where we need to look for problems.

--
Jeff Layton <[email protected]>

2013-07-31 20:35:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Wed, Jul 31, 2013 at 07:50:17PM +0000, Larry Keegan wrote:
> On Wed, 31 Jul 2013 10:03:28 -0400
> "J. Bruce Fields" <[email protected]> wrote:
> > On Fri, Jul 26, 2013 at 10:25:10PM +0000, Larry Keegan wrote:
> > > As far as NFS client arrangements are concerned, both of the NFS
> > > server machines also function as NFS clients, so /home/larry works
> > > on them in the same way as it does on any other NFS client on the
> > > network. It is just that the NFS servers also run my postfix MTAs.
> >
> > It's unrelated to your ESTALE problem, but note that a setup like this
> > may be prone to deadlock. (The client may need to write to the server
> > to free up memory. The server may need memory to service the write.
> > If the server and client are on the same machine, this can deadlock.)
> >
> > --b.
> >
>
> Dear Bruce,
>
> Perhaps you can clear something up for me. If I understand you
> correctly, the following commands might lead to deadlock:
>
> nfsserver# mount localhost:/filesystem /mnt
> nfsserver# memory-eater &
> [1] 1234
> nfsserver# echo tip it over the edge > /mnt/file
>
> but that it won't deadlock if there is memory to spare. The reason I
> ask is I'd always assumed that any 'spare' memory in an active Linux
> system would end up being consumed by the disc cache, and that the
> cached pages are discarded or copied to disc when other parts of the

Note if the pages are backed by NFS files then "copying to disk" may
mean writing to the NFS server.

> system want memory (or sooner), assuming there is memory available to do
> that.
>
> What I'm asking is whether this deadlock scenario is 'prone' to occur
> whenever there are insufficient reclaimable pages free or whether this
> can occur before that point? Can this deadlock occur even if the cache
> is large enough to ensure that most of what it contains has been
> written to disc already? IOW, ignoring the other parts of the O/S, if a
> programme writes 100MB/sec maximum to an NFS mounted directory on the
> same machine, and the NFS server commits its data to disc within 10
> seconds say, would 4GB of RAM provide enough headroom to make this
> deadlock unlikely?

Sorry, I don't know.

--b.

2013-07-31 14:03:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, Jul 26, 2013 at 10:25:10PM +0000, Larry Keegan wrote:
> On Fri, 26 Jul 2013 11:02:22 -0400
> "J. Bruce Fields" <[email protected]> wrote:
> > On Fri, Jul 26, 2013 at 09:12:25AM -0400, Jeff Layton wrote:
> > > On Fri, 26 Jul 2013 12:41:01 +0000
> > > Larry Keegan <[email protected]> wrote:
> > > > I now have a good and a bad packet capture. I can run them through
> > > > tshark -V but if I do this, they're really long, so I'm wondering
> > > > how best to post them. I've posted the summaries below.
> > > >
> > > > The set-up is as follows: I'm running a few xterms on my desktop
> > > > (the affected client) as well as claws-mail using the mailmbox
> > > > plugin. Claws keeps a cache of the mailbox
> > > > in .clawsmail/tagsdb/<foldername>. From time to time I blast a
> > > > load of mail into these mail boxes using procmail. This seems to
> > > > demonstrate the problem most of the time. After a few minutes
> > > > everything gets back to normal.
> > > >
> > > > The actual mail is being delivered on my file server pair directly
> > > > into /home/larry/Mail/<foldername>. Both file servers use
> > > > automount to mount the same filesystem
> >
> > Wait, I'm confused: that sounds like you're mounting the same ext4
> > filesystem from two different machines?
> >
> > --b.
> >
>
> Dear Bruce,
>
> I'm sorry, I didn't express myself clearly enough. I described my
> server-side NFS arrangements a few hours ago in a note to Jeff Layton.
> (I'm afraid I didn't catch your email until just now - NFS problems,
> you know). In summary, whereas I do have two NFS servers, only one has
> the filesystems mounted and exported at a time. The other just sees the
> underlying drbd device in secondary mode.

Got it, thanks for the clarification.

> It merely keeps the data on
> the block device up-to-date for when I cut over to it. I pretty much
> never do this unless I wish to reboot the active NFS server. To all
> intents and purposes, I only have one NFS server. I purposefully didn't
> use primary-primary drbd replication with OCFS or GFS2 because it
> was all too new when I set this up.
>
> As far as NFS client arrangements are concerned, both of the NFS server
> machines also function as NFS clients, so /home/larry works on them in
> the same way as it does on any other NFS client on the network. It is
> just that the NFS servers also run my postfix MTAs.

It's unrelated to your ESTALE problem, but note that a setup like this
may be prone to deadlock. (The client may need to write to the server
to free up memory. The server may need memory to service the write. If
the server and client are on the same machine, this can deadlock.)

--b.

> In turn, postfix
> delivers mail to my (multiple) inboxes under /home/larry/Mail/whatever.
>
> Mail comes in from my perimeter mail boxes is in round-robin fashion to
> both the NFS server/client/postfix MTA machines, so inevitably yes, both
> of these machines automount /home/larry most of the time, but this is no
> different from, say my desktop computer which is also
> mounting /home/larry.
>
> The point about email delivery is that in this arrangement both my NFS
> server computers, with purely their NFS client hats on, contend to
> deliver messages into the same mail files on the same filesystem served
> by just one NFS server. For instance, the linux-nfs mailing list traffic
> all goes into one file. This often causes a lot of thumb-twiddling
> whilst the procmails try to get the lock on the mail file. It's all
> happens too fast for me to notice, but I'm sure procmail would rather
> be sunning itself on the beach or something.
>
> The reason why email delivery seems to be a pretty consistent trigger
> for this problem is: a) there's bugger all else going on with all these
> NFS problems, b) the fact that the two NFS server/client/postfix boxes
> and claws-mail on my desktop box are all investigating and modifying
> the same files all the time, and c) I'm deliberately holding back
> inbound mail and releasing it in large batches to try to exercise this
> problem.
>
> The odd thing is that I haven't (yet) had any problems with the
> mailboxes themselves, only the state-files and caches that claws mail
> keeps under /home/whatever/.clawsmail. These are only ever accessed
> from my desktop machine.
>
> Yours,
>
> larry

2013-07-26 16:10:52

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, 26 Jul 2013 09:12:25 -0400
Jeff Layton <[email protected]> wrote:
> On Fri, 26 Jul 2013 12:41:01 +0000
> Larry Keegan <[email protected]> wrote:
>
> > On Thu, 25 Jul 2013 14:18:28 -0400
> > Jeff Layton <[email protected]> wrote:
> > > On Thu, 25 Jul 2013 17:05:26 +0000
> > > Larry Keegan <[email protected]> wrote:
> > >
> > > > On Thu, 25 Jul 2013 10:11:43 -0400
> > > > Jeff Layton <[email protected]> wrote:
> > > > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > > > Larry Keegan <[email protected]> wrote:
> > > > >
> > > > > > Dear Chaps,
> > > > > >
> > > > > > I am experiencing some inexplicable NFS behaviour which I
> > > > > > would like to run past you.
> > > > > What might be helpful is to do some network captures when the
> > > > > problem occurs. What we want to know is whether the ESTALE
> > > > > errors are coming from the server, or if the client is
> > > > > generating them. That'll narrow down where we need to look
> > > > > for problems.
> > > Ok, we had a number of changes to how ESTALE errors are handled
> > > over the last few releases. When you mentioned 3.10, I had
> > > assumed that you might be hitting a regression in one of those,
> > > but those went in well after the 3.4 series.
> > >
> > > Captures are probably your best bet. My suspicion is that the
> > > server is returning these ESTALE errors occasionally, but it
> > > would be best to have you confirm that. They may also help make
> > > sense of why it's occurring...
> > I now have a good and a bad packet capture. I can run them through
> > tshark -V but if I do this, they're really long, so I'm wondering
> > how best to post them. I've posted the summaries below.
> >
> > The first thing that strikes me is the bad trace is much longer.
> > This strikes me as reasonable because as well as the ESTALE problem
> > I've noticed that the whole system seems sluggish. claws-mail is
> > particularly so because it keeps saving my typing into a drafts
> > mailbox, and because claws doesn't really understand traditional
> > mboxes, it spends an inordinate amount of time locking and unlocking
> > the boxes for each message in them. Claws also spews tracebacks
> > frequently and it crashes from time to time, something it never did
> > before the ESTALE problem occurred.
> I'm afraid I can't tell much from the above output. I don't see any
> ESTALE errors there, but you can get similar issues if (for instance)
> certain attributes of a file change.

Such as might occur due to mail delivery?

> You mentioned that this is a DRBD
> cluster, are you "floating" IP addresses between cluster nodes here?
> If so, do your problems occur around the times that that's happening?
>
> Also, what sort of filesystem is being exported here?
>
The way my NFS servers are configured is as follows:

I have two identical boxes. They run lvm. There are two lvs on each
box called outer-nfs0 and outer-nfs1. These are kept in sync with drbd.
The content of these volumes are encrypted with dmcrypt. The plaintext
of each volume is a pv. I have two inner volume groups, named nfs0 and
nfs1. These each contain one of those pvs. They are sliced into a dozen
or so lvs. The lvs each contain ext4 filesystems. Each filesystem
contains one or more home directories. Although each filesystem is
exported in its entirety, autofs only mounts subdirectories (for
example /home/larry on fs-nfs0:/export/nfs0/home00/larry). Exports are
arranged by editing the exports file and running 'exportfs -r' so
userspace is always in sync with the kernel.

Each nfs volume group is associated with its own IP address which is
switched along with the volume group. So, when one of my boxes can see
volume group nfs0 it will mount the volumes inside it and export all the
filesystems on that volume group via its own ip address. Thus, one
fileserver can export nothing, a dozen filesystems or two dozen
filesystems. The automounter map only ever refers to the switchable ip
addresses.

This arrangement keeps the complexity of the dmcrypt stuff low and is
moderately nippy. As for the switchover, I've merely arranged pacemaker
to 'ip addr del' and 'ip addr add' the switchable IP addresses, blast
out a few ARPs and Bob's you're uncle. Occasionally I get a machine
which hangs for a couple of minutes, but mostly it's just a few
seconds. Until recently I haven't seen ESTALE errors.

The way I see it, as far as our discussion goes, it looks like I have a
single NFS server with three IP addresses, and the server happens to
copy its data to another server just in case. I haven't switched over
since I last upgraded.

Having said that, I can see where you're coming from. My particular
configuration is unnecessarily complicated for testing this problem.
I shall configure some other boxes more straightforwardly and hammer
them. Are there any good nfs stress-tests you can suggest?

Yours,

Larry.

2013-07-26 22:25:17

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, 26 Jul 2013 11:02:22 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Fri, Jul 26, 2013 at 09:12:25AM -0400, Jeff Layton wrote:
> > On Fri, 26 Jul 2013 12:41:01 +0000
> > Larry Keegan <[email protected]> wrote:
> > > I now have a good and a bad packet capture. I can run them through
> > > tshark -V but if I do this, they're really long, so I'm wondering
> > > how best to post them. I've posted the summaries below.
> > >
> > > The set-up is as follows: I'm running a few xterms on my desktop
> > > (the affected client) as well as claws-mail using the mailmbox
> > > plugin. Claws keeps a cache of the mailbox
> > > in .clawsmail/tagsdb/<foldername>. From time to time I blast a
> > > load of mail into these mail boxes using procmail. This seems to
> > > demonstrate the problem most of the time. After a few minutes
> > > everything gets back to normal.
> > >
> > > The actual mail is being delivered on my file server pair directly
> > > into /home/larry/Mail/<foldername>. Both file servers use
> > > automount to mount the same filesystem
>
> Wait, I'm confused: that sounds like you're mounting the same ext4
> filesystem from two different machines?
>
> --b.
>

Dear Bruce,

I'm sorry, I didn't express myself clearly enough. I described my
server-side NFS arrangements a few hours ago in a note to Jeff Layton.
(I'm afraid I didn't catch your email until just now - NFS problems,
you know). In summary, whereas I do have two NFS servers, only one has
the filesystems mounted and exported at a time. The other just sees the
underlying drbd device in secondary mode. It merely keeps the data on
the block device up-to-date for when I cut over to it. I pretty much
never do this unless I wish to reboot the active NFS server. To all
intents and purposes, I only have one NFS server. I purposefully didn't
use primary-primary drbd replication with OCFS or GFS2 because it
was all too new when I set this up.

As far as NFS client arrangements are concerned, both of the NFS server
machines also function as NFS clients, so /home/larry works on them in
the same way as it does on any other NFS client on the network. It is
just that the NFS servers also run my postfix MTAs. In turn, postfix
delivers mail to my (multiple) inboxes under /home/larry/Mail/whatever.

Mail comes in from my perimeter mail boxes is in round-robin fashion to
both the NFS server/client/postfix MTA machines, so inevitably yes, both
of these machines automount /home/larry most of the time, but this is no
different from, say my desktop computer which is also
mounting /home/larry.

The point about email delivery is that in this arrangement both my NFS
server computers, with purely their NFS client hats on, contend to
deliver messages into the same mail files on the same filesystem served
by just one NFS server. For instance, the linux-nfs mailing list traffic
all goes into one file. This often causes a lot of thumb-twiddling
whilst the procmails try to get the lock on the mail file. It's all
happens too fast for me to notice, but I'm sure procmail would rather
be sunning itself on the beach or something.

The reason why email delivery seems to be a pretty consistent trigger
for this problem is: a) there's bugger all else going on with all these
NFS problems, b) the fact that the two NFS server/client/postfix boxes
and claws-mail on my desktop box are all investigating and modifying
the same files all the time, and c) I'm deliberately holding back
inbound mail and releasing it in large batches to try to exercise this
problem.

The odd thing is that I haven't (yet) had any problems with the
mailboxes themselves, only the state-files and caches that claws mail
keeps under /home/whatever/.clawsmail. These are only ever accessed
from my desktop machine.

Yours,

larry

2013-07-25 14:41:18

by Myklebust, Trond

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

T24gVGh1LCAyMDEzLTA3LTI1IGF0IDEwOjMzIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gVGh1LCAyNSBKdWwgMjAxMyAxNDoyNDozMCArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gVGh1LCAyMDEz
LTA3LTI1IGF0IDEwOjExIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiANCj4gPiA+IFdo
YXQgbWlnaHQgYmUgaGVscGZ1bCBpcyB0byBkbyBzb21lIG5ldHdvcmsgY2FwdHVyZXMgd2hlbiB0
aGUgcHJvYmxlbQ0KPiA+ID4gb2NjdXJzLiBXaGF0IHdlIHdhbnQgdG8ga25vdyBpcyB3aGV0aGVy
IHRoZSBFU1RBTEUgZXJyb3JzIGFyZSBjb21pbmcNCj4gPiA+IGZyb20gdGhlIHNlcnZlciwgb3Ig
aWYgdGhlIGNsaWVudCBpcyBnZW5lcmF0aW5nIHRoZW0uIFRoYXQnbGwgbmFycm93DQo+ID4gPiBk
b3duIHdoZXJlIHdlIG5lZWQgdG8gbG9vayBmb3IgcHJvYmxlbXMuDQo+ID4gDQo+ID4gSG1tLi4u
IFNob3VsZG4ndCBFU1RBTEUgYWx3YXlzIGJlIHJlcGFja2FnZWQgYXMgRU5PRU5UIGJ5IHRoZSBW
RlMsIG5vdw0KPiA+IHRoYXQgeW91ciBwYXRjaHNldCBoYXMgZ29uZSB1cHN0cmVhbSwgSmVmZj8N
Cj4gPiANCj4gDQo+IEkgZG9uJ3QgdGhpbmsgc28uLi4NCj4gDQo+IE9uIHNvbWV0aGluZyBwYXRo
LWJhc2VkIHRoZW4gdGhhdCBtaWdodCBtYWtlIHNlbnNlIChvciBtYXliZSB3ZSBzaG91bGQNCj4g
ZGVjbGFyZSBhIG5ldyBFUkFDRSBlcnJvciBsaWtlIEFsIG9uY2Ugc3VnZ2VzdGVkIGFuZCByZXR1
cm4gdGhhdCkuIElmDQo+IHlvdSdyZSBkb2luZyBhIHdyaXRlKCkgb24gYSBmZCB0aGF0IHlvdSBw
cmV2aW91c2x5IG9wZW5lZCBidXQgdGhlIGlub2RlDQo+IGhhcyBkaXNhcHBlYXJlZCBvbiB0aGUg
c2VydmVyLCB0aGVuIC1FU1RBTEUgY2xlYXJseSBzZWVtcyB2YWxpZC4NCg0KRUJBREYgd291bGQg
YmUgYSB2YWxpZCBQT1NJWCBhbHRlcm5hdGl2ZS4gWW91ciBmaWxlIGRlc2NyaXB0b3IgaXMNCmNs
ZWFybHkgaW52YWxpZCBpZiB0aGVyZSBpcyBubyBmaWxlLi4NCg0KPiBUaGVyZSBhcmUgb3RoZXIg
cHJvYmxlbWF0aWMgY2FzZXMgdG9vLi4uDQo+IA0KPiBTdXBwb3NlIEkgZG8gc3RhdCgiLiIsIC4u
Lik7ID8gRG9lcyBhbiAtRU5PRU5UIGVycm9yIG1ha2Ugc2Vuc2UgYXQgdGhhdCBwb2ludD8NCj4g
DQoNCg0KT24gYW4gWEZTIHBhcnRpdGlvbjoNCiAgICAgICAgW3Ryb25kbXlAbGVpcmEgdG1wXSQg
bWtkaXIgZ251cnINCiAgICAgICAgW3Ryb25kbXlAbGVpcmEgdG1wXSQgY2QgZ251cnINCiAgICAg
ICAgW3Ryb25kbXlAbGVpcmEgZ251cnJdJCBybWRpciAuLi9nbnVycg0KICAgICAgICBbdHJvbmRt
eUBsZWlyYSBnbnVycl0kIHB3ZCAtUA0KICAgICAgICBwd2Q6IGVycm9yIHJldHJpZXZpbmcgY3Vy
cmVudCBkaXJlY3Rvcnk6IGdldGN3ZDogY2Fubm90IGFjY2VzcyBwYXJlbnQgZGlyZWN0b3JpZXM6
IE5vIHN1Y2ggZmlsZSBvciBkaXJlY3RvcnkNCiAgICAgICAgDQpTbyB5ZXMsIGl0J3MgYWN0dWFs
bHkgdGhlIHByZWZlcnJlZCBlcnJvciBmb3IgbW9zdCBmaWxlc3lzdGVtcy4NCg0KPiBBbHNvLCBz
aW5jZSB3ZSBvbmx5IHJldHJ5IG9uY2Ugb24gYW4gRVNUQUxFIGVycm9yLCByZXR1cm5pbmcgdGhh
dCBpcyBhDQo+IHByZXR0eSBjbGVhciBpbmRpY2F0b3IgdGhhdCB5b3UgcmFjZWQgd2l0aCBzb21l
IG90aGVyIG1ldGFkYXRhDQo+IG9wZXJhdGlvbnMuIEVOT0VOVCBpcyBub3QgYXMgaW5mb3JtYXRp
dmUuLi4NCj4gDQoNCkFncmVlZCwgYnV0IEVTVEFMRSBpcyBub3QgYSB2YWxpZCBQT1NJWCBlcnJv
ciwgc28gaXQgaXMgdGhlb3JldGljYWxseQ0Kbm9uLXBvcnRhYmxlLg0KDQotLSANClRyb25kIE15
a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlr
bGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQo=

2013-07-26 15:02:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, Jul 26, 2013 at 09:12:25AM -0400, Jeff Layton wrote:
> On Fri, 26 Jul 2013 12:41:01 +0000
> Larry Keegan <[email protected]> wrote:
>
> > On Thu, 25 Jul 2013 14:18:28 -0400
> > Jeff Layton <[email protected]> wrote:
> > > On Thu, 25 Jul 2013 17:05:26 +0000
> > > Larry Keegan <[email protected]> wrote:
> > >
> > > > On Thu, 25 Jul 2013 10:11:43 -0400
> > > > Jeff Layton <[email protected]> wrote:
> > > > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > > > Larry Keegan <[email protected]> wrote:
> > > > >
> > > > > > Dear Chaps,
> > > > > >
> > > > > > I am experiencing some inexplicable NFS behaviour which I would
> > > > > > like to run past you.
> > > > >
> > > > > Were these machines running older kernels before this started
> > > > > happening? What kernel did you upgrade from if so?
> > > > >
> > > > [snip out my long rambling reply]
> > > > > What might be helpful is to do some network captures when the
> > > > > problem occurs. What we want to know is whether the ESTALE errors
> > > > > are coming from the server, or if the client is generating them.
> > > > > That'll narrow down where we need to look for problems.
> > > >
> > > > As it was giving me gyp during typing I tried to capture some NFS
> > > > traffic. Unfortunately claws-mail started a mail box check in the
> > > > middle of this and the problem disappeared! Normally it's claws
> > > > which starts this. It'll come along again soon enough and I'll send
> > > > a trace.
> > > >
> > > Ok, we had a number of changes to how ESTALE errors are handled over
> > > the last few releases. When you mentioned 3.10, I had assumed that you
> > > might be hitting a regression in one of those, but those went in well
> > > after the 3.4 series.
> > >
> > > Captures are probably your best bet. My suspicion is that the server
> > > is returning these ESTALE errors occasionally, but it would be best to
> > > have you confirm that. They may also help make sense of why it's
> > > occurring...
> > >
> >
> > Dear Jeff,
> >
> > I now have a good and a bad packet capture. I can run them through
> > tshark -V but if I do this, they're really long, so I'm wondering how
> > best to post them. I've posted the summaries below.
> >
> > The set-up is as follows: I'm running a few xterms on my desktop (the
> > affected client) as well as claws-mail using the mailmbox plugin.
> > Claws keeps a cache of the mailbox in .clawsmail/tagsdb/<foldername>.
> > From time to time I blast a load of mail into these mail boxes using
> > procmail. This seems to demonstrate the problem most of the time. After
> > a few minutes everything gets back to normal.
> >
> > The actual mail is being delivered on my file server pair directly
> > into /home/larry/Mail/<foldername>. Both file servers use automount to
> > mount the same filesystem

Wait, I'm confused: that sounds like you're mounting the same ext4
filesystem from two different machines?

--b.

> > and attempt to deliver mail into the boxes
> > simultaneously. Clearly the .lock files stop them stomping on each
> > other. This works well.
> >
> > When it's in the mood to work, the test session on my desktop looks
> > like this:
> >
> > # ls .claws-mail/tagsdb
> > #mailmbox #mh
> > # _
> >
> > When it doesn't it looks like this:
> >
> > # ls .claws-mail/tagsdb
> > ls: cannot open directory .claws-mail/tagsdb: Stale NFS file handle
> > # _
> >
> > I captured the packets on the network desktop. All else was quiet on
> > the network, at least as far as TCP traffic was concerned. Here are the
> > summaries:
> >
> > # tshark -r good tcp
> > 10 1.304139000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
> > 11 1.304653000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 10) ACCESS, [Allowed: RD LU MD XT DL]
> > 12 1.304694000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=173 Ack=129 Win=3507 Len=0 TSval=119293240 TSecr=440910222
> > 13 1.304740000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> > 14 1.305225000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 13) LOOKUP
> > 15 1.305283000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
> > 16 1.305798000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 15) ACCESS, [Allowed: RD LU MD XT DL]
> > 17 1.305835000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> > 18 1.306330000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 17) LOOKUP
> > 19 1.306373000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0x445c531a
> > 20 1.306864000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 19) GETATTR
> > 21 1.346003000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=877 Ack=941 Win=3507 Len=0 TSval=119293282 TSecr=440910225
> > # tshark -r bad tcp
> > 14 2.078769000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x76aee435, [Check: RD LU MD XT DL]
> > 15 2.079266000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 14) ACCESS, [Allowed: RD LU MD XT DL]
> > 16 2.079296000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=173 Ack=129 Win=3507 Len=0 TSval=180576023 TSecr=502193004
> > 17 2.079338000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
> > 18 2.079797000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 17) ACCESS, [Allowed: RD LU MD XT DL]
> > 19 2.079834000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> > 20 2.080331000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 19) GETATTR
> > 21 2.080410000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> > 22 2.080903000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 21) LOOKUP
> > 23 2.080982000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> > 24 2.081477000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 23) GETATTR
> > 25 2.081509000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> > 26 2.082010000 10.1.1.173 -> 10.1.1.139 NFS 178 V4 Reply (Call In 25) GETATTR
> > 27 2.082040000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> > 28 2.082542000 10.1.1.173 -> 10.1.1.139 NFS 142 V4 Reply (Call In 27) GETATTR
> > 29 2.089525000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> > 30 2.089996000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 29) GETATTR
> > 31 2.090028000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> > 32 2.090529000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 31) GETATTR
> > 33 2.090577000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0x4e5465ab
> > 34 2.091061000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 33) GETATTR
> > 35 2.091110000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
> > 36 2.091593000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 35) LOOKUP
> > 37 2.091657000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> > 38 2.092126000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 37) GETATTR
> > 39 2.092157000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> > 40 2.092658000 10.1.1.173 -> 10.1.1.139 NFS 178 V4 Reply (Call In 39) GETATTR
> > 41 2.092684000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> > 42 2.093150000 10.1.1.173 -> 10.1.1.139 NFS 142 V4 Reply (Call In 41) GETATTR
> > 43 2.100520000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
> > 44 2.101014000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 43) GETATTR
> > 45 2.101040000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
> > 46 2.101547000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 45) GETATTR
> > 47 2.141500000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=2657 Ack=2289 Win=3507 Len=0 TSval=180576086 TSecr=502193026
> > # _
> >
> > The first thing that strikes me is the bad trace is much longer. This
> > strikes me as reasonable because as well as the ESTALE problem I've
> > noticed that the whole system seems sluggish. claws-mail is
> > particularly so because it keeps saving my typing into a drafts
> > mailbox, and because claws doesn't really understand traditional
> > mboxes, it spends an inordinate amount of time locking and unlocking
> > the boxes for each message in them. Claws also spews tracebacks
> > frequently and it crashes from time to time, something it never did
> > before the ESTALE problem occurred.
> >
> > Yours,
> >
> > Larry
>
> I'm afraid I can't tell much from the above output. I don't see any
> ESTALE errors there, but you can get similar issues if (for instance)
> certain attributes of a file change. You mentioned that this is a DRBD
> cluster, are you "floating" IP addresses between cluster nodes here? If
> so, do your problems occur around the times that that's happening?
>
> Also, what sort of filesystem is being exported here?
>
> --
> Jeff Layton <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-07-26 14:59:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Thu, Jul 25, 2013 at 05:05:26PM +0000, Larry Keegan wrote:
> On Thu, 25 Jul 2013 10:11:43 -0400
> Jeff Layton <[email protected]> wrote:
> > On Thu, 25 Jul 2013 13:45:15 +0000
> > Larry Keegan <[email protected]> wrote:
> >
> > > Dear Chaps,
> > >
> > > I am experiencing some inexplicable NFS behaviour which I would
> > > like to run past you.
> > >
> > > I have a linux NFS server running kernel 3.10.2 and some clients
> > > running the same. The server is actually a pair of identical
> > > machines serving up a small number of ext4 filesystems atop drbd.
> > > They don't do much apart from serve home directories and deliver
> > > mail into them. These have worked just fine for aeons.
> > >
> > > The problem I am seeing is that for the past month or so, on and
> > > off, one NFS client starts reporting stale NFS file handles on some
> > > part of the directory tree exported by the NFS server. During the
> > > outage the other parts of the same export remain unaffected. Then,
> > > some ten minutes to an hour later they're back to normal. Access to
> > > the affected sub-directories remains possible from the server (both
> > > directly and via nfs) and from other clients. There do not appear
> > > to be any errors on the underlying ext4 filesystems.
> > >
> > > Each NFS client seems to get the heebie-jeebies over some directory
> > > or other pretty much independently. The problem affects all of the
> > > filesystems exported by the NFS server, but clearly I notice it
> > > first in home directories, and in particular in my dot
> > > subdirectories for things like my mail client and browser. I'd say
> > > something's up the spout about 20% of the time.

And the problem affects just that one directory? Ohter files and
directories on the same filesystem continue to be accessible?

> > > The server and clients are using nfs4, although for a while I tried
> > > nfs3 without any appreciable difference. I do not have
> > > CONFIG_FSCACHE set.
> > >
> > > I wonder if anyone could tell me if they have ever come across this
> > > before, or what debugging settings might help me diagnose the
> > > problem?
> > >
> > > Yours,
> > >
> > > Larry
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-nfs" in the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> > Were these machines running older kernels before this started
> > happening? What kernel did you upgrade from if so?
> >
>
> Dear Jeff,
>
> The full story is this:
>
> I had a pair of boxes running kernel 3.4.3 with the aforementioned drbd
> pacemaker malarkey and some clients running the same.
>
> Then I upgraded the machines by moving from plain old dos partitions to
> gpt. This necessitated a complete reload of everything, but there were
> no software changes. I can be sure that nothing else was changed
> because I build my entire operating system in one ginormous makefile.
>
> Rapidly afterwards I switched the motherboards for ones with more PCI
> slots. There were no software changes except those relating to MAC
> addresses.
>
> Next I moved from 100Mbit to gigabit hubs. Then the problems started.

So both the "good" and "bad" behavior were seen with the same 3.4.3
kernel?

...
> Anyway, to cut a long story short, this problem seemed to me to be a
> file server problem so I replaced network cards, swapped hubs,

Including reverting back to your original configuration with 100Mbit
hubs?

--b.

2013-07-26 23:21:17

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, 26 Jul 2013 10:59:37 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Thu, Jul 25, 2013 at 05:05:26PM +0000, Larry Keegan wrote:
> > On Thu, 25 Jul 2013 10:11:43 -0400
> > Jeff Layton <[email protected]> wrote:
> > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > Larry Keegan <[email protected]> wrote:
> > >
> > > > Dear Chaps,
> > > >
> > > > I am experiencing some inexplicable NFS behaviour which I would
> > > > like to run past you.
> > > >
> > > > I have a linux NFS server running kernel 3.10.2 and some clients
> > > > running the same. The server is actually a pair of identical
> > > > machines serving up a small number of ext4 filesystems atop
> > > > drbd. They don't do much apart from serve home directories and
> > > > deliver mail into them. These have worked just fine for aeons.
> > > >
> > > > The problem I am seeing is that for the past month or so, on and
> > > > off, one NFS client starts reporting stale NFS file handles on
> > > > some part of the directory tree exported by the NFS server.
> > > > During the outage the other parts of the same export remain
> > > > unaffected. Then, some ten minutes to an hour later they're
> > > > back to normal. Access to the affected sub-directories remains
> > > > possible from the server (both directly and via nfs) and from
> > > > other clients. There do not appear to be any errors on the
> > > > underlying ext4 filesystems.
> > > >
> > > > Each NFS client seems to get the heebie-jeebies over some
> > > > directory or other pretty much independently. The problem
> > > > affects all of the filesystems exported by the NFS server, but
> > > > clearly I notice it first in home directories, and in
> > > > particular in my dot subdirectories for things like my mail
> > > > client and browser. I'd say something's up the spout about 20%
> > > > of the time.
>
> And the problem affects just that one directory?

Yes. It's almost always .claws-mail/tagsdb. Sometimes
it's .claws-mail/mailmboxcache and sometimes it's (what you would
call) .mozilla. I suspect this is because very little else is being
actively changed.

> Ohter files and
> directories on the same filesystem continue to be accessible?

Spot on. Furthermore, whilst one client is returning ESTALE the others
are able to see and modify those same files as if there were no
problems at all.

After however long it takes the client which was getting ESTALE on
those directories is back to normal. The client sees the latest version
of the files if those files have been changed by another client in the
meantime. IOW if I hadn't been there when the ESTALE had happened, I'd
never have noticed.

However, if another client (or the server itself with its client hat
on) starts to experience ESTALE on some directories or others, their
errors can start and end completely independently. So, for instance I
might have /home/larry/this/that inaccessible on one NFS client,
/home/larry/the/other inaccessible on another NFS client, and
and /home/mary/quite/contrary on another NFS client. Each one bobs up
and down with no apparent timing relationship with the others.

> > > > The server and clients are using nfs4, although for a while I
> > > > tried nfs3 without any appreciable difference. I do not have
> > > > CONFIG_FSCACHE set.
> > > >
> > > > I wonder if anyone could tell me if they have ever come across
> > > > this before, or what debugging settings might help me diagnose
> > > > the problem?
> > > Were these machines running older kernels before this started
> > > happening? What kernel did you upgrade from if so?
> > The full story is this:
> >
> > I had a pair of boxes running kernel 3.4.3 with the aforementioned
> > drbd pacemaker malarkey and some clients running the same.
> >
> > Then I upgraded the machines by moving from plain old dos
> > partitions to gpt. This necessitated a complete reload of
> > everything, but there were no software changes. I can be sure that
> > nothing else was changed because I build my entire operating system
> > in one ginormous makefile.
> >
> > Rapidly afterwards I switched the motherboards for ones with more
> > PCI slots. There were no software changes except those relating to
> > MAC addresses.
> >
> > Next I moved from 100Mbit to gigabit hubs. Then the problems
> > started.
>
> So both the "good" and "bad" behavior were seen with the same 3.4.3
> kernel?

Yes. I'm now running 3.10.2, but yes, 3.10.1, 3.10, 3.4.4 and 3.4.3
all exhibit the same behaviour. I was running 3.10.2 when I made the
network captures I spoke of.

However, when I first noticed the problem with kernel 3.4.3 it affected
several filesystems and I thought the machines needed to be rebooted,
but since then I've been toughing it out. I don't suppose the
character of the problem has changed at all, but my experience of it
has, if that makes sense.

> > Anyway, to cut a long story short, this problem seemed to me to be a
> > file server problem so I replaced network cards, swapped hubs,
>
> Including reverting back to your original configuration with 100Mbit
> hubs?

No, guilty as charged. I haven't swapped back the /original/
hubs, and I haven't reconstructed the old hardware arrangement exactly
(it's a little difficult because those parts are now in use elsewhere),
but I've done what I considered to be equivalent tests. I'll do some
more swapping and see if I can shake something out.

Thank you for your suggestions.

Yours,

Larry.

2013-07-25 14:33:30

by Jeff Layton

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Thu, 25 Jul 2013 14:24:30 +0000
"Myklebust, Trond" <[email protected]> wrote:

> On Thu, 2013-07-25 at 10:11 -0400, Jeff Layton wrote:
>
> > What might be helpful is to do some network captures when the problem
> > occurs. What we want to know is whether the ESTALE errors are coming
> > from the server, or if the client is generating them. That'll narrow
> > down where we need to look for problems.
>
> Hmm... Shouldn't ESTALE always be repackaged as ENOENT by the VFS, now
> that your patchset has gone upstream, Jeff?
>

I don't think so...

On something path-based then that might make sense (or maybe we should
declare a new ERACE error like Al once suggested and return that). If
you're doing a write() on a fd that you previously opened but the inode
has disappeared on the server, then -ESTALE clearly seems valid.

There are other problematic cases too...

Suppose I do stat(".", ...); ? Does an -ENOENT error make sense at that point?

Also, since we only retry once on an ESTALE error, returning that is a
pretty clear indicator that you raced with some other metadata
operations. ENOENT is not as informative...

--
Jeff Layton <[email protected]>

2013-07-26 12:41:08

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Thu, 25 Jul 2013 14:18:28 -0400
Jeff Layton <[email protected]> wrote:
> On Thu, 25 Jul 2013 17:05:26 +0000
> Larry Keegan <[email protected]> wrote:
>
> > On Thu, 25 Jul 2013 10:11:43 -0400
> > Jeff Layton <[email protected]> wrote:
> > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > Larry Keegan <[email protected]> wrote:
> > >
> > > > Dear Chaps,
> > > >
> > > > I am experiencing some inexplicable NFS behaviour which I would
> > > > like to run past you.
> > >
> > > Were these machines running older kernels before this started
> > > happening? What kernel did you upgrade from if so?
> > >
> > [snip out my long rambling reply]
> > > What might be helpful is to do some network captures when the
> > > problem occurs. What we want to know is whether the ESTALE errors
> > > are coming from the server, or if the client is generating them.
> > > That'll narrow down where we need to look for problems.
> >
> > As it was giving me gyp during typing I tried to capture some NFS
> > traffic. Unfortunately claws-mail started a mail box check in the
> > middle of this and the problem disappeared! Normally it's claws
> > which starts this. It'll come along again soon enough and I'll send
> > a trace.
> >
> Ok, we had a number of changes to how ESTALE errors are handled over
> the last few releases. When you mentioned 3.10, I had assumed that you
> might be hitting a regression in one of those, but those went in well
> after the 3.4 series.
>
> Captures are probably your best bet. My suspicion is that the server
> is returning these ESTALE errors occasionally, but it would be best to
> have you confirm that. They may also help make sense of why it's
> occurring...
>

Dear Jeff,

I now have a good and a bad packet capture. I can run them through
tshark -V but if I do this, they're really long, so I'm wondering how
best to post them. I've posted the summaries below.

The set-up is as follows: I'm running a few xterms on my desktop (the
affected client) as well as claws-mail using the mailmbox plugin.
Claws keeps a cache of the mailbox in .clawsmail/tagsdb/<foldername>.
>From time to time I blast a load of mail into these mail boxes using
procmail. This seems to demonstrate the problem most of the time. After
a few minutes everything gets back to normal.

The actual mail is being delivered on my file server pair directly
into /home/larry/Mail/<foldername>. Both file servers use automount to
mount the same filesystem and attempt to deliver mail into the boxes
simultaneously. Clearly the .lock files stop them stomping on each
other. This works well.

When it's in the mood to work, the test session on my desktop looks
like this:

# ls .claws-mail/tagsdb
#mailmbox #mh
# _

When it doesn't it looks like this:

# ls .claws-mail/tagsdb
ls: cannot open directory .claws-mail/tagsdb: Stale NFS file handle
# _

I captured the packets on the network desktop. All else was quiet on
the network, at least as far as TCP traffic was concerned. Here are the
summaries:

# tshark -r good tcp
10 1.304139000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
11 1.304653000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 10) ACCESS, [Allowed: RD LU MD XT DL]
12 1.304694000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=173 Ack=129 Win=3507 Len=0 TSval=119293240 TSecr=440910222
13 1.304740000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
14 1.305225000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 13) LOOKUP
15 1.305283000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
16 1.305798000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 15) ACCESS, [Allowed: RD LU MD XT DL]
17 1.305835000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
18 1.306330000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 17) LOOKUP
19 1.306373000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0x445c531a
20 1.306864000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 19) GETATTR
21 1.346003000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=877 Ack=941 Win=3507 Len=0 TSval=119293282 TSecr=440910225
# tshark -r bad tcp
14 2.078769000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x76aee435, [Check: RD LU MD XT DL]
15 2.079266000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 14) ACCESS, [Allowed: RD LU MD XT DL]
16 2.079296000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=173 Ack=129 Win=3507 Len=0 TSval=180576023 TSecr=502193004
17 2.079338000 10.1.1.139 -> 10.1.1.173 NFS 238 V4 Call ACCESS FH:0x4e5465ab, [Check: RD LU MD XT DL]
18 2.079797000 10.1.1.173 -> 10.1.1.139 NFS 194 V4 Reply (Call In 17) ACCESS, [Allowed: RD LU MD XT DL]
19 2.079834000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
20 2.080331000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 19) GETATTR
21 2.080410000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
22 2.080903000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 21) LOOKUP
23 2.080982000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
24 2.081477000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 23) GETATTR
25 2.081509000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
26 2.082010000 10.1.1.173 -> 10.1.1.139 NFS 178 V4 Reply (Call In 25) GETATTR
27 2.082040000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
28 2.082542000 10.1.1.173 -> 10.1.1.139 NFS 142 V4 Reply (Call In 27) GETATTR
29 2.089525000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
30 2.089996000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 29) GETATTR
31 2.090028000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
32 2.090529000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 31) GETATTR
33 2.090577000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0x4e5465ab
34 2.091061000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 33) GETATTR
35 2.091110000 10.1.1.139 -> 10.1.1.173 NFS 250 V4 Call LOOKUP DH:0x4e5465ab/tagsdb
36 2.091593000 10.1.1.173 -> 10.1.1.139 NFS 310 V4 Reply (Call In 35) LOOKUP
37 2.091657000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
38 2.092126000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 37) GETATTR
39 2.092157000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
40 2.092658000 10.1.1.173 -> 10.1.1.139 NFS 178 V4 Reply (Call In 39) GETATTR
41 2.092684000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
42 2.093150000 10.1.1.173 -> 10.1.1.139 NFS 142 V4 Reply (Call In 41) GETATTR
43 2.100520000 10.1.1.139 -> 10.1.1.173 NFS 226 V4 Call GETATTR FH:0xb12cdc45
44 2.101014000 10.1.1.173 -> 10.1.1.139 NFS 162 V4 Reply (Call In 43) GETATTR
45 2.101040000 10.1.1.139 -> 10.1.1.173 NFS 230 V4 Call GETATTR FH:0xb12cdc45
46 2.101547000 10.1.1.173 -> 10.1.1.139 NFS 262 V4 Reply (Call In 45) GETATTR
47 2.141500000 10.1.1.139 -> 10.1.1.173 TCP 66 gdoi > nfs [ACK] Seq=2657 Ack=2289 Win=3507 Len=0 TSval=180576086 TSecr=502193026
# _

The first thing that strikes me is the bad trace is much longer. This
strikes me as reasonable because as well as the ESTALE problem I've
noticed that the whole system seems sluggish. claws-mail is
particularly so because it keeps saving my typing into a drafts
mailbox, and because claws doesn't really understand traditional
mboxes, it spends an inordinate amount of time locking and unlocking
the boxes for each message in them. Claws also spews tracebacks
frequently and it crashes from time to time, something it never did
before the ESTALE problem occurred.

Yours,

Larry

2013-07-25 14:24:31

by Myklebust, Trond

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

T24gVGh1LCAyMDEzLTA3LTI1IGF0IDEwOjExIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCg0K
PiBXaGF0IG1pZ2h0IGJlIGhlbHBmdWwgaXMgdG8gZG8gc29tZSBuZXR3b3JrIGNhcHR1cmVzIHdo
ZW4gdGhlIHByb2JsZW0NCj4gb2NjdXJzLiBXaGF0IHdlIHdhbnQgdG8ga25vdyBpcyB3aGV0aGVy
IHRoZSBFU1RBTEUgZXJyb3JzIGFyZSBjb21pbmcNCj4gZnJvbSB0aGUgc2VydmVyLCBvciBpZiB0
aGUgY2xpZW50IGlzIGdlbmVyYXRpbmcgdGhlbS4gVGhhdCdsbCBuYXJyb3cNCj4gZG93biB3aGVy
ZSB3ZSBuZWVkIHRvIGxvb2sgZm9yIHByb2JsZW1zLg0KDQpIbW0uLi4gU2hvdWxkbid0IEVTVEFM
RSBhbHdheXMgYmUgcmVwYWNrYWdlZCBhcyBFTk9FTlQgYnkgdGhlIFZGUywgbm93DQp0aGF0IHlv
dXIgcGF0Y2hzZXQgaGFzIGdvbmUgdXBzdHJlYW0sIEplZmY/DQoNCi0tIA0KVHJvbmQgTXlrbGVi
dXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1
c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg==

2013-08-06 15:38:31

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Tue, 6 Aug 2013 09:34:07 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Tue, Aug 06, 2013 at 07:14:28AM -0400, Jeff Layton wrote:
> > On Tue, 6 Aug 2013 11:02:09 +0000
> > Larry Keegan <[email protected]> wrote:
> > > These figures seem reasonable for a single SATA HDD in concert
> > > with dmcrypt. Whilst I expected some degradation from exporting
> > > and mounting sync, I have to say that I'm truly flabbergasted by
> > > the difference between the sync and async figures. I can't help
> > > but think I am still suffering from some sort of configuration
> > > problem. Do the numbers from the NFS client seem unreasonable?
> > >
> >
> > That's expected. Performance is the tradeoff for tight cache
> > coherency.
> >
> > With -o sync, each write() sycall requires a round trip to the
> > server. They don't get batched and you can't issue them in
> > parallel. That has a terrible effect on write performance.
>
> Note also mounting -osync and exporting with the "sync" export option
> are entirely different things.
>
> The defaults are what you want (async mount option, sync export
> option) unless you've thought hard about it. This is especially true
> on the server, since the "async" export option make it skip
> committing data to disk even when the protocol mandates it.

Righto. This makes sense. After a brief pause I'll see if I can tickle
the ESTALE problem under NFS 4.

With many thanks.

Yours,

Larry.

2013-08-06 13:34:11

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Tue, Aug 06, 2013 at 07:14:28AM -0400, Jeff Layton wrote:
> On Tue, 6 Aug 2013 11:02:09 +0000
> Larry Keegan <[email protected]> wrote:
> > These figures seem reasonable for a single SATA HDD in concert
> > with dmcrypt. Whilst I expected some degradation from exporting and
> > mounting sync, I have to say that I'm truly flabbergasted by the
> > difference between the sync and async figures. I can't help but
> > think I am still suffering from some sort of configuration
> > problem. Do the numbers from the NFS client seem unreasonable?
> >
>
> That's expected. Performance is the tradeoff for tight cache coherency.
>
> With -o sync, each write() sycall requires a round trip to the server.
> They don't get batched and you can't issue them in parallel. That has a
> terrible effect on write performance.

Note also mounting -osync and exporting with the "sync" export option
are entirely different things.

The defaults are what you want (async mount option, sync export option)
unless you've thought hard about it. This is especially true on the
server, since the "async" export option make it skip committing data to
disk even when the protocol mandates it.

--b.

2013-08-06 11:02:13

by Larry Keegan

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, 26 Jul 2013 23:21:11 +0000
Larry Keegan <[email protected]> wrote:
> On Fri, 26 Jul 2013 10:59:37 -0400
> "J. Bruce Fields" <[email protected]> wrote:
> > On Thu, Jul 25, 2013 at 05:05:26PM +0000, Larry Keegan wrote:
> > > On Thu, 25 Jul 2013 10:11:43 -0400
> > > Jeff Layton <[email protected]> wrote:
> > > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > > Larry Keegan <[email protected]> wrote:
> > > >
> > > > > Dear Chaps,
> > > > >
> > > > > I am experiencing some inexplicable NFS behaviour which I
> > > > > would like to run past you.
> > > > >
> > > > > I have a linux NFS server running kernel 3.10.2 and some
> > > > > clients running the same. The server is actually a pair of
> > > > > identical machines serving up a small number of ext4
> > > > > filesystems atop drbd. They don't do much apart from serve
> > > > > home directories and deliver mail into them. These have
> > > > > worked just fine for aeons.
> > > > >
> > > > > The problem I am seeing is that for the past month or so, on
> > > > > and off, one NFS client starts reporting stale NFS file
> > > > > handles on some part of the directory tree exported by the
> > > > > NFS server. During the outage the other parts of the same
> > > > > export remain unaffected. Then, some ten minutes to an hour
> > > > > later they're back to normal. Access to the affected
> > > > > sub-directories remains possible from the server (both
> > > > > directly and via nfs) and from other clients. There do not
> > > > > appear to be any errors on the underlying ext4 filesystems.
> > > > >
> > > > > Each NFS client seems to get the heebie-jeebies over some
> > > > > directory or other pretty much independently. The problem
> > > > > affects all of the filesystems exported by the NFS server, but
> > > > > clearly I notice it first in home directories, and in
> > > > > particular in my dot subdirectories for things like my mail
> > > > > client and browser. I'd say something's up the spout about 20%
> > > > > of the time.
> >
> > And the problem affects just that one directory?
>
> Yes. It's almost always .claws-mail/tagsdb. Sometimes
> it's .claws-mail/mailmboxcache and sometimes it's (what you would
> call) .mozilla. I suspect this is because very little else is being
> actively changed.
>
> > Ohter files and
> > directories on the same filesystem continue to be accessible?
>
> Spot on. Furthermore, whilst one client is returning ESTALE the others
> are able to see and modify those same files as if there were no
> problems at all.
>
> After however long it takes the client which was getting ESTALE on
> those directories is back to normal. The client sees the latest
> version of the files if those files have been changed by another
> client in the meantime. IOW if I hadn't been there when the ESTALE
> had happened, I'd never have noticed.
>
> However, if another client (or the server itself with its client hat
> on) starts to experience ESTALE on some directories or others, their
> errors can start and end completely independently. So, for instance I
> might have /home/larry/this/that inaccessible on one NFS client,
> /home/larry/the/other inaccessible on another NFS client, and
> and /home/mary/quite/contrary on another NFS client. Each one bobs up
> and down with no apparent timing relationship with the others.
>
> > > > > The server and clients are using nfs4, although for a while I
> > > > > tried nfs3 without any appreciable difference. I do not have
> > > > > CONFIG_FSCACHE set.
> > > > >
> > > > > I wonder if anyone could tell me if they have ever come across
> > > > > this before, or what debugging settings might help me diagnose
> > > > > the problem?
> > > > Were these machines running older kernels before this started
> > > > happening? What kernel did you upgrade from if so?
> > > The full story is this:
> > >
> > > I had a pair of boxes running kernel 3.4.3 with the aforementioned
> > > drbd pacemaker malarkey and some clients running the same.
> > >
> > > Then I upgraded the machines by moving from plain old dos
> > > partitions to gpt. This necessitated a complete reload of
> > > everything, but there were no software changes. I can be sure that
> > > nothing else was changed because I build my entire operating
> > > system in one ginormous makefile.
> > >
> > > Rapidly afterwards I switched the motherboards for ones with more
> > > PCI slots. There were no software changes except those relating to
> > > MAC addresses.
> > >
> > > Next I moved from 100Mbit to gigabit hubs. Then the problems
> > > started.
> >
> > So both the "good" and "bad" behavior were seen with the same 3.4.3
> > kernel?
>
> Yes. I'm now running 3.10.2, but yes, 3.10.1, 3.10, 3.4.4 and 3.4.3
> all exhibit the same behaviour. I was running 3.10.2 when I made the
> network captures I spoke of.
>
> However, when I first noticed the problem with kernel 3.4.3 it
> affected several filesystems and I thought the machines needed to be
> rebooted, but since then I've been toughing it out. I don't suppose
> the character of the problem has changed at all, but my experience of
> it has, if that makes sense.
>
> > > Anyway, to cut a long story short, this problem seemed to me to
> > > be a file server problem so I replaced network cards, swapped
> > > hubs,
> >
> > Including reverting back to your original configuration with 100Mbit
> > hubs?
>
> No, guilty as charged. I haven't swapped back the /original/
> hubs, and I haven't reconstructed the old hardware arrangement exactly
> (it's a little difficult because those parts are now in use
> elsewhere), but I've done what I considered to be equivalent tests.
> I'll do some more swapping and see if I can shake something out.
>
> Thank you for your suggestions.

Dear Chaps,

I've spent the last few days doing a variety of tests and I'm convinced
now that my hardware changes have nothing to do with the problem, and
that it only occurs when I'm using NFS 4. As it stands all my boxes are
running 3.10.3, have NFS 4 enabled in kernel but all NFS mounts are
performed with -o nfsvers=3. Everything is stable.

When I claimed earlier that I still had problems despite using NFS 3,
I think that one of the computers was still using NFS 4 unbeknownst to
me. I'm sorry for spouting guff.

Part of my testing involved using bonnie++. I was more than interested
to note that with NFS 3 performance can be truly abysmal if an NFS export
has the sync option set and then a client mounts it with -o sync. This
is a typical example of my tests:

client# bonnie++ -s 8g -m async
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
async 8G 53912 85 76221 16 37415 9 42827 75 101754 5 201.6 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 9006 47 +++++ +++ 13676 40 8410 44 +++++ +++ 14587 39
async,8G,53912,85,76221,16,37415,9,42827,75,101754,5,201.6,0,16,9006,47,+++++,+++,13676,40,8410,44,+++++,+++,14587,39

client# bonnie++ -s 8g -m sync
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
sync 8G 16288 29 3816 0 4358 1 55449 98 113439 6 344.2 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 922 4 29133 12 1809 4 918 4 2066 5 1907 4
sync,8G,16288,29,3816,0,4358,1,55449,98,113439,6,344.2,1,16,922,4,29133,12,1809,4,918,4,2066,5,1907,4

The above tests were conducted on the same client machine, having
4x2.5GHz CPU and 4GB of RAM, and against a server with 2x2.5GHz CPU
and 4GB of RAM. I'm using gigabit networking and have 0% packet loss.
The network is otherwise practically silent.

The underlying ext4 filesystem on the server, despite being encrypted
at the block device and mounted with -o barrier=1, yielded these
figures by way of comparison:

server# bonnie++ -s 8G -m raw
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
raw 8G 66873 98 140602 17 46965 7 38474 75 102117 10 227.7 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
raw,8G,66873,98,140602,17,46965,7,38474,75,102117,10,227.7,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

These figures seem reasonable for a single SATA HDD in concert
with dmcrypt. Whilst I expected some degradation from exporting and
mounting sync, I have to say that I'm truly flabbergasted by the
difference between the sync and async figures. I can't help but
think I am still suffering from some sort of configuration
problem. Do the numbers from the NFS client seem unreasonable?

Yours,

Larry.

2013-08-19 21:22:44

by Bruce Guenter

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Fri, Jul 26, 2013 at 10:59:37AM -0400, J. Bruce Fields wrote:
> On Thu, Jul 25, 2013 at 05:05:26PM +0000, Larry Keegan wrote:
> > On Thu, 25 Jul 2013 10:11:43 -0400
> > Jeff Layton <[email protected]> wrote:
> > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > Larry Keegan <[email protected]> wrote:
> > >
> > > > The problem I am seeing is that for the past month or so, on and
> > > > off, one NFS client starts reporting stale NFS file handles on some
> > > > part of the directory tree exported by the NFS server. During the
> > > > outage the other parts of the same export remain unaffected.
>
> And the problem affects just that one directory? Ohter files and
> directories on the same filesystem continue to be accessible?

FWIW I have also experienced the same problem. Randomly, some part of a
NFS (v4) mounted directory tree start return ESTALE, while the rest of
the tree remains accessable.

I am currently running linux 3.9.7 on the server, and have experienced
the problem on clients running linux 3.8.11 and 3.10. The underlying
filesystem is ext4 on dm-crypt.

Unlike the OP, I have not seen the behavior go away after a period of
time, although perhaps I didn't wait long enough. The only fix I found
is to unmount and re-mount (which gets to be a nuisance when ones' home
directory is NFS mounted).

FWIW During the most recent failure, I straced ls and noticed that the
stat on the failing directory works, but opening it failed. I don't know
if that's significant, but I didn't see it mentioned before.

--
Bruce Guenter <[email protected]> http://untroubled.org/


Attachments:
(No filename) (1.57 kB)
signature.asc (836.00 B)
Digital signature
Download all attachments

2013-08-06 11:13:51

by Jeff Layton

[permalink] [raw]
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

On Tue, 6 Aug 2013 11:02:09 +0000
Larry Keegan <[email protected]> wrote:

> On Fri, 26 Jul 2013 23:21:11 +0000
> Larry Keegan <[email protected]> wrote:
> > On Fri, 26 Jul 2013 10:59:37 -0400
> > "J. Bruce Fields" <[email protected]> wrote:
> > > On Thu, Jul 25, 2013 at 05:05:26PM +0000, Larry Keegan wrote:
> > > > On Thu, 25 Jul 2013 10:11:43 -0400
> > > > Jeff Layton <[email protected]> wrote:
> > > > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > > > Larry Keegan <[email protected]> wrote:
> > > > >
> > > > > > Dear Chaps,
> > > > > >
> > > > > > I am experiencing some inexplicable NFS behaviour which I
> > > > > > would like to run past you.
> > > > > >
> > > > > > I have a linux NFS server running kernel 3.10.2 and some
> > > > > > clients running the same. The server is actually a pair of
> > > > > > identical machines serving up a small number of ext4
> > > > > > filesystems atop drbd. They don't do much apart from serve
> > > > > > home directories and deliver mail into them. These have
> > > > > > worked just fine for aeons.
> > > > > >
> > > > > > The problem I am seeing is that for the past month or so, on
> > > > > > and off, one NFS client starts reporting stale NFS file
> > > > > > handles on some part of the directory tree exported by the
> > > > > > NFS server. During the outage the other parts of the same
> > > > > > export remain unaffected. Then, some ten minutes to an hour
> > > > > > later they're back to normal. Access to the affected
> > > > > > sub-directories remains possible from the server (both
> > > > > > directly and via nfs) and from other clients. There do not
> > > > > > appear to be any errors on the underlying ext4 filesystems.
> > > > > >
> > > > > > Each NFS client seems to get the heebie-jeebies over some
> > > > > > directory or other pretty much independently. The problem
> > > > > > affects all of the filesystems exported by the NFS server, but
> > > > > > clearly I notice it first in home directories, and in
> > > > > > particular in my dot subdirectories for things like my mail
> > > > > > client and browser. I'd say something's up the spout about 20%
> > > > > > of the time.
> > >
> > > And the problem affects just that one directory?
> >
> > Yes. It's almost always .claws-mail/tagsdb. Sometimes
> > it's .claws-mail/mailmboxcache and sometimes it's (what you would
> > call) .mozilla. I suspect this is because very little else is being
> > actively changed.
> >
> > > Ohter files and
> > > directories on the same filesystem continue to be accessible?
> >
> > Spot on. Furthermore, whilst one client is returning ESTALE the others
> > are able to see and modify those same files as if there were no
> > problems at all.
> >
> > After however long it takes the client which was getting ESTALE on
> > those directories is back to normal. The client sees the latest
> > version of the files if those files have been changed by another
> > client in the meantime. IOW if I hadn't been there when the ESTALE
> > had happened, I'd never have noticed.
> >
> > However, if another client (or the server itself with its client hat
> > on) starts to experience ESTALE on some directories or others, their
> > errors can start and end completely independently. So, for instance I
> > might have /home/larry/this/that inaccessible on one NFS client,
> > /home/larry/the/other inaccessible on another NFS client, and
> > and /home/mary/quite/contrary on another NFS client. Each one bobs up
> > and down with no apparent timing relationship with the others.
> >
> > > > > > The server and clients are using nfs4, although for a while I
> > > > > > tried nfs3 without any appreciable difference. I do not have
> > > > > > CONFIG_FSCACHE set.
> > > > > >
> > > > > > I wonder if anyone could tell me if they have ever come across
> > > > > > this before, or what debugging settings might help me diagnose
> > > > > > the problem?
> > > > > Were these machines running older kernels before this started
> > > > > happening? What kernel did you upgrade from if so?
> > > > The full story is this:
> > > >
> > > > I had a pair of boxes running kernel 3.4.3 with the aforementioned
> > > > drbd pacemaker malarkey and some clients running the same.
> > > >
> > > > Then I upgraded the machines by moving from plain old dos
> > > > partitions to gpt. This necessitated a complete reload of
> > > > everything, but there were no software changes. I can be sure that
> > > > nothing else was changed because I build my entire operating
> > > > system in one ginormous makefile.
> > > >
> > > > Rapidly afterwards I switched the motherboards for ones with more
> > > > PCI slots. There were no software changes except those relating to
> > > > MAC addresses.
> > > >
> > > > Next I moved from 100Mbit to gigabit hubs. Then the problems
> > > > started.
> > >
> > > So both the "good" and "bad" behavior were seen with the same 3.4.3
> > > kernel?
> >
> > Yes. I'm now running 3.10.2, but yes, 3.10.1, 3.10, 3.4.4 and 3.4.3
> > all exhibit the same behaviour. I was running 3.10.2 when I made the
> > network captures I spoke of.
> >
> > However, when I first noticed the problem with kernel 3.4.3 it
> > affected several filesystems and I thought the machines needed to be
> > rebooted, but since then I've been toughing it out. I don't suppose
> > the character of the problem has changed at all, but my experience of
> > it has, if that makes sense.
> >
> > > > Anyway, to cut a long story short, this problem seemed to me to
> > > > be a file server problem so I replaced network cards, swapped
> > > > hubs,
> > >
> > > Including reverting back to your original configuration with 100Mbit
> > > hubs?
> >
> > No, guilty as charged. I haven't swapped back the /original/
> > hubs, and I haven't reconstructed the old hardware arrangement exactly
> > (it's a little difficult because those parts are now in use
> > elsewhere), but I've done what I considered to be equivalent tests.
> > I'll do some more swapping and see if I can shake something out.
> >
> > Thank you for your suggestions.
>
> Dear Chaps,
>
> I've spent the last few days doing a variety of tests and I'm convinced
> now that my hardware changes have nothing to do with the problem, and
> that it only occurs when I'm using NFS 4. As it stands all my boxes are
> running 3.10.3, have NFS 4 enabled in kernel but all NFS mounts are
> performed with -o nfsvers=3. Everything is stable.
>
> When I claimed earlier that I still had problems despite using NFS 3,
> I think that one of the computers was still using NFS 4 unbeknownst to
> me. I'm sorry for spouting guff.
>
> Part of my testing involved using bonnie++. I was more than interested
> to note that with NFS 3 performance can be truly abysmal if an NFS export
> has the sync option set and then a client mounts it with -o sync. This
> is a typical example of my tests:
>
> client# bonnie++ -s 8g -m async
> Writing with putc()...done
> Writing intelligently...done
> Rewriting...done
> Reading with getc()...done
> Reading intelligently...done
> start 'em...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> async 8G 53912 85 76221 16 37415 9 42827 75 101754 5 201.6 0
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 9006 47 +++++ +++ 13676 40 8410 44 +++++ +++ 14587 39
> async,8G,53912,85,76221,16,37415,9,42827,75,101754,5,201.6,0,16,9006,47,+++++,+++,13676,40,8410,44,+++++,+++,14587,39
>
> client# bonnie++ -s 8g -m sync
> Writing with putc()...done
> Writing intelligently...done
> Rewriting...done
> Reading with getc()...done
> Reading intelligently...done
> start 'em...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> sync 8G 16288 29 3816 0 4358 1 55449 98 113439 6 344.2 1
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 922 4 29133 12 1809 4 918 4 2066 5 1907 4
> sync,8G,16288,29,3816,0,4358,1,55449,98,113439,6,344.2,1,16,922,4,29133,12,1809,4,918,4,2066,5,1907,4
>
> The above tests were conducted on the same client machine, having
> 4x2.5GHz CPU and 4GB of RAM, and against a server with 2x2.5GHz CPU
> and 4GB of RAM. I'm using gigabit networking and have 0% packet loss.
> The network is otherwise practically silent.
>
> The underlying ext4 filesystem on the server, despite being encrypted
> at the block device and mounted with -o barrier=1, yielded these
> figures by way of comparison:
>
> server# bonnie++ -s 8G -m raw
> Writing with putc()...done
> Writing intelligently...done
> Rewriting...done
> Reading with getc()...done
> Reading intelligently...done
> start 'em...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> raw 8G 66873 98 140602 17 46965 7 38474 75 102117 10 227.7 0
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> raw,8G,66873,98,140602,17,46965,7,38474,75,102117,10,227.7,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
>
> These figures seem reasonable for a single SATA HDD in concert
> with dmcrypt. Whilst I expected some degradation from exporting and
> mounting sync, I have to say that I'm truly flabbergasted by the
> difference between the sync and async figures. I can't help but
> think I am still suffering from some sort of configuration
> problem. Do the numbers from the NFS client seem unreasonable?
>

That's expected. Performance is the tradeoff for tight cache coherency.

With -o sync, each write() sycall requires a round trip to the server.
They don't get batched and you can't issue them in parallel. That has a
terrible effect on write performance.

--
Jeff Layton <[email protected]>