2006-08-08 22:24:49

by Xin Zhao

[permalink] [raw]
Subject: What's the NFS OOM problem?

I often heard of the OOM probelm in NFS, but don't know what it is.
Now I am developing a NFS based system and found my system memory
(server side) is used too fast. I checked the code but didn't find
memory leaking. So I suspect I run into OOM issue.

Can someone help me and give me a brief description on OOM issue?

Many many thanks!
-x


2006-08-09 02:33:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

Xin Zhao wrote:
> I often heard of the OOM probelm in NFS, but don't know what it is.
> Now I am developing a NFS based system and found my system memory
> (server side) is used too fast. I checked the code but didn't find
> memory leaking. So I suspect I run into OOM issue.
>
> Can someone help me and give me a brief description on OOM issue?
>
> Many many thanks!

What I suspect you're talking about has to do with a network client
running out of memory and not being able to talk to the network. The
server isn't affected.

-hpa

2006-08-10 05:14:12

by Willy Tarreau

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Tue, Aug 08, 2006 at 06:24:47PM -0400, Xin Zhao wrote:
> I often heard of the OOM probelm in NFS, but don't know what it is.
> Now I am developing a NFS based system and found my system memory
> (server side) is used too fast. I checked the code but didn't find
> memory leaking. So I suspect I run into OOM issue.

I simply think that you're cache is filling while your clients access
a lot of files. That's expected. You might also get quite a bunch of
dentries cached which will not be accounted for in meminfo. Check
/proc/meminfo for the cache+buffer size, and check /proc/slabinfo for
the number of dentries. The usual way to ensure this is only cache is
to allocate a large amount of memory (let's say all the system RAM
provided that everything can get swapped), then free it. You'll see
a lot of free memory after that.

> Can someone help me and give me a brief description on OOM issue?

I don't know about any OOM issue related to NFS. At most it might happen
on the client (eg: stating firefox from an NFS root) which might not have
enough memory for new network buffers, but I don't even know if it's
possible at all.

Regards,
Willy

2006-08-10 21:53:43

by Grant Coady

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Thu, 10 Aug 2006 06:57:11 +0200, Willy Tarreau <[email protected]> wrote:

>On Tue, Aug 08, 2006 at 06:24:47PM -0400, Xin Zhao wrote:
>> I often heard of the OOM probelm in NFS, but don't know what it is.
>> Now I am developing a NFS based system and found my system memory
>> (server side) is used too fast. I checked the code but didn't find
>> memory leaking. So I suspect I run into OOM issue.
>
>I simply think that you're cache is filling while your clients access
>a lot of files. That's expected. You might also get quite a bunch of
>dentries cached which will not be accounted for in meminfo. Check
>/proc/meminfo for the cache+buffer size, and check /proc/slabinfo for
>the number of dentries. The usual way to ensure this is only cache is
>to allocate a large amount of memory (let's say all the system RAM
>provided that everything can get swapped), then free it. You'll see
>a lot of free memory after that.
>
>> Can someone help me and give me a brief description on OOM issue?
>
>I don't know about any OOM issue related to NFS. At most it might happen
>on the client (eg: stating firefox from an NFS root) which might not have
>enough memory for new network buffers, but I don't even know if it's
>possible at all.

I once wrote a silly test script that put way too much work into ksoftirqd
and the system slowed right down, it was some time ago, I forget details.

You could see the problem by monitoring `top` on both client and server,
watching the thing choking. I didn't document it, seemed like a "don't
do that" situation at the time.

Grant.

2006-08-11 00:33:39

by NeilBrown

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Thursday August 10, [email protected] wrote:
>
> > Can someone help me and give me a brief description on OOM issue?
>
> I don't know about any OOM issue related to NFS. At most it might happen
> on the client (eg: stating firefox from an NFS root) which might not have
> enough memory for new network buffers, but I don't even know if it's
> possible at all.

We've had reports of OOM problems with NFS at SuSE.
The common factors seem to be lots of memory (6G+) and very large
files.
Tuning down /proc/sys/vm/dirty_*ratio seems to avoid the problem,
but I'm not very close to understanding what the real problem is.

NeilBrown

2006-08-11 04:16:21

by Willy Tarreau

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Fri, Aug 11, 2006 at 10:33:32AM +1000, Neil Brown wrote:
> On Thursday August 10, [email protected] wrote:
> >
> > > Can someone help me and give me a brief description on OOM issue?
> >
> > I don't know about any OOM issue related to NFS. At most it might happen
> > on the client (eg: stating firefox from an NFS root) which might not have
> > enough memory for new network buffers, but I don't even know if it's
> > possible at all.
>
> We've had reports of OOM problems with NFS at SuSE.
> The common factors seem to be lots of memory (6G+) and very large
> files.

Just out of curiosity, does it happen on 32bit or 64bit machines (or both) ?

> Tuning down /proc/sys/vm/dirty_*ratio seems to avoid the problem,
> but I'm not very close to understanding what the real problem is.

The most important is to be aware of it ;-)

> NeilBrown

Thanks for the info,
Willy

2006-08-11 04:24:50

by NeilBrown

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Friday August 11, [email protected] wrote:
> On Fri, Aug 11, 2006 at 10:33:32AM +1000, Neil Brown wrote:

> > We've had reports of OOM problems with NFS at SuSE.
> > The common factors seem to be lots of memory (6G+) and very large
> > files.
>
> Just out of curiosity, does it happen on 32bit or 64bit machines (or both) ?

Both.
If it was just 32bit I'd be blaming highmem in a flash.
But it's not that easy :-(

NeilBrown

2006-08-11 08:48:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Fri, 2006-08-11 at 10:33 +1000, Neil Brown wrote:
> On Thursday August 10, [email protected] wrote:
> >
> > > Can someone help me and give me a brief description on OOM issue?
> >
> > I don't know about any OOM issue related to NFS. At most it might happen
> > on the client (eg: stating firefox from an NFS root) which might not have
> > enough memory for new network buffers, but I don't even know if it's
> > possible at all.
>
> We've had reports of OOM problems with NFS at SuSE.
> The common factors seem to be lots of memory (6G+) and very large
> files.
> Tuning down /proc/sys/vm/dirty_*ratio seems to avoid the problem,
> but I'm not very close to understanding what the real problem is.

Would it not be related to mmap'ed files, where the client will not
properly
track the dirty pages? This will make the reclaim code go crap itself
because
suddenly not a single page is easily freeable anymore, all pages are
then
found to be dirty and require writeback, which takes more memory - ie.
allocate
network packets, and wait for proper answer.

Andrew is currently carrying some patches that will avoid this problem
by
virtue of tracking dirtying of mmap'ed pages. With these patches
nr_dirty is
properly incremented and the pdflush logic should kick in and do its
thing.

This would explain why lowering dirty_*ratio would sometimes help, that
would
kick off the pdflush thread earlier, which would then detect the
previously
unknown dirty pages.


2006-08-14 02:03:27

by NeilBrown

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Friday August 11, [email protected] wrote:
> On Fri, 2006-08-11 at 10:33 +1000, Neil Brown wrote:
> > On Thursday August 10, [email protected] wrote:
> > >
> > > > Can someone help me and give me a brief description on OOM issue?
> > >
> > > I don't know about any OOM issue related to NFS. At most it might happen
> > > on the client (eg: stating firefox from an NFS root) which might not have
> > > enough memory for new network buffers, but I don't even know if it's
> > > possible at all.
> >
> > We've had reports of OOM problems with NFS at SuSE.
> > The common factors seem to be lots of memory (6G+) and very large
> > files.
> > Tuning down /proc/sys/vm/dirty_*ratio seems to avoid the problem,
> > but I'm not very close to understanding what the real problem is.
>
> Would it not be related to mmap'ed files, where the client will not
> properly
> track the dirty pages? This will make the reclaim code go crap itself
> because
> suddenly not a single page is easily freeable anymore, all pages are
> then
> found to be dirty and require writeback, which takes more memory - ie.
> allocate
> network packets, and wait for proper answer.

I don't think mmap is being used, but I've asked for confirmation just
to be sure - thanks for the tip.

I have a reconstructed "/proc/meminfo" collected out of a crash-dump.
(note that this is from a 2.6.5 based kernel, though the same symptom
has been reported on a 2.6.16 based kernel).
It is below, plus the 'Unstable' number (which 2.6.5 doesn't show
normally).

It seems that 10Gig of the 16Gig is in 'writeback'.
It is my understanding that pages shouldn't stay in 'writeback for
very long. They should get written and then (for nfs) moved to
Unstable.
The fact that 'Dirty' is zero suggests that there weren't any malloc
failures in starting writeback so I don't think the system is actually
OOM at this point (MemFree is 17Meg which is 1/1000th of total ram,
but some thousands of pages). But the machine has nevertheless seized
up.

I'm thinking that the very large number of 'writeback pages on the
inactive list is slowing down shink_list and associated functions
and nothing is progressing very fast. But I wonder why
'writeback' was allowed to get so high, and why it stays to high.

Looking at balance_dirty_pages again, I see that it only really
worries about the number of dirty pages. i.e. once enough pages have
been written, it breaks out of the loop, even if there are
heaps and heaps of writeback pages....

So on this 16Gig machine with dirty_ratio at the default of '40',
We happily let 6.4Gig get dirty and then start writing it out in
balance_dirty_pages. It will then flush out 6Meg for every 4 Meg
that is written. While nr_writeback stays high balance_dirty will keep
flushing until nr_dirty hits zero. then it will just flush out all
dirty pages every time it is called, thus keeping nr_writeback high.
They should be a 100msec pause each time balance_dirty_pages
is called at this stage. It is called for every 4Meg of data,
which would take a lot longer than 100msec to go out via NFS....

Hmmm.. maybe balance_dirty_pages should wait for nr_writeback to
drop sometimes. Currently it has to write at least
sync_writeback_pages(). If it cannot find that many to write, it
stops.
Maybe if it cannot find that many, it should wait for nr_writeback to
drop by the corresponding number. That would mean that if
nr_pagewriteback got out-of-hand, writes would be throttled until it
came back in line.

But there is more to the story... (I hope you don't mind me rambling
on like this. It helps to have someone to explain the problem to).
When Writeback is really high, nfs doesn't make (much) progress in
getting it down again. Apparently rpciod is using lots of CPU time
and not sending many packets on the network...

The crash dump shows rpciod and lots of other processes as Runnable,
with very simply stack-tops (I don't have full details on hand) so
they are probably all trying to get some free memory and so going very
slowly (because the inactive list is so long with very little usable
on it). rpciod calls rpc_malloc which uses a mempool to avoid
starvation, but that doesn't avoid incredible slowness.

So here is my understanding of the problem that I am seeing:

1/ balance_dirty_pages will allow nr_writeback to grow
without bound. While it does work to decrease the number of
ditry pages, it does nothing about decreasing the number
of writeback pages. It should (in some situations) wait for
the number to decrease (blk_congestion_wait isn't strong enough by
itself).

2/ When there is a very large number of Writeback pages on the
inactive list, memory reclaim can go very slowly. Maybe Writeback
pages shouldn't be on the inactive list? Either that or we
need a strong limit on the number of Writeback pages.


Comments/corrections very welcome. I'll see if I can find a way to
verify any of this with the customer...

Thanks for listening,
NeilBrown



MemTotal: 16154060 kB
MemFree: 17760 kB
Buffers: 16104 kB
Cached: 12956032 kB
SwapCached: 1224 kB
Active: 52 kB
Inactive: 12972740 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 16154060 kB
LowFree: 17760 kB
SwapTotal: 25173844 kB
SwapFree: 25119512 kB
Dirty: 0 kB
Writeback: 10999176 kB
Mapped: 4316 kB
Slab: 3135240 kB
Committed_AS: 112984 kB
PageTables: 1436 kB
VmallocTotal: 536870911 kB
VmallocUsed: 12828 kB
VmallocChunk: 536858083 kB

Unstable 1326884 kB

2006-08-15 18:24:46

by Roger Heflin

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

Neil Brown wrote:
> On Thursday August 10, [email protected] wrote:
>>> Can someone help me and give me a brief description on OOM issue?
>> I don't know about any OOM issue related to NFS. At most it might happen
>> on the client (eg: stating firefox from an NFS root) which might not have
>> enough memory for new network buffers, but I don't even know if it's
>> possible at all.
>
> We've had reports of OOM problems with NFS at SuSE.
> The common factors seem to be lots of memory (6G+) and very large
> files.
> Tuning down /proc/sys/vm/dirty_*ratio seems to avoid the problem,
> but I'm not very close to understanding what the real problem is.
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

I have noticed on SLES kernels that when the dirty_*ratios turned down it
still uses alot more memory than it should work writeback buffers, it makes
me think that with the default setting of 40% that it for some reason
may be using all of memory and deadlocking. It does not seem like an
NFS only issue, as I believe I have duplicated it with a fast lock
setup.

Checking writeback in /proc/meminfo does indicate that alot more memory
is being used for write cache that should be.

Roger

2006-08-17 05:04:38

by NeilBrown

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

On Tuesday August 15, [email protected] wrote:
>
> I have noticed on SLES kernels that when the dirty_*ratios turned down it
> still uses alot more memory than it should work writeback buffers, it makes
> me think that with the default setting of 40% that it for some reason
> may be using all of memory and deadlocking. It does not seem like an
> NFS only issue, as I believe I have duplicated it with a fast lock
> setup.

We seem to have a little patch in SuSE kernels that might be making
the problem worse .... though I presume it was introduced for a
reason. I haven't managed to track what that reason was yet.

What is "a fast lock setup"?? I don't understand.

NeilBrown

2006-08-17 13:33:40

by Roger Heflin

[permalink] [raw]
Subject: Re: What's the NFS OOM problem?

Neil Brown wrote:
> On Tuesday August 15, [email protected] wrote:
>> I have noticed on SLES kernels that when the dirty_*ratios turned down it
>> still uses alot more memory than it should work writeback buffers, it makes
>> me think that with the default setting of 40% that it for some reason
>> may be using all of memory and deadlocking. It does not seem like an
>> NFS only issue, as I believe I have duplicated it with a fast lock
>> setup.
>
> We seem to have a little patch in SuSE kernels that might be making
> the problem worse .... though I presume it was introduced for a
> reason. I haven't managed to track what that reason was yet.
>
> What is "a fast lock setup"?? I don't understand.
>
> NeilBrown
>

I am not sure what I ment, I may have ment a fast disk setup, and
thought or typed the wrong thing. The machine I duplicated it with
had disks that would sustain 175MB/second (3 striped), 4cpus with local
ram of 32GB. The 2 cpu/4GB/100MB/second machine does not seem
to have the issue. Both machines are opterons, I believe I duplicated
it under SP2, I know I duplicated it SP3 and one of the
post-SP3 kernels. It did not occur under SP1.

Turning down the dirty*ratios seems to make it go away. When I
get a chance I will retest on SP2 and see if it happens there.

I do know (and this may be related) that if on a 32GB machine I
pagelock a large portion of ram (say 28GB) that machine will deadlock
under high IO. The basic symptoms are similar to the writeback
issue the machine responds to ping/sysrq, but logins fail, and any
new process creation fails.

Roger