2011-03-28 16:39:31

by Evgenii Lepikhin

[permalink] [raw]
Subject: Very aggressive memory reclaim

Hello,

I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
memory, intensive HDD traffic and 20..50 forks per second. Vanilla
kernel 2.6.37.4. The problem is that kernel frees memory very
aggressively.

For example:

25% of memory is used by processes
50% for page caches
7% for slabs, etc.
18% free.

That's bad but works. After few hours:

25% of memory is used by processes
62% for page caches
7% for slabs, etc.
5% free.

Most of files are cached, works perfectly. This is the moment when
kernel decides to free some memory. After memory reclaim:

25% of memory is used by processes
25% for page caches(!)
7% for slabs, etc.
43% free(!)

Page cache is dropped, server becomes too slow. This is the beginning
of new cycle.

I didn't found any huge mallocs at that moment. Looks like because of
large number of small mallocs (forks) kernel have pessimistic forecast
about future memory usage and frees too much memory. Is there any
options of tuning this? Any other variants?

Thanks!


2011-03-28 17:42:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

[ Add Cc's of those that may help you ]

-- Steve

On Mon, Mar 28, 2011 at 08:39:29PM +0400, John Lepikhin wrote:
> Hello,
>
> I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
> memory, intensive HDD traffic and 20..50 forks per second. Vanilla
> kernel 2.6.37.4. The problem is that kernel frees memory very
> aggressively.
>
> For example:
>
> 25% of memory is used by processes
> 50% for page caches
> 7% for slabs, etc.
> 18% free.
>
> That's bad but works. After few hours:
>
> 25% of memory is used by processes
> 62% for page caches
> 7% for slabs, etc.
> 5% free.
>
> Most of files are cached, works perfectly. This is the moment when
> kernel decides to free some memory. After memory reclaim:
>
> 25% of memory is used by processes
> 25% for page caches(!)
> 7% for slabs, etc.
> 43% free(!)
>
> Page cache is dropped, server becomes too slow. This is the beginning
> of new cycle.
>
> I didn't found any huge mallocs at that moment. Looks like because of
> large number of small mallocs (forks) kernel have pessimistic forecast
> about future memory usage and frees too much memory. Is there any
> options of tuning this? Any other variants?
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-03-28 21:53:59

by Dave Chinner

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

[cc xfs and mm lists]

On Mon, Mar 28, 2011 at 08:39:29PM +0400, John Lepikhin wrote:
> Hello,
>
> I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
> memory, intensive HDD traffic and 20..50 forks per second. Vanilla
> kernel 2.6.37.4. The problem is that kernel frees memory very
> aggressively.
>
> For example:
>
> 25% of memory is used by processes
> 50% for page caches
> 7% for slabs, etc.
> 18% free.
>
> That's bad but works. After few hours:
>
> 25% of memory is used by processes
> 62% for page caches
> 7% for slabs, etc.
> 5% free.
>
> Most of files are cached, works perfectly. This is the moment when
> kernel decides to free some memory. After memory reclaim:
>
> 25% of memory is used by processes
> 25% for page caches(!)
> 7% for slabs, etc.
> 43% free(!)
>
> Page cache is dropped, server becomes too slow. This is the beginning
> of new cycle.
>
> I didn't found any huge mallocs at that moment. Looks like because of
> large number of small mallocs (forks) kernel have pessimistic forecast
> about future memory usage and frees too much memory. Is there any
> options of tuning this? Any other variants?

First it would be useful to determine why the VM is reclaiming so
much memory. If it is somewhat predictable when the excessive
reclaim is going to happen, it might be worth capturing an event
trace from the VM so we can see more precisely what it is doiing
during this event. In that case, recording the kmem/* and vmscan/*
events is probably sufficient to tell us what memory allocations
triggered reclaim and how much reclaim was done on each event.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-03-28 22:52:18

by Minchan Kim

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

On Tue, Mar 29, 2011 at 6:53 AM, Dave Chinner <[email protected]> wrote:
> [cc xfs and mm lists]
>
> On Mon, Mar 28, 2011 at 08:39:29PM +0400, John Lepikhin wrote:
>> Hello,
>>
>> I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
>> memory, intensive HDD traffic and 20..50 forks per second. Vanilla
>> kernel 2.6.37.4. The problem is that kernel frees memory very
>> aggressively.
>>
>> For example:
>>
>> 25% of memory is used by processes
>> 50% for page caches
>> 7% for slabs, etc.
>> 18% free.
>>
>> That's bad but works. After few hours:
>>
>> 25% of memory is used by processes
>> 62% for page caches
>> 7% for slabs, etc.
>> 5% free.
>>
>> Most of files are cached, works perfectly. This is the moment when
>> kernel decides to free some memory. After memory reclaim:
>>
>> 25% of memory is used by processes
>> 25% for page caches(!)
>> 7% for slabs, etc.
>> 43% free(!)
>>
>> Page cache is dropped, server becomes too slow. This is the beginning
>> of new cycle.
>>
>> I didn't found any huge mallocs at that moment. Looks like because of
>> large number of small mallocs (forks) kernel have pessimistic forecast
>> about future memory usage and frees too much memory. Is there any
>> options of tuning this? Any other variants?
>
> First it would be useful to determine why the VM is reclaiming so
> much memory. If it is somewhat predictable when the excessive
> reclaim is going to happen, it might be worth capturing an event
> trace from the VM so we can see more precisely what it is doiing
> during this event. In that case, recording the kmem/* and vmscan/*
> events is probably sufficient to tell us what memory allocations
> triggered reclaim and how much reclaim was done on each event.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
>

Recently, We had a similar issue.
http://www.spinics.net/lists/linux-mm/msg12243.html
But it seems to not merge. I don't know why since I didn't follow up the thread.
Maybe Cced guys can help you.

Is it a sudden big cache drop at the moment or accumulated small cache
drop for long time?
What's your zones' size?

Please attach the result of cat /proc/zoneinfo for others.

2011-03-29 00:00:35

by Andi Kleen

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

Dave Chinner <[email protected]> writes:
>
> First it would be useful to determine why the VM is reclaiming so
> much memory. If it is somewhat predictable when the excessive
> reclaim is going to happen, it might be worth capturing an event

Often it's to get pages of a higher order. Just tracing alloc_pages
should tell you that.

There are a few other cases (like memory failure handling), but they're
more obscure.

-Andi
--
[email protected] -- Speaking for myself only

2011-03-29 01:57:32

by Dave Chinner

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

On Mon, Mar 28, 2011 at 04:58:50PM -0700, Andi Kleen wrote:
> Dave Chinner <[email protected]> writes:
> >
> > First it would be useful to determine why the VM is reclaiming so
> > much memory. If it is somewhat predictable when the excessive
> > reclaim is going to happen, it might be worth capturing an event
>
> Often it's to get pages of a higher order. Just tracing alloc_pages
> should tell you that.

Yes, the kmem/mm_page_alloc tracepoint gives us that. But in case
that is not the cause, grabbing all the trace points I suggested is
more likely to indicate where the problem is. I'd prefer to get more
data than needed the first time around than have to do multiple
round trips because a single trace point doesn't tell us the cause...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-03-29 02:55:07

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

> Recently, We had a similar issue.
> http://www.spinics.net/lists/linux-mm/msg12243.html
> But it seems to not merge. I don't know why since I didn't follow up the thread.
> Maybe Cced guys can help you.
>
> Is it a sudden big cache drop at the moment or accumulated small cache
> drop for long time?
> What's your zones' size?
>
> Please attach the result of cat /proc/zoneinfo for others.

If my remember is correct, 2.6.38 is included Mel's anti agressive
reclaim patch. And original report seems to be using 2.6.37.x.

John, can you try 2.6.38?


2011-03-29 07:23:00

by Evgenii Lepikhin

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

2011/3/29 Minchan Kim <[email protected]>:

> Please attach the result of cat /proc/zoneinfo for others.

See attachment. Right now I have no zoneinfo for crisis time, but I
can catch it if required.


Attachments:
zoneinfo (14.31 kB)
meminfo (0.99 kB)
Download all attachments

2011-03-29 07:26:04

by Evgenii Lepikhin

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

2011/3/29 Dave Chinner <[email protected]>:

> First it would be useful to determine why the VM is reclaiming so
> much memory. If it is somewhat predictable when the excessive
> reclaim is going to happen, it might be worth capturing an event
> trace from the VM so we can see more precisely what it is doiing
> during this event. In that case, recording the kmem/* and vmscan/*
> events is probably sufficient to tell us what memory allocations
> triggered reclaim and how much reclaim was done on each event.

Do you mean I must add some debug to mm functions? I don't know any
other way to catch such events.

2011-03-29 07:33:34

by Evgenii Lepikhin

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

2011/3/29 KOSAKI Motohiro <[email protected]>:

> If my remember is correct, 2.6.38 is included Mel's anti agressive
> reclaim patch. And original report seems to be using 2.6.37.x.
>
> John, can you try 2.6.38?

I'll ask my boss about it. Unfortunately we found opposite issue with
memory management + XFS (100M of inodes) on 2.6.38: some objects in
xfs_inode and dentry slabs are seems to be never cleared (at least
without "sync && echo 2 >.../drop_caches"). But this is not a
production machine working 24x7, so we don't care about it right now.

2011-03-29 08:59:50

by Avi Kivity

[permalink] [raw]
Subject: Re: Very aggressive memory reclaim

On 03/29/2011 09:26 AM, John Lepikhin wrote:
> 2011/3/29 Dave Chinner<[email protected]>:
>
> > First it would be useful to determine why the VM is reclaiming so
> > much memory. If it is somewhat predictable when the excessive
> > reclaim is going to happen, it might be worth capturing an event
> > trace from the VM so we can see more precisely what it is doiing
> > during this event. In that case, recording the kmem/* and vmscan/*
> > events is probably sufficient to tell us what memory allocations
> > triggered reclaim and how much reclaim was done on each event.
>
> Do you mean I must add some debug to mm functions? I don't know any
> other way to catch such events.

Download and build trace-cmd
(git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git),
and do

$ trace-cmd record -e kmem -e vmscan -b 30000

Hit ctrl-C when done and post the output file generated in cwd.

--
error compiling committee.c: too many arguments to function