2013-03-04 09:52:25

by Lenky Gao

[permalink] [raw]
Subject: Inactive memory keep growing and how to release it?

Hi,

When i just run a test on Centos 6.2 as follows:

#!/bin/bash

while true
do

file="/tmp/filetest"

echo $file

dd if=/dev/zero of=${file} bs=512 count=204800 &> /dev/null

sleep 5
done

the inactive memory keep growing:

#cat /proc/meminfo | grep Inactive\(fi
Inactive(file): 420144 kB
...
#cat /proc/meminfo | grep Inactive\(fi
Inactive(file): 911912 kB
...
#cat /proc/meminfo | grep Inactive\(fi
Inactive(file): 1547484 kB
...

and i cannot reclaim it:

# cat /proc/meminfo | grep Inactive\(fi
Inactive(file): 1557684 kB
# echo 3 > /proc/sys/vm/drop_caches
# cat /proc/meminfo | grep Inactive\(fi
Inactive(file): 1520832 kB

I have tested on other version kernel, such as 2.6.30 and .6.11, the
problom also exists.

When in the final situation, i cannot kmalloc a larger contiguous
memory, especially in interrupt context.
Can you give some tips to avoid this?

PS:
# uname -a
Linux localhost.localdomain 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6
19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux



--
Regards,

Lenky


2013-03-04 10:48:56

by Zlatko Calusic

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

On 04.03.2013 10:52, Lenky Gao wrote:
> Hi,
>
> When i just run a test on Centos 6.2 as follows:
>
> #!/bin/bash
>
> while true
> do
>
> file="/tmp/filetest"
>
> echo $file
>
> dd if=/dev/zero of=${file} bs=512 count=204800 &> /dev/null
>
> sleep 5
> done
>
> the inactive memory keep growing:
>
> #cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 420144 kB
> ...
> #cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 911912 kB
> ...
> #cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 1547484 kB
> ...
>
> and i cannot reclaim it:
>
> # cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 1557684 kB
> # echo 3 > /proc/sys/vm/drop_caches
> # cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 1520832 kB
>
> I have tested on other version kernel, such as 2.6.30 and .6.11, the
> problom also exists.
>
> When in the final situation, i cannot kmalloc a larger contiguous
> memory, especially in interrupt context.
> Can you give some tips to avoid this?
>

The drop_caches mechanism doesn't free dirty page cache pages. And your
bash script is creating a lot of dirty pages. Run it like this and see
if it helps your case:

sync; echo 3 > /proc/sys/vm/drop_caches

Regards,
--
Zlatko

2013-03-04 12:21:41

by Lenky Gao

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

2013/3/4 Zlatko Calusic <[email protected]>:
>
> The drop_caches mechanism doesn't free dirty page cache pages. And your bash
> script is creating a lot of dirty pages. Run it like this and see if it
> helps your case:
>
> sync; echo 3 > /proc/sys/vm/drop_caches

Thanks for your advice.

The inactive memory still cannot be reclaimed after i execute the sync command:

# cat /proc/meminfo | grep Inactive\(file\);
Inactive(file): 882824 kB
# sync;
# echo 3 > /proc/sys/vm/drop_caches
# cat /proc/meminfo | grep Inactive\(file\);
Inactive(file): 777664 kB

I find these page becomes orphaned in this function, but do not understand why:

/*
* If truncate cannot remove the fs-private metadata from the page, the page
* becomes orphaned. It will be left on the LRU and may even be mapped into
* user pagetables if we're racing with filemap_fault().
*
* We need to bale out if page->mapping is no longer equal to the original
* mapping. This happens a) when the VM reclaimed the page while we waited on
* its lock, b) when a concurrent invalidate_mapping_pages got there first and
* c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
*/
static int
truncate_complete_page(struct address_space *mapping, struct page *page)
{
...

My file system type is ext3, mounted with the opteion data=journal and
it is easy to reproduce.


--
Regards,

Lenky

2013-03-09 02:14:24

by Will Huck

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

Cc experts. Hugh, Johannes,

On 03/04/2013 08:21 PM, Lenky Gao wrote:
> 2013/3/4 Zlatko Calusic <[email protected]>:
>> The drop_caches mechanism doesn't free dirty page cache pages. And your bash
>> script is creating a lot of dirty pages. Run it like this and see if it
>> helps your case:
>>
>> sync; echo 3 > /proc/sys/vm/drop_caches
> Thanks for your advice.
>
> The inactive memory still cannot be reclaimed after i execute the sync command:
>
> # cat /proc/meminfo | grep Inactive\(file\);
> Inactive(file): 882824 kB
> # sync;
> # echo 3 > /proc/sys/vm/drop_caches
> # cat /proc/meminfo | grep Inactive\(file\);
> Inactive(file): 777664 kB
>
> I find these page becomes orphaned in this function, but do not understand why:
>
> /*
> * If truncate cannot remove the fs-private metadata from the page, the page
> * becomes orphaned. It will be left on the LRU and may even be mapped into
> * user pagetables if we're racing with filemap_fault().
> *
> * We need to bale out if page->mapping is no longer equal to the original
> * mapping. This happens a) when the VM reclaimed the page while we waited on
> * its lock, b) when a concurrent invalidate_mapping_pages got there first and
> * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
> */
> static int
> truncate_complete_page(struct address_space *mapping, struct page *page)
> {
> ...
>
> My file system type is ext3, mounted with the opteion data=journal and
> it is easy to reproduce.
>
>

2013-03-14 10:14:07

by Michal Hocko

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

On Mon 04-03-13 17:52:22, Lenky Gao wrote:
> Hi,
>
> When i just run a test on Centos 6.2 as follows:
>
> #!/bin/bash
>
> while true
> do
>
> file="/tmp/filetest"
>
> echo $file
>
> dd if=/dev/zero of=${file} bs=512 count=204800 &> /dev/null
>
> sleep 5
> done
>
> the inactive memory keep growing:
>
> #cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 420144 kB
> ...
> #cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 911912 kB
> ...
> #cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 1547484 kB
> ...
>
> and i cannot reclaim it:

How did you try to reclaim the memory? How much memory is still free?
Are you above watermaks (/proc/zoneinfo will tell you more)

> # cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 1557684 kB
> # echo 3 > /proc/sys/vm/drop_caches
> # cat /proc/meminfo | grep Inactive\(fi
> Inactive(file): 1520832 kB
>
> I have tested on other version kernel, such as 2.6.30 and .6.11, the
> problom also exists.
>
> When in the final situation, i cannot kmalloc a larger contiguous
> memory, especially in interrupt context.

This could be related to the memory fragmentation and your kernel seem
to be too large to have memory compaction which helps a lot in that
area.

> Can you give some tips to avoid this?

One way would be to increase /proc/sys/vm/min_free_kbytes which will
enlarge watermaks so the reclaim starts sooner.

> PS:
> # uname -a
> Linux localhost.localdomain 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6
> 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux

This is really an old kernel and also a distribution one which might
contain a lot of patches on top of the core kernel. I would suggest to
contact Redhat or try to reproduce the issue with the vanilla and
up-to-date kernel and report here.
--
Michal Hocko
SUSE Labs

2013-03-14 12:39:29

by Hillf Danton

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

On Sat, Mar 9, 2013 at 10:14 AM, Will Huck <[email protected]> wrote:
> Cc experts. Hugh, Johannes,
>
> On 03/04/2013 08:21 PM, Lenky Gao wrote:
>>
>> 2013/3/4 Zlatko Calusic <[email protected]>:
>>>
>>> The drop_caches mechanism doesn't free dirty page cache pages. And your
>>> bash
>>> script is creating a lot of dirty pages. Run it like this and see if it
>>> helps your case:
>>>
>>> sync; echo 3 > /proc/sys/vm/drop_caches
>>
>> Thanks for your advice.
>>
>> The inactive memory still cannot be reclaimed after i execute the sync
>> command:
>>
>> # cat /proc/meminfo | grep Inactive\(file\);
>> Inactive(file): 882824 kB
>> # sync;
>> # echo 3 > /proc/sys/vm/drop_caches
>> # cat /proc/meminfo | grep Inactive\(file\);
>> Inactive(file): 777664 kB
>>
>> I find these page becomes orphaned in this function, but do not understand
>> why:
>>
>> /*
>> * If truncate cannot remove the fs-private metadata from the page, the
>> page
>> * becomes orphaned. It will be left on the LRU and may even be mapped
>> into
>> * user pagetables if we're racing with filemap_fault().
>> *
>> * We need to bale out if page->mapping is no longer equal to the
>> original
>> * mapping. This happens a) when the VM reclaimed the page while we
>> waited on
>> * its lock, b) when a concurrent invalidate_mapping_pages got there
>> first and
>> * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
>> */
>> static int
>> truncate_complete_page(struct address_space *mapping, struct page *page)
>> {
>> ...
>>
>> My file system type is ext3, mounted with the opteion data=journal and
>> it is easy to reproduce.
>>

Perhaps we have to consider page count for orphan page if it
could be reproduced with mainline.

Hillf
---
--- a/mm/vmscan.c Sun Mar 10 13:36:26 2013
+++ b/mm/vmscan.c Thu Mar 14 20:29:40 2013
@@ -315,14 +315,14 @@ out:
return ret;
}

-static inline int is_page_cache_freeable(struct page *page)
+static inline int is_page_cache_freeable(struct page *page, int has_mapping)
{
/*
* A freeable page cache page is referenced only by the caller
* that isolated the page, the page cache radix tree and
* optional buffer heads at page->private.
*/
- return page_count(page) - page_has_private(page) == 2;
+ return page_count(page) - page_has_private(page) == has_mapping + 1;
}

static int may_write_to_queue(struct backing_dev_info *bdi,
@@ -393,7 +393,7 @@ static pageout_t pageout(struct page *pa
* swap_backing_dev_info is bust: it doesn't reflect the
* congestion state of the swapdevs. Easy to fix, if needed.
*/
- if (!is_page_cache_freeable(page))
+ if (!is_page_cache_freeable(page, mapping ? 1 : 0))
return PAGE_KEEP;
if (!mapping) {
/*
--

2013-03-14 15:07:09

by Lenky Gao

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

On Thu, Mar 14, 2013 at 6:14 PM, Michal Hocko <[email protected]> wrote:
> One way would be to increase /proc/sys/vm/min_free_kbytes which will
> enlarge watermaks so the reclaim starts sooner.
>

Good tip thanks. :)

> This is really an old kernel and also a distribution one which might
> contain a lot of patches on top of the core kernel. I would suggest to
> contact Redhat or try to reproduce the issue with the vanilla and
> up-to-date kernel and report here.

I have tested on other version vanilla kernel, such as 2.6.30 and 3.6.11, the
issue also exist and it is easy to reproduce.

Maybe i have found the answer for this question:

On Thu, Mar 14, 2013 at 4:00 PM, Lenky Gao <[email protected]> wrote:
> Hi Everyone,
>
> Maybe i have found the answer for this question. The author of the JBD
> have explained in the comments:
>
> /*
> * When an ext3-ordered file is truncated, it is possible that many pages are
> * not successfully freed, because they are attached to a committing
> transaction.
> * After the transaction commits, these pages are left on the LRU, with no
> * ->mapping, and with attached buffers. These pages are trivially reclaimable
> * by the VM, but their apparent absence upsets the VM accounting, and it makes
> * the numbers in /proc/meminfo look odd.
> ...
> */
> static void release_buffer_page(struct buffer_head *bh)
> {
> struct page *page;
> ...

But my new question is why not free those pages directly after the
transaction commits?

On Thu, Mar 14, 2013 at 8:39 PM, Hillf Danton <[email protected]> wrote:
> Perhaps we have to consider page count for orphan page if it
> could be reproduced with mainline.
>
> Hillf
> ---
> --- a/mm/vmscan.c Sun Mar 10 13:36:26 2013
> +++ b/mm/vmscan.c Thu Mar 14 20:29:40 2013
> @@ -315,14 +315,14 @@ out:
> return ret;
> }
>
> -static inline int is_page_cache_freeable(struct page *page)
> +static inline int is_page_cache_freeable(struct page *page, int has_mapping)
> {
> /*
> * A freeable page cache page is referenced only by the caller
> * that isolated the page, the page cache radix tree and
> * optional buffer heads at page->private.
> */
> - return page_count(page) - page_has_private(page) == 2;
> + return page_count(page) - page_has_private(page) == has_mapping + 1;
> }
>
> static int may_write_to_queue(struct backing_dev_info *bdi,
> @@ -393,7 +393,7 @@ static pageout_t pageout(struct page *pa
> * swap_backing_dev_info is bust: it doesn't reflect the
> * congestion state of the swapdevs. Easy to fix, if needed.
> */
> - if (!is_page_cache_freeable(page))
> + if (!is_page_cache_freeable(page, mapping ? 1 : 0))
> return PAGE_KEEP;
> if (!mapping) {
> /*

Thanks, i'll test it.

I am totally a newbie regarding VMM and EXT/JBD, thanks to everyone
for your kind attention and help.

--
Regards,

Lenky

2013-03-15 08:41:54

by Simon Jeons

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

On 03/14/2013 06:14 PM, Michal Hocko wrote:
> On Mon 04-03-13 17:52:22, Lenky Gao wrote:
>> Hi,
>>
>> When i just run a test on Centos 6.2 as follows:
>>
>> #!/bin/bash
>>
>> while true
>> do
>>
>> file="/tmp/filetest"
>>
>> echo $file
>>
>> dd if=/dev/zero of=${file} bs=512 count=204800 &> /dev/null
>>
>> sleep 5
>> done
>>
>> the inactive memory keep growing:
>>
>> #cat /proc/meminfo | grep Inactive\(fi
>> Inactive(file): 420144 kB
>> ...
>> #cat /proc/meminfo | grep Inactive\(fi
>> Inactive(file): 911912 kB
>> ...
>> #cat /proc/meminfo | grep Inactive\(fi
>> Inactive(file): 1547484 kB
>> ...
>>
>> and i cannot reclaim it:
> How did you try to reclaim the memory? How much memory is still free?
> Are you above watermaks (/proc/zoneinfo will tell you more)
>
>> # cat /proc/meminfo | grep Inactive\(fi
>> Inactive(file): 1557684 kB
>> # echo 3 > /proc/sys/vm/drop_caches
>> # cat /proc/meminfo | grep Inactive\(fi
>> Inactive(file): 1520832 kB
>>
>> I have tested on other version kernel, such as 2.6.30 and .6.11, the
>> problom also exists.
>>
>> When in the final situation, i cannot kmalloc a larger contiguous
>> memory, especially in interrupt context.
> This could be related to the memory fragmentation and your kernel seem
> to be too large to have memory compaction which helps a lot in that
> area.
>
>> Can you give some tips to avoid this?
> One way would be to increase /proc/sys/vm/min_free_kbytes which will
> enlarge watermaks so the reclaim starts sooner.
>
>> PS:
>> # uname -a
>> Linux localhost.localdomain 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6
>> 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
> This is really an old kernel and also a distribution one which might
> contain a lot of patches on top of the core kernel. I would suggest to
> contact Redhat or try to reproduce the issue with the vanilla and

What's the meaning of vanilla?

> up-to-date kernel and report here.

2013-03-15 08:51:19

by Simon Jeons

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

On 03/14/2013 08:39 PM, Hillf Danton wrote:
> On Sat, Mar 9, 2013 at 10:14 AM, Will Huck <[email protected]> wrote:
>> Cc experts. Hugh, Johannes,
>>
>> On 03/04/2013 08:21 PM, Lenky Gao wrote:
>>> 2013/3/4 Zlatko Calusic <[email protected]>:
>>>> The drop_caches mechanism doesn't free dirty page cache pages. And your
>>>> bash
>>>> script is creating a lot of dirty pages. Run it like this and see if it
>>>> helps your case:
>>>>
>>>> sync; echo 3 > /proc/sys/vm/drop_caches
>>> Thanks for your advice.
>>>
>>> The inactive memory still cannot be reclaimed after i execute the sync
>>> command:
>>>
>>> # cat /proc/meminfo | grep Inactive\(file\);
>>> Inactive(file): 882824 kB
>>> # sync;
>>> # echo 3 > /proc/sys/vm/drop_caches
>>> # cat /proc/meminfo | grep Inactive\(file\);
>>> Inactive(file): 777664 kB
>>>
>>> I find these page becomes orphaned in this function, but do not understand
>>> why:
>>>
>>> /*
>>> * If truncate cannot remove the fs-private metadata from the page, the
>>> page
>>> * becomes orphaned. It will be left on the LRU and may even be mapped
>>> into
>>> * user pagetables if we're racing with filemap_fault().
>>> *
>>> * We need to bale out if page->mapping is no longer equal to the
>>> original
>>> * mapping. This happens a) when the VM reclaimed the page while we
>>> waited on
>>> * its lock, b) when a concurrent invalidate_mapping_pages got there
>>> first and
>>> * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
>>> */
>>> static int
>>> truncate_complete_page(struct address_space *mapping, struct page *page)
>>> {
>>> ...
>>>
>>> My file system type is ext3, mounted with the opteion data=journal and
>>> it is easy to reproduce.
>>>
> Perhaps we have to consider page count for orphan page if it
> could be reproduced with mainline.

Why? /proc/sys/vm/drop_caches will call invalidate_mapping_pages()
instead of truncate_complete_page().

>
> Hillf
> ---
> --- a/mm/vmscan.c Sun Mar 10 13:36:26 2013
> +++ b/mm/vmscan.c Thu Mar 14 20:29:40 2013
> @@ -315,14 +315,14 @@ out:
> return ret;
> }
>
> -static inline int is_page_cache_freeable(struct page *page)
> +static inline int is_page_cache_freeable(struct page *page, int has_mapping)
> {
> /*
> * A freeable page cache page is referenced only by the caller
> * that isolated the page, the page cache radix tree and
> * optional buffer heads at page->private.
> */
> - return page_count(page) - page_has_private(page) == 2;
> + return page_count(page) - page_has_private(page) == has_mapping + 1;
> }

page count 2 is for page cache and isolator, why you check mapping
separately?

> static int may_write_to_queue(struct backing_dev_info *bdi,
> @@ -393,7 +393,7 @@ static pageout_t pageout(struct page *pa
> * swap_backing_dev_info is bust: it doesn't reflect the
> * congestion state of the swapdevs. Easy to fix, if needed.
> */
> - if (!is_page_cache_freeable(page))
> + if (!is_page_cache_freeable(page, mapping ? 1 : 0))
> return PAGE_KEEP;
> if (!mapping) {
> /*
> --
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-03-15 15:00:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Inactive memory keep growing and how to release it?

On Fri, Mar 15, 2013 at 04:41:41PM +0800, Simon Jeons wrote:
> >This is really an old kernel and also a distribution one which might
> >contain a lot of patches on top of the core kernel. I would suggest to
> >contact Redhat or try to reproduce the issue with the vanilla and
>
> What's the meaning of vanilla?

Vanilla means an up-to-date (i.e., non-prehistoric) kernel from
kernel.org, without any "Value Added" patches from a distribution.

See: https://www.kernel.org/

Regards,

- Ted