2009-07-05 09:23:07

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 0/5] OOM analysis helper patches


Current OOM log doesn't provide sufficient memory usage information. it cause
make confusion to lkml MM guys.

this patch series add some memory usage information to OOM log.

enjoy.



2009-07-05 09:23:43

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 1/5] add per-zone statistics to show_free_areas()

Subject: [PATCH] add per-zone statistics to show_free_areas()

Currently, show_free_area() mainly display system memory usage. but it
doesn't display per-zone memory usage information.

However, if DMA zone OOM occur, Administrator definitely need to know
per-zone memory usage information.



Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/page_alloc.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2151,6 +2151,16 @@ void show_free_areas(void)
" inactive_file:%lukB"
" unevictable:%lukB"
" present:%lukB"
+ " mlocked:%lukB"
+ " dirty:%lukB"
+ " writeback:%lukB"
+ " mapped:%lukB"
+ " slab_reclaimable:%lukB"
+ " slab_unreclaimable:%lukB"
+ " pagetables:%lukB"
+ " unstable:%lukB"
+ " bounce:%lukB"
+ " writeback_tmp:%lukB"
" pages_scanned:%lu"
" all_unreclaimable? %s"
"\n",
@@ -2165,6 +2175,16 @@ void show_free_areas(void)
K(zone_page_state(zone, NR_INACTIVE_FILE)),
K(zone_page_state(zone, NR_UNEVICTABLE)),
K(zone->present_pages),
+ K(zone_page_state(zone, NR_MLOCK)),
+ K(zone_page_state(zone, NR_FILE_DIRTY)),
+ K(zone_page_state(zone, NR_WRITEBACK)),
+ K(zone_page_state(zone, NR_FILE_MAPPED)),
+ K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
+ K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
+ K(zone_page_state(zone, NR_PAGETABLE)),
+ K(zone_page_state(zone, NR_UNSTABLE_NFS)),
+ K(zone_page_state(zone, NR_BOUNCE)),
+ K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
zone->pages_scanned,
(zone_is_all_unreclaimable(zone) ? "yes" : "no")
);

2009-07-05 09:24:18

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 2/5] add buffer cache information to show_free_areas()

Subject: [PATCH] add buffer cache information to show_free_areas()

When administrator analysis memory shortage reason from OOM log, They
often need to know rest number of cache like pages.

Then, show_free_areas() shouldn't only display page cache, but also it
should display buffer cache.


Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2118,7 +2118,7 @@ void show_free_areas(void)
printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
" inactive_file:%lu"
" unevictable:%lu"
- " dirty:%lu writeback:%lu unstable:%lu\n"
+ " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
" mapped:%lu pagetables:%lu bounce:%lu\n",
global_page_state(NR_ACTIVE_ANON),
@@ -2128,6 +2128,7 @@ void show_free_areas(void)
global_page_state(NR_UNEVICTABLE),
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
+ K(nr_blockdev_pages()),
global_page_state(NR_UNSTABLE_NFS),
global_page_state(NR_FREE_PAGES),
global_page_state(NR_SLAB_RECLAIMABLE),

2009-07-05 09:24:57

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 3/5] Show kernel stack usage to /proc/meminfo and OOM log

Subject: [PATCH] Show kernel stack usage to /proc/meminfo and OOM log

if the system have a lot of thread, kernel stack consume unignorable large size
memory. IOW, it make a lot of unaccountable memory.
Tons unaccountable memory bring to harder analyse memory related trouble.

Then, kernel stack account is useful.


Signed-off-by: KOSAKI Motohiro <[email protected]>
---
drivers/base/node.c | 3 +++
fs/proc/meminfo.c | 2 ++
include/linux/mmzone.h | 3 ++-
kernel/fork.c | 11 +++++++++++
mm/page_alloc.c | 3 +++
mm/vmstat.c | 1 +
6 files changed, 22 insertions(+), 1 deletion(-)

Index: b/fs/proc/meminfo.c
===================================================================
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -84,6 +84,7 @@ static int meminfo_proc_show(struct seq_
"Slab: %8lu kB\n"
"SReclaimable: %8lu kB\n"
"SUnreclaim: %8lu kB\n"
+ "KernelStack: %8lu kB\n"
"PageTables: %8lu kB\n"
#ifdef CONFIG_QUICKLIST
"Quicklists: %8lu kB\n"
@@ -128,6 +129,7 @@ static int meminfo_proc_show(struct seq_
global_page_state(NR_SLAB_UNRECLAIMABLE)),
K(global_page_state(NR_SLAB_RECLAIMABLE)),
K(global_page_state(NR_SLAB_UNRECLAIMABLE)),
+ global_page_state(NR_KERNEL_STACK) * THREAD_SIZE / 1024,
K(global_page_state(NR_PAGETABLE)),
#ifdef CONFIG_QUICKLIST
K(quicklist_total_size()),
Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -94,10 +94,11 @@ enum zone_stat_item {
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
+ NR_KERNEL_STACK,
+ /* Second 128 byte cacheline */
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_BOUNCE,
NR_VMSCAN_WRITE,
- /* Second 128 byte cacheline */
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
Index: b/kernel/fork.c
===================================================================
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -137,9 +137,17 @@ struct kmem_cache *vm_area_cachep;
/* SLAB cache for mm_struct structures (tsk->mm) */
static struct kmem_cache *mm_cachep;

+static void account_kernel_stack(struct thread_info *ti, int account)
+{
+ struct zone *zone = page_zone(virt_to_page(ti));
+
+ mod_zone_page_state(zone, NR_KERNEL_STACK, account);
+}
+
void free_task(struct task_struct *tsk)
{
prop_local_destroy_single(&tsk->dirties);
+ account_kernel_stack(tsk->stack, -1);
free_thread_info(tsk->stack);
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
@@ -255,6 +263,9 @@ static struct task_struct *dup_task_stru
tsk->btrace_seq = 0;
#endif
tsk->splice_pipe = NULL;
+
+ account_kernel_stack(ti, 1);
+
return tsk;

out:
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2158,6 +2158,7 @@ void show_free_areas(void)
" mapped:%lukB"
" slab_reclaimable:%lukB"
" slab_unreclaimable:%lukB"
+ " kernel_stack:%lukB"
" pagetables:%lukB"
" unstable:%lukB"
" bounce:%lukB"
@@ -2182,6 +2183,8 @@ void show_free_areas(void)
K(zone_page_state(zone, NR_FILE_MAPPED)),
K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
+ zone_page_state(zone, NR_KERNEL_STACK) *
+ THREAD_SIZE / 1024,
K(zone_page_state(zone, NR_PAGETABLE)),
K(zone_page_state(zone, NR_UNSTABLE_NFS)),
K(zone_page_state(zone, NR_BOUNCE)),
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -639,6 +639,7 @@ static const char * const vmstat_text[]
"nr_slab_reclaimable",
"nr_slab_unreclaimable",
"nr_page_table_pages",
+ "nr_kernel_stack",
"nr_unstable",
"nr_bounce",
"nr_vmscan_write",
Index: b/drivers/base/node.c
===================================================================
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -85,6 +85,7 @@ static ssize_t node_read_meminfo(struct
"Node %d FilePages: %8lu kB\n"
"Node %d Mapped: %8lu kB\n"
"Node %d AnonPages: %8lu kB\n"
+ "Node %d KernelStack: %8lu kB\n"
"Node %d PageTables: %8lu kB\n"
"Node %d NFS_Unstable: %8lu kB\n"
"Node %d Bounce: %8lu kB\n"
@@ -116,6 +117,8 @@ static ssize_t node_read_meminfo(struct
nid, K(node_page_state(nid, NR_FILE_PAGES)),
nid, K(node_page_state(nid, NR_FILE_MAPPED)),
nid, K(node_page_state(nid, NR_ANON_PAGES)),
+ nid, node_page_state(nid, NR_KERNEL_STACK) *
+ THREAD_SIZE / 1024,
nid, K(node_page_state(nid, NR_PAGETABLE)),
nid, K(node_page_state(nid, NR_UNSTABLE_NFS)),
nid, K(node_page_state(nid, NR_BOUNCE)),

2009-07-05 09:25:39

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 4/5] add isolate pages vmstat

Subject: [PATCH] add isolate pages vmstat

If the system have plenty threads or processes, concurrent reclaim can
isolate very much pages.
Unfortunately, current /proc/meminfo and OOM log can't show it.

This patch provide the way of showing this information.


reproduce way
-----------------------
% ./hackbench 140 process 1000
=> couse OOM

Active_anon:4419 active_file:120 inactive_anon:1418
inactive_file:61 unevictable:0 isolated:45311
^^^^^
dirty:0 writeback:580 unstable:0
free:27 slab_reclaimable:297 slab_unreclaimable:4050
mapped:221 kernel_stack:5758 pagetables:28219 bounce:0



Signed-off-by: KOSAKI Motohiro <[email protected]>
---
drivers/base/node.c | 2 ++
fs/proc/meminfo.c | 2 ++
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 6 ++++--
mm/vmscan.c | 4 ++++
mm/vmstat.c | 2 +-
6 files changed, 14 insertions(+), 3 deletions(-)

Index: b/fs/proc/meminfo.c
===================================================================
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
"Active(file): %8lu kB\n"
"Inactive(file): %8lu kB\n"
"Unevictable: %8lu kB\n"
+ "IsolatedPages: %8lu kB\n"
"Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"HighTotal: %8lu kB\n"
@@ -109,6 +110,7 @@ static int meminfo_proc_show(struct seq_
K(pages[LRU_ACTIVE_FILE]),
K(pages[LRU_INACTIVE_FILE]),
K(pages[LRU_UNEVICTABLE]),
+ K(global_page_state(NR_ISOLATED)),
K(global_page_state(NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
K(i.totalhigh),
Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,7 @@ enum zone_stat_item {
NR_BOUNCE,
NR_VMSCAN_WRITE,
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
+ NR_ISOLATED, /* Temporary isolated pages from lru */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
NUMA_MISS, /* allocated in non intended node */
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2116,8 +2116,7 @@ void show_free_areas(void)
}

printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
- " inactive_file:%lu"
- " unevictable:%lu"
+ " inactive_file:%lu unevictable:%lu isolated:%lu\n"
" dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
" mapped:%lu pagetables:%lu bounce:%lu\n",
@@ -2126,6 +2125,7 @@ void show_free_areas(void)
global_page_state(NR_INACTIVE_ANON),
global_page_state(NR_INACTIVE_FILE),
global_page_state(NR_UNEVICTABLE),
+ global_page_state(NR_ISOLATED),
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
K(nr_blockdev_pages()),
@@ -2151,6 +2151,7 @@ void show_free_areas(void)
" active_file:%lukB"
" inactive_file:%lukB"
" unevictable:%lukB"
+ " isolated:%lukB"
" present:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
@@ -2176,6 +2177,7 @@ void show_free_areas(void)
K(zone_page_state(zone, NR_ACTIVE_FILE)),
K(zone_page_state(zone, NR_INACTIVE_FILE)),
K(zone_page_state(zone, NR_UNEVICTABLE)),
+ K(zone_page_state(zone, NR_ISOLATED)),
K(zone->present_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
-count[LRU_ACTIVE_ANON]);
__mod_zone_page_state(zone, NR_INACTIVE_ANON,
-count[LRU_INACTIVE_ANON]);
+ __mod_zone_page_state(zone, NR_ISOLATED, nr_taken);

if (scanning_global_lru(sc))
zone->pages_scanned += nr_scan;
@@ -1131,6 +1132,7 @@ static unsigned long shrink_inactive_lis
goto done;

spin_lock(&zone->lru_lock);
+ __mod_zone_page_state(zone, NR_ISOLATED, -nr_taken);
/*
* Put back any unfreeable pages.
*/
@@ -1232,6 +1234,7 @@ static void move_active_pages_to_lru(str
}
}
__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED, -pgmoved);
if (!is_active_lru(lru))
__count_vm_events(PGDEACTIVATE, pgmoved);
}
@@ -1267,6 +1270,7 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
else
__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED, pgmoved);
spin_unlock_irq(&zone->lru_lock);

pgmoved = 0; /* count referenced (mapping) mapped pages */
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -644,7 +644,7 @@ static const char * const vmstat_text[]
"nr_bounce",
"nr_vmscan_write",
"nr_writeback_temp",
-
+ "nr_isolated_pages",
#ifdef CONFIG_NUMA
"numa_hit",
"numa_miss",
Index: b/drivers/base/node.c
===================================================================
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -73,6 +73,7 @@ static ssize_t node_read_meminfo(struct
"Node %d Active(file): %8lu kB\n"
"Node %d Inactive(file): %8lu kB\n"
"Node %d Unevictable: %8lu kB\n"
+ "Node %d IsolatedPages: %8lu kB\n"
"Node %d Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"Node %d HighTotal: %8lu kB\n"
@@ -105,6 +106,7 @@ static ssize_t node_read_meminfo(struct
nid, K(node_page_state(nid, NR_ACTIVE_FILE)),
nid, K(node_page_state(nid, NR_INACTIVE_FILE)),
nid, K(node_page_state(nid, NR_UNEVICTABLE)),
+ nid, K(node_page_state(nid, NR_ISOLATED)),
nid, K(node_page_state(nid, NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
nid, K(i.totalhigh),

2009-07-05 09:26:27

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 5/5] add NR_ANON_PAGES to OOM log

Subject: [PATCH] add NR_ANON_PAGES to OOM log

show_free_areas can display NR_FILE_PAGES, but it can't display
NR_ANON_PAGES.

this patch fix its inconsistency.


Reported-by: Wu Fengguang <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/page_alloc.c | 1 +
1 file changed, 1 insertion(+)

Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2216,6 +2216,7 @@ void show_free_areas(void)
printk("= %lukB\n", K(total));
}

+ printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));

show_swap_cache_info();

2009-07-05 11:14:55

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 1/5] add per-zone statistics to show_free_areas()

On Sun, Jul 05, 2009 at 07:09:55PM +0800, KOSAKI Motohiro wrote:
> > On Sun, Jul 05, 2009 at 05:23:35PM +0800, KOSAKI Motohiro wrote:
> > > Subject: [PATCH] add per-zone statistics to show_free_areas()
> > >
> > > Currently, show_free_area() mainly display system memory usage. but it
> > > doesn't display per-zone memory usage information.
> > >
> > > However, if DMA zone OOM occur, Administrator definitely need to know
> > > per-zone memory usage information.
> >
> > DMA zone is normally lowmem-reserved. But I think the numbers still
> > make sense for DMA32.
> >
> > Acked-by: Wu Fengguang <[email protected]>
>
> Yes, x86_64 have DMA and DMA32, but almost 64-bit architecture have
> 2 or 4GB "DMA" zone.

Ah Yes!

> Then, I wrote the patch description by generic name.

OK.

Thanks,
Fengguang

2009-07-05 11:22:18

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 2/5] add buffer cache information to show_free_areas()

On Sun, Jul 05, 2009 at 05:24:07PM +0800, KOSAKI Motohiro wrote:
> Subject: [PATCH] add buffer cache information to show_free_areas()
>
> When administrator analysis memory shortage reason from OOM log, They
> often need to know rest number of cache like pages.

nr_blockdev_pages() pages are also accounted in NR_FILE_PAGES.

> Then, show_free_areas() shouldn't only display page cache, but also it
> should display buffer cache.

So if we are to add this, I'd suggest to put it close to the total
pagecache line:

printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
+ printk("%ld blkdev pagecache pages\n", nr_blockdev_pages());

Thanks,
Fengguang

>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/page_alloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2118,7 +2118,7 @@ void show_free_areas(void)
> printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> " inactive_file:%lu"
> " unevictable:%lu"
> - " dirty:%lu writeback:%lu unstable:%lu\n"
> + " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> " mapped:%lu pagetables:%lu bounce:%lu\n",
> global_page_state(NR_ACTIVE_ANON),
> @@ -2128,6 +2128,7 @@ void show_free_areas(void)
> global_page_state(NR_UNEVICTABLE),
> global_page_state(NR_FILE_DIRTY),
> global_page_state(NR_WRITEBACK),
> + K(nr_blockdev_pages()),
> global_page_state(NR_UNSTABLE_NFS),
> global_page_state(NR_FREE_PAGES),
> global_page_state(NR_SLAB_RECLAIMABLE),
>

2009-07-05 11:31:17

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/5] add buffer cache information to show_free_areas()

> On Sun, Jul 05, 2009 at 05:24:07PM +0800, KOSAKI Motohiro wrote:
> > Subject: [PATCH] add buffer cache information to show_free_areas()
> >
> > When administrator analysis memory shortage reason from OOM log, They
> > often need to know rest number of cache like pages.
>
> nr_blockdev_pages() pages are also accounted in NR_FILE_PAGES.

Yes, I know.

> > Then, show_free_areas() shouldn't only display page cache, but also it
> > should display buffer cache.
>
> So if we are to add this, I'd suggest to put it close to the total
> pagecache line:
>
> printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
> + printk("%ld blkdev pagecache pages\n", nr_blockdev_pages());

but this is intensional. May I explain why I choose non verbose area?
In typical workload, buffer-pages doesn't consume so many pages. then
I feel that your idea is too verbose output. In addition, if buffer-pages are much,
Administrator want to know other I/O related vmstat at the same time.

Then, I choose current position.


Thanks.



>
> Thanks,
> Fengguang
>
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > mm/page_alloc.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > Index: b/mm/page_alloc.c
> > ===================================================================
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2118,7 +2118,7 @@ void show_free_areas(void)
> > printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> > " inactive_file:%lu"
> > " unevictable:%lu"
> > - " dirty:%lu writeback:%lu unstable:%lu\n"
> > + " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> > " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> > " mapped:%lu pagetables:%lu bounce:%lu\n",
> > global_page_state(NR_ACTIVE_ANON),
> > @@ -2128,6 +2128,7 @@ void show_free_areas(void)
> > global_page_state(NR_UNEVICTABLE),
> > global_page_state(NR_FILE_DIRTY),
> > global_page_state(NR_WRITEBACK),
> > + K(nr_blockdev_pages()),
> > global_page_state(NR_UNSTABLE_NFS),
> > global_page_state(NR_FREE_PAGES),
> > global_page_state(NR_SLAB_RECLAIMABLE),
> >


2009-07-05 11:06:08

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 1/5] add per-zone statistics to show_free_areas()

On Sun, Jul 05, 2009 at 05:23:35PM +0800, KOSAKI Motohiro wrote:
> Subject: [PATCH] add per-zone statistics to show_free_areas()
>
> Currently, show_free_area() mainly display system memory usage. but it
> doesn't display per-zone memory usage information.
>
> However, if DMA zone OOM occur, Administrator definitely need to know
> per-zone memory usage information.

DMA zone is normally lowmem-reserved. But I think the numbers still
make sense for DMA32.

Acked-by: Wu Fengguang <[email protected]>

>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/page_alloc.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2151,6 +2151,16 @@ void show_free_areas(void)
> " inactive_file:%lukB"
> " unevictable:%lukB"
> " present:%lukB"
> + " mlocked:%lukB"
> + " dirty:%lukB"
> + " writeback:%lukB"
> + " mapped:%lukB"
> + " slab_reclaimable:%lukB"
> + " slab_unreclaimable:%lukB"
> + " pagetables:%lukB"
> + " unstable:%lukB"
> + " bounce:%lukB"
> + " writeback_tmp:%lukB"
> " pages_scanned:%lu"
> " all_unreclaimable? %s"
> "\n",
> @@ -2165,6 +2175,16 @@ void show_free_areas(void)
> K(zone_page_state(zone, NR_INACTIVE_FILE)),
> K(zone_page_state(zone, NR_UNEVICTABLE)),
> K(zone->present_pages),
> + K(zone_page_state(zone, NR_MLOCK)),
> + K(zone_page_state(zone, NR_FILE_DIRTY)),
> + K(zone_page_state(zone, NR_WRITEBACK)),
> + K(zone_page_state(zone, NR_FILE_MAPPED)),
> + K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
> + K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
> + K(zone_page_state(zone, NR_PAGETABLE)),
> + K(zone_page_state(zone, NR_UNSTABLE_NFS)),
> + K(zone_page_state(zone, NR_BOUNCE)),
> + K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
> zone->pages_scanned,
> (zone_is_all_unreclaimable(zone) ? "yes" : "no")
> );
>

2009-07-05 11:10:04

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 1/5] add per-zone statistics to show_free_areas()

> On Sun, Jul 05, 2009 at 05:23:35PM +0800, KOSAKI Motohiro wrote:
> > Subject: [PATCH] add per-zone statistics to show_free_areas()
> >
> > Currently, show_free_area() mainly display system memory usage. but it
> > doesn't display per-zone memory usage information.
> >
> > However, if DMA zone OOM occur, Administrator definitely need to know
> > per-zone memory usage information.
>
> DMA zone is normally lowmem-reserved. But I think the numbers still
> make sense for DMA32.
>
> Acked-by: Wu Fengguang <[email protected]>

Yes, x86_64 have DMA and DMA32, but almost 64-bit architecture have
2 or 4GB "DMA" zone.
Then, I wrote the patch description by generic name.

Thanks.

2009-07-05 12:07:23

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 2/5] add buffer cache information to show_free_areas()

On Sun, Jul 05, 2009 at 07:31:02PM +0800, KOSAKI Motohiro wrote:
> > On Sun, Jul 05, 2009 at 05:24:07PM +0800, KOSAKI Motohiro wrote:
> > > Subject: [PATCH] add buffer cache information to show_free_areas()
> > >
> > > When administrator analysis memory shortage reason from OOM log, They
> > > often need to know rest number of cache like pages.
> >
> > nr_blockdev_pages() pages are also accounted in NR_FILE_PAGES.
>
> Yes, I know.
>
> > > Then, show_free_areas() shouldn't only display page cache, but also it
> > > should display buffer cache.
> >
> > So if we are to add this, I'd suggest to put it close to the total
> > pagecache line:
> >
> > printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
> > + printk("%ld blkdev pagecache pages\n", nr_blockdev_pages());
>
> but this is intensional. May I explain why I choose non verbose area?
> In typical workload, buffer-pages doesn't consume so many pages. then
> I feel that your idea is too verbose output. In addition, if buffer-pages are much,
> Administrator want to know other I/O related vmstat at the same time.

> > > + " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"

Now I see your point. It makes sense to put typically small numbers together.

Acked-by: Wu Fengguang <[email protected]>

Thanks,
Fengguang

> > > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > > ---
> > > mm/page_alloc.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > Index: b/mm/page_alloc.c
> > > ===================================================================
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2118,7 +2118,7 @@ void show_free_areas(void)
> > > printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> > > " inactive_file:%lu"
> > > " unevictable:%lu"
> > > - " dirty:%lu writeback:%lu unstable:%lu\n"
> > > + " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> > > " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> > > " mapped:%lu pagetables:%lu bounce:%lu\n",
> > > global_page_state(NR_ACTIVE_ANON),
> > > @@ -2128,6 +2128,7 @@ void show_free_areas(void)
> > > global_page_state(NR_UNEVICTABLE),
> > > global_page_state(NR_FILE_DIRTY),
> > > global_page_state(NR_WRITEBACK),
> > > + K(nr_blockdev_pages()),
> > > global_page_state(NR_UNSTABLE_NFS),
> > > global_page_state(NR_FREE_PAGES),
> > > global_page_state(NR_SLAB_RECLAIMABLE),
> > >
>
>

2009-07-05 12:10:25

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Sun, Jul 05, 2009 at 05:25:32PM +0800, KOSAKI Motohiro wrote:
> Subject: [PATCH] add isolate pages vmstat
>
> If the system have plenty threads or processes, concurrent reclaim can
> isolate very much pages.
> Unfortunately, current /proc/meminfo and OOM log can't show it.
>
> This patch provide the way of showing this information.
>
>
> reproduce way
> -----------------------
> % ./hackbench 140 process 1000
> => couse OOM
>
> Active_anon:4419 active_file:120 inactive_anon:1418
> inactive_file:61 unevictable:0 isolated:45311
> ^^^^^
> dirty:0 writeback:580 unstable:0
> free:27 slab_reclaimable:297 slab_unreclaimable:4050
> mapped:221 kernel_stack:5758 pagetables:28219 bounce:0
>
>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> drivers/base/node.c | 2 ++
> fs/proc/meminfo.c | 2 ++
> include/linux/mmzone.h | 1 +
> mm/page_alloc.c | 6 ++++--
> mm/vmscan.c | 4 ++++
> mm/vmstat.c | 2 +-
> 6 files changed, 14 insertions(+), 3 deletions(-)
>
> Index: b/fs/proc/meminfo.c
> ===================================================================
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
> "Active(file): %8lu kB\n"
> "Inactive(file): %8lu kB\n"
> "Unevictable: %8lu kB\n"
> + "IsolatedPages: %8lu kB\n"
> "Mlocked: %8lu kB\n"
> #ifdef CONFIG_HIGHMEM
> "HighTotal: %8lu kB\n"
> @@ -109,6 +110,7 @@ static int meminfo_proc_show(struct seq_
> K(pages[LRU_ACTIVE_FILE]),
> K(pages[LRU_INACTIVE_FILE]),
> K(pages[LRU_UNEVICTABLE]),
> + K(global_page_state(NR_ISOLATED)),

Glad to see you renamed it to NR_ISOLATED :)
But for the user visible name, how about IsolatedLRU?

Acked-by: Wu Fengguang <[email protected]>

> K(global_page_state(NR_MLOCK)),
> #ifdef CONFIG_HIGHMEM
> K(i.totalhigh),
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -100,6 +100,7 @@ enum zone_stat_item {
> NR_BOUNCE,
> NR_VMSCAN_WRITE,
> NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
> + NR_ISOLATED, /* Temporary isolated pages from lru */
> #ifdef CONFIG_NUMA
> NUMA_HIT, /* allocated in intended node */
> NUMA_MISS, /* allocated in non intended node */
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2116,8 +2116,7 @@ void show_free_areas(void)
> }
>
> printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> - " inactive_file:%lu"
> - " unevictable:%lu"
> + " inactive_file:%lu unevictable:%lu isolated:%lu\n"
> " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> " mapped:%lu pagetables:%lu bounce:%lu\n",
> @@ -2126,6 +2125,7 @@ void show_free_areas(void)
> global_page_state(NR_INACTIVE_ANON),
> global_page_state(NR_INACTIVE_FILE),
> global_page_state(NR_UNEVICTABLE),
> + global_page_state(NR_ISOLATED),
> global_page_state(NR_FILE_DIRTY),
> global_page_state(NR_WRITEBACK),
> K(nr_blockdev_pages()),
> @@ -2151,6 +2151,7 @@ void show_free_areas(void)
> " active_file:%lukB"
> " inactive_file:%lukB"
> " unevictable:%lukB"
> + " isolated:%lukB"
> " present:%lukB"
> " mlocked:%lukB"
> " dirty:%lukB"
> @@ -2176,6 +2177,7 @@ void show_free_areas(void)
> K(zone_page_state(zone, NR_ACTIVE_FILE)),
> K(zone_page_state(zone, NR_INACTIVE_FILE)),
> K(zone_page_state(zone, NR_UNEVICTABLE)),
> + K(zone_page_state(zone, NR_ISOLATED)),
> K(zone->present_pages),
> K(zone_page_state(zone, NR_MLOCK)),
> K(zone_page_state(zone, NR_FILE_DIRTY)),
> Index: b/mm/vmscan.c
> ===================================================================
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
> -count[LRU_ACTIVE_ANON]);
> __mod_zone_page_state(zone, NR_INACTIVE_ANON,
> -count[LRU_INACTIVE_ANON]);
> + __mod_zone_page_state(zone, NR_ISOLATED, nr_taken);
>
> if (scanning_global_lru(sc))
> zone->pages_scanned += nr_scan;
> @@ -1131,6 +1132,7 @@ static unsigned long shrink_inactive_lis
> goto done;
>
> spin_lock(&zone->lru_lock);
> + __mod_zone_page_state(zone, NR_ISOLATED, -nr_taken);
> /*
> * Put back any unfreeable pages.
> */
> @@ -1232,6 +1234,7 @@ static void move_active_pages_to_lru(str
> }
> }
> __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> + __mod_zone_page_state(zone, NR_ISOLATED, -pgmoved);
> if (!is_active_lru(lru))
> __count_vm_events(PGDEACTIVATE, pgmoved);
> }
> @@ -1267,6 +1270,7 @@ static void shrink_active_list(unsigned
> __mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
> else
> __mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
> + __mod_zone_page_state(zone, NR_ISOLATED, pgmoved);
> spin_unlock_irq(&zone->lru_lock);
>
> pgmoved = 0; /* count referenced (mapping) mapped pages */
> Index: b/mm/vmstat.c
> ===================================================================
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -644,7 +644,7 @@ static const char * const vmstat_text[]
> "nr_bounce",
> "nr_vmscan_write",
> "nr_writeback_temp",
> -
> + "nr_isolated_pages",
> #ifdef CONFIG_NUMA
> "numa_hit",
> "numa_miss",
> Index: b/drivers/base/node.c
> ===================================================================
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -73,6 +73,7 @@ static ssize_t node_read_meminfo(struct
> "Node %d Active(file): %8lu kB\n"
> "Node %d Inactive(file): %8lu kB\n"
> "Node %d Unevictable: %8lu kB\n"
> + "Node %d IsolatedPages: %8lu kB\n"
> "Node %d Mlocked: %8lu kB\n"
> #ifdef CONFIG_HIGHMEM
> "Node %d HighTotal: %8lu kB\n"
> @@ -105,6 +106,7 @@ static ssize_t node_read_meminfo(struct
> nid, K(node_page_state(nid, NR_ACTIVE_FILE)),
> nid, K(node_page_state(nid, NR_INACTIVE_FILE)),
> nid, K(node_page_state(nid, NR_UNEVICTABLE)),
> + nid, K(node_page_state(nid, NR_ISOLATED)),
> nid, K(node_page_state(nid, NR_MLOCK)),
> #ifdef CONFIG_HIGHMEM
> nid, K(i.totalhigh),
>

2009-07-05 12:13:25

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Sun, Jul 05, 2009 at 05:26:18PM +0800, KOSAKI Motohiro wrote:
> Subject: [PATCH] add NR_ANON_PAGES to OOM log
>
> show_free_areas can display NR_FILE_PAGES, but it can't display
> NR_ANON_PAGES.
>
> this patch fix its inconsistency.
>
>
> Reported-by: Wu Fengguang <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/page_alloc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2216,6 +2216,7 @@ void show_free_areas(void)
> printk("= %lukB\n", K(total));
> }
>
> + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
> printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));

Can we put related items together, ie. this looks more friendly:

Anon:XXX active_anon:XXX inactive_anon:XXX
File:XXX active_file:XXX inactive_file:XXX

Thanks,
Fengguang

2009-07-05 12:21:27

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

> On Sun, Jul 05, 2009 at 05:26:18PM +0800, KOSAKI Motohiro wrote:
> > Subject: [PATCH] add NR_ANON_PAGES to OOM log
> >
> > show_free_areas can display NR_FILE_PAGES, but it can't display
> > NR_ANON_PAGES.
> >
> > this patch fix its inconsistency.
> >
> >
> > Reported-by: Wu Fengguang <[email protected]>
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > mm/page_alloc.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > Index: b/mm/page_alloc.c
> > ===================================================================
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2216,6 +2216,7 @@ void show_free_areas(void)
> > printk("= %lukB\n", K(total));
> > }
> >
> > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
> > printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
>
> Can we put related items together, ie. this looks more friendly:
>
> Anon:XXX active_anon:XXX inactive_anon:XXX
> File:XXX active_file:XXX inactive_file:XXX

hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
tmpfs pages are accounted as FILE, but it is stay in anon lru.

I think your proposed format easily makes confusion. this format cause to
imazine Anon = active_anon + inactive_anon.

At least, we need to use another name, I think.


2009-07-05 12:23:58

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

> On Sun, Jul 05, 2009 at 05:25:32PM +0800, KOSAKI Motohiro wrote:
> > Subject: [PATCH] add isolate pages vmstat
> >
> > If the system have plenty threads or processes, concurrent reclaim can
> > isolate very much pages.
> > Unfortunately, current /proc/meminfo and OOM log can't show it.
> >
> > This patch provide the way of showing this information.
> >
> >
> > reproduce way
> > -----------------------
> > % ./hackbench 140 process 1000
> > => couse OOM
> >
> > Active_anon:4419 active_file:120 inactive_anon:1418
> > inactive_file:61 unevictable:0 isolated:45311
> > ^^^^^
> > dirty:0 writeback:580 unstable:0
> > free:27 slab_reclaimable:297 slab_unreclaimable:4050
> > mapped:221 kernel_stack:5758 pagetables:28219 bounce:0
> >
> >
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > drivers/base/node.c | 2 ++
> > fs/proc/meminfo.c | 2 ++
> > include/linux/mmzone.h | 1 +
> > mm/page_alloc.c | 6 ++++--
> > mm/vmscan.c | 4 ++++
> > mm/vmstat.c | 2 +-
> > 6 files changed, 14 insertions(+), 3 deletions(-)
> >
> > Index: b/fs/proc/meminfo.c
> > ===================================================================
> > --- a/fs/proc/meminfo.c
> > +++ b/fs/proc/meminfo.c
> > @@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
> > "Active(file): %8lu kB\n"
> > "Inactive(file): %8lu kB\n"
> > "Unevictable: %8lu kB\n"
> > + "IsolatedPages: %8lu kB\n"
> > "Mlocked: %8lu kB\n"
> > #ifdef CONFIG_HIGHMEM
> > "HighTotal: %8lu kB\n"
> > @@ -109,6 +110,7 @@ static int meminfo_proc_show(struct seq_
> > K(pages[LRU_ACTIVE_FILE]),
> > K(pages[LRU_INACTIVE_FILE]),
> > K(pages[LRU_UNEVICTABLE]),
> > + K(global_page_state(NR_ISOLATED)),
>
> Glad to see you renamed it to NR_ISOLATED :)
> But for the user visible name, how about IsolatedLRU?

Ah, nice. below is update patch.

Changelog
----------------
since v1
- rename "IsolatedPages" to "IsolatedLRU"


=================================
Subject: [PATCH] add isolate pages vmstat

If the system have plenty threads or processes, concurrent reclaim can
isolate very much pages.
Unfortunately, current /proc/meminfo and OOM log can't show it.

This patch provide the way of showing this information.


reproduce way
-----------------------
% ./hackbench 140 process 1000
=> couse OOM

Active_anon:4419 active_file:120 inactive_anon:1418
inactive_file:61 unevictable:0 isolated:45311
^^^^^
dirty:0 writeback:580 unstable:0
free:27 slab_reclaimable:297 slab_unreclaimable:4050
mapped:221 kernel_stack:5758 pagetables:28219 bounce:0



Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Wu Fengguang <[email protected]>
---
drivers/base/node.c | 2 ++
fs/proc/meminfo.c | 2 ++
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 6 ++++--
mm/vmscan.c | 4 ++++
mm/vmstat.c | 2 +-
6 files changed, 14 insertions(+), 3 deletions(-)

Index: b/fs/proc/meminfo.c
===================================================================
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
"Active(file): %8lu kB\n"
"Inactive(file): %8lu kB\n"
"Unevictable: %8lu kB\n"
+ "IsolatedLRU: %8lu kB\n"
"Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"HighTotal: %8lu kB\n"
@@ -109,6 +110,7 @@ static int meminfo_proc_show(struct seq_
K(pages[LRU_ACTIVE_FILE]),
K(pages[LRU_INACTIVE_FILE]),
K(pages[LRU_UNEVICTABLE]),
+ K(global_page_state(NR_ISOLATED)),
K(global_page_state(NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
K(i.totalhigh),
Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,7 @@ enum zone_stat_item {
NR_BOUNCE,
NR_VMSCAN_WRITE,
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
+ NR_ISOLATED, /* Temporary isolated pages from lru */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
NUMA_MISS, /* allocated in non intended node */
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2116,8 +2116,7 @@ void show_free_areas(void)
}

printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
- " inactive_file:%lu"
- " unevictable:%lu"
+ " inactive_file:%lu unevictable:%lu isolated:%lu\n"
" dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
" mapped:%lu pagetables:%lu bounce:%lu\n",
@@ -2126,6 +2125,7 @@ void show_free_areas(void)
global_page_state(NR_INACTIVE_ANON),
global_page_state(NR_INACTIVE_FILE),
global_page_state(NR_UNEVICTABLE),
+ global_page_state(NR_ISOLATED),
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
K(nr_blockdev_pages()),
@@ -2151,6 +2151,7 @@ void show_free_areas(void)
" active_file:%lukB"
" inactive_file:%lukB"
" unevictable:%lukB"
+ " isolated:%lukB"
" present:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
@@ -2176,6 +2177,7 @@ void show_free_areas(void)
K(zone_page_state(zone, NR_ACTIVE_FILE)),
K(zone_page_state(zone, NR_INACTIVE_FILE)),
K(zone_page_state(zone, NR_UNEVICTABLE)),
+ K(zone_page_state(zone, NR_ISOLATED)),
K(zone->present_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
-count[LRU_ACTIVE_ANON]);
__mod_zone_page_state(zone, NR_INACTIVE_ANON,
-count[LRU_INACTIVE_ANON]);
+ __mod_zone_page_state(zone, NR_ISOLATED, nr_taken);

if (scanning_global_lru(sc))
zone->pages_scanned += nr_scan;
@@ -1131,6 +1132,7 @@ static unsigned long shrink_inactive_lis
goto done;

spin_lock(&zone->lru_lock);
+ __mod_zone_page_state(zone, NR_ISOLATED, -nr_taken);
/*
* Put back any unfreeable pages.
*/
@@ -1232,6 +1234,7 @@ static void move_active_pages_to_lru(str
}
}
__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED, -pgmoved);
if (!is_active_lru(lru))
__count_vm_events(PGDEACTIVATE, pgmoved);
}
@@ -1267,6 +1270,7 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
else
__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED, pgmoved);
spin_unlock_irq(&zone->lru_lock);

pgmoved = 0; /* count referenced (mapping) mapped pages */
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -644,7 +644,7 @@ static const char * const vmstat_text[]
"nr_bounce",
"nr_vmscan_write",
"nr_writeback_temp",
-
+ "nr_isolated_pages",
#ifdef CONFIG_NUMA
"numa_hit",
"numa_miss",
Index: b/drivers/base/node.c
===================================================================
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -73,6 +73,7 @@ static ssize_t node_read_meminfo(struct
"Node %d Active(file): %8lu kB\n"
"Node %d Inactive(file): %8lu kB\n"
"Node %d Unevictable: %8lu kB\n"
+ "Node %d IsolatedPages: %8lu kB\n"
"Node %d Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"Node %d HighTotal: %8lu kB\n"
@@ -105,6 +106,7 @@ static ssize_t node_read_meminfo(struct
nid, K(node_page_state(nid, NR_ACTIVE_FILE)),
nid, K(node_page_state(nid, NR_INACTIVE_FILE)),
nid, K(node_page_state(nid, NR_UNEVICTABLE)),
+ nid, K(node_page_state(nid, NR_ISOLATED)),
nid, K(node_page_state(nid, NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
nid, K(i.totalhigh),



2009-07-05 13:02:18

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Sun, Jul 05, 2009 at 08:21:20PM +0800, KOSAKI Motohiro wrote:
> > On Sun, Jul 05, 2009 at 05:26:18PM +0800, KOSAKI Motohiro wrote:
> > > Subject: [PATCH] add NR_ANON_PAGES to OOM log
> > >
> > > show_free_areas can display NR_FILE_PAGES, but it can't display
> > > NR_ANON_PAGES.
> > >
> > > this patch fix its inconsistency.
> > >
> > >
> > > Reported-by: Wu Fengguang <[email protected]>
> > > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > > ---
> > > mm/page_alloc.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > Index: b/mm/page_alloc.c
> > > ===================================================================
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2216,6 +2216,7 @@ void show_free_areas(void)
> > > printk("= %lukB\n", K(total));
> > > }
> > >
> > > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
> > > printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
> >
> > Can we put related items together, ie. this looks more friendly:
> >
> > Anon:XXX active_anon:XXX inactive_anon:XXX
> > File:XXX active_file:XXX inactive_file:XXX
>
> hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
> tmpfs pages are accounted as FILE, but it is stay in anon lru.

Right, that's exactly the reason I propose to put them together: to
make the number of tmpfs pages obvious.

> I think your proposed format easily makes confusion. this format cause to
> imazine Anon = active_anon + inactive_anon.

Yes it may confuse normal users :(

> At least, we need to use another name, I think.

Hmm I find it hard to work out a good name.

But instead, it may be a good idea to explicitly compute the tmpfs
pages, because the excessive use of tmpfs pages could be a common
reason of OOM.

Thanks,
Fengguang

2009-07-05 13:19:57

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

>> > > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
>> > > ? printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
>> >
>> > Can we put related items together, ie. this looks more friendly:
>> >
>> > ? ? ? ? Anon:XXX active_anon:XXX inactive_anon:XXX
>> > ? ? ? ? File:XXX active_file:XXX inactive_file:XXX
>>
>> hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
>> tmpfs pages are accounted as FILE, but it is stay in anon lru.
>
> Right, that's exactly the reason I propose to put them together: to
> make the number of tmpfs pages obvious.
>
>> I think your proposed format easily makes confusion. this format cause to
>> imazine Anon = active_anon + inactive_anon.
>
> Yes it may confuse normal users :(
>
>> At least, we need to use another name, I think.
>
> Hmm I find it hard to work out a good name.
>
> But instead, it may be a good idea to explicitly compute the tmpfs
> pages, because the excessive use of tmpfs pages could be a common
> reason of OOM.

Yeah, explicite tmpfs/shmem accounting is also useful for /proc/meminfo.

2009-07-05 14:51:51

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Sun, Jul 5, 2009 at 9:23 PM, KOSAKI
Motohiro<[email protected]> wrote:
>> On Sun, Jul 05, 2009 at 05:25:32PM +0800, KOSAKI Motohiro wrote:
>> > Subject: [PATCH] add isolate pages vmstat
>> >
>> > If the system have plenty threads or processes, concurrent reclaim can
>> > isolate very much pages.
>> > Unfortunately, current /proc/meminfo and OOM log can't show it.
>> >
>> > This patch provide the way of showing this information.
>> >
>> >
>> > reproduce way
>> > -----------------------
>> > % ./hackbench 140 process 1000
>> >    => couse OOM
>> >
>> > Active_anon:4419 active_file:120 inactive_anon:1418
>> >  inactive_file:61 unevictable:0 isolated:45311
>> >                                          ^^^^^
>> >  dirty:0 writeback:580 unstable:0
>> >  free:27 slab_reclaimable:297 slab_unreclaimable:4050
>> >  mapped:221 kernel_stack:5758 pagetables:28219 bounce:0
>> >
>> >
>> >
>> > Signed-off-by: KOSAKI Motohiro <[email protected]>
>> > ---
>> >  drivers/base/node.c    |    2 ++
>> >  fs/proc/meminfo.c      |    2 ++
>> >  include/linux/mmzone.h |    1 +
>> >  mm/page_alloc.c        |    6 ++++--
>> >  mm/vmscan.c            |    4 ++++
>> >  mm/vmstat.c            |    2 +-
>> >  6 files changed, 14 insertions(+), 3 deletions(-)
>> >
>> > Index: b/fs/proc/meminfo.c
>> > ===================================================================
>> > --- a/fs/proc/meminfo.c
>> > +++ b/fs/proc/meminfo.c
>> > @@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
>> >             "Active(file):   %8lu kB\n"
>> >             "Inactive(file): %8lu kB\n"
>> >             "Unevictable:    %8lu kB\n"
>> > +           "IsolatedPages:  %8lu kB\n"
>> >             "Mlocked:        %8lu kB\n"
>> >  #ifdef CONFIG_HIGHMEM
>> >             "HighTotal:      %8lu kB\n"
>> > @@ -109,6 +110,7 @@ static int meminfo_proc_show(struct seq_
>> >             K(pages[LRU_ACTIVE_FILE]),
>> >             K(pages[LRU_INACTIVE_FILE]),
>> >             K(pages[LRU_UNEVICTABLE]),
>> > +           K(global_page_state(NR_ISOLATED)),
>>
>> Glad to see you renamed it to NR_ISOLATED :)
>> But for the user visible name, how about IsolatedLRU?
>
> Ah, nice.  below is update patch.
>
> Changelog
> ----------------
>  since v1
>    - rename "IsolatedPages" to "IsolatedLRU"
>
>
> =================================
> Subject: [PATCH] add isolate pages vmstat
>
> If the system have plenty threads or processes, concurrent reclaim can
> isolate very much pages.
> Unfortunately, current /proc/meminfo and OOM log can't show it.
>
> This patch provide the way of showing this information.
>
>
> reproduce way
> -----------------------
> % ./hackbench 140 process 1000
>   => couse OOM
>
> Active_anon:4419 active_file:120 inactive_anon:1418
>  inactive_file:61 unevictable:0 isolated:45311
>                                         ^^^^^
>  dirty:0 writeback:580 unstable:0
>  free:27 slab_reclaimable:297 slab_unreclaimable:4050
>  mapped:221 kernel_stack:5758 pagetables:28219 bounce:0
>
>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Acked-by: Wu Fengguang <[email protected]>
> ---
>  drivers/base/node.c    |    2 ++
>  fs/proc/meminfo.c      |    2 ++
>  include/linux/mmzone.h |    1 +
>  mm/page_alloc.c        |    6 ++++--
>  mm/vmscan.c            |    4 ++++
>  mm/vmstat.c            |    2 +-
>  6 files changed, 14 insertions(+), 3 deletions(-)
>
> Index: b/fs/proc/meminfo.c
> ===================================================================
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
>                "Active(file):   %8lu kB\n"
>                "Inactive(file): %8lu kB\n"
>                "Unevictable:    %8lu kB\n"
> +               "IsolatedLRU:    %8lu kB\n"
>                "Mlocked:        %8lu kB\n"
>  #ifdef CONFIG_HIGHMEM
>                "HighTotal:      %8lu kB\n"
> @@ -109,6 +110,7 @@ static int meminfo_proc_show(struct seq_
>                K(pages[LRU_ACTIVE_FILE]),
>                K(pages[LRU_INACTIVE_FILE]),
>                K(pages[LRU_UNEVICTABLE]),
> +               K(global_page_state(NR_ISOLATED)),
>                K(global_page_state(NR_MLOCK)),
>  #ifdef CONFIG_HIGHMEM
>                K(i.totalhigh),
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -100,6 +100,7 @@ enum zone_stat_item {
>        NR_BOUNCE,
>        NR_VMSCAN_WRITE,
>        NR_WRITEBACK_TEMP,      /* Writeback using temporary buffers */
> +       NR_ISOLATED,            /* Temporary isolated pages from lru */
>  #ifdef CONFIG_NUMA
>        NUMA_HIT,               /* allocated in intended node */
>        NUMA_MISS,              /* allocated in non intended node */
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2116,8 +2116,7 @@ void show_free_areas(void)
>        }
>
>        printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> -               " inactive_file:%lu"
> -               " unevictable:%lu"
> +               " inactive_file:%lu unevictable:%lu isolated:%lu\n"

It's good.
I have a one suggestion.

I know this patch came from David's OOM problem a few days ago.

I think total pages isolated of all lru doesn't help us much.
It just represents why [in]active[anon/file] is zero.

How about adding isolate page number per each lru ?

IsolatedPages(file)
IsolatedPages(anon)

It can help knowing exact number of each lru.

--
Kind regards,
Minchan Kim

2009-07-05 15:04:22

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Sun, Jul 5, 2009 at 10:19 PM, KOSAKI
Motohiro<[email protected]> wrote:
>>> > > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
>>> > >   printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
>>> >
>>> > Can we put related items together, ie. this looks more friendly:
>>> >
>>> >         Anon:XXX active_anon:XXX inactive_anon:XXX
>>> >         File:XXX active_file:XXX inactive_file:XXX
>>>
>>> hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
>>> tmpfs pages are accounted as FILE, but it is stay in anon lru.
>>
>> Right, that's exactly the reason I propose to put them together: to
>> make the number of tmpfs pages obvious.
>>
>>> I think your proposed format easily makes confusion. this format cause to
>>> imazine Anon = active_anon + inactive_anon.
>>
>> Yes it may confuse normal users :(
>>
>>> At least, we need to use another name, I think.
>>
>> Hmm I find it hard to work out a good name.
>>
>> But instead, it may be a good idea to explicitly compute the tmpfs
>> pages, because the excessive use of tmpfs pages could be a common
>> reason of OOM.
>
> Yeah,  explicite tmpfs/shmem accounting is also useful for /proc/meminfo.

Do we have to account it explicitly?

If we know the exact isolate pages of each lru,

tmpfs/shmem = (NR_ACTIVE_ANON + NR_INACTIVE_ANON + isolate(anon)) -
NR_ANON_PAGES.

Is there any cases above equation is wrong ?

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



--
Kind regards,
Minchan Kim

2009-07-05 15:16:44

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Sun, Jul 05, 2009 at 11:04:17PM +0800, Minchan Kim wrote:
> On Sun, Jul 5, 2009 at 10:19 PM, KOSAKI
> Motohiro<[email protected]> wrote:
> >>> > > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
> >>> > >   printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
> >>> >
> >>> > Can we put related items together, ie. this looks more friendly:
> >>> >
> >>> >         Anon:XXX active_anon:XXX inactive_anon:XXX
> >>> >         File:XXX active_file:XXX inactive_file:XXX
> >>>
> >>> hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
> >>> tmpfs pages are accounted as FILE, but it is stay in anon lru.
> >>
> >> Right, that's exactly the reason I propose to put them together: to
> >> make the number of tmpfs pages obvious.
> >>
> >>> I think your proposed format easily makes confusion. this format cause to
> >>> imazine Anon = active_anon + inactive_anon.
> >>
> >> Yes it may confuse normal users :(
> >>
> >>> At least, we need to use another name, I think.
> >>
> >> Hmm I find it hard to work out a good name.
> >>
> >> But instead, it may be a good idea to explicitly compute the tmpfs
> >> pages, because the excessive use of tmpfs pages could be a common
> >> reason of OOM.
> >
> > Yeah,  explicite tmpfs/shmem accounting is also useful for /proc/meminfo.
>
> Do we have to account it explicitly?

When OOM happens, one frequent question to ask is: are there too many
tmpfs/shmem pages? Exporting this number makes our oom-message-decoding
life easier :)

> If we know the exact isolate pages of each lru,
>
> tmpfs/shmem = (NR_ACTIVE_ANON + NR_INACTIVE_ANON + isolate(anon)) -
> NR_ANON_PAGES.
>
> Is there any cases above equation is wrong ?

That's right, but the calculation may be too complex (and boring) for
our little brain ;)

Thanks,
Fengguang

2009-07-05 15:27:27

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Mon, Jul 6, 2009 at 12:16 AM, Wu Fengguang<[email protected]> wrote:
> On Sun, Jul 05, 2009 at 11:04:17PM +0800, Minchan Kim wrote:
>> On Sun, Jul 5, 2009 at 10:19 PM, KOSAKI
>> Motohiro<[email protected]> wrote:
>> >>> > > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
>> >>> > >   printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
>> >>> >
>> >>> > Can we put related items together, ie. this looks more friendly:
>> >>> >
>> >>> >         Anon:XXX active_anon:XXX inactive_anon:XXX
>> >>> >         File:XXX active_file:XXX inactive_file:XXX
>> >>>
>> >>> hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
>> >>> tmpfs pages are accounted as FILE, but it is stay in anon lru.
>> >>
>> >> Right, that's exactly the reason I propose to put them together: to
>> >> make the number of tmpfs pages obvious.
>> >>
>> >>> I think your proposed format easily makes confusion. this format cause to
>> >>> imazine Anon = active_anon + inactive_anon.
>> >>
>> >> Yes it may confuse normal users :(
>> >>
>> >>> At least, we need to use another name, I think.
>> >>
>> >> Hmm I find it hard to work out a good name.
>> >>
>> >> But instead, it may be a good idea to explicitly compute the tmpfs
>> >> pages, because the excessive use of tmpfs pages could be a common
>> >> reason of OOM.
>> >
>> > Yeah,  explicite tmpfs/shmem accounting is also useful for /proc/meminfo.
>>
>> Do we have to account it explicitly?
>
> When OOM happens, one frequent question to ask is: are there too many
> tmpfs/shmem pages?  Exporting this number makes our oom-message-decoding
> life easier :)

Indeed.

>> If we know the exact isolate pages of each lru,
>>
>> tmpfs/shmem = (NR_ACTIVE_ANON + NR_INACTIVE_ANON + isolate(anon)) -
>> NR_ANON_PAGES.
>>
>> Is there any cases above equation is wrong ?
>
> That's right, but the calculation may be too complex (and boring) for
> our little brain ;)

Yes. if something is change in future or we miss someting, the above
question may be wrong.
I wanted to remove overhead of new accouting.

Anyway, I think it's not a big cost in normal system.
So If you want to add new accounting, I don't have any objection. :)

> Thanks,
> Fengguang
>



--
Kind regards,
Minchan Kim

2009-07-05 14:16:34

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/5] add buffer cache information to show_free_areas()

On Sun, Jul 5, 2009 at 6:24 PM, KOSAKI
Motohiro<[email protected]> wrote:
> Subject: [PATCH] add buffer cache information to show_free_areas()
>
> When administrator analysis memory shortage reason from OOM log, They
> often need to know rest number of cache like pages.
>
> Then, show_free_areas() shouldn't only display page cache, but also it
> should display buffer cache.
>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
>  mm/page_alloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2118,7 +2118,7 @@ void show_free_areas(void)
>        printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
>                " inactive_file:%lu"
>                " unevictable:%lu"
> -               " dirty:%lu writeback:%lu unstable:%lu\n"
> +               " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
>                " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
>                " mapped:%lu pagetables:%lu bounce:%lu\n",
>                global_page_state(NR_ACTIVE_ANON),
> @@ -2128,6 +2128,7 @@ void show_free_areas(void)
>                global_page_state(NR_UNEVICTABLE),
>                global_page_state(NR_FILE_DIRTY),
>                global_page_state(NR_WRITEBACK),
> +               K(nr_blockdev_pages()),

Why do you show the number with kilobyte unit ?
Others are already number of pages.

Do you have any reason ?


--
Kind regards,
Minchan Kim

2009-07-06 04:24:30

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/5] add buffer cache information to show_free_areas()

> > @@ -2118,7 +2118,7 @@ void show_free_areas(void)
> > ? ? ? ?printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> > ? ? ? ? ? ? ? ?" inactive_file:%lu"
> > ? ? ? ? ? ? ? ?" unevictable:%lu"
> > - ? ? ? ? ? ? ? " dirty:%lu writeback:%lu unstable:%lu\n"
> > + ? ? ? ? ? ? ? " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> > ? ? ? ? ? ? ? ?" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> > ? ? ? ? ? ? ? ?" mapped:%lu pagetables:%lu bounce:%lu\n",
> > ? ? ? ? ? ? ? ?global_page_state(NR_ACTIVE_ANON),
> > @@ -2128,6 +2128,7 @@ void show_free_areas(void)
> > ? ? ? ? ? ? ? ?global_page_state(NR_UNEVICTABLE),
> > ? ? ? ? ? ? ? ?global_page_state(NR_FILE_DIRTY),
> > ? ? ? ? ? ? ? ?global_page_state(NR_WRITEBACK),
> > + ? ? ? ? ? ? ? K(nr_blockdev_pages()),
>
> Why do you show the number with kilobyte unit ?
> Others are already number of pages.
>
> Do you have any reason ?

Good catch. this is simple mistake.
I'll fix it.

2009-07-06 09:28:18

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

> > ? ? ? ?printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> > - ? ? ? ? ? ? ? " inactive_file:%lu"
> > - ? ? ? ? ? ? ? " unevictable:%lu"
> > + ? ? ? ? ? ? ? " inactive_file:%lu unevictable:%lu isolated:%lu\n"
>
> It's good.
> I have a one suggestion.
>
> I know this patch came from David's OOM problem a few days ago.
>
> I think total pages isolated of all lru doesn't help us much.
> It just represents why [in]active[anon/file] is zero.
>
> How about adding isolate page number per each lru ?
>
> IsolatedPages(file)
> IsolatedPages(anon)
>
> It can help knowing exact number of each lru.

Good suggestion!
Will fix.

2009-07-06 11:55:40

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

> > > ? ? ? ?printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> > > - ? ? ? ? ? ? ? " inactive_file:%lu"
> > > - ? ? ? ? ? ? ? " unevictable:%lu"
> > > + ? ? ? ? ? ? ? " inactive_file:%lu unevictable:%lu isolated:%lu\n"
> >
> > It's good.
> > I have a one suggestion.
> >
> > I know this patch came from David's OOM problem a few days ago.
> >
> > I think total pages isolated of all lru doesn't help us much.
> > It just represents why [in]active[anon/file] is zero.
> >
> > How about adding isolate page number per each lru ?
> >
> > IsolatedPages(file)
> > IsolatedPages(anon)
> >
> > It can help knowing exact number of each lru.
>
> Good suggestion!
> Will fix.


New version here.

thanks.


============ CUT HERE ===============
Subject: [PATCH] add isolate pages vmstat

If the system have plenty threads or processes, concurrent reclaim can
isolate very much pages.
Unfortunately, current /proc/meminfo and OOM log can't show it.

This patch provide the way of showing this information.


reproduce way
-----------------------
% ./hackbench 140 process 1000
=> couse OOM

Active_anon:146 active_file:41 inactive_anon:0
inactive_file:0 unevictable:0
isolated_anon:49245 isolated_file:113
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dirty:0 writeback:0 buffer:49 unstable:0
free:184 slab_reclaimable:276 slab_unreclaimable:5492
mapped:87 pagetables:28239 bounce:0


Signed-off-by: KOSAKI Motohiro <[email protected]>
---
drivers/base/node.c | 4 ++++
fs/proc/meminfo.c | 4 ++++
include/linux/mmzone.h | 2 ++
mm/page_alloc.c | 10 ++++++++--
mm/vmscan.c | 5 +++++
mm/vmstat.c | 3 ++-
6 files changed, 25 insertions(+), 3 deletions(-)

Index: b/fs/proc/meminfo.c
===================================================================
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -65,6 +65,8 @@ static int meminfo_proc_show(struct seq_
"Active(file): %8lu kB\n"
"Inactive(file): %8lu kB\n"
"Unevictable: %8lu kB\n"
+ "Isolated(anon): %8lu kB\n"
+ "Isolated(file): %8lu kB\n"
"Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"HighTotal: %8lu kB\n"
@@ -109,6 +111,8 @@ static int meminfo_proc_show(struct seq_
K(pages[LRU_ACTIVE_FILE]),
K(pages[LRU_INACTIVE_FILE]),
K(pages[LRU_UNEVICTABLE]),
+ K(global_page_state(NR_ISOLATED_ANON)),
+ K(global_page_state(NR_ISOLATED_FILE)),
K(global_page_state(NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
K(i.totalhigh),
Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,8 @@ enum zone_stat_item {
NR_BOUNCE,
NR_VMSCAN_WRITE,
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
+ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
+ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
NUMA_MISS, /* allocated in non intended node */
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2116,8 +2116,8 @@ void show_free_areas(void)
}

printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
- " inactive_file:%lu"
- " unevictable:%lu"
+ " inactive_file:%lu unevictable:%lu\n"
+ " isolated_anon:%lu isolated_file:%lu\n"
" dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
" mapped:%lu pagetables:%lu bounce:%lu\n",
@@ -2126,6 +2126,8 @@ void show_free_areas(void)
global_page_state(NR_INACTIVE_ANON),
global_page_state(NR_INACTIVE_FILE),
global_page_state(NR_UNEVICTABLE),
+ global_page_state(NR_ISOLATED_ANON),
+ global_page_state(NR_ISOLATED_FILE),
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
nr_blockdev_pages(),
@@ -2151,6 +2153,8 @@ void show_free_areas(void)
" active_file:%lukB"
" inactive_file:%lukB"
" unevictable:%lukB"
+ " isolated(anon):%lukB"
+ " isolated(file):%lukB"
" present:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
@@ -2176,6 +2180,8 @@ void show_free_areas(void)
K(zone_page_state(zone, NR_ACTIVE_FILE)),
K(zone_page_state(zone, NR_INACTIVE_FILE)),
K(zone_page_state(zone, NR_UNEVICTABLE)),
+ K(zone_page_state(zone, NR_ISOLATED_ANON)),
+ K(zone_page_state(zone, NR_ISOLATED_FILE)),
K(zone->present_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
-count[LRU_ACTIVE_ANON]);
__mod_zone_page_state(zone, NR_INACTIVE_ANON,
-count[LRU_INACTIVE_ANON]);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);

if (scanning_global_lru(sc))
zone->pages_scanned += nr_scan;
@@ -1131,6 +1132,7 @@ static unsigned long shrink_inactive_lis
goto done;

spin_lock(&zone->lru_lock);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
/*
* Put back any unfreeable pages.
*/
@@ -1205,6 +1207,7 @@ static void move_active_pages_to_lru(str
unsigned long pgmoved = 0;
struct pagevec pvec;
struct page *page;
+ int file = is_file_lru(lru);

pagevec_init(&pvec, 1);

@@ -1232,6 +1235,7 @@ static void move_active_pages_to_lru(str
}
}
__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -pgmoved);
if (!is_active_lru(lru))
__count_vm_events(PGDEACTIVATE, pgmoved);
}
@@ -1267,6 +1271,7 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
else
__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, pgmoved);
spin_unlock_irq(&zone->lru_lock);

pgmoved = 0; /* count referenced (mapping) mapped pages */
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -644,7 +644,8 @@ static const char * const vmstat_text[]
"nr_bounce",
"nr_vmscan_write",
"nr_writeback_temp",
-
+ "nr_isolated_anon",
+ "nr_isolated_file",
#ifdef CONFIG_NUMA
"numa_hit",
"numa_miss",
Index: b/drivers/base/node.c
===================================================================
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -73,6 +73,8 @@ static ssize_t node_read_meminfo(struct
"Node %d Active(file): %8lu kB\n"
"Node %d Inactive(file): %8lu kB\n"
"Node %d Unevictable: %8lu kB\n"
+ "Node %d Isolated(anon): %8lu kB\n"
+ "Node %d Isolated(file): %8lu kB\n"
"Node %d Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"Node %d HighTotal: %8lu kB\n"
@@ -105,6 +107,8 @@ static ssize_t node_read_meminfo(struct
nid, K(node_page_state(nid, NR_ACTIVE_FILE)),
nid, K(node_page_state(nid, NR_INACTIVE_FILE)),
nid, K(node_page_state(nid, NR_UNEVICTABLE)),
+ nid, K(node_page_state(nid, NR_ISOLATED_ANON)),
+ nid, K(node_page_state(nid, NR_ISOLATED_FILE)),
nid, K(node_page_state(nid, NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
nid, K(i.totalhigh),



2009-07-07 00:02:54

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat


> ============ CUT HERE ===============
> Subject: [PATCH] add isolate pages vmstat
>
> If the system have plenty threads or processes, concurrent reclaim can
> isolate very much pages.
> Unfortunately, current /proc/meminfo and OOM log can't show it.
>
> This patch provide the way of showing this information.
>
>
> reproduce way
> -----------------------
> % ./hackbench 140 process 1000
> => couse OOM
>
> Active_anon:146 active_file:41 inactive_anon:0
> inactive_file:0 unevictable:0
> isolated_anon:49245 isolated_file:113
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> dirty:0 writeback:0 buffer:49 unstable:0
> free:184 slab_reclaimable:276 slab_unreclaimable:5492
> mapped:87 pagetables:28239 bounce:0
>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> drivers/base/node.c | 4 ++++
> fs/proc/meminfo.c | 4 ++++
> include/linux/mmzone.h | 2 ++
> mm/page_alloc.c | 10 ++++++++--
> mm/vmscan.c | 5 +++++
> mm/vmstat.c | 3 ++-
> 6 files changed, 25 insertions(+), 3 deletions(-)
>
> Index: b/fs/proc/meminfo.c
> ===================================================================
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -65,6 +65,8 @@ static int meminfo_proc_show(struct seq_
> "Active(file): %8lu kB\n"
> "Inactive(file): %8lu kB\n"
> "Unevictable: %8lu kB\n"
> + "Isolated(anon): %8lu kB\n"
> + "Isolated(file): %8lu kB\n"
> "Mlocked: %8lu kB\n"
> #ifdef CONFIG_HIGHMEM
> "HighTotal: %8lu kB\n"
> @@ -109,6 +111,8 @@ static int meminfo_proc_show(struct seq_
> K(pages[LRU_ACTIVE_FILE]),
> K(pages[LRU_INACTIVE_FILE]),
> K(pages[LRU_UNEVICTABLE]),
> + K(global_page_state(NR_ISOLATED_ANON)),
> + K(global_page_state(NR_ISOLATED_FILE)),
> K(global_page_state(NR_MLOCK)),
> #ifdef CONFIG_HIGHMEM
> K(i.totalhigh),
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -100,6 +100,8 @@ enum zone_stat_item {
> NR_BOUNCE,
> NR_VMSCAN_WRITE,
> NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
> + NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
> + NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
> #ifdef CONFIG_NUMA
> NUMA_HIT, /* allocated in intended node */
> NUMA_MISS, /* allocated in non intended node */
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2116,8 +2116,8 @@ void show_free_areas(void)
> }
>
> printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> - " inactive_file:%lu"
> - " unevictable:%lu"
> + " inactive_file:%lu unevictable:%lu\n"
> + " isolated_anon:%lu isolated_file:%lu\n"
> " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> " mapped:%lu pagetables:%lu bounce:%lu\n",
> @@ -2126,6 +2126,8 @@ void show_free_areas(void)
> global_page_state(NR_INACTIVE_ANON),
> global_page_state(NR_INACTIVE_FILE),
> global_page_state(NR_UNEVICTABLE),
> + global_page_state(NR_ISOLATED_ANON),
> + global_page_state(NR_ISOLATED_FILE),
> global_page_state(NR_FILE_DIRTY),
> global_page_state(NR_WRITEBACK),
> nr_blockdev_pages(),
> @@ -2151,6 +2153,8 @@ void show_free_areas(void)
> " active_file:%lukB"
> " inactive_file:%lukB"
> " unevictable:%lukB"
> + " isolated(anon):%lukB"
> + " isolated(file):%lukB"
> " present:%lukB"
> " mlocked:%lukB"
> " dirty:%lukB"
> @@ -2176,6 +2180,8 @@ void show_free_areas(void)
> K(zone_page_state(zone, NR_ACTIVE_FILE)),
> K(zone_page_state(zone, NR_INACTIVE_FILE)),
> K(zone_page_state(zone, NR_UNEVICTABLE)),
> + K(zone_page_state(zone, NR_ISOLATED_ANON)),
> + K(zone_page_state(zone, NR_ISOLATED_FILE)),
> K(zone->present_pages),
> K(zone_page_state(zone, NR_MLOCK)),
> K(zone_page_state(zone, NR_FILE_DIRTY)),
> Index: b/mm/vmscan.c
> ===================================================================
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
> -count[LRU_ACTIVE_ANON]);
> __mod_zone_page_state(zone, NR_INACTIVE_ANON,
> -count[LRU_INACTIVE_ANON]);
> + __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);

Lumpy can reclaim file + anon anywhere.
How about using count[NR_LRU_LISTS]?
--
Kind regards,
Minchan Kim

2009-07-07 00:09:47

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

>
> > ============ CUT HERE ===============
> > Subject: [PATCH] add isolate pages vmstat
> >
> > If the system have plenty threads or processes, concurrent reclaim can
> > isolate very much pages.
> > Unfortunately, current /proc/meminfo and OOM log can't show it.
> >
> > This patch provide the way of showing this information.
> >
> >
> > reproduce way
> > -----------------------
> > % ./hackbench 140 process 1000
> > => couse OOM
> >
> > Active_anon:146 active_file:41 inactive_anon:0
> > inactive_file:0 unevictable:0
> > isolated_anon:49245 isolated_file:113
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > dirty:0 writeback:0 buffer:49 unstable:0
> > free:184 slab_reclaimable:276 slab_unreclaimable:5492
> > mapped:87 pagetables:28239 bounce:0
> >
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > drivers/base/node.c | 4 ++++
> > fs/proc/meminfo.c | 4 ++++
> > include/linux/mmzone.h | 2 ++
> > mm/page_alloc.c | 10 ++++++++--
> > mm/vmscan.c | 5 +++++
> > mm/vmstat.c | 3 ++-
> > 6 files changed, 25 insertions(+), 3 deletions(-)
> >
> > Index: b/fs/proc/meminfo.c
> > ===================================================================
> > --- a/fs/proc/meminfo.c
> > +++ b/fs/proc/meminfo.c
> > @@ -65,6 +65,8 @@ static int meminfo_proc_show(struct seq_
> > "Active(file): %8lu kB\n"
> > "Inactive(file): %8lu kB\n"
> > "Unevictable: %8lu kB\n"
> > + "Isolated(anon): %8lu kB\n"
> > + "Isolated(file): %8lu kB\n"
> > "Mlocked: %8lu kB\n"
> > #ifdef CONFIG_HIGHMEM
> > "HighTotal: %8lu kB\n"
> > @@ -109,6 +111,8 @@ static int meminfo_proc_show(struct seq_
> > K(pages[LRU_ACTIVE_FILE]),
> > K(pages[LRU_INACTIVE_FILE]),
> > K(pages[LRU_UNEVICTABLE]),
> > + K(global_page_state(NR_ISOLATED_ANON)),
> > + K(global_page_state(NR_ISOLATED_FILE)),
> > K(global_page_state(NR_MLOCK)),
> > #ifdef CONFIG_HIGHMEM
> > K(i.totalhigh),
> > Index: b/include/linux/mmzone.h
> > ===================================================================
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -100,6 +100,8 @@ enum zone_stat_item {
> > NR_BOUNCE,
> > NR_VMSCAN_WRITE,
> > NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
> > + NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
> > + NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
> > #ifdef CONFIG_NUMA
> > NUMA_HIT, /* allocated in intended node */
> > NUMA_MISS, /* allocated in non intended node */
> > Index: b/mm/page_alloc.c
> > ===================================================================
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2116,8 +2116,8 @@ void show_free_areas(void)
> > }
> >
> > printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> > - " inactive_file:%lu"
> > - " unevictable:%lu"
> > + " inactive_file:%lu unevictable:%lu\n"
> > + " isolated_anon:%lu isolated_file:%lu\n"
> > " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> > " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> > " mapped:%lu pagetables:%lu bounce:%lu\n",
> > @@ -2126,6 +2126,8 @@ void show_free_areas(void)
> > global_page_state(NR_INACTIVE_ANON),
> > global_page_state(NR_INACTIVE_FILE),
> > global_page_state(NR_UNEVICTABLE),
> > + global_page_state(NR_ISOLATED_ANON),
> > + global_page_state(NR_ISOLATED_FILE),
> > global_page_state(NR_FILE_DIRTY),
> > global_page_state(NR_WRITEBACK),
> > nr_blockdev_pages(),
> > @@ -2151,6 +2153,8 @@ void show_free_areas(void)
> > " active_file:%lukB"
> > " inactive_file:%lukB"
> > " unevictable:%lukB"
> > + " isolated(anon):%lukB"
> > + " isolated(file):%lukB"
> > " present:%lukB"
> > " mlocked:%lukB"
> > " dirty:%lukB"
> > @@ -2176,6 +2180,8 @@ void show_free_areas(void)
> > K(zone_page_state(zone, NR_ACTIVE_FILE)),
> > K(zone_page_state(zone, NR_INACTIVE_FILE)),
> > K(zone_page_state(zone, NR_UNEVICTABLE)),
> > + K(zone_page_state(zone, NR_ISOLATED_ANON)),
> > + K(zone_page_state(zone, NR_ISOLATED_FILE)),
> > K(zone->present_pages),
> > K(zone_page_state(zone, NR_MLOCK)),
> > K(zone_page_state(zone, NR_FILE_DIRTY)),
> > Index: b/mm/vmscan.c
> > ===================================================================
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
> > -count[LRU_ACTIVE_ANON]);
> > __mod_zone_page_state(zone, NR_INACTIVE_ANON,
> > -count[LRU_INACTIVE_ANON]);
> > + __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
>
> Lumpy can reclaim file + anon anywhere.
> How about using count[NR_LRU_LISTS]?

Ah yes, good catch.


2009-07-07 01:20:04

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

> > > Index: b/mm/vmscan.c
> > > ===================================================================
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
> > > -count[LRU_ACTIVE_ANON]);
> > > __mod_zone_page_state(zone, NR_INACTIVE_ANON,
> > > -count[LRU_INACTIVE_ANON]);
> > > + __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
> >
> > Lumpy can reclaim file + anon anywhere.
> > How about using count[NR_LRU_LISTS]?
>
> Ah yes, good catch.

Fixed.

Subject: [PATCH] add isolate pages vmstat

If the system have plenty threads or processes, concurrent reclaim can
isolate very much pages.
Unfortunately, current /proc/meminfo and OOM log can't show it.

This patch provide the way of showing this information.


reproduce way
-----------------------
% ./hackbench 140 process 1000
=> couse OOM

Active_anon:146 active_file:41 inactive_anon:0
inactive_file:0 unevictable:0
isolated_anon:49245 isolated_file:113
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dirty:0 writeback:0 buffer:49 unstable:0
free:184 slab_reclaimable:276 slab_unreclaimable:5492
mapped:87 pagetables:28239 bounce:0


Signed-off-by: KOSAKI Motohiro <[email protected]>
---
drivers/base/node.c | 4 ++++
fs/proc/meminfo.c | 4 ++++
include/linux/mmzone.h | 2 ++
mm/page_alloc.c | 10 ++++++++--
mm/vmscan.c | 13 +++++++++++++
mm/vmstat.c | 3 ++-
6 files changed, 33 insertions(+), 3 deletions(-)

Index: b/fs/proc/meminfo.c
===================================================================
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -65,6 +65,8 @@ static int meminfo_proc_show(struct seq_
"Active(file): %8lu kB\n"
"Inactive(file): %8lu kB\n"
"Unevictable: %8lu kB\n"
+ "Isolated(anon): %8lu kB\n"
+ "Isolated(file): %8lu kB\n"
"Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"HighTotal: %8lu kB\n"
@@ -109,6 +111,8 @@ static int meminfo_proc_show(struct seq_
K(pages[LRU_ACTIVE_FILE]),
K(pages[LRU_INACTIVE_FILE]),
K(pages[LRU_UNEVICTABLE]),
+ K(global_page_state(NR_ISOLATED_ANON)),
+ K(global_page_state(NR_ISOLATED_FILE)),
K(global_page_state(NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
K(i.totalhigh),
Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,8 @@ enum zone_stat_item {
NR_BOUNCE,
NR_VMSCAN_WRITE,
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
+ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
+ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
NUMA_MISS, /* allocated in non intended node */
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2116,8 +2116,8 @@ void show_free_areas(void)
}

printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
- " inactive_file:%lu"
- " unevictable:%lu"
+ " inactive_file:%lu unevictable:%lu\n"
+ " isolated_anon:%lu isolated_file:%lu\n"
" dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
" mapped:%lu pagetables:%lu bounce:%lu\n",
@@ -2126,6 +2126,8 @@ void show_free_areas(void)
global_page_state(NR_INACTIVE_ANON),
global_page_state(NR_INACTIVE_FILE),
global_page_state(NR_UNEVICTABLE),
+ global_page_state(NR_ISOLATED_ANON),
+ global_page_state(NR_ISOLATED_FILE),
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
nr_blockdev_pages(),
@@ -2151,6 +2153,8 @@ void show_free_areas(void)
" active_file:%lukB"
" inactive_file:%lukB"
" unevictable:%lukB"
+ " isolated(anon):%lukB"
+ " isolated(file):%lukB"
" present:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
@@ -2176,6 +2180,8 @@ void show_free_areas(void)
K(zone_page_state(zone, NR_ACTIVE_FILE)),
K(zone_page_state(zone, NR_INACTIVE_FILE)),
K(zone_page_state(zone, NR_UNEVICTABLE)),
+ K(zone_page_state(zone, NR_ISOLATED_ANON)),
+ K(zone_page_state(zone, NR_ISOLATED_FILE)),
K(zone->present_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1067,6 +1067,8 @@ static unsigned long shrink_inactive_lis
unsigned long nr_active;
unsigned int count[NR_LRU_LISTS] = { 0, };
int mode = lumpy_reclaim ? ISOLATE_BOTH : ISOLATE_INACTIVE;
+ unsigned long nr_anon;
+ unsigned long nr_file;

nr_taken = sc->isolate_pages(sc->swap_cluster_max,
&page_list, &nr_scan, sc->order, mode,
@@ -1083,6 +1085,12 @@ static unsigned long shrink_inactive_lis
__mod_zone_page_state(zone, NR_INACTIVE_ANON,
-count[LRU_INACTIVE_ANON]);

+ nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
+ nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
+
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, nr_file);
+
if (scanning_global_lru(sc))
zone->pages_scanned += nr_scan;

@@ -1131,6 +1139,8 @@ static unsigned long shrink_inactive_lis
goto done;

spin_lock(&zone->lru_lock);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
/*
* Put back any unfreeable pages.
*/
@@ -1205,6 +1215,7 @@ static void move_active_pages_to_lru(str
unsigned long pgmoved = 0;
struct pagevec pvec;
struct page *page;
+ int file = is_file_lru(lru);

pagevec_init(&pvec, 1);

@@ -1232,6 +1243,7 @@ static void move_active_pages_to_lru(str
}
}
__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -pgmoved);
if (!is_active_lru(lru))
__count_vm_events(PGDEACTIVATE, pgmoved);
}
@@ -1267,6 +1279,7 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
else
__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, pgmoved);
spin_unlock_irq(&zone->lru_lock);

pgmoved = 0; /* count referenced (mapping) mapped pages */
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -644,7 +644,8 @@ static const char * const vmstat_text[]
"nr_bounce",
"nr_vmscan_write",
"nr_writeback_temp",
-
+ "nr_isolated_anon",
+ "nr_isolated_file",
#ifdef CONFIG_NUMA
"numa_hit",
"numa_miss",
Index: b/drivers/base/node.c
===================================================================
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -73,6 +73,8 @@ static ssize_t node_read_meminfo(struct
"Node %d Active(file): %8lu kB\n"
"Node %d Inactive(file): %8lu kB\n"
"Node %d Unevictable: %8lu kB\n"
+ "Node %d Isolated(anon): %8lu kB\n"
+ "Node %d Isolated(file): %8lu kB\n"
"Node %d Mlocked: %8lu kB\n"
#ifdef CONFIG_HIGHMEM
"Node %d HighTotal: %8lu kB\n"
@@ -105,6 +107,8 @@ static ssize_t node_read_meminfo(struct
nid, K(node_page_state(nid, NR_ACTIVE_FILE)),
nid, K(node_page_state(nid, NR_INACTIVE_FILE)),
nid, K(node_page_state(nid, NR_UNEVICTABLE)),
+ nid, K(node_page_state(nid, NR_ISOLATED_ANON)),
+ nid, K(node_page_state(nid, NR_ISOLATED_FILE)),
nid, K(node_page_state(nid, NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
nid, K(i.totalhigh),




2009-07-07 01:22:59

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

> On Sun, Jul 05, 2009 at 08:21:20PM +0800, KOSAKI Motohiro wrote:
> > > On Sun, Jul 05, 2009 at 05:26:18PM +0800, KOSAKI Motohiro wrote:
> > > > Subject: [PATCH] add NR_ANON_PAGES to OOM log
> > > >
> > > > show_free_areas can display NR_FILE_PAGES, but it can't display
> > > > NR_ANON_PAGES.
> > > >
> > > > this patch fix its inconsistency.
> > > >
> > > >
> > > > Reported-by: Wu Fengguang <[email protected]>
> > > > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > > > ---
> > > > mm/page_alloc.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > Index: b/mm/page_alloc.c
> > > > ===================================================================
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -2216,6 +2216,7 @@ void show_free_areas(void)
> > > > printk("= %lukB\n", K(total));
> > > > }
> > > >
> > > > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
> > > > printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
> > >
> > > Can we put related items together, ie. this looks more friendly:
> > >
> > > Anon:XXX active_anon:XXX inactive_anon:XXX
> > > File:XXX active_file:XXX inactive_file:XXX
> >
> > hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
> > tmpfs pages are accounted as FILE, but it is stay in anon lru.
>
> Right, that's exactly the reason I propose to put them together: to
> make the number of tmpfs pages obvious.

How about this?

==================================================
Subject: [PATCH] add shmem vmstat

Recently, We faced several OOM problem by plenty GEM cache. and generally,
plenty Shmem/Tmpfs potentially makes memory shortage problem.

Then, End-user want to know how much memory used by shmem.


Signed-off-by: KOSAKI Motohiro <[email protected]>
---
drivers/base/node.c | 2 ++
fs/proc/meminfo.c | 2 ++
include/linux/mmzone.h | 1 +
mm/filemap.c | 4 ++++
mm/page_alloc.c | 9 ++++++---
mm/vmstat.c | 1 +
6 files changed, 16 insertions(+), 3 deletions(-)

Index: b/drivers/base/node.c
===================================================================
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -87,6 +87,7 @@ static ssize_t node_read_meminfo(struct
"Node %d FilePages: %8lu kB\n"
"Node %d Mapped: %8lu kB\n"
"Node %d AnonPages: %8lu kB\n"
+ "Node %d Shmem: %8lu kB\n"
"Node %d KernelStack: %8lu kB\n"
"Node %d PageTables: %8lu kB\n"
"Node %d NFS_Unstable: %8lu kB\n"
@@ -121,6 +122,7 @@ static ssize_t node_read_meminfo(struct
nid, K(node_page_state(nid, NR_FILE_PAGES)),
nid, K(node_page_state(nid, NR_FILE_MAPPED)),
nid, K(node_page_state(nid, NR_ANON_PAGES)),
+ nid, K(node_page_state(nid, NR_SHMEM)),
nid, node_page_state(nid, NR_KERNEL_STACK) *
THREAD_SIZE / 1024,
nid, K(node_page_state(nid, NR_PAGETABLE)),
Index: b/fs/proc/meminfo.c
===================================================================
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -83,6 +83,7 @@ static int meminfo_proc_show(struct seq_
"Writeback: %8lu kB\n"
"AnonPages: %8lu kB\n"
"Mapped: %8lu kB\n"
+ "Shmem: %8lu kB\n"
"Slab: %8lu kB\n"
"SReclaimable: %8lu kB\n"
"SUnreclaim: %8lu kB\n"
@@ -129,6 +130,7 @@ static int meminfo_proc_show(struct seq_
K(global_page_state(NR_WRITEBACK)),
K(global_page_state(NR_ANON_PAGES)),
K(global_page_state(NR_FILE_MAPPED)),
+ K(global_page_state(NR_SHMEM)),
K(global_page_state(NR_SLAB_RECLAIMABLE) +
global_page_state(NR_SLAB_UNRECLAIMABLE)),
K(global_page_state(NR_SLAB_RECLAIMABLE)),
Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -102,6 +102,7 @@ enum zone_stat_item {
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
+ NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
NUMA_MISS, /* allocated in non intended node */
Index: b/mm/filemap.c
===================================================================
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -120,6 +120,8 @@ void __remove_from_page_cache(struct pag
page->mapping = NULL;
mapping->nrpages--;
__dec_zone_page_state(page, NR_FILE_PAGES);
+ if (PageSwapBacked(page))
+ __dec_zone_page_state(page, NR_SHMEM);
BUG_ON(page_mapped(page));

/*
@@ -476,6 +478,8 @@ int add_to_page_cache_locked(struct page
if (likely(!error)) {
mapping->nrpages++;
__inc_zone_page_state(page, NR_FILE_PAGES);
+ if (PageSwapBacked(page))
+ __inc_zone_page_state(page, NR_SHMEM);
spin_unlock_irq(&mapping->tree_lock);
} else {
page->mapping = NULL;
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -646,6 +646,7 @@ static const char * const vmstat_text[]
"nr_writeback_temp",
"nr_isolated_anon",
"nr_isolated_file",
+ "nr_shmem",
#ifdef CONFIG_NUMA
"numa_hit",
"numa_miss",
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2118,9 +2118,9 @@ void show_free_areas(void)
printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
" inactive_file:%lu unevictable:%lu\n"
" isolated_anon:%lu isolated_file:%lu\n"
- " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
+ " dirty:%lu writeback:%lu buffer:%lu shmem:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
- " mapped:%lu pagetables:%lu bounce:%lu\n",
+ " mapped:%lu pagetables:%lu unstable:%lu bounce:%lu\n",
global_page_state(NR_ACTIVE_ANON),
global_page_state(NR_ACTIVE_FILE),
global_page_state(NR_INACTIVE_ANON),
@@ -2131,12 +2131,13 @@ void show_free_areas(void)
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
nr_blockdev_pages(),
- global_page_state(NR_UNSTABLE_NFS),
+ global_page_state(NR_SHMEM),
global_page_state(NR_FREE_PAGES),
global_page_state(NR_SLAB_RECLAIMABLE),
global_page_state(NR_SLAB_UNRECLAIMABLE),
global_page_state(NR_FILE_MAPPED),
global_page_state(NR_PAGETABLE),
+ global_page_state(NR_UNSTABLE_NFS),
global_page_state(NR_BOUNCE));

for_each_populated_zone(zone) {
@@ -2160,6 +2161,7 @@ void show_free_areas(void)
" dirty:%lukB"
" writeback:%lukB"
" mapped:%lukB"
+ " shmem:%lukB"
" slab_reclaimable:%lukB"
" slab_unreclaimable:%lukB"
" kernel_stack:%lukB"
@@ -2187,6 +2189,7 @@ void show_free_areas(void)
K(zone_page_state(zone, NR_FILE_DIRTY)),
K(zone_page_state(zone, NR_WRITEBACK)),
K(zone_page_state(zone, NR_FILE_MAPPED)),
+ K(zone_page_state(zone, NR_SHMEM)),
K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
zone_page_state(zone, NR_KERNEL_STACK) *



2009-07-07 01:48:20

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

It looks good to me.
Thanks for your effort. I added my review sign. :)

Let remain one side note.
This accounting feature results from direct reclaim bomb.
If we prevent direct reclaim bomb, I think this feature can be removed.

As I know, Rik or Wu is making patch for throttling direct reclaim.

On Tue, 7 Jul 2009 10:19:53 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > > > Index: b/mm/vmscan.c
> > > > ===================================================================
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
> > > > -count[LRU_ACTIVE_ANON]);
> > > > __mod_zone_page_state(zone, NR_INACTIVE_ANON,
> > > > -count[LRU_INACTIVE_ANON]);
> > > > + __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
> > >
> > > Lumpy can reclaim file + anon anywhere.
> > > How about using count[NR_LRU_LISTS]?
> >
> > Ah yes, good catch.
>
> Fixed.
>
> Subject: [PATCH] add isolate pages vmstat
>
> If the system have plenty threads or processes, concurrent reclaim can
> isolate very much pages.
> Unfortunately, current /proc/meminfo and OOM log can't show it.
>
> This patch provide the way of showing this information.
>
>
> reproduce way
> -----------------------
> % ./hackbench 140 process 1000
> => couse OOM
>
> Active_anon:146 active_file:41 inactive_anon:0
> inactive_file:0 unevictable:0
> isolated_anon:49245 isolated_file:113
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> dirty:0 writeback:0 buffer:49 unstable:0
> free:184 slab_reclaimable:276 slab_unreclaimable:5492
> mapped:87 pagetables:28239 bounce:0
>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

--
Kind regards,
Minchan Kim

2009-07-07 02:12:42

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

> It looks good to me.
> Thanks for your effort. I added my review sign. :)
>
> Let remain one side note.
> This accounting feature results from direct reclaim bomb.
> If we prevent direct reclaim bomb, I think this feature can be removed.

Hmmm. I disagree.
isolated pages can become more than >1GB on server systems.
Who want >1GB unaccountable memory?



>
> As I know, Rik or Wu is making patch for throttling direct reclaim.
>
> On Tue, 7 Jul 2009 10:19:53 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > > > Index: b/mm/vmscan.c
> > > > > ===================================================================
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
> > > > > -count[LRU_ACTIVE_ANON]);
> > > > > __mod_zone_page_state(zone, NR_INACTIVE_ANON,
> > > > > -count[LRU_INACTIVE_ANON]);
> > > > > + __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
> > > >
> > > > Lumpy can reclaim file + anon anywhere.
> > > > How about using count[NR_LRU_LISTS]?
> > >
> > > Ah yes, good catch.
> >
> > Fixed.
> >
> > Subject: [PATCH] add isolate pages vmstat
> >
> > If the system have plenty threads or processes, concurrent reclaim can
> > isolate very much pages.
> > Unfortunately, current /proc/meminfo and OOM log can't show it.
> >
> > This patch provide the way of showing this information.
> >
> >
> > reproduce way
> > -----------------------
> > % ./hackbench 140 process 1000
> > => couse OOM
> >
> > Active_anon:146 active_file:41 inactive_anon:0
> > inactive_file:0 unevictable:0
> > isolated_anon:49245 isolated_file:113
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > dirty:0 writeback:0 buffer:49 unstable:0
> > free:184 slab_reclaimable:276 slab_unreclaimable:5492
> > mapped:87 pagetables:28239 bounce:0
> >
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: Minchan Kim <[email protected]>
>
> --
> Kind regards,
> Minchan Kim
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>


2009-07-07 02:31:29

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Tue, 7 Jul 2009 11:12:33 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > It looks good to me.
> > Thanks for your effort. I added my review sign. :)
> >
> > Let remain one side note.
> > This accounting feature results from direct reclaim bomb.
> > If we prevent direct reclaim bomb, I think this feature can be removed.
>
> Hmmm. I disagree.
> isolated pages can become more than >1GB on server systems.
> Who want >1GB unaccountable memory?

I am okay if it happens without reclaim bomb on server system which
have a lot of memory as you said.

--
Kind regards,
Minchan Kim

2009-07-07 03:03:22

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

Minchan Kim wrote:
> It looks good to me.
> Thanks for your effort. I added my review sign. :)
>
> Let remain one side note.
> This accounting feature results from direct reclaim bomb.
> If we prevent direct reclaim bomb, I think this feature can be removed.
>
> As I know, Rik or Wu is making patch for throttling direct reclaim.

My plan is to build the patch on top of these patches,
so I'm waiting for them to settle :)

Acked-by: Rik van Riel <[email protected]>

--
All rights reversed.

2009-07-07 10:55:19

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Tue, Jul 07, 2009 at 09:22:48AM +0800, KOSAKI Motohiro wrote:
> > On Sun, Jul 05, 2009 at 08:21:20PM +0800, KOSAKI Motohiro wrote:
> > > > On Sun, Jul 05, 2009 at 05:26:18PM +0800, KOSAKI Motohiro wrote:
> > > > > Subject: [PATCH] add NR_ANON_PAGES to OOM log
> > > > >
> > > > > show_free_areas can display NR_FILE_PAGES, but it can't display
> > > > > NR_ANON_PAGES.
> > > > >
> > > > > this patch fix its inconsistency.
> > > > >
> > > > >
> > > > > Reported-by: Wu Fengguang <[email protected]>
> > > > > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > > > > ---
> > > > > mm/page_alloc.c | 1 +
> > > > > 1 file changed, 1 insertion(+)
> > > > >
> > > > > Index: b/mm/page_alloc.c
> > > > > ===================================================================
> > > > > --- a/mm/page_alloc.c
> > > > > +++ b/mm/page_alloc.c
> > > > > @@ -2216,6 +2216,7 @@ void show_free_areas(void)
> > > > > printk("= %lukB\n", K(total));
> > > > > }
> > > > >
> > > > > + printk("%ld total anon pages\n", global_page_state(NR_ANON_PAGES));
> > > > > printk("%ld total pagecache pages\n", global_page_state(NR_FILE_PAGES));
> > > >
> > > > Can we put related items together, ie. this looks more friendly:
> > > >
> > > > Anon:XXX active_anon:XXX inactive_anon:XXX
> > > > File:XXX active_file:XXX inactive_file:XXX
> > >
> > > hmmm. Actually NR_ACTIVE_ANON + NR_INACTIVE_ANON != NR_ANON_PAGES.
> > > tmpfs pages are accounted as FILE, but it is stay in anon lru.
> >
> > Right, that's exactly the reason I propose to put them together: to
> > make the number of tmpfs pages obvious.
>
> How about this?
>
> ==================================================
> Subject: [PATCH] add shmem vmstat
>
> Recently, We faced several OOM problem by plenty GEM cache. and generally,
> plenty Shmem/Tmpfs potentially makes memory shortage problem.
>
> Then, End-user want to know how much memory used by shmem.

Thanks for doing this. I think it's convenient to export shmem/tmpfs
pages in the /proc interfaces.

I noticed that you ignored migrate_page_move_mapping() which may move
the file page from one zone to another. Another question is, why you
choose to maintain one more ZVC counter instead of computing it from
the existing counters? Ie. Minchan's equation tmpfs/shmem =
(NR_ACTIVE_ANON + NR_INACTIVE_ANON + isolate(anon)) - NR_ANON_PAGES.
The reason should at least be mentioned in the changelog.

Thanks,
Fengguang

> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> drivers/base/node.c | 2 ++
> fs/proc/meminfo.c | 2 ++
> include/linux/mmzone.h | 1 +
> mm/filemap.c | 4 ++++
> mm/page_alloc.c | 9 ++++++---
> mm/vmstat.c | 1 +
> 6 files changed, 16 insertions(+), 3 deletions(-)
>
> Index: b/drivers/base/node.c
> ===================================================================
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -87,6 +87,7 @@ static ssize_t node_read_meminfo(struct
> "Node %d FilePages: %8lu kB\n"
> "Node %d Mapped: %8lu kB\n"
> "Node %d AnonPages: %8lu kB\n"
> + "Node %d Shmem: %8lu kB\n"
> "Node %d KernelStack: %8lu kB\n"
> "Node %d PageTables: %8lu kB\n"
> "Node %d NFS_Unstable: %8lu kB\n"
> @@ -121,6 +122,7 @@ static ssize_t node_read_meminfo(struct
> nid, K(node_page_state(nid, NR_FILE_PAGES)),
> nid, K(node_page_state(nid, NR_FILE_MAPPED)),
> nid, K(node_page_state(nid, NR_ANON_PAGES)),
> + nid, K(node_page_state(nid, NR_SHMEM)),
> nid, node_page_state(nid, NR_KERNEL_STACK) *
> THREAD_SIZE / 1024,
> nid, K(node_page_state(nid, NR_PAGETABLE)),
> Index: b/fs/proc/meminfo.c
> ===================================================================
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -83,6 +83,7 @@ static int meminfo_proc_show(struct seq_
> "Writeback: %8lu kB\n"
> "AnonPages: %8lu kB\n"
> "Mapped: %8lu kB\n"
> + "Shmem: %8lu kB\n"
> "Slab: %8lu kB\n"
> "SReclaimable: %8lu kB\n"
> "SUnreclaim: %8lu kB\n"
> @@ -129,6 +130,7 @@ static int meminfo_proc_show(struct seq_
> K(global_page_state(NR_WRITEBACK)),
> K(global_page_state(NR_ANON_PAGES)),
> K(global_page_state(NR_FILE_MAPPED)),
> + K(global_page_state(NR_SHMEM)),
> K(global_page_state(NR_SLAB_RECLAIMABLE) +
> global_page_state(NR_SLAB_UNRECLAIMABLE)),
> K(global_page_state(NR_SLAB_RECLAIMABLE)),
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -102,6 +102,7 @@ enum zone_stat_item {
> NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
> NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
> NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
> + NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */
> #ifdef CONFIG_NUMA
> NUMA_HIT, /* allocated in intended node */
> NUMA_MISS, /* allocated in non intended node */
> Index: b/mm/filemap.c
> ===================================================================
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -120,6 +120,8 @@ void __remove_from_page_cache(struct pag
> page->mapping = NULL;
> mapping->nrpages--;
> __dec_zone_page_state(page, NR_FILE_PAGES);
> + if (PageSwapBacked(page))
> + __dec_zone_page_state(page, NR_SHMEM);
> BUG_ON(page_mapped(page));
>
> /*
> @@ -476,6 +478,8 @@ int add_to_page_cache_locked(struct page
> if (likely(!error)) {
> mapping->nrpages++;
> __inc_zone_page_state(page, NR_FILE_PAGES);
> + if (PageSwapBacked(page))
> + __inc_zone_page_state(page, NR_SHMEM);
> spin_unlock_irq(&mapping->tree_lock);
> } else {
> page->mapping = NULL;
> Index: b/mm/vmstat.c
> ===================================================================
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -646,6 +646,7 @@ static const char * const vmstat_text[]
> "nr_writeback_temp",
> "nr_isolated_anon",
> "nr_isolated_file",
> + "nr_shmem",
> #ifdef CONFIG_NUMA
> "numa_hit",
> "numa_miss",
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2118,9 +2118,9 @@ void show_free_areas(void)
> printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> " inactive_file:%lu unevictable:%lu\n"
> " isolated_anon:%lu isolated_file:%lu\n"
> - " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> + " dirty:%lu writeback:%lu buffer:%lu shmem:%lu\n"
> " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> - " mapped:%lu pagetables:%lu bounce:%lu\n",
> + " mapped:%lu pagetables:%lu unstable:%lu bounce:%lu\n",
> global_page_state(NR_ACTIVE_ANON),
> global_page_state(NR_ACTIVE_FILE),
> global_page_state(NR_INACTIVE_ANON),
> @@ -2131,12 +2131,13 @@ void show_free_areas(void)
> global_page_state(NR_FILE_DIRTY),
> global_page_state(NR_WRITEBACK),
> nr_blockdev_pages(),
> - global_page_state(NR_UNSTABLE_NFS),
> + global_page_state(NR_SHMEM),
> global_page_state(NR_FREE_PAGES),
> global_page_state(NR_SLAB_RECLAIMABLE),
> global_page_state(NR_SLAB_UNRECLAIMABLE),
> global_page_state(NR_FILE_MAPPED),
> global_page_state(NR_PAGETABLE),
> + global_page_state(NR_UNSTABLE_NFS),
> global_page_state(NR_BOUNCE));
>
> for_each_populated_zone(zone) {
> @@ -2160,6 +2161,7 @@ void show_free_areas(void)
> " dirty:%lukB"
> " writeback:%lukB"
> " mapped:%lukB"
> + " shmem:%lukB"
> " slab_reclaimable:%lukB"
> " slab_unreclaimable:%lukB"
> " kernel_stack:%lukB"
> @@ -2187,6 +2189,7 @@ void show_free_areas(void)
> K(zone_page_state(zone, NR_FILE_DIRTY)),
> K(zone_page_state(zone, NR_WRITEBACK)),
> K(zone_page_state(zone, NR_FILE_MAPPED)),
> + K(zone_page_state(zone, NR_SHMEM)),
> K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
> K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
> zone_page_state(zone, NR_KERNEL_STACK) *
>
>
>

2009-07-07 13:51:46

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Tue, Jul 07, 2009 at 09:19:53AM +0800, KOSAKI Motohiro wrote:
> > > > Index: b/mm/vmscan.c
> > > > ===================================================================
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -1082,6 +1082,7 @@ static unsigned long shrink_inactive_lis
> > > > -count[LRU_ACTIVE_ANON]);
> > > > __mod_zone_page_state(zone, NR_INACTIVE_ANON,
> > > > -count[LRU_INACTIVE_ANON]);
> > > > + __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
> > >
> > > Lumpy can reclaim file + anon anywhere.
> > > How about using count[NR_LRU_LISTS]?
> >
> > Ah yes, good catch.
>
> Fixed.
>
> Subject: [PATCH] add isolate pages vmstat
>
> If the system have plenty threads or processes, concurrent reclaim can
> isolate very much pages.
> Unfortunately, current /proc/meminfo and OOM log can't show it.
>
> This patch provide the way of showing this information.

Acked-by: Wu Fengguang <[email protected]>

> printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> - " inactive_file:%lu"
> - " unevictable:%lu"
> + " inactive_file:%lu unevictable:%lu\n"
> + " isolated_anon:%lu isolated_file:%lu\n"

How about
active_anon inactive_anon isolated_anon
active_file inactive_file isolated_file
?

Thanks,
Fengguang

2009-07-07 13:57:30

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Tue, Jul 07, 2009 at 09:22:48AM +0800, KOSAKI Motohiro wrote:
> > On Sun, Jul 05, 2009 at 08:21:20PM +0800, KOSAKI Motohiro wrote:
> > > > On Sun, Jul 05, 2009 at 05:26:18PM +0800, KOSAKI Motohiro wrote:

> @@ -2118,9 +2118,9 @@ void show_free_areas(void)
> printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> " inactive_file:%lu unevictable:%lu\n"
> " isolated_anon:%lu isolated_file:%lu\n"
> - " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> + " dirty:%lu writeback:%lu buffer:%lu shmem:%lu\n"

btw, nfs unstable pages are related to writeback pages, so it may be
better to put "unstable" right after "writeback" (as it was)?

Thanks,
Fengguang


> " free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> - " mapped:%lu pagetables:%lu bounce:%lu\n",
> + " mapped:%lu pagetables:%lu unstable:%lu bounce:%lu\n",
> global_page_state(NR_ACTIVE_ANON),
> global_page_state(NR_ACTIVE_FILE),
> global_page_state(NR_INACTIVE_ANON),
> @@ -2131,12 +2131,13 @@ void show_free_areas(void)
> global_page_state(NR_FILE_DIRTY),
> global_page_state(NR_WRITEBACK),
> nr_blockdev_pages(),
> - global_page_state(NR_UNSTABLE_NFS),
> + global_page_state(NR_SHMEM),
> global_page_state(NR_FREE_PAGES),
> global_page_state(NR_SLAB_RECLAIMABLE),
> global_page_state(NR_SLAB_UNRECLAIMABLE),
> global_page_state(NR_FILE_MAPPED),
> global_page_state(NR_PAGETABLE),
> + global_page_state(NR_UNSTABLE_NFS),
> global_page_state(NR_BOUNCE));

2009-07-07 16:33:46

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/5] add per-zone statistics to show_free_areas()

On Sun, 5 Jul 2009, KOSAKI Motohiro wrote:

> Subject: [PATCH] add per-zone statistics to show_free_areas()
>
> Currently, show_free_area() mainly display system memory usage. but it
> doesn't display per-zone memory usage information.

An attempt to rewrite the description:

show_free_areas() displays only a limited amount of zone counters. This
patch includes additional counters in the display to allow easier
debugging. This may be especially useful if an OOM is due to running out
of DMA memory.

Reviewed-by: Christoph Lameter <[email protected]>

2009-07-07 16:37:47

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 3/5] Show kernel stack usage to /proc/meminfo and OOM log

On Sun, 5 Jul 2009, KOSAKI Motohiro wrote:

> Subject: [PATCH] Show kernel stack usage to /proc/meminfo and OOM log
>
> if the system have a lot of thread, kernel stack consume unignorable large size
> memory. IOW, it make a lot of unaccountable memory.
> Tons unaccountable memory bring to harder analyse memory related trouble.
>
> Then, kernel stack account is useful.

The amount of memory allocated to kernel stacks can become significant and
cause OOM conditions. However, we do not display the amount of memory
consumed by stacks.'

Add code to display the amount of memory used for stacks in /proc/meminfo.

Reviewed-by: <[email protected]>

(It may be useful to also include the stack sizes in the per zone
information displayed when an OOM occurs).

2009-07-07 16:47:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Sun, 5 Jul 2009, KOSAKI Motohiro wrote:

> mm/vmstat.c | 2 +-
> 6 files changed, 14 insertions(+), 3 deletions(-)
>
> Index: b/fs/proc/meminfo.c
> ===================================================================
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
> "Active(file): %8lu kB\n"
> "Inactive(file): %8lu kB\n"
> "Unevictable: %8lu kB\n"
> + "IsolatedPages: %8lu kB\n"

Why is it called isolatedpages when we display the amount of memory in
kilobytes?

2009-07-07 16:50:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Tue, 7 Jul 2009, KOSAKI Motohiro wrote:

> +++ b/include/linux/mmzone.h
> @@ -100,6 +100,8 @@ enum zone_stat_item {
> NR_BOUNCE,
> NR_VMSCAN_WRITE,
> NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
> + NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
> + NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */

LRU counters are rarer in use then the counters used for dirty pages etc.

Could you move the counters for reclaim into a separate cacheline?

2009-07-07 16:53:35

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Mon, 6 Jul 2009, Minchan Kim wrote:

> Anyway, I think it's not a big cost in normal system.
> So If you want to add new accounting, I don't have any objection. :)

Lets keep the counters to a mininum. If we can calculate the values from
something else then there is no justification for a new counter.

A new counter increases the size of the per cpu structures that exist for
each zone and each cpu. 1 byte gets multiplies by the number of cpus and
that gets multiplied by the number of zones.

2009-07-07 17:24:16

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

Christoph Lameter wrote:
> On Tue, 7 Jul 2009, KOSAKI Motohiro wrote:
>
>> +++ b/include/linux/mmzone.h
>> @@ -100,6 +100,8 @@ enum zone_stat_item {
>> NR_BOUNCE,
>> NR_VMSCAN_WRITE,
>> NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
>> + NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
>> + NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
>
> LRU counters are rarer in use then the counters used for dirty pages etc.
>
> Could you move the counters for reclaim into a separate cacheline?

I don't get the point of that - these counters are
per-cpu anyway, so why would they need to be in a
separate cacheline?

2009-07-07 23:33:17

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Tue, 7 Jul 2009, Rik van Riel wrote:

> Christoph Lameter wrote:
> > On Tue, 7 Jul 2009, KOSAKI Motohiro wrote:
> >
> > > +++ b/include/linux/mmzone.h
> > > @@ -100,6 +100,8 @@ enum zone_stat_item {
> > > NR_BOUNCE,
> > > NR_VMSCAN_WRITE,
> > > NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
> > > + NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
> > > + NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
> >
> > LRU counters are rarer in use then the counters used for dirty pages etc.
> >
> > Could you move the counters for reclaim into a separate cacheline?
>
> I don't get the point of that - these counters are
> per-cpu anyway, so why would they need to be in a
> separate cacheline?

Because there are so many counters now that they spread multiple
cachelines. PCP data is very performance sensitive. Putting them in a
separate cacheline so that the most important counters are in the
first one will reduce the cache footprint of many core VM functions.

2009-07-08 01:44:46

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Wed, Jul 08, 2009 at 12:46:54AM +0800, Christoph Lameter wrote:
> On Sun, 5 Jul 2009, KOSAKI Motohiro wrote:
>
> > mm/vmstat.c | 2 +-
> > 6 files changed, 14 insertions(+), 3 deletions(-)
> >
> > Index: b/fs/proc/meminfo.c
> > ===================================================================
> > --- a/fs/proc/meminfo.c
> > +++ b/fs/proc/meminfo.c
> > @@ -65,6 +65,7 @@ static int meminfo_proc_show(struct seq_
> > "Active(file): %8lu kB\n"
> > "Inactive(file): %8lu kB\n"
> > "Unevictable: %8lu kB\n"
> > + "IsolatedPages: %8lu kB\n"
>
> Why is it called isolatedpages when we display the amount of memory in
> kilobytes?

See following emails. This has been changed to "IsolatedLRU" and then
"Isolated(file)/Isolated(anon)".

Thanks,
Fengguang

2009-07-09 02:13:39

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 3/5] Show kernel stack usage to /proc/meminfo and OOM log

> On Sun, 5 Jul 2009, KOSAKI Motohiro wrote:
>
> > Subject: [PATCH] Show kernel stack usage to /proc/meminfo and OOM log
> >
> > if the system have a lot of thread, kernel stack consume unignorable large size
> > memory. IOW, it make a lot of unaccountable memory.
> > Tons unaccountable memory bring to harder analyse memory related trouble.
> >
> > Then, kernel stack account is useful.
>
> The amount of memory allocated to kernel stacks can become significant and
> cause OOM conditions. However, we do not display the amount of memory
> consumed by stacks.'
>
> Add code to display the amount of memory used for stacks in /proc/meminfo.
>
> Reviewed-by: <[email protected]>

Thanks.
I'll fix the description.


> (It may be useful to also include the stack sizes in the per zone
> information displayed when an OOM occurs).

following code in this patch mean display per-zone stack size, no?



> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2158,6 +2158,7 @@ void show_free_areas(void)
> " mapped:%lukB"
> " slab_reclaimable:%lukB"
> " slab_unreclaimable:%lukB"
> + " kernel_stack:%lukB"
> " pagetables:%lukB"
> " unstable:%lukB"
> " bounce:%lukB"
> @@ -2182,6 +2183,8 @@ void show_free_areas(void)
> K(zone_page_state(zone, NR_FILE_MAPPED)),
> K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
> K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
> + zone_page_state(zone, NR_KERNEL_STACK) *
> + THREAD_SIZE / 1024,
> K(zone_page_state(zone, NR_PAGETABLE)),
> K(zone_page_state(zone, NR_UNSTABLE_NFS)),
> K(zone_page_state(zone, NR_BOUNCE)),

2009-07-09 02:19:29

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

> On Tue, 7 Jul 2009, KOSAKI Motohiro wrote:
>
> > +++ b/include/linux/mmzone.h
> > @@ -100,6 +100,8 @@ enum zone_stat_item {
> > NR_BOUNCE,
> > NR_VMSCAN_WRITE,
> > NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
> > + NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
> > + NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
>
> LRU counters are rarer in use then the counters used for dirty pages etc.
>
> Could you move the counters for reclaim into a separate cacheline?
>

Current definition is here.

dirty pages and other frequently used counter stay in first cache line.
NR_ISOLATED_(ANON|FILE) and other unfrequently used counter stay in second
cache line.

Do you mean we shouldn't use zone_stat_item for it?


---------------------------------------------------------
enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
NR_LRU_BASE,
NR_INACTIVE_ANON = NR_LRU_BASE, /* must match order of LRU_[IN]ACTIVE */
NR_ACTIVE_ANON, /* " " " " " */
NR_INACTIVE_FILE, /* " " " " " */
NR_ACTIVE_FILE, /* " " " " " */
NR_UNEVICTABLE, /* " " " " " */
NR_MLOCK, /* mlock()ed pages found and moved off LRU */
NR_ANON_PAGES, /* Mapped anonymous pages */
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
only modified from process context */
NR_FILE_PAGES,
NR_FILE_DIRTY,
NR_WRITEBACK,
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
NR_KERNEL_STACK,
/* Second 128 byte cacheline */
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_BOUNCE,
NR_VMSCAN_WRITE,
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
NUMA_MISS, /* allocated in non intended node */
NUMA_FOREIGN, /* was intended here, hit elsewhere */
NUMA_INTERLEAVE_HIT, /* interleaver preferred this zone */
NUMA_LOCAL, /* allocation from local node */
NUMA_OTHER, /* allocation from other node */
#endif
NR_VM_ZONE_STAT_ITEMS };




2009-07-09 05:14:03

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

> On Tue, Jul 07, 2009 at 09:22:48AM +0800, KOSAKI Motohiro wrote:
> > > On Sun, Jul 05, 2009 at 08:21:20PM +0800, KOSAKI Motohiro wrote:
> > > > > On Sun, Jul 05, 2009 at 05:26:18PM +0800, KOSAKI Motohiro wrote:
>
> > @@ -2118,9 +2118,9 @@ void show_free_areas(void)
> > printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
> > " inactive_file:%lu unevictable:%lu\n"
> > " isolated_anon:%lu isolated_file:%lu\n"
> > - " dirty:%lu writeback:%lu buffer:%lu unstable:%lu\n"
> > + " dirty:%lu writeback:%lu buffer:%lu shmem:%lu\n"
>
> btw, nfs unstable pages are related to writeback pages, so it may be
> better to put "unstable" right after "writeback" (as it was)?

OK, will fix.


2009-07-09 05:50:25

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

> On Mon, 6 Jul 2009, Minchan Kim wrote:
>
> > Anyway, I think it's not a big cost in normal system.
> > So If you want to add new accounting, I don't have any objection. :)
>
> Lets keep the counters to a mininum. If we can calculate the values from
> something else then there is no justification for a new counter.
>
> A new counter increases the size of the per cpu structures that exist for
> each zone and each cpu. 1 byte gets multiplies by the number of cpus and
> that gets multiplied by the number of zones.

OK. I'll implement this idea.


2009-07-09 07:20:42

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

> > On Mon, 6 Jul 2009, Minchan Kim wrote:
> >
> > > Anyway, I think it's not a big cost in normal system.
> > > So If you want to add new accounting, I don't have any objection. :)
> >
> > Lets keep the counters to a mininum. If we can calculate the values from
> > something else then there is no justification for a new counter.
> >
> > A new counter increases the size of the per cpu structures that exist for
> > each zone and each cpu. 1 byte gets multiplies by the number of cpus and
> > that gets multiplied by the number of zones.
>
> OK. I'll implement this idea.

Grr, sorry I cancel this opinion. Shem pages can't be calculated
by minchan's formula.

if those page are mlocked, the page move to unevictable lru. then
this calculation don't account mlocked page. However mlocked tmpfs pages
also make OOM issue.


2009-07-09 10:28:03

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 5/5] add NR_ANON_PAGES to OOM log

On Thu, Jul 9, 2009 at 4:20 PM, KOSAKI
Motohiro<[email protected]> wrote:
>> > On Mon, 6 Jul 2009, Minchan Kim wrote:
>> >
>> > > Anyway, I think it's not a big cost in normal system.
>> > > So If you want to add new accounting, I don't have any objection. :)
>> >
>> > Lets keep the counters to a mininum. If we can calculate the values from
>> > something else then there is no justification for a new counter.
>> >
>> > A new counter increases the size of the per cpu structures that exist for
>> > each zone and each cpu. 1 byte gets multiplies by the number of cpus and
>> > that gets multiplied by the number of zones.
>>
>> OK. I'll implement this idea.
>
> Grr, sorry I cancel this opinion. Shem pages can't be calculated
> by minchan's formula.
>
> if those page are mlocked, the page move to unevictable lru. then
> this calculation don't account mlocked page. However mlocked tmpfs pages
> also make OOM issue.

Absolutely. You're right.
But In my opinion, mlocked shmem pages are important ?
Now we care only number of unevictable pages but don't care what kinds
of pages there are in unevictable list.

What we need is to decode OOM more easily.
I think what kinds of pages there area unevictable lur list is not important.

--
Kind regards,
Minchan Kim

2009-07-09 21:00:24

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 4/5] add isolate pages vmstat

On Thu, 9 Jul 2009, KOSAKI Motohiro wrote:

> >
> > Could you move the counters for reclaim into a separate cacheline?
> >
>
> Current definition is here.
>
> dirty pages and other frequently used counter stay in first cache line.
> NR_ISOLATED_(ANON|FILE) and other unfrequently used counter stay in second
> cache line.
>
> Do you mean we shouldn't use zone_stat_item for it?

No there is really no alternative to it.

Just be aware that what you may increases the cache footprint of key
functions in the vm. Some regression tests would be useful (do a page
fault test etc).

2009-07-09 21:00:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 3/5] Show kernel stack usage to /proc/meminfo and OOM log

On Thu, 9 Jul 2009, KOSAKI Motohiro wrote:

> following code in this patch mean display per-zone stack size, no?

Right.