2013-10-16 10:42:33

by Mel Gorman

[permalink] [raw]
Subject: [PATCH] mm: Do not walk all of system memory during show_mem

It has been reported on very large machines that show_mem is taking almost
5 minutes to display information. This is a serious problem if there is
an OOM storm. The bulk of the cost is in show_mem doing a very expensive
PFN walk to give us the following information

Total RAM: Also available as totalram_pages
Highmem pages: Also available as totalhigh_pages
Reserved pages: Can be inferred from the zone structure
Shared pages: PFN walk required
Unshared pages: PFN walk required
Quick pages: Per-cpu walk required

Only the shared/unshared pages requires a full PFN walk but that information
is useless. It is also inaccurate as page pins of unshared pages would
be accounted for as shared. Even if the information was accurate, I'm
struggling to think how the shared/unshared information could be useful
for debugging OOM conditions. Maybe it was useful before rmap existed when
reclaiming shared pages was costly but it is less relevant today.

The PFN walk could be optimised a bit but why bother as the information is
useless. This patch deletes the PFN walker and infers the total RAM, highmem
and reserved pages count from struct zone. It omits the shared/unshared page
usage on the grounds that it is useless. It also corrects the reporting
of HighMem as HighMem/MovableOnly as ZONE_MOVABLE has similar problems to
HighMem with respect to lowmem/highmem exhaustion.

Signed-off-by: Mel Gorman <[email protected]>
---
lib/show_mem.c | 39 +++++++++++----------------------------
1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/lib/show_mem.c b/lib/show_mem.c
index b7c7231..5847a49 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -12,8 +12,7 @@
void show_mem(unsigned int filter)
{
pg_data_t *pgdat;
- unsigned long total = 0, reserved = 0, shared = 0,
- nonshared = 0, highmem = 0;
+ unsigned long total = 0, reserved = 0, highmem = 0;

printk("Mem-Info:\n");
show_free_areas(filter);
@@ -22,43 +21,27 @@ void show_mem(unsigned int filter)
return;

for_each_online_pgdat(pgdat) {
- unsigned long i, flags;
+ unsigned long flags;
+ int zoneid;

pgdat_resize_lock(pgdat, &flags);
- for (i = 0; i < pgdat->node_spanned_pages; i++) {
- struct page *page;
- unsigned long pfn = pgdat->node_start_pfn + i;
-
- if (unlikely(!(i % MAX_ORDER_NR_PAGES)))
- touch_nmi_watchdog();
-
- if (!pfn_valid(pfn))
+ for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
+ struct zone *zone = &pgdat->node_zones[zoneid];
+ if (!populated_zone(zone))
continue;

- page = pfn_to_page(pfn);
-
- if (PageHighMem(page))
- highmem++;
+ total += zone->present_pages;
+ reserved = zone->present_pages - zone->managed_pages;

- if (PageReserved(page))
- reserved++;
- else if (page_count(page) == 1)
- nonshared++;
- else if (page_count(page) > 1)
- shared += page_count(page) - 1;
-
- total++;
+ if (is_highmem_idx(zoneid))
+ highmem += zone->present_pages;
}
pgdat_resize_unlock(pgdat, &flags);
}

printk("%lu pages RAM\n", total);
-#ifdef CONFIG_HIGHMEM
- printk("%lu pages HighMem\n", highmem);
-#endif
+ printk("%lu pages HighMem/MovableOnly\n", highmem);
printk("%lu pages reserved\n", reserved);
- printk("%lu pages shared\n", shared);
- printk("%lu pages non-shared\n", nonshared);
#ifdef CONFIG_QUICKLIST
printk("%lu pages in pagetable cache\n",
quicklist_total_size());


2013-10-17 01:12:00

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] mm: Do not walk all of system memory during show_mem

On Wed, 16 Oct 2013, Mel Gorman wrote:

> It has been reported on very large machines that show_mem is taking almost
> 5 minutes to display information. This is a serious problem if there is
> an OOM storm. The bulk of the cost is in show_mem doing a very expensive
> PFN walk to give us the following information
>
> Total RAM: Also available as totalram_pages
> Highmem pages: Also available as totalhigh_pages
> Reserved pages: Can be inferred from the zone structure
> Shared pages: PFN walk required
> Unshared pages: PFN walk required
> Quick pages: Per-cpu walk required
>
> Only the shared/unshared pages requires a full PFN walk but that information
> is useless. It is also inaccurate as page pins of unshared pages would
> be accounted for as shared. Even if the information was accurate, I'm
> struggling to think how the shared/unshared information could be useful
> for debugging OOM conditions. Maybe it was useful before rmap existed when
> reclaiming shared pages was costly but it is less relevant today.
>
> The PFN walk could be optimised a bit but why bother as the information is
> useless. This patch deletes the PFN walker and infers the total RAM, highmem
> and reserved pages count from struct zone. It omits the shared/unshared page
> usage on the grounds that it is useless. It also corrects the reporting
> of HighMem as HighMem/MovableOnly as ZONE_MOVABLE has similar problems to
> HighMem with respect to lowmem/highmem exhaustion.
>

We haven't been hit by this for the oom killer, but we did get hit with
this for page allocation failure warnings as a result of having irqs
disabled and passing GFP_ATOMIC to the page allocator without GFP_NOWARN.
That was the intention of passing SHOW_MEM_FILTER_PAGE_COUNT into
show_mem() in 4b59e6c47309 ("mm, show_mem: suppress page counts in
non-blockable contexts").

With this, I assume we can just remove SHOW_MEM_FILTER_PAGE_COUNT
entirely?

2013-10-31 04:55:46

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] mm: Do not walk all of system memory during show_mem

(10/16/13 6:42 AM), Mel Gorman wrote:
> It has been reported on very large machines that show_mem is taking almost
> 5 minutes to display information. This is a serious problem if there is
> an OOM storm. The bulk of the cost is in show_mem doing a very expensive
> PFN walk to give us the following information
>
> Total RAM: Also available as totalram_pages
> Highmem pages: Also available as totalhigh_pages
> Reserved pages: Can be inferred from the zone structure
> Shared pages: PFN walk required
> Unshared pages: PFN walk required
> Quick pages: Per-cpu walk required
>
> Only the shared/unshared pages requires a full PFN walk but that information
> is useless. It is also inaccurate as page pins of unshared pages would
> be accounted for as shared. Even if the information was accurate, I'm
> struggling to think how the shared/unshared information could be useful
> for debugging OOM conditions. Maybe it was useful before rmap existed when
> reclaiming shared pages was costly but it is less relevant today.
>
> The PFN walk could be optimised a bit but why bother as the information is
> useless. This patch deletes the PFN walker and infers the total RAM, highmem
> and reserved pages count from struct zone. It omits the shared/unshared page
> usage on the grounds that it is useless. It also corrects the reporting
> of HighMem as HighMem/MovableOnly as ZONE_MOVABLE has similar problems to
> HighMem with respect to lowmem/highmem exhaustion.
>
> Signed-off-by: Mel Gorman <[email protected]>

That's ok. I haven't used such information on my long oom debugging history.

Acked-by: KOSAKI Motohiro <[email protected]>