2005-12-15 00:14:33

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 00/14] Zoned VM stats

Zone based VM statistics are necessary to be able to determine what the state
of memory in one zone is. In a NUMA system this can be helpful to do local
reclaim and other memory optimizations by shifting VM load to optimize
page allocation. It is also helpful to know how the computing load affects
the memory allocations on various zones.

The patchset introduces a framework for counters that is a cross between the
existing page_stats --which are simply global counters split per cpu-- and the
approach of deferred incremental updates implemented for nr_pagecache.

Small per cpu 8 bit counters are introduced in struct zone. If counting
exceeds certain threshold then the counters are accumulated in an array in
the zone of the page and in a global array. This means that access to
VM counter information for a zone and for the whole machine is possible
by simply indexing an array. [Thanks to Nick Piggin for pointing me
at that approach].

The new statistics are then used to realize zone reclaim.

Patchset is against 2.6.15-rc5-mm2. The patches after zone reclaim are optional.

This is expanding and I hope its complete. But I have not tested it in UP and SMP yet.
There may be yet unforeseen consequences to the changes to various counters.


1 Add some consts for inlines in mm.h
2 Basic counter functionality
3 Convert nr_mapped
4 Convert nr_pagecache
5 Resurrect scan_control.may_swap
6 Zone Reclaim
7 Expanded node and zone statistics
8 Convert nr_slab
9 Convert nr_page_table
10 Convert nr_dirty
11 Convert nr_writeback
12 Convert nr_unstable
13 Remove get_page_state functions
14 Remove wbs


2005-12-15 00:14:37

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 02/14] Basic counter functionality

Currently we have various vm counters for the pages in a zone that are split
per cpu. This arrangement does not allow access to per zone statistics that
are important to optimize VM behavior for NUMA architectures. All one can say
from the per cpu differential variables is how much a certain variable was
changed by this cpu without being able to deduce how many pages in each zone
are of a certain type.

This framework here implements differential counters for each processor
in struct zone. The differential counters are consolidated when a threshold
is exceeded (like done in the current implementation for nr_pageache), when
slab reaping occurs or when a consolidation function is called.
Consolidation uses atomic operations and accumulates counters per zone in
the zone structure and also globally in the vm_stat array. VM function can
access the counts by simply indexing a global or zone specific array.

The arrangement of counters in an array simplifies processing when output
has to be generated for /proc/*.

Counter updates can be triggered by calling *_zone_page_state or
__*_zone_page_state. The second function can be called if it is known that
interrupts are disabled.

Specially optimized increment and decrement functions are provided. These
can avoid certain checks and use increment or decrement instructions that
an architecture may provide.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-12 15:07:45.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 14:57:22.000000000 -0800
@@ -596,7 +596,281 @@ static int rmqueue_bulk(struct zone *zon
return i;
}

+/*
+ * Manage combined zone based / global counters
+ */
+#define STAT_THRESHOLD 32
+
+atomic_long_t vm_stat[NR_STAT_ITEMS];
+
+static inline void zone_page_state_consolidate(long x, struct zone *zone, enum zone_stat_item item)
+{
+ atomic_long_add(x, &zone->vm_stat[item]);
+ atomic_long_add(x, &vm_stat[item]);
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Determine pointer to currently valid differential byte given a zone and
+ * the item number.
+ *
+ * Preemption must be off
+ */
+static inline s8 *diff_pointer(struct zone *zone, enum zone_stat_item item)
+{
+ return &zone_pcp(zone, raw_smp_processor_id())->vm_stat_diff[item];
+}
+
+/*
+ * For use when we know that interrupts are disabled.
+ */
+void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta)
+{
+ s8 *p;
+ long x;
+
+ p = diff_pointer(zone, item);
+ x = delta + *p;
+
+ if (unlikely(x > STAT_THRESHOLD || x < -STAT_THRESHOLD)) {
+ zone_page_state_consolidate(x, zone, item);
+ x = 0;
+ }
+
+ *p = x;
+}
+EXPORT_SYMBOL(__mod_zone_page_state);
+
+/*
+ * For an unknown interrupt state
+ */
+void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ __mod_zone_page_state(zone, item, delta);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(mod_zone_page_state);
+
+/*
+ * Optimized increment and decrement functions.
+ *
+ * These are only for a single page and therefore can take a struct page *
+ * argument instead of struct zone *. This allows the inclusion of the code
+ * generated for page_zone(page) into the optimized functions.
+ *
+ * No overflow check is necessary and therefore the differential can be
+ * incremented or decremented in place which may allow the compilers to
+ * generate better code.
+ *
+ * The increment or decrement is known and therefore one boundary check can
+ * be omitted.
+ *
+ * Some processors have inc/dec instructions that are atomic vs an interrupt.
+ * However, the code must first determine the differential location in a zone
+ * based on the processor number and then inc/dec the counter. There is no
+ * guarantee without disabling preemption that the processor will not change
+ * in between and therefore the atomicity vs. interrupt cannot be exploited
+ * in a useful way here.
+ */
+void __inc_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ struct zone *zone = page_zone(page);
+ s8 *p = diff_pointer(zone, item);
+
+ *p++;
+
+ if (unlikely(*p > STAT_THRESHOLD)) {
+ zone_page_state_consolidate(*p, zone, item);
+ *p = 0;
+ }
+}
+EXPORT_SYMBOL(__inc_zone_page_state);
+
+void __dec_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ struct zone *zone = page_zone(page);
+ s8 *p = diff_pointer(zone, item);
+
+ *p--;
+
+ if (unlikely(*p < -STAT_THRESHOLD)) {
+ zone_page_state_consolidate(*p, zone, item);
+ *p = 0;
+ }
+}
+EXPORT_SYMBOL(__dec_zone_page_state);
+
+void inc_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ unsigned long flags;
+ struct zone *zone;
+ s8 *p;
+
+ local_irq_save(flags);
+ zone = page_zone(page);
+ p = diff_pointer(zone, item);
+
+ *p++;
+
+ if (unlikely(*p > STAT_THRESHOLD)) {
+ zone_page_state_consolidate(*p, zone, item);
+ *p = 0;
+ }
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(inc_zone_page_state);
+
+void dec_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ unsigned long flags;
+ struct zone *zone;
+ s8 *p;
+
+ local_irq_save(flags);
+ zone = page_zone(page);
+ p = diff_pointer(zone, item);
+
+ *p--;
+
+ if (unlikely(*p < -STAT_THRESHOLD)) {
+ zone_page_state_consolidate(*p, zone, item);
+ *p = 0;
+ }
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(dec_zone_page_state);
+
+/*
+ * Update the zone counters for one cpu.
+ */
+void refresh_cpu_vm_stats(void)
+{
+ struct zone *zone;
+ int i;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ for_each_zone(zone) {
+ struct per_cpu_pageset *pcp = zone_pcp(zone, raw_smp_processor_id());
+
+ for(i = 0; i < NR_STAT_ITEMS; i++) {
+ int v;
+
+ v = pcp->vm_stat_diff[i];
+ if (v) {
+ pcp->vm_stat_diff[i] = 0;
+ zone_page_state_consolidate(v, zone, i);
+ }
+ }
+ }
+ local_irq_restore(flags);
+}
+
+static void __refresh_cpu_vm_stats(void *dummy)
+{
+ refresh_cpu_vm_stats();
+}
+
+/*
+ * Consolidate all counters.
+ *
+ * Note that the result is less inaccurate but still inaccurate
+ * since concurrent processes can increment/decrement counters
+ * while this functions runs.
+ */
+void refresh_vm_stats(void)
+{
+ schedule_on_each_cpu(__refresh_cpu_vm_stats, NULL);
+}
+EXPORT_SYMBOL(refresh_vm_stats);
+
+#else /* CONFIG_SMP */
+
+/*
+ * For use when we know that interrupts are disabled.
+ */
+void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta)
+{
+ zone_page_state_consolidate(delta, zone, item);
+}
+EXPORT_SYMBOL(__mod_zone_page_state);
+
+/*
+ * For an unknown interrupt state
+ */
+void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ zone_page_state_consolidate(delta, zone, item);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(mod_zone_page_state);
+
+void __inc_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ struct zone *zone = page_zone(page);
+
+ zone_page_state_consolidate(1, zone, item);
+}
+EXPORT_SYMBOL(__inc_zone_page_state);
+
+void __dec_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ struct zone *zone = page_zone(page);
+
+ zone_page_state_consolidate(-1, zone, item);
+}
+EXPORT_SYMBOL(__dec_zone_page_state);
+
+void inc_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ unsigned long flags;
+ struct zone *zone;
+
+ local_irq_save(flags);
+ zone = page_zone(page);
+ zone_page_state_consolidate(1, zone, item);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(inc_zone_page_state);
+
+void dec_zone_page_state(const struct page *page, enum zone_stat_item item)
+{
+ unsigned long flags;
+ struct zone *zone;
+
+ local_irq_save(flags);
+ zone = page_zone(page);
+ zone_page_state_consolidate(-1, zone, item);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(dec_zone_page_state);
+#endif
+
#ifdef CONFIG_NUMA
+/*
+ * Determine the per node value of a stat item. This is done by cycling
+ * through all the zones of a node.
+ */
+unsigned long node_page_state(int node, enum zone_stat_item item)
+{
+ struct zone *zones = NODE_DATA(node)->node_zones;
+ int i;
+ long v = 0;
+
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ v += atomic_long_read(&zones[i].vm_stat[item]);
+ if (v < 0)
+ v = 0;
+ return v;
+}
+EXPORT_SYMBOL(node_page_state);
+
/* Called from the slab reaper to drain remote pagesets */
void drain_remote_pages(void)
{
Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 14:45:40.000000000 -0800
@@ -174,6 +174,49 @@ extern void __mod_page_state(unsigned lo
} while (0)

/*
+ * Zone based accounting with per cpu differentials.
+ */
+extern atomic_long_t vm_stat[NR_STAT_ITEMS];
+
+static inline unsigned long global_page_state(enum zone_stat_item item)
+{
+ long x = atomic_long_read(&vm_stat[item]);
+
+ if (x < 0)
+ x = 0;
+ return x;
+}
+
+static inline unsigned long zone_page_state(struct zone *zone, enum zone_stat_item item)
+{
+ long x = atomic_long_read(&zone->vm_stat[item]);
+
+ if (x < 0)
+ x = 0;
+ return x;
+}
+
+#ifdef CONFIG_NUMA
+unsigned long node_page_state(int node, enum zone_stat_item);
+#else
+#define node_page_state(node, item) global_page_state(item)
+#endif
+
+void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta);
+void __inc_zone_page_state(const struct page *page, enum zone_stat_item item);
+void __dec_zone_page_state(const struct page *page, enum zone_stat_item item);
+
+#define __add_zone_page_state(zone, item) __mod_zone_page_state(zone, item, delta)
+#define __sub_zone_page_state(zone, item) __mod_zone_page_state(zone, item, -(delta))
+
+void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta);
+void inc_zone_page_state(const struct page *page, enum zone_stat_item item);
+void dec_zone_page_state(const struct page *page, enum zone_stat_item item);
+
+#define add_zone_page_state(zone, item, delta) mod_zone_page_state(zone, item, delta)
+#define sub_zone_page_state(zone, item, delta) mod_zone_page_state(zone, item, -(delta))
+
+/*
* Manipulation of page state flags
*/
#define PageLocked(page) \
Index: linux-2.6.15-rc5-mm2/include/linux/gfp.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/gfp.h 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/gfp.h 2005-12-14 14:49:19.000000000 -0800
@@ -160,4 +160,12 @@ void drain_remote_pages(void);
static inline void drain_remote_pages(void) { };
#endif

+#ifdef CONFIG_SMP
+void refresh_cpu_vm_stats(void);
+void refresh_vm_stats(void);
+#else
+static inline void refresh_cpu_vm_stats(void) { };
+static inline void refresh_vm_stats(void) { };
+#endif
+
#endif /* __LINUX_GFP_H */
Index: linux-2.6.15-rc5-mm2/mm/slab.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/slab.c 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/slab.c 2005-12-14 14:45:40.000000000 -0800
@@ -3423,6 +3423,7 @@ static void cache_reap(void *unused)
check_irq_on();
up(&cache_chain_sem);
drain_remote_pages();
+ refresh_cpu_vm_stats();
/* Setup the next iteration */
schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC);
}
Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 14:46:34.000000000 -0800
@@ -44,6 +44,9 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

+enum zone_stat_item { };
+#define NR_STAT_ITEMS 0
+
struct per_cpu_pages {
int count; /* number of pages in the list */
int high; /* high watermark, emptying needed */
@@ -53,6 +56,10 @@ struct per_cpu_pages {

struct per_cpu_pageset {
struct per_cpu_pages pcp[2]; /* 0: hot. 1: cold */
+#ifdef CONFIG_SMP
+ s8 vm_stat_diff[NR_STAT_ITEMS];
+#endif
+
#ifdef CONFIG_NUMA
unsigned long numa_hit; /* allocated in intended node */
unsigned long numa_miss; /* allocated in non intended node */
@@ -149,6 +156,8 @@ struct zone {
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */

+ /* Zone statistics */
+ atomic_long_t vm_stat[NR_STAT_ITEMS];
/*
* Does the allocator try to reclaim pages from the zone as soon
* as it fails a watermark_ok() in __alloc_pages?

2005-12-15 00:14:57

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 05/14] Resurrect scan_control.may_swap

Resurrect may_swap in struct scan_control

Undo the patch to remove may_writepage from mm.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/vmscan.c 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/vmscan.c 2005-12-14 15:24:19.000000000 -0800
@@ -71,6 +71,9 @@ struct scan_control {

int may_writepage;

+ /* Can pages be swapped as part of reclaim? */
+ int may_swap;
+
/* This context's SWAP_CLUSTER_MAX. If freeing memory for
* suspend, we effectively ignore SWAP_CLUSTER_MAX.
* In this context, it doesn't matter that we scan the
@@ -458,6 +461,8 @@ static int shrink_list(struct list_head
* Try to allocate it some swap space here.
*/
if (PageAnon(page) && !PageSwapCache(page)) {
+ if (!sc->may_swap)
+ goto keep_locked;
if (!add_to_swap(page, GFP_ATOMIC))
goto activate_locked;
}
@@ -1415,6 +1420,7 @@ int try_to_free_pages(struct zone **zone

sc.gfp_mask = gfp_mask;
sc.may_writepage = 0;
+ sc.may_swap = 1;

inc_page_state(allocstall);

@@ -1517,6 +1523,7 @@ loop_again:
total_reclaimed = 0;
sc.gfp_mask = GFP_KERNEL;
sc.may_writepage = 0;
+ sc.may_swap = 1;
sc.nr_mapped = global_page_state(NR_MAPPED);

inc_page_state(pageoutrun);

2005-12-15 00:15:27

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 03/14] Convert nr_mapped

Make nr_mapped a per zone counter

nr_mapped is important because it allows a determination how many pages of a
zone are not mapped, which would allow a more efficient means of determining
when we need to reclaim memory in a zone.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-14 14:45:40.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 14:57:29.000000000 -0800
@@ -86,7 +86,6 @@ struct page_state {
unsigned long nr_writeback; /* Pages under writeback */
unsigned long nr_unstable; /* NFS unstable pages */
unsigned long nr_page_table_pages;/* Pages used for pagetables */
- unsigned long nr_mapped; /* mapped into pagetables */
unsigned long nr_slab; /* In slab */
#define GET_PAGE_STATE_LAST nr_slab

Index: linux-2.6.15-rc5-mm2/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/drivers/base/node.c 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm2/drivers/base/node.c 2005-12-14 14:57:29.000000000 -0800
@@ -43,18 +43,18 @@ static ssize_t node_read_meminfo(struct
unsigned long inactive;
unsigned long active;
unsigned long free;
+ unsigned long nr_mapped;

si_meminfo_node(&i, nid);
get_page_state_node(&ps, nid);
__get_zone_counts(&active, &inactive, &free, NODE_DATA(nid));
+ nr_mapped = node_page_state(nid, NR_MAPPED);

/* Check for negative values in these approximate counters */
if ((long)ps.nr_dirty < 0)
ps.nr_dirty = 0;
if ((long)ps.nr_writeback < 0)
ps.nr_writeback = 0;
- if ((long)ps.nr_mapped < 0)
- ps.nr_mapped = 0;
if ((long)ps.nr_slab < 0)
ps.nr_slab = 0;

@@ -83,7 +83,7 @@ static ssize_t node_read_meminfo(struct
nid, K(i.freeram - i.freehigh),
nid, K(ps.nr_dirty),
nid, K(ps.nr_writeback),
- nid, K(ps.nr_mapped),
+ nid, K(nr_mapped),
nid, K(ps.nr_slab));
n += hugetlb_report_node_meminfo(nid, buf + n);
return n;
Index: linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/proc/proc_misc.c 2005-12-12 09:10:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c 2005-12-14 14:57:29.000000000 -0800
@@ -190,7 +190,7 @@ static int meminfo_read_proc(char *page,
K(i.freeswap),
K(ps.nr_dirty),
K(ps.nr_writeback),
- K(ps.nr_mapped),
+ K(global_page_state(NR_MAPPED)),
K(ps.nr_slab),
K(allowed),
K(committed),
Index: linux-2.6.15-rc5-mm2/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/vmscan.c 2005-12-13 20:41:05.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/vmscan.c 2005-12-14 14:57:29.000000000 -0800
@@ -1429,7 +1429,7 @@ int try_to_free_pages(struct zone **zone
}

for (priority = DEF_PRIORITY; priority >= 0; priority--) {
- sc.nr_mapped = read_page_state(nr_mapped);
+ sc.nr_mapped = global_page_state(NR_MAPPED);
sc.nr_scanned = 0;
sc.nr_reclaimed = 0;
sc.priority = priority;
@@ -1517,7 +1517,7 @@ loop_again:
total_reclaimed = 0;
sc.gfp_mask = GFP_KERNEL;
sc.may_writepage = 0;
- sc.nr_mapped = read_page_state(nr_mapped);
+ sc.nr_mapped = global_page_state(NR_MAPPED);

inc_page_state(pageoutrun);

Index: linux-2.6.15-rc5-mm2/mm/page-writeback.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page-writeback.c 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page-writeback.c 2005-12-14 14:57:29.000000000 -0800
@@ -111,7 +111,7 @@ static void get_writeback_state(struct w
{
wbs->nr_dirty = read_page_state(nr_dirty);
wbs->nr_unstable = read_page_state(nr_unstable);
- wbs->nr_mapped = read_page_state(nr_mapped);
+ wbs->nr_mapped = global_page_state(NR_MAPPED);
wbs->nr_writeback = read_page_state(nr_writeback);
}

Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 14:57:22.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 14:57:29.000000000 -0800
@@ -1784,7 +1784,7 @@ void show_free_areas(void)
ps.nr_unstable,
nr_free_pages(),
ps.nr_slab,
- ps.nr_mapped,
+ global_page_state(NR_MAPPED),
ps.nr_page_table_pages);

for_each_zone(zone) {
Index: linux-2.6.15-rc5-mm2/mm/rmap.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/rmap.c 2005-12-14 10:54:05.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/rmap.c 2005-12-14 14:57:29.000000000 -0800
@@ -473,7 +473,7 @@ static void __page_set_anon_rmap(struct

page->index = linear_page_index(vma, address);

- inc_page_state(nr_mapped);
+ inc_zone_page_state(page, NR_MAPPED);
}

/**
@@ -520,7 +520,7 @@ void page_add_file_rmap(struct page *pag
BUG_ON(!pfn_valid(page_to_pfn(page)));

if (atomic_inc_and_test(&page->_mapcount))
- inc_page_state(nr_mapped);
+ inc_zone_page_state(page, NR_MAPPED);
}

/**
@@ -544,7 +544,7 @@ void page_remove_rmap(struct page *page)
*/
if (page_test_and_clear_dirty(page))
set_page_dirty(page);
- dec_page_state(nr_mapped);
+ dec_zone_page_state(page, NR_MAPPED);
}
}

Index: linux-2.6.15-rc5-mm2/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/swap_prefetch.c 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/swap_prefetch.c 2005-12-14 14:57:29.000000000 -0800
@@ -327,7 +327,7 @@ static int prefetch_suitable(void)
* >2/3 of the ram is mapped or swapcache, we need some free for
* pagecache
*/
- limit = ps.nr_mapped + ps.nr_slab + pending_writes +
+ limit = global_page_state(NR_MAPPED) + ps.nr_slab + pending_writes +
total_swapcache_pages;
if (limit > mapped_limit)
goto out;
Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-14 14:46:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 14:57:29.000000000 -0800
@@ -44,8 +44,8 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

-enum zone_stat_item { };
-#define NR_STAT_ITEMS 0
+enum zone_stat_item { NR_MAPPED };
+#define NR_STAT_ITEMS 1

struct per_cpu_pages {
int count; /* number of pages in the list */

2005-12-15 00:15:51

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 06/14] Zone Reclaim

Zone reclaim allows the reclaiming of pages from a zone if the number of free
pages falls below the watermark even if other zones still have enough pages
available. Zone reclaim is of particular importance for NUMA machines. It can
be more beneficial to reclaim a page than taking the performance penalties
that come with allocating a page on a remote zone.

Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20 meaning the distance to another node in the
same component (enclosure or motherboard).

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 14:57:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:24:22.000000000 -0800
@@ -1186,7 +1186,9 @@ get_page_from_freelist(gfp_t gfp_mask, u
mark = (*z)->pages_high;
if (!zone_watermark_ok(*z, order, mark,
classzone_idx, alloc_flags))
- continue;
+ if (!zone_reclaim_mode ||
+ !zone_reclaim(*z, gfp_mask, order))
+ continue;
}

page = buffered_rmqueue(*z, order, gfp_mask);
@@ -1957,13 +1959,22 @@ static void __init build_zonelists(pg_da
prev_node = local_node;
nodes_clear(used_mask);
while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
+ int distance = node_distance(local_node, node);
+
+ /*
+ * If another node is sufficiently far away then it is better
+ * to reclaim pages in a zone before going off node.
+ */
+ if (distance > RECLAIM_DISTANCE)
+ zone_reclaim_mode = 1;
+
/*
* We don't want to pressure a particular node.
* So adding penalty to the first node in same
* distance group to make it round-robin.
*/
- if (node_distance(local_node, node) !=
- node_distance(local_node, prev_node))
+
+ if (distance != node_distance(local_node, prev_node))
node_load[node] += load;
prev_node = node;
load--;
Index: linux-2.6.15-rc5-mm2/include/linux/swap.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/swap.h 2005-12-13 20:41:05.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/swap.h 2005-12-14 15:24:22.000000000 -0800
@@ -172,6 +172,17 @@ extern void swap_setup(void);

/* linux/mm/vmscan.c */
extern int try_to_free_pages(struct zone **, gfp_t);
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
+extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+#else
+#define zone_reclaim_mode 0
+static inline int zone_reclaim(struct zone *z, gfp_t mask,
+ unsigned int order)
+{
+ return 0;
+}
+#endif
extern int shrink_all_memory(int);
extern int vm_swappiness;

Index: linux-2.6.15-rc5-mm2/include/linux/topology.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/topology.h 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/topology.h 2005-12-14 15:24:22.000000000 -0800
@@ -56,6 +56,9 @@
#define REMOTE_DISTANCE 20
#define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
#endif
+#ifndef RECLAIM_DISTANCE
+#define RECLAIM_DISTANCE 20
+#endif
#ifndef PENALTY_FOR_NODE_WITH_CPUS
#define PENALTY_FOR_NODE_WITH_CPUS (1)
#endif
Index: linux-2.6.15-rc5-mm2/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/vmscan.c 2005-12-14 15:24:19.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/vmscan.c 2005-12-14 15:24:43.000000000 -0800
@@ -1823,3 +1823,60 @@ static int __init kswapd_init(void)
}

module_init(kswapd_init)
+
+#ifdef CONFIG_NUMA
+/*
+ * Zone reclaim mode
+ *
+ * If non-zero call zone_reclaim when the number of free pages falls below
+ * the watermarks.
+ */
+int zone_reclaim_mode __read_mostly;
+
+/*
+ * Try to free up some pages from this zone through reclaim.
+ */
+int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
+{
+ struct scan_control sc;
+ int nr_pages = 1 << order;
+ struct task_struct *p = current;
+ struct reclaim_state reclaim_state;
+
+ if (!(gfp_mask & __GFP_WAIT) ||
+ zone->zone_pgdat->node_id != numa_node_id() ||
+ zone->all_unreclaimable ||
+ atomic_read(&zone->reclaim_in_progress) > 0)
+ return 0;
+
+ /*
+ * Check if there is a reasonable amount of recoverable memory before
+ * doing the scan.
+ */
+ if (zone_page_state(zone, NR_PAGECACHE) <=
+ zone_page_state(zone, NR_MAPPED) + nr_pages)
+ return 0;
+
+ sc.gfp_mask = gfp_mask;
+ sc.may_writepage = 0;
+ sc.may_swap = 0;
+ sc.nr_mapped = global_page_state(NR_MAPPED);
+ sc.nr_scanned = 0;
+ sc.nr_reclaimed = 0;
+ sc.priority = 0;
+ disable_swap_token();
+
+ sc.swap_cluster_max = max(nr_pages, SWAP_CLUSTER_MAX);
+
+ cond_resched();
+ p->flags |= PF_MEMALLOC;
+ reclaim_state.reclaimed_slab = 0;
+ p->reclaim_state = &reclaim_state;
+ shrink_zone(zone, &sc);
+ p->reclaim_state = NULL;
+ current->flags &= ~PF_MEMALLOC;
+ cond_resched();
+ return sc.nr_reclaimed >= (1 << order);
+}
+#endif
+

2005-12-15 00:15:52

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 12/14] Convert nr_unstable

Per zone unstable pages

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/swap_prefetch.c 2005-12-14 15:35:38.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/swap_prefetch.c 2005-12-14 15:37:43.000000000 -0800
@@ -319,7 +319,7 @@ static int prefetch_suitable(void)
goto out;

/* Delay prefetching if we have significant amounts of dirty data */
- pending_writes = global_page_state(NR_DIRTY) + ps.nr_unstable;
+ pending_writes = global_page_state(NR_DIRTY) + global_page_state(NR_UNSTABLE);
if (pending_writes > SWAP_CLUSTER_MAX)
goto out;

Index: linux-2.6.15-rc5-mm2/fs/fs-writeback.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/fs-writeback.c 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/fs-writeback.c 2005-12-14 15:37:43.000000000 -0800
@@ -471,7 +471,7 @@ void sync_inodes_sb(struct super_block *
.sync_mode = wait ? WB_SYNC_ALL : WB_SYNC_HOLD,
};
unsigned long nr_dirty = global_page_state(NR_DIRTY);
- unsigned long nr_unstable = read_page_state(nr_unstable);
+ unsigned long nr_unstable = global_page_state(NR_UNSTABLE);

wbc.nr_to_write = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused) +
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 15:37:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:37:54.000000000 -0800
@@ -597,7 +597,8 @@ static int rmqueue_bulk(struct zone *zon
}

char *stat_item_descr[NR_STAT_ITEMS] = {
- "mapped","pagecache", "slab", "pagetable", "dirty", "writeback"
+ "mapped","pagecache", "slab", "pagetable", "dirty", "writeback",
+ "unstable"
};

/*
@@ -1781,7 +1782,7 @@ void show_free_areas(void)
inactive,
global_page_state(NR_DIRTY),
global_page_state(NR_WRITEBACK),
- ps.nr_unstable,
+ global_page_state(NR_UNSTABLE),
nr_free_pages(),
global_page_state(NR_SLAB),
global_page_state(NR_MAPPED),
Index: linux-2.6.15-rc5-mm2/fs/nfs/write.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/nfs/write.c 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/nfs/write.c 2005-12-14 15:37:43.000000000 -0800
@@ -474,7 +474,7 @@ nfs_mark_request_commit(struct nfs_page
nfs_list_add_request(req, &nfsi->commit);
nfsi->ncommit++;
spin_unlock(&nfsi->req_lock);
- inc_page_state(nr_unstable);
+ inc_zone_page_state(req->wb_page, NR_UNSTABLE);
mark_inode_dirty(inode);
}
#endif
@@ -1272,7 +1272,6 @@ void nfs_commit_done(struct rpc_task *ta
{
struct nfs_write_data *data = calldata;
struct nfs_page *req;
- int res = 0;

dprintk("NFS: %4d nfs_commit_done (status %d)\n",
task->tk_pid, task->tk_status);
@@ -1306,9 +1305,8 @@ void nfs_commit_done(struct rpc_task *ta
nfs_mark_request_dirty(req);
next:
nfs_clear_page_writeback(req);
- res++;
+ dec_zone_page_state(req->wb_page, NR_UNSTABLE);
}
- sub_page_state(nr_unstable,res);
}
#endif

Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-14 15:35:38.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 15:37:43.000000000 -0800
@@ -82,8 +82,7 @@
* allowed.
*/
struct page_state {
- unsigned long nr_unstable; /* NFS unstable pages */
-#define GET_PAGE_STATE_LAST nr_unstable
+#define GET_PAGE_STATE_LAST xxx

/*
* The below are zeroed by get_page_state(). Use get_full_page_state()
Index: linux-2.6.15-rc5-mm2/mm/page-writeback.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page-writeback.c 2005-12-14 15:35:38.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page-writeback.c 2005-12-14 15:37:43.000000000 -0800
@@ -110,7 +110,7 @@ struct writeback_state
static void get_writeback_state(struct writeback_state *wbs)
{
wbs->nr_dirty = global_page_state(NR_DIRTY);
- wbs->nr_unstable = read_page_state(nr_unstable);
+ wbs->nr_unstable = global_page_state(NR_UNSTABLE);
wbs->nr_mapped = global_page_state(NR_MAPPED);
wbs->nr_writeback = global_page_state(NR_WRITEBACK);
}
Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-14 15:35:38.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 15:37:43.000000000 -0800
@@ -44,8 +44,8 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

-enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB, NR_PAGETABLE, NR_DIRTY, NR_WRITEBACK };
-#define NR_STAT_ITEMS 6
+enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB, NR_PAGETABLE, NR_DIRTY, NR_WRITEBACK, NR_UNSTABLE };
+#define NR_STAT_ITEMS 7

struct per_cpu_pages {
int count; /* number of pages in the list */

2005-12-15 00:16:36

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 07/14] Expanded node and zone statistics

Extend zone, node and global statistics by printing all counters from the stats
array.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/drivers/base/node.c 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/drivers/base/node.c 2005-12-14 15:28:34.000000000 -0800
@@ -43,12 +43,14 @@ static ssize_t node_read_meminfo(struct
unsigned long inactive;
unsigned long active;
unsigned long free;
- unsigned long nr_mapped;
+ int j;
+ unsigned long nr[NR_STAT_ITEMS];

si_meminfo_node(&i, nid);
get_page_state_node(&ps, nid);
__get_zone_counts(&active, &inactive, &free, NODE_DATA(nid));
- nr_mapped = node_page_state(nid, NR_MAPPED);
+ for (j = 0; j < NR_STAT_ITEMS; j++)
+ nr[j] = node_page_state(nid, j);

/* Check for negative values in these approximate counters */
if ((long)ps.nr_dirty < 0)
@@ -71,6 +73,7 @@ static ssize_t node_read_meminfo(struct
"Node %d Dirty: %8lu kB\n"
"Node %d Writeback: %8lu kB\n"
"Node %d Mapped: %8lu kB\n"
+ "Node %d Pagecache: %8lu kB\n"
"Node %d Slab: %8lu kB\n",
nid, K(i.totalram),
nid, K(i.freeram),
@@ -83,7 +86,8 @@ static ssize_t node_read_meminfo(struct
nid, K(i.freeram - i.freehigh),
nid, K(ps.nr_dirty),
nid, K(ps.nr_writeback),
- nid, K(nr_mapped),
+ nid, K(nr[NR_MAPPED]),
+ nid, K(nr[NR_PAGECACHE]),
nid, K(ps.nr_slab));
n += hugetlb_report_node_meminfo(nid, buf + n);
return n;
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 15:27:43.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:28:34.000000000 -0800
@@ -596,6 +596,8 @@ static int rmqueue_bulk(struct zone *zon
return i;
}

+char *stat_item_descr[NR_STAT_ITEMS] = { "mapped","pagecache" };
+
/*
* Manage combined zone based / global counters
*/
@@ -2602,6 +2604,11 @@ static int zoneinfo_show(struct seq_file
zone->nr_scan_active, zone->nr_scan_inactive,
zone->spanned_pages,
zone->present_pages);
+ for(i = 0; i < NR_STAT_ITEMS; i++)
+ seq_printf(m, "\n %-8s %lu",
+ stat_item_descr[i],
+ zone_page_state(zone, i));
+
seq_printf(m,
"\n protection: (%lu",
zone->lowmem_reserve[0]);

2005-12-15 00:16:47

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 14/14] Remove wbs

Remove writeback state

We can remove some functions now that were needed to calculate the page
state.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/mm/page-writeback.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page-writeback.c 2005-12-14 15:37:43.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page-writeback.c 2005-12-14 15:40:58.000000000 -0800
@@ -99,22 +99,6 @@ EXPORT_SYMBOL(laptop_mode);

static void background_writeout(unsigned long _min_pages);

-struct writeback_state
-{
- unsigned long nr_dirty;
- unsigned long nr_unstable;
- unsigned long nr_mapped;
- unsigned long nr_writeback;
-};
-
-static void get_writeback_state(struct writeback_state *wbs)
-{
- wbs->nr_dirty = global_page_state(NR_DIRTY);
- wbs->nr_unstable = global_page_state(NR_UNSTABLE);
- wbs->nr_mapped = global_page_state(NR_MAPPED);
- wbs->nr_writeback = global_page_state(NR_WRITEBACK);
-}
-
/*
* Work out the current dirty-memory clamping and background writeout
* thresholds.
@@ -133,8 +117,7 @@ static void get_writeback_state(struct w
* clamping level.
*/
static void
-get_dirty_limits(struct writeback_state *wbs, long *pbackground, long *pdirty,
- struct address_space *mapping)
+get_dirty_limits(long *pbackground, long *pdirty, struct address_space *mapping)
{
int background_ratio; /* Percentages */
int dirty_ratio;
@@ -144,8 +127,6 @@ get_dirty_limits(struct writeback_state
unsigned long available_memory = total_pages;
struct task_struct *tsk;

- get_writeback_state(wbs);
-
#ifdef CONFIG_HIGHMEM
/*
* If this mapping can only allocate from low memory,
@@ -156,7 +137,7 @@ get_dirty_limits(struct writeback_state
#endif


- unmapped_ratio = 100 - (wbs->nr_mapped * 100) / total_pages;
+ unmapped_ratio = 100 - (global_page_state(NR_MAPPED) * 100) / total_pages;

dirty_ratio = vm_dirty_ratio;
if (dirty_ratio > unmapped_ratio / 2)
@@ -189,7 +170,6 @@ get_dirty_limits(struct writeback_state
*/
static void balance_dirty_pages(struct address_space *mapping)
{
- struct writeback_state wbs;
long nr_reclaimable;
long background_thresh;
long dirty_thresh;
@@ -206,10 +186,9 @@ static void balance_dirty_pages(struct a
.nr_to_write = write_chunk,
};

- get_dirty_limits(&wbs, &background_thresh,
- &dirty_thresh, mapping);
- nr_reclaimable = wbs.nr_dirty + wbs.nr_unstable;
- if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh)
+ get_dirty_limits(&background_thresh, &dirty_thresh, mapping);
+ nr_reclaimable = global_page_state(NR_DIRTY) + global_page_state(NR_UNSTABLE);
+ if (nr_reclaimable + global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;

dirty_exceeded = 1;
@@ -222,10 +201,9 @@ static void balance_dirty_pages(struct a
*/
if (nr_reclaimable) {
writeback_inodes(&wbc);
- get_dirty_limits(&wbs, &background_thresh,
- &dirty_thresh, mapping);
- nr_reclaimable = wbs.nr_dirty + wbs.nr_unstable;
- if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh)
+ get_dirty_limits(&background_thresh, &dirty_thresh, mapping);
+ nr_reclaimable = global_page_state(NR_DIRTY) + global_page_state(NR_UNSTABLE);
+ if (nr_reclaimable + global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written >= write_chunk)
@@ -234,7 +212,7 @@ static void balance_dirty_pages(struct a
blk_congestion_wait(WRITE, HZ/10);
}

- if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh)
+ if (nr_reclaimable + global_page_state(NR_WRITEBACK) <= dirty_thresh)
dirty_exceeded = 0;

if (writeback_in_progress(bdi))
@@ -291,12 +269,11 @@ EXPORT_SYMBOL(balance_dirty_pages_rateli

void throttle_vm_writeout(void)
{
- struct writeback_state wbs;
long background_thresh;
long dirty_thresh;

for ( ; ; ) {
- get_dirty_limits(&wbs, &background_thresh, &dirty_thresh, NULL);
+ get_dirty_limits(&background_thresh, &dirty_thresh, NULL);

/*
* Boost the allowable dirty threshold a bit for page
@@ -304,7 +281,7 @@ void throttle_vm_writeout(void)
*/
dirty_thresh += dirty_thresh / 10; /* wheeee... */

- if (wbs.nr_unstable + wbs.nr_writeback <= dirty_thresh)
+ if (global_page_state(NR_UNSTABLE) + global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
blk_congestion_wait(WRITE, HZ/10);
}
@@ -327,12 +304,11 @@ static void background_writeout(unsigned
};

for ( ; ; ) {
- struct writeback_state wbs;
long background_thresh;
long dirty_thresh;

- get_dirty_limits(&wbs, &background_thresh, &dirty_thresh, NULL);
- if (wbs.nr_dirty + wbs.nr_unstable < background_thresh
+ get_dirty_limits(&background_thresh, &dirty_thresh, NULL);
+ if (global_page_state(NR_DIRTY) + global_page_state(NR_UNSTABLE) < background_thresh
&& min_pages <= 0)
break;
wbc.encountered_congestion = 0;
@@ -356,12 +332,8 @@ static void background_writeout(unsigned
*/
int wakeup_pdflush(long nr_pages)
{
- if (nr_pages == 0) {
- struct writeback_state wbs;
-
- get_writeback_state(&wbs);
- nr_pages = wbs.nr_dirty + wbs.nr_unstable;
- }
+ if (nr_pages == 0)
+ nr_pages = global_page_state(NR_DIRTY) + global_page_state(NR_UNSTABLE);
return pdflush_operation(background_writeout, nr_pages);
}

@@ -392,7 +364,6 @@ static void wb_kupdate(unsigned long arg
unsigned long start_jif;
unsigned long next_jif;
long nr_to_write;
- struct writeback_state wbs;
struct writeback_control wbc = {
.bdi = NULL,
.sync_mode = WB_SYNC_NONE,
@@ -404,11 +375,10 @@ static void wb_kupdate(unsigned long arg

sync_supers();

- get_writeback_state(&wbs);
oldest_jif = jiffies - (dirty_expire_centisecs * HZ) / 100;
start_jif = jiffies;
next_jif = start_jif + (dirty_writeback_centisecs * HZ) / 100;
- nr_to_write = wbs.nr_dirty + wbs.nr_unstable +
+ nr_to_write = global_page_state(NR_DIRTY) + global_page_state(NR_UNSTABLE) +
(inodes_stat.nr_inodes - inodes_stat.nr_unused);
while (nr_to_write > 0) {
wbc.encountered_congestion = 0;

2005-12-15 00:17:38

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 13/14] Remove get_page_state functions

Remove obsolate page_state functions

We can remove all the get_page_state related functions after all the basic
page state variables have been moved to the zone based scheme.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-14 15:37:43.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 15:39:22.000000000 -0800
@@ -82,8 +82,6 @@
* allowed.
*/
struct page_state {
-#define GET_PAGE_STATE_LAST xxx
-
/*
* The below are zeroed by get_page_state(). Use get_full_page_state()
* to add up all these.
@@ -136,8 +134,6 @@ struct page_state {
unsigned long nr_bounce; /* pages for bounce buffers */
};

-extern void get_page_state(struct page_state *ret);
-extern void get_page_state_node(struct page_state *ret, int node);
extern void get_full_page_state(struct page_state *ret);
extern unsigned long __read_page_state(unsigned long offset);
extern void __mod_page_state(unsigned long offset, unsigned long delta);
Index: linux-2.6.15-rc5-mm2/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/drivers/base/node.c 2005-12-14 15:35:38.000000000 -0800
+++ linux-2.6.15-rc5-mm2/drivers/base/node.c 2005-12-14 15:39:22.000000000 -0800
@@ -39,7 +39,6 @@ static ssize_t node_read_meminfo(struct
int n;
int nid = dev->id;
struct sysinfo i;
- struct page_state ps;
unsigned long inactive;
unsigned long active;
unsigned long free;
@@ -47,7 +46,6 @@ static ssize_t node_read_meminfo(struct
unsigned long nr[NR_STAT_ITEMS];

si_meminfo_node(&i, nid);
- get_page_state_node(&ps, nid);
__get_zone_counts(&active, &inactive, &free, NODE_DATA(nid));
for (j = 0; j < NR_STAT_ITEMS; j++)
nr[j] = node_page_state(nid, j);
Index: linux-2.6.15-rc5-mm2/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/arch/i386/mm/pgtable.c 2005-12-14 15:35:38.000000000 -0800
+++ linux-2.6.15-rc5-mm2/arch/i386/mm/pgtable.c 2005-12-14 15:39:22.000000000 -0800
@@ -30,7 +30,6 @@ void show_mem(void)
struct page *page;
pg_data_t *pgdat;
unsigned long i;
- struct page_state ps;
unsigned long flags;

printk(KERN_INFO "Mem-info:\n");
@@ -58,7 +57,6 @@ void show_mem(void)
printk(KERN_INFO "%d pages shared\n", shared);
printk(KERN_INFO "%d pages swap cached\n", cached);

- get_page_state(&ps);
printk(KERN_INFO "%lu pages dirty\n", global_page_state(NR_DIRTY));
printk(KERN_INFO "%lu pages writeback\n", global_page_state(NR_WRITEBACK));
printk(KERN_INFO "%lu pages mapped\n", ps.nr_mapped);
Index: linux-2.6.15-rc5-mm2/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/swap_prefetch.c 2005-12-14 15:37:43.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/swap_prefetch.c 2005-12-14 15:39:22.000000000 -0800
@@ -274,7 +274,6 @@ static inline unsigned long prefetch_pag
*/
static int prefetch_suitable(void)
{
- struct page_state ps;
unsigned long pending_writes, limit;
struct zone *z;
int ret = 0;
@@ -312,8 +311,6 @@ static int prefetch_suitable(void)
} else
last_free = temp_free;

- get_page_state(&ps);
-
/* We shouldn't prefetch when we are doing writeback */
if (global_page_state(NR_WRITEBACK))
goto out;
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 15:37:54.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:39:22.000000000 -0800
@@ -1608,28 +1608,6 @@ static void __get_page_state(struct page
}
}

-void get_page_state_node(struct page_state *ret, int node)
-{
- int nr;
- cpumask_t mask = node_to_cpumask(node);
-
- nr = offsetof(struct page_state, GET_PAGE_STATE_LAST);
- nr /= sizeof(unsigned long);
-
- __get_page_state(ret, nr+1, &mask);
-}
-
-void get_page_state(struct page_state *ret)
-{
- int nr;
- cpumask_t mask = CPU_MASK_ALL;
-
- nr = offsetof(struct page_state, GET_PAGE_STATE_LAST);
- nr /= sizeof(unsigned long);
-
- __get_page_state(ret, nr + 1, &mask);
-}
-
void get_full_page_state(struct page_state *ret)
{
cpumask_t mask = CPU_MASK_ALL;
@@ -1737,7 +1715,6 @@ void si_meminfo_node(struct sysinfo *val
*/
void show_free_areas(void)
{
- struct page_state ps;
int cpu, temperature;
unsigned long active;
unsigned long inactive;
@@ -1769,7 +1746,6 @@ void show_free_areas(void)
}
}

- get_page_state(&ps);
get_zone_counts(&active, &inactive, &free);

printk("Free pages: %11ukB (%ukB HighMem)\n",
Index: linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/proc/proc_misc.c 2005-12-14 15:35:38.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c 2005-12-14 15:39:22.000000000 -0800
@@ -120,7 +120,6 @@ static int meminfo_read_proc(char *page,
{
struct sysinfo i;
int len;
- struct page_state ps;
unsigned long inactive;
unsigned long active;
unsigned long free;
@@ -129,7 +128,6 @@ static int meminfo_read_proc(char *page,
struct vmalloc_info vmi;
long cached;

- get_page_state(&ps);
get_zone_counts(&active, &inactive, &free);

/*

2005-12-15 00:18:10

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 10/14] Convert nr_dirty

Convert nr_dirty to zoned counter

This makes nr_dirty a per zone counter, so that we can determine the number of
dirty pages per node etc.

The counter aggregation for nr_dirty had to be undone in the NFS layer since
it summed up the pages from multiple zones.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 15:29:46.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:35:34.000000000 -0800
@@ -596,7 +596,9 @@ static int rmqueue_bulk(struct zone *zon
return i;
}

-char *stat_item_descr[NR_STAT_ITEMS] = { "mapped","pagecache", "slab", "pagetable" };
+char *stat_item_descr[NR_STAT_ITEMS] = {
+ "mapped","pagecache", "slab", "pagetable", "dirty"
+};

/*
* Manage combined zone based / global counters
@@ -1777,7 +1779,7 @@ void show_free_areas(void)
"unstable:%lu free:%u slab:%lu mapped:%lu pagetables:%lu\n",
active,
inactive,
- ps.nr_dirty,
+ global_page_state(NR_DIRTY),
ps.nr_writeback,
ps.nr_unstable,
nr_free_pages(),
Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-14 15:29:46.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 15:34:57.000000000 -0800
@@ -82,7 +82,6 @@
* allowed.
*/
struct page_state {
- unsigned long nr_dirty; /* Dirty writeable pages */
unsigned long nr_writeback; /* Pages under writeback */
unsigned long nr_unstable; /* NFS unstable pages */
#define GET_PAGE_STATE_LAST nr_unstable
Index: linux-2.6.15-rc5-mm2/mm/page-writeback.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page-writeback.c 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page-writeback.c 2005-12-14 15:34:57.000000000 -0800
@@ -109,7 +109,7 @@ struct writeback_state

static void get_writeback_state(struct writeback_state *wbs)
{
- wbs->nr_dirty = read_page_state(nr_dirty);
+ wbs->nr_dirty = global_page_state(NR_DIRTY);
wbs->nr_unstable = read_page_state(nr_unstable);
wbs->nr_mapped = global_page_state(NR_MAPPED);
wbs->nr_writeback = read_page_state(nr_writeback);
@@ -632,7 +632,7 @@ int __set_page_dirty_nobuffers(struct pa
if (mapping2) { /* Race with truncate? */
BUG_ON(mapping2 != mapping);
if (mapping_cap_account_dirty(mapping))
- inc_page_state(nr_dirty);
+ __inc_zone_page_state(page, NR_DIRTY);
radix_tree_tag_set(&mapping->page_tree,
page_index(page), PAGECACHE_TAG_DIRTY);
}
@@ -716,9 +716,9 @@ int test_clear_page_dirty(struct page *p
radix_tree_tag_clear(&mapping->page_tree,
page_index(page),
PAGECACHE_TAG_DIRTY);
- write_unlock_irqrestore(&mapping->tree_lock, flags);
if (mapping_cap_account_dirty(mapping))
- dec_page_state(nr_dirty);
+ __dec_zone_page_state(page, NR_DIRTY);
+ write_unlock_irqrestore(&mapping->tree_lock, flags);
return 1;
}
write_unlock_irqrestore(&mapping->tree_lock, flags);
@@ -749,7 +749,7 @@ int clear_page_dirty_for_io(struct page
if (mapping) {
if (TestClearPageDirty(page)) {
if (mapping_cap_account_dirty(mapping))
- dec_page_state(nr_dirty);
+ dec_zone_page_state(page, NR_DIRTY);
return 1;
}
return 0;
Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-14 15:29:46.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 15:34:57.000000000 -0800
@@ -44,8 +44,8 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

-enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB, NR_PAGETABLE };
-#define NR_STAT_ITEMS 4
+enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB, NR_PAGETABLE, NR_DIRTY };
+#define NR_STAT_ITEMS 5

struct per_cpu_pages {
int count; /* number of pages in the list */
Index: linux-2.6.15-rc5-mm2/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/drivers/base/node.c 2005-12-14 15:29:46.000000000 -0800
+++ linux-2.6.15-rc5-mm2/drivers/base/node.c 2005-12-14 15:34:57.000000000 -0800
@@ -53,8 +53,6 @@ static ssize_t node_read_meminfo(struct
nr[j] = node_page_state(nid, j);

/* Check for negative values in these approximate counters */
- if ((long)ps.nr_dirty < 0)
- ps.nr_dirty = 0;
if ((long)ps.nr_writeback < 0)
ps.nr_writeback = 0;

@@ -82,7 +80,7 @@ static ssize_t node_read_meminfo(struct
nid, K(i.freehigh),
nid, K(i.totalram - i.totalhigh),
nid, K(i.freeram - i.freehigh),
- nid, K(ps.nr_dirty),
+ nid, K(nr[NR_DIRTY]),
nid, K(ps.nr_writeback),
nid, K(nr[NR_MAPPED]),
nid, K(nr[NR_PAGECACHE]),
Index: linux-2.6.15-rc5-mm2/fs/fs-writeback.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/fs-writeback.c 2005-12-12 09:10:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/fs-writeback.c 2005-12-14 15:34:57.000000000 -0800
@@ -470,7 +470,7 @@ void sync_inodes_sb(struct super_block *
struct writeback_control wbc = {
.sync_mode = wait ? WB_SYNC_ALL : WB_SYNC_HOLD,
};
- unsigned long nr_dirty = read_page_state(nr_dirty);
+ unsigned long nr_dirty = global_page_state(NR_DIRTY);
unsigned long nr_unstable = read_page_state(nr_unstable);

wbc.nr_to_write = nr_dirty + nr_unstable +
Index: linux-2.6.15-rc5-mm2/fs/buffer.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/buffer.c 2005-12-13 20:41:05.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/buffer.c 2005-12-14 15:34:57.000000000 -0800
@@ -857,7 +857,7 @@ int __set_page_dirty_buffers(struct page
write_lock_irq(&mapping->tree_lock);
if (page->mapping) { /* Race with truncate? */
if (mapping_cap_account_dirty(mapping))
- inc_page_state(nr_dirty);
+ __inc_zone_page_state(page, NR_DIRTY);
radix_tree_tag_set(&mapping->page_tree,
page_index(page),
PAGECACHE_TAG_DIRTY);
Index: linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/proc/proc_misc.c 2005-12-14 15:29:46.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c 2005-12-14 15:34:57.000000000 -0800
@@ -188,7 +188,7 @@ static int meminfo_read_proc(char *page,
K(i.freeram-i.freehigh),
K(i.totalswap),
K(i.freeswap),
- K(ps.nr_dirty),
+ K(global_page_state(NR_DIRTY)),
K(ps.nr_writeback),
K(global_page_state(NR_MAPPED)),
K(global_page_state(NR_SLAB)),
Index: linux-2.6.15-rc5-mm2/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/arch/i386/mm/pgtable.c 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm2/arch/i386/mm/pgtable.c 2005-12-14 15:34:57.000000000 -0800
@@ -59,7 +59,7 @@ void show_mem(void)
printk(KERN_INFO "%d pages swap cached\n", cached);

get_page_state(&ps);
- printk(KERN_INFO "%lu pages dirty\n", ps.nr_dirty);
+ printk(KERN_INFO "%lu pages dirty\n", global_page_state(NR_DIRTY));
printk(KERN_INFO "%lu pages writeback\n", ps.nr_writeback);
printk(KERN_INFO "%lu pages mapped\n", ps.nr_mapped);
printk(KERN_INFO "%lu pages slab\n", ps.nr_slab);
Index: linux-2.6.15-rc5-mm2/fs/reiser4/page_cache.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/reiser4/page_cache.c 2005-12-12 09:10:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/reiser4/page_cache.c 2005-12-14 15:34:57.000000000 -0800
@@ -470,7 +470,7 @@ int set_page_dirty_internal(struct page

if (!TestSetPageDirty(page)) {
if (mapping_cap_account_dirty(mapping))
- inc_page_state(nr_dirty);
+ inc_zone_page_state(page, NR_DIRTY);

__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
}
Index: linux-2.6.15-rc5-mm2/fs/nfs/write.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/nfs/write.c 2005-12-12 09:10:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/nfs/write.c 2005-12-14 15:34:57.000000000 -0800
@@ -446,7 +446,7 @@ nfs_mark_request_dirty(struct nfs_page *
nfs_list_add_request(req, &nfsi->dirty);
nfsi->ndirty++;
spin_unlock(&nfsi->req_lock);
- inc_page_state(nr_dirty);
+ inc_zone_page_state(req->wb_page, NR_DIRTY);
mark_inode_dirty(inode);
}

@@ -539,7 +539,6 @@ nfs_scan_dirty(struct inode *inode, stru
if (nfsi->ndirty != 0) {
res = nfs_scan_lock_dirty(nfsi, dst, idx_start, npages);
nfsi->ndirty -= res;
- sub_page_state(nr_dirty,res);
if ((nfsi->ndirty == 0) != list_empty(&nfsi->dirty))
printk(KERN_ERR "NFS: desynchronized value of nfs_i.ndirty.\n");
}
Index: linux-2.6.15-rc5-mm2/fs/reiser4/emergency_flush.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/reiser4/emergency_flush.c 2005-12-12 09:10:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/reiser4/emergency_flush.c 2005-12-14 15:34:57.000000000 -0800
@@ -740,7 +740,7 @@ void eflush_del(jnode * node, int page_l
if (!TestSetPageDirty(page)) {
BUG_ON(jnode_get_mapping(node) != page->mapping);
if (mapping_cap_account_dirty(page->mapping))
- inc_page_state(nr_dirty);
+ inc_zone_page_state(page, NR_DIRTY);
}

assert("nikita-2766", atomic_read(&node->x_count) > 1);
Index: linux-2.6.15-rc5-mm2/fs/reiser4/as_ops.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/reiser4/as_ops.c 2005-12-12 09:10:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/reiser4/as_ops.c 2005-12-14 15:34:57.000000000 -0800
@@ -84,7 +84,7 @@ int reiser4_set_page_dirty(struct page *
if (page->mapping) {
assert("vs-1652", page->mapping == mapping);
if (mapping_cap_account_dirty(mapping))
- inc_page_state(nr_dirty);
+ __inc_zone_page_state(page, NR_DIRTY);
radix_tree_tag_set(&mapping->page_tree,
page->index,
PAGECACHE_TAG_REISER4_MOVED);
Index: linux-2.6.15-rc5-mm2/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/swap_prefetch.c 2005-12-14 15:29:13.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/swap_prefetch.c 2005-12-14 15:34:57.000000000 -0800
@@ -319,7 +319,7 @@ static int prefetch_suitable(void)
goto out;

/* Delay prefetching if we have significant amounts of dirty data */
- pending_writes = ps.nr_dirty + ps.nr_unstable;
+ pending_writes = global_page_state(NR_DIRTY) + ps.nr_unstable;
if (pending_writes > SWAP_CLUSTER_MAX)
goto out;

Index: linux-2.6.15-rc5-mm2/fs/nfs/pagelist.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/nfs/pagelist.c 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/nfs/pagelist.c 2005-12-14 15:34:57.000000000 -0800
@@ -309,6 +309,7 @@ nfs_scan_lock_dirty(struct nfs_inode *nf
req->wb_index, NFS_PAGE_TAG_DIRTY);
nfs_list_remove_request(req);
nfs_list_add_request(req, dst);
+ inc_zone_page_state(req->wb_page, NR_DIRTY);
res++;
}
}

2005-12-15 00:17:10

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 11/14] Convert nr_writeback

Per zone page writeback counts

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/drivers/base/node.c 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/drivers/base/node.c 2005-12-14 15:35:38.000000000 -0800
@@ -52,9 +52,6 @@ static ssize_t node_read_meminfo(struct
for (j = 0; j < NR_STAT_ITEMS; j++)
nr[j] = node_page_state(nid, j);

- /* Check for negative values in these approximate counters */
- if ((long)ps.nr_writeback < 0)
- ps.nr_writeback = 0;

n = sprintf(buf, "\n"
"Node %d MemTotal: %8lu kB\n"
@@ -81,7 +78,7 @@ static ssize_t node_read_meminfo(struct
nid, K(i.totalram - i.totalhigh),
nid, K(i.freeram - i.freehigh),
nid, K(nr[NR_DIRTY]),
- nid, K(ps.nr_writeback),
+ nid, K(nr[NR_WRITEBACK]),
nid, K(nr[NR_MAPPED]),
nid, K(nr[NR_PAGECACHE]),
nid, K(nr[NR_SLAB]));
Index: linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/proc/proc_misc.c 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c 2005-12-14 15:35:38.000000000 -0800
@@ -189,7 +189,7 @@ static int meminfo_read_proc(char *page,
K(i.totalswap),
K(i.freeswap),
K(global_page_state(NR_DIRTY)),
- K(ps.nr_writeback),
+ K(global_page_state(NR_WRITEBACK)),
K(global_page_state(NR_MAPPED)),
K(global_page_state(NR_SLAB)),
K(allowed),
Index: linux-2.6.15-rc5-mm2/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/arch/i386/mm/pgtable.c 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/arch/i386/mm/pgtable.c 2005-12-14 15:35:38.000000000 -0800
@@ -60,7 +60,7 @@ void show_mem(void)

get_page_state(&ps);
printk(KERN_INFO "%lu pages dirty\n", global_page_state(NR_DIRTY));
- printk(KERN_INFO "%lu pages writeback\n", ps.nr_writeback);
+ printk(KERN_INFO "%lu pages writeback\n", global_page_state(NR_WRITEBACK));
printk(KERN_INFO "%lu pages mapped\n", ps.nr_mapped);
printk(KERN_INFO "%lu pages slab\n", ps.nr_slab);
printk(KERN_INFO "%lu pages pagetables\n", ps.nr_page_table_pages);
Index: linux-2.6.15-rc5-mm2/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/swap_prefetch.c 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/swap_prefetch.c 2005-12-14 15:35:38.000000000 -0800
@@ -315,7 +315,7 @@ static int prefetch_suitable(void)
get_page_state(&ps);

/* We shouldn't prefetch when we are doing writeback */
- if (ps.nr_writeback)
+ if (global_page_state(NR_WRITEBACK))
goto out;

/* Delay prefetching if we have significant amounts of dirty data */
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 15:35:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:37:34.000000000 -0800
@@ -597,7 +597,7 @@ static int rmqueue_bulk(struct zone *zon
}

char *stat_item_descr[NR_STAT_ITEMS] = {
- "mapped","pagecache", "slab", "pagetable", "dirty"
+ "mapped","pagecache", "slab", "pagetable", "dirty", "writeback"
};

/*
@@ -1780,7 +1780,7 @@ void show_free_areas(void)
active,
inactive,
global_page_state(NR_DIRTY),
- ps.nr_writeback,
+ global_page_state(NR_WRITEBACK),
ps.nr_unstable,
nr_free_pages(),
global_page_state(NR_SLAB),
Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 15:35:38.000000000 -0800
@@ -82,7 +82,6 @@
* allowed.
*/
struct page_state {
- unsigned long nr_writeback; /* Pages under writeback */
unsigned long nr_unstable; /* NFS unstable pages */
#define GET_PAGE_STATE_LAST nr_unstable

@@ -291,7 +290,7 @@ void dec_zone_page_state(const struct pa
do { \
if (!test_and_set_bit(PG_writeback, \
&(page)->flags)) \
- inc_page_state(nr_writeback); \
+ inc_zone_page_state(page, NR_WRITEBACK); \
} while (0)
#define TestSetPageWriteback(page) \
({ \
@@ -299,14 +298,14 @@ void dec_zone_page_state(const struct pa
ret = test_and_set_bit(PG_writeback, \
&(page)->flags); \
if (!ret) \
- inc_page_state(nr_writeback); \
+ inc_zone_page_state(page, NR_WRITEBACK); \
ret; \
})
#define ClearPageWriteback(page) \
do { \
if (test_and_clear_bit(PG_writeback, \
&(page)->flags)) \
- dec_page_state(nr_writeback); \
+ dec_zone_page_state(page, NR_WRITEBACK); \
} while (0)
#define TestClearPageWriteback(page) \
({ \
@@ -314,7 +313,7 @@ void dec_zone_page_state(const struct pa
ret = test_and_clear_bit(PG_writeback, \
&(page)->flags); \
if (ret) \
- dec_page_state(nr_writeback); \
+ dec_zone_page_state(page, NR_WRITEBACK); \
ret; \
})

Index: linux-2.6.15-rc5-mm2/mm/page-writeback.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page-writeback.c 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page-writeback.c 2005-12-14 15:35:38.000000000 -0800
@@ -112,7 +112,7 @@ static void get_writeback_state(struct w
wbs->nr_dirty = global_page_state(NR_DIRTY);
wbs->nr_unstable = read_page_state(nr_unstable);
wbs->nr_mapped = global_page_state(NR_MAPPED);
- wbs->nr_writeback = read_page_state(nr_writeback);
+ wbs->nr_writeback = global_page_state(NR_WRITEBACK);
}

/*
Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-14 15:34:57.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 15:35:38.000000000 -0800
@@ -44,8 +44,8 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

-enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB, NR_PAGETABLE, NR_DIRTY };
-#define NR_STAT_ITEMS 5
+enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB, NR_PAGETABLE, NR_DIRTY, NR_WRITEBACK };
+#define NR_STAT_ITEMS 6

struct per_cpu_pages {
int count; /* number of pages in the list */

2005-12-15 00:17:11

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 08/14] Convert nr_slab

The number of slab pages in use is currently a counter split per cpu.
Make the number of slab pages a per zone counter so that we can see how
many slab pages have been allocated in each zone.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/drivers/base/node.c 2005-12-14 15:28:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/drivers/base/node.c 2005-12-14 15:29:13.000000000 -0800
@@ -88,7 +88,7 @@ static ssize_t node_read_meminfo(struct
nid, K(ps.nr_writeback),
nid, K(nr[NR_MAPPED]),
nid, K(nr[NR_PAGECACHE]),
- nid, K(ps.nr_slab));
+ nid, K(nr[NR_SLAB]));
n += hugetlb_report_node_meminfo(nid, buf + n);
return n;
}
Index: linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/proc/proc_misc.c 2005-12-14 14:57:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c 2005-12-14 15:29:13.000000000 -0800
@@ -191,7 +191,7 @@ static int meminfo_read_proc(char *page,
K(ps.nr_dirty),
K(ps.nr_writeback),
K(global_page_state(NR_MAPPED)),
- K(ps.nr_slab),
+ K(global_page_state(NR_SLAB)),
K(allowed),
K(committed),
K(ps.nr_page_table_pages),
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 15:28:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:29:13.000000000 -0800
@@ -596,7 +596,7 @@ static int rmqueue_bulk(struct zone *zon
return i;
}

-char *stat_item_descr[NR_STAT_ITEMS] = { "mapped","pagecache" };
+char *stat_item_descr[NR_STAT_ITEMS] = { "mapped","pagecache", "slab" };

/*
* Manage combined zone based / global counters
@@ -1781,7 +1781,7 @@ void show_free_areas(void)
ps.nr_writeback,
ps.nr_unstable,
nr_free_pages(),
- ps.nr_slab,
+ global_page_state(NR_SLAB),
global_page_state(NR_MAPPED),
ps.nr_page_table_pages);

Index: linux-2.6.15-rc5-mm2/mm/slab.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/slab.c 2005-12-14 14:45:40.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/slab.c 2005-12-14 15:29:13.000000000 -0800
@@ -1236,7 +1236,7 @@ static void *kmem_getpages(kmem_cache_t
i = (1 << cachep->gfporder);
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
atomic_add(i, &slab_reclaim_pages);
- add_page_state(nr_slab, i);
+ add_zone_page_state(page_zone(page), NR_SLAB, i);
while (i--) {
SetPageSlab(page);
page++;
@@ -1258,7 +1258,7 @@ static void kmem_freepages(kmem_cache_t
BUG();
page++;
}
- sub_page_state(nr_slab, nr_freed);
+ sub_zone_page_state(page_zone(page), NR_SLAB, nr_freed);
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += nr_freed;
free_pages((unsigned long)addr, cachep->gfporder);
Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 15:29:13.000000000 -0800
@@ -86,8 +86,7 @@ struct page_state {
unsigned long nr_writeback; /* Pages under writeback */
unsigned long nr_unstable; /* NFS unstable pages */
unsigned long nr_page_table_pages;/* Pages used for pagetables */
- unsigned long nr_slab; /* In slab */
-#define GET_PAGE_STATE_LAST nr_slab
+#define GET_PAGE_STATE_LAST nr_page_table_pages

/*
* The below are zeroed by get_page_state(). Use get_full_page_state()
Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-14 14:57:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 15:29:13.000000000 -0800
@@ -44,8 +44,8 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

-enum zone_stat_item { NR_MAPPED, NR_PAGECACHE };
-#define NR_STAT_ITEMS 2
+enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB };
+#define NR_STAT_ITEMS 3

struct per_cpu_pages {
int count; /* number of pages in the list */
Index: linux-2.6.15-rc5-mm2/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/swap_prefetch.c 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/swap_prefetch.c 2005-12-14 15:29:13.000000000 -0800
@@ -327,7 +327,7 @@ static int prefetch_suitable(void)
* >2/3 of the ram is mapped or swapcache, we need some free for
* pagecache
*/
- limit = global_page_state(NR_MAPPED) + ps.nr_slab + pending_writes +
+ limit = global_page_state(NR_MAPPED) + global_page_state(NR_SLAB) + pending_writes +
total_swapcache_pages;
if (limit > mapped_limit)
goto out;

2005-12-15 00:15:27

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 09/14] Convert nr_page_table

The nr_page_table_pages counter is currently implemented as a counter
split per cpu. nr_page_table_pages has therefore currently meaning as a
counter of the page table pages in the system as a whole.

This patch switches the counter to use a zone based couter. It is then
possible to determine how many pages in a zone are used for page tables.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/mm/memory.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/memory.c 2005-12-13 15:46:52.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/memory.c 2005-12-14 15:29:46.000000000 -0800
@@ -116,7 +116,7 @@ static void free_pte_range(struct mmu_ga
pmd_clear(pmd);
pte_lock_deinit(page);
pte_free_tlb(tlb, page);
- dec_page_state(nr_page_table_pages);
+ dec_zone_page_state(page, NR_PAGETABLE);
tlb->mm->nr_ptes--;
}

@@ -302,7 +302,7 @@ int __pte_alloc(struct mm_struct *mm, pm
pte_free(new);
} else {
mm->nr_ptes++;
- inc_page_state(nr_page_table_pages);
+ inc_zone_page_state(new, NR_PAGETABLE);
pmd_populate(mm, pmd, new);
}
spin_unlock(&mm->page_table_lock);
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 15:29:13.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:29:46.000000000 -0800
@@ -596,7 +596,7 @@ static int rmqueue_bulk(struct zone *zon
return i;
}

-char *stat_item_descr[NR_STAT_ITEMS] = { "mapped","pagecache", "slab" };
+char *stat_item_descr[NR_STAT_ITEMS] = { "mapped","pagecache", "slab", "pagetable" };

/*
* Manage combined zone based / global counters
@@ -1783,7 +1783,7 @@ void show_free_areas(void)
nr_free_pages(),
global_page_state(NR_SLAB),
global_page_state(NR_MAPPED),
- ps.nr_page_table_pages);
+ global_page_state(NR_PAGETABLE));

for_each_zone(zone) {
int i;
Index: linux-2.6.15-rc5-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/page-flags.h 2005-12-14 15:29:13.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/page-flags.h 2005-12-14 15:29:46.000000000 -0800
@@ -85,8 +85,7 @@ struct page_state {
unsigned long nr_dirty; /* Dirty writeable pages */
unsigned long nr_writeback; /* Pages under writeback */
unsigned long nr_unstable; /* NFS unstable pages */
- unsigned long nr_page_table_pages;/* Pages used for pagetables */
-#define GET_PAGE_STATE_LAST nr_page_table_pages
+#define GET_PAGE_STATE_LAST nr_unstable

/*
* The below are zeroed by get_page_state(). Use get_full_page_state()
Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-14 15:29:13.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 15:29:46.000000000 -0800
@@ -44,8 +44,8 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

-enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB };
-#define NR_STAT_ITEMS 3
+enum zone_stat_item { NR_MAPPED, NR_PAGECACHE, NR_SLAB, NR_PAGETABLE };
+#define NR_STAT_ITEMS 4

struct per_cpu_pages {
int count; /* number of pages in the list */
Index: linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/proc/proc_misc.c 2005-12-14 15:29:13.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c 2005-12-14 15:29:46.000000000 -0800
@@ -194,7 +194,7 @@ static int meminfo_read_proc(char *page,
K(global_page_state(NR_SLAB)),
K(allowed),
K(committed),
- K(ps.nr_page_table_pages),
+ K(global_page_state(NR_PAGETABLE)),
(unsigned long)VMALLOC_TOTAL >> 10,
vmi.used >> 10,
vmi.largest_chunk >> 10
Index: linux-2.6.15-rc5-mm2/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/drivers/base/node.c 2005-12-14 15:29:13.000000000 -0800
+++ linux-2.6.15-rc5-mm2/drivers/base/node.c 2005-12-14 15:29:46.000000000 -0800
@@ -57,8 +57,6 @@ static ssize_t node_read_meminfo(struct
ps.nr_dirty = 0;
if ((long)ps.nr_writeback < 0)
ps.nr_writeback = 0;
- if ((long)ps.nr_slab < 0)
- ps.nr_slab = 0;

n = sprintf(buf, "\n"
"Node %d MemTotal: %8lu kB\n"

2005-12-15 00:19:11

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 04/14] Convert nr_pagecache

Convert nr_pagecache a zoned counter

Currently a single atomic variable is used to establish the size of the page cache
in the whole machine. The zoned VM counters have the same method of implementation
as the nr_pagecache code. Remove the special implementation for nr_pagecache and make
it a zoned counter. We will then be able to figure out how much of the memory in a
zone is used by the pagecache.

Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/include/linux/pagemap.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/pagemap.h 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/pagemap.h 2005-12-14 14:57:33.000000000 -0800
@@ -99,51 +99,6 @@ int add_to_page_cache_lru(struct page *p
extern void remove_from_page_cache(struct page *page);
extern void __remove_from_page_cache(struct page *page);

-extern atomic_t nr_pagecache;
-
-#ifdef CONFIG_SMP
-
-#define PAGECACHE_ACCT_THRESHOLD max(16, NR_CPUS * 2)
-DECLARE_PER_CPU(long, nr_pagecache_local);
-
-/*
- * pagecache_acct implements approximate accounting for pagecache.
- * vm_enough_memory() do not need high accuracy. Writers will keep
- * an offset in their per-cpu arena and will spill that into the
- * global count whenever the absolute value of the local count
- * exceeds the counter's threshold.
- *
- * MUST be protected from preemption.
- * current protection is mapping->page_lock.
- */
-static inline void pagecache_acct(int count)
-{
- long *local;
-
- local = &__get_cpu_var(nr_pagecache_local);
- *local += count;
- if (*local > PAGECACHE_ACCT_THRESHOLD || *local < -PAGECACHE_ACCT_THRESHOLD) {
- atomic_add(*local, &nr_pagecache);
- *local = 0;
- }
-}
-
-#else
-
-static inline void pagecache_acct(int count)
-{
- atomic_add(count, &nr_pagecache);
-}
-#endif
-
-static inline unsigned long get_page_cache_size(void)
-{
- int ret = atomic_read(&nr_pagecache);
- if (unlikely(ret < 0))
- ret = 0;
- return ret;
-}
-
/*
* Return byte-offset into filesystem object for page.
*/
Index: linux-2.6.15-rc5-mm2/mm/swap_state.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/swap_state.c 2005-12-13 20:41:05.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/swap_state.c 2005-12-14 14:57:33.000000000 -0800
@@ -87,7 +87,7 @@ static int __add_to_swap_cache(struct pa
SetPageSwapCache(page);
set_page_private(page, entry.val);
total_swapcache_pages++;
- pagecache_acct(1);
+ __inc_zone_page_state(page, NR_PAGECACHE);
}
write_unlock_irq(&swapper_space.tree_lock);
radix_tree_preload_end();
@@ -133,7 +133,7 @@ void __delete_from_swap_cache(struct pag
set_page_private(page, 0);
ClearPageSwapCache(page);
total_swapcache_pages--;
- pagecache_acct(-1);
+ __dec_zone_page_state(page, NR_PAGECACHE);
INC_CACHE_INFO(del_total);
}

Index: linux-2.6.15-rc5-mm2/mm/filemap.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/filemap.c 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/filemap.c 2005-12-14 14:57:33.000000000 -0800
@@ -115,7 +115,7 @@ void __remove_from_page_cache(struct pag
radix_tree_delete(&mapping->page_tree, page->index);
page->mapping = NULL;
mapping->nrpages--;
- pagecache_acct(-1);
+ __dec_zone_page_state(page, NR_PAGECACHE);
}
EXPORT_SYMBOL(__remove_from_page_cache);

@@ -406,7 +406,7 @@ int add_to_page_cache(struct page *page,
page->mapping = mapping;
page->index = offset;
mapping->nrpages++;
- pagecache_acct(1);
+ __inc_zone_page_state(page, NR_PAGECACHE);
}
write_unlock_irq(&mapping->tree_lock);
radix_tree_preload_end();
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 14:57:33.000000000 -0800
@@ -1578,12 +1578,6 @@ static void show_node(struct zone *zone)
*/
static DEFINE_PER_CPU(struct page_state, page_states) = {0};

-atomic_t nr_pagecache = ATOMIC_INIT(0);
-EXPORT_SYMBOL(nr_pagecache);
-#ifdef CONFIG_SMP
-DEFINE_PER_CPU(long, nr_pagecache_local) = 0;
-#endif
-
static void __get_page_state(struct page_state *ret, int nr, cpumask_t *cpumask)
{
int cpu = 0;
Index: linux-2.6.15-rc5-mm2/mm/mmap.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/mmap.c 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/mmap.c 2005-12-14 14:57:33.000000000 -0800
@@ -95,7 +95,7 @@ int __vm_enough_memory(long pages, int c
if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
unsigned long n;

- free = get_page_cache_size();
+ free = global_page_state(NR_PAGECACHE);
free += nr_swap_pages;

/*
Index: linux-2.6.15-rc5-mm2/mm/nommu.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/nommu.c 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/nommu.c 2005-12-14 14:57:33.000000000 -0800
@@ -1114,7 +1114,7 @@ int __vm_enough_memory(long pages, int c
if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
unsigned long n;

- free = get_page_cache_size();
+ free = global_page_state(NR_PAGECACHE);
free += nr_swap_pages;

/*
Index: linux-2.6.15-rc5-mm2/arch/sparc64/kernel/sys_sunos32.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/arch/sparc64/kernel/sys_sunos32.c 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm2/arch/sparc64/kernel/sys_sunos32.c 2005-12-14 14:57:33.000000000 -0800
@@ -154,7 +154,7 @@ asmlinkage int sunos_brk(u32 baddr)
* simple, it hopefully works in most obvious cases.. Easy to
* fool it, but this should catch most mistakes.
*/
- freepages = get_page_cache_size();
+ freepages = global_page_state(NR_PAGECACHE);
freepages >>= 1;
freepages += nr_free_pages();
freepages += nr_swap_pages;
Index: linux-2.6.15-rc5-mm2/arch/sparc/kernel/sys_sunos.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/arch/sparc/kernel/sys_sunos.c 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm2/arch/sparc/kernel/sys_sunos.c 2005-12-14 14:57:33.000000000 -0800
@@ -195,7 +195,7 @@ asmlinkage int sunos_brk(unsigned long b
* simple, it hopefully works in most obvious cases.. Easy to
* fool it, but this should catch most mistakes.
*/
- freepages = get_page_cache_size();
+ freepages = global_page_state(NR_PAGECACHE);
freepages >>= 1;
freepages += nr_free_pages();
freepages += nr_swap_pages;
Index: linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/fs/proc/proc_misc.c 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/fs/proc/proc_misc.c 2005-12-14 14:57:33.000000000 -0800
@@ -142,7 +142,7 @@ static int meminfo_read_proc(char *page,
allowed = ((totalram_pages - hugetlb_total_pages())
* sysctl_overcommit_ratio / 100) + total_swap_pages;

- cached = get_page_cache_size() - total_swapcache_pages - i.bufferram;
+ cached = global_page_state(NR_PAGECACHE) - total_swapcache_pages - i.bufferram;
if (cached < 0)
cached = 0;

Index: linux-2.6.15-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mmzone.h 2005-12-14 14:57:29.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mmzone.h 2005-12-14 14:57:33.000000000 -0800
@@ -44,8 +44,8 @@ struct zone_padding {
#define ZONE_PADDING(name)
#endif

-enum zone_stat_item { NR_MAPPED };
-#define NR_STAT_ITEMS 1
+enum zone_stat_item { NR_MAPPED, NR_PAGECACHE };
+#define NR_STAT_ITEMS 2

struct per_cpu_pages {
int count; /* number of pages in the list */

2005-12-15 00:15:26

by Christoph Lameter

[permalink] [raw]
Subject: [RFC3 01/14] Add some consts for inlines in mm.h

[PATCH] const attributes for some inlines in mm.h

Const attributes allow the compiler to generate more efficient code by
allowing callers to keep arguments of struct page in registers.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.15-rc5-mm2/include/linux/mm.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/mm.h 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/mm.h 2005-12-14 14:39:50.000000000 -0800
@@ -456,7 +456,7 @@ void put_page(struct page *page);
#define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1)
#define ZONETABLE_MASK ((1UL << ZONETABLE_SHIFT) - 1)

-static inline unsigned long page_zonenum(struct page *page)
+static inline unsigned long page_zonenum(const struct page *page)
{
return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
}
@@ -464,20 +464,20 @@ static inline unsigned long page_zonenum
struct zone;
extern struct zone *zone_table[];

-static inline struct zone *page_zone(struct page *page)
+static inline struct zone *page_zone(const struct page *page)
{
return zone_table[(page->flags >> ZONETABLE_PGSHIFT) &
ZONETABLE_MASK];
}

-static inline unsigned long page_to_nid(struct page *page)
+static inline unsigned long page_to_nid(const struct page *page)
{
if (FLAGS_HAS_NODE)
return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
else
return page_zone(page)->zone_pgdat->node_id;
}
-static inline unsigned long page_to_section(struct page *page)
+static inline unsigned long page_to_section(const struct page *page)
{
return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
}
@@ -511,7 +511,7 @@ static inline void set_page_links(struct
extern struct page *mem_map;
#endif

-static inline void *lowmem_page_address(struct page *page)
+static inline void *lowmem_page_address(const struct page *page)
{
return __va(page_to_pfn(page) << PAGE_SHIFT);
}
@@ -553,7 +553,7 @@ void page_address_init(void);
#define PAGE_MAPPING_ANON 1

extern struct address_space swapper_space;
-static inline struct address_space *page_mapping(struct page *page)
+static inline struct address_space *page_mapping(const struct page *page)
{
struct address_space *mapping = page->mapping;

@@ -564,7 +564,7 @@ static inline struct address_space *page
return mapping;
}

-static inline int PageAnon(struct page *page)
+static inline int PageAnon(const struct page *page)
{
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
}
@@ -573,7 +573,7 @@ static inline int PageAnon(struct page *
* Return the pagecache index of the passed page. Regular pagecache pages
* use ->index whereas swapcache pages use ->private
*/
-static inline pgoff_t page_index(struct page *page)
+static inline pgoff_t page_index(const struct page *page)
{
if (unlikely(PageSwapCache(page)))
return page_private(page);
@@ -590,7 +590,7 @@ static inline void reset_page_mapcount(s
atomic_set(&(page)->_mapcount, -1);
}

-static inline int page_mapcount(struct page *page)
+static inline int page_mapcount(const struct page *page)
{
return atomic_read(&(page)->_mapcount) + 1;
}
@@ -598,7 +598,7 @@ static inline int page_mapcount(struct p
/*
* Return true if this page is mapped into pagetables.
*/
-static inline int page_mapped(struct page *page)
+static inline int page_mapped(const struct page *page)
{
return atomic_read(&(page)->_mapcount) >= 0;
}

2005-12-15 00:59:28

by J.A. Magallon

[permalink] [raw]
Subject: Re: [RFC3 01/14] Add some consts for inlines in mm.h

On Wed, 14 Dec 2005 16:14:20 -0800 (PST), Christoph Lameter <[email protected]> wrote:

> [PATCH] const attributes for some inlines in mm.h
>
> Const attributes allow the compiler to generate more efficient code by
> allowing callers to keep arguments of struct page in registers.
>

Even if it does not keep them in registers, at least it doesn't duplicate
them...

--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam4 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))


Attachments:
signature.asc (189.00 B)

2005-12-17 04:20:34

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [RFC3 02/14] Basic counter functionality

Hi Christoph,

On Wed, Dec 14, 2005 at 04:14:25PM -0800, Christoph Lameter wrote:
> Currently we have various vm counters for the pages in a zone that are split
> per cpu. This arrangement does not allow access to per zone statistics that
> are important to optimize VM behavior for NUMA architectures. All one can say
> from the per cpu differential variables is how much a certain variable was
> changed by this cpu without being able to deduce how many pages in each zone
> are of a certain type.
>
> This framework here implements differential counters for each processor
> in struct zone. The differential counters are consolidated when a threshold
> is exceeded (like done in the current implementation for nr_pageache), when
> slab reaping occurs or when a consolidation function is called.
> Consolidation uses atomic operations and accumulates counters per zone in
> the zone structure and also globally in the vm_stat array. VM function can
> access the counts by simply indexing a global or zone specific array.
>
> The arrangement of counters in an array simplifies processing when output
> has to be generated for /proc/*.
>
> Counter updates can be triggered by calling *_zone_page_state or
> __*_zone_page_state. The second function can be called if it is known that
> interrupts are disabled.
>
> Specially optimized increment and decrement functions are provided. These
> can avoid certain checks and use increment or decrement instructions that
> an architecture may provide.
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-12 15:07:45.000000000 -0800
> +++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 14:57:22.000000000 -0800
> @@ -596,7 +596,281 @@ static int rmqueue_bulk(struct zone *zon
> return i;
> }
>
> +/*
> + * Manage combined zone based / global counters
> + */
> +#define STAT_THRESHOLD 32
> +
> +atomic_long_t vm_stat[NR_STAT_ITEMS];
> +
> +static inline void zone_page_state_consolidate(long x, struct zone *zone, enum zone_stat_item item)
> +{
> + atomic_long_add(x, &zone->vm_stat[item]);
> + atomic_long_add(x, &vm_stat[item]);
> +}
> +
> +#ifdef CONFIG_SMP
> +/*
> + * Determine pointer to currently valid differential byte given a zone and
> + * the item number.
> + *
> + * Preemption must be off
> + */
> +static inline s8 *diff_pointer(struct zone *zone, enum zone_stat_item item)
> +{
> + return &zone_pcp(zone, raw_smp_processor_id())->vm_stat_diff[item];
> +}
> +
> +/*
> + * For use when we know that interrupts are disabled.
> + */
> +void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta)
> +{
> + s8 *p;
> + long x;
> +
> + p = diff_pointer(zone, item);
> + x = delta + *p;
> +
> + if (unlikely(x > STAT_THRESHOLD || x < -STAT_THRESHOLD)) {
> + zone_page_state_consolidate(x, zone, item);
> + x = 0;
> + }
> +
> + *p = x;
> +}

There is no need to disable interrupts AFAICS, but only preemption
(which could cause problems as your comment above describes). I suppose
that these counters are not accessed at interrupt time and are not meant
to be, right?

Which means that if an interrupt happens at any point in the code,
the state will be consistent after the IRQ(s) handler(s) finish and
execution restarts where it had been interrupted.

Why not use preempt_disable/preempt_enable? Those would disappear
if !CONFIG_PREEMPT, and could be faster than the interrupt
disabling/enabling (no need to save "flags" on stack, but increment
preempt count, which has a chance to be on cache, I guess).

It would also be nice to have all code related to debugging only
counters selectable at compile time, since it might not be interesting
data for some scenarios (but unnecessary bloat) - seems that was the
original intent by Andrew as you noted.

2005-12-19 17:58:34

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC3 02/14] Basic counter functionality

On Sat, 17 Dec 2005, Marcelo Tosatti wrote:

> > +static inline s8 *diff_pointer(struct zone *zone, enum zone_stat_item item)
> > +{
> > + return &zone_pcp(zone, raw_smp_processor_id())->vm_stat_diff[item];
> > +}
> > +
> > +/*
> > + * For use when we know that interrupts are disabled.
> > + */
> > +void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta)
> > +{
> > + s8 *p;
> > + long x;
> > +
> > + p = diff_pointer(zone, item);
> > + x = delta + *p;
> > +
> > + if (unlikely(x > STAT_THRESHOLD || x < -STAT_THRESHOLD)) {
> > + zone_page_state_consolidate(x, zone, item);
> > + x = 0;
> > + }
> > +
> > + *p = x;
> > +}
>
> There is no need to disable interrupts AFAICS, but only preemption
> (which could cause problems as your comment above describes). I suppose
> that these counters are not accessed at interrupt time and are not meant
> to be, right?

Some of the counters can be accessed at interrupt time and these are meant
to be right. Next rev adds another racy version of the counters that will
be used for the optional VM counters. Those counters will benefit from
inc/dec if generated by the compiler.

> Why not use preempt_disable/preempt_enable? Those would disappear
> if !CONFIG_PREEMPT, and could be faster than the interrupt
> disabling/enabling (no need to save "flags" on stack, but increment
> preempt count, which has a chance to be on cache, I guess).

On a counter by counter basis one could use the __ functions that do not
disable interrupts.

> It would also be nice to have all code related to debugging only
> counters selectable at compile time, since it might not be interesting
> data for some scenarios (but unnecessary bloat) - seems that was the
> original intent by Andrew as you noted.

That will be part of the next rev that I am currently testing.