2014-04-28 20:23:20

by Fabian Frédérick

[permalink] [raw]
Subject: [PATCH 1/1] MM: make vmpressure_win dynamic

Initialize vmpressure_win in vmstat using
calculate_normal_threshold() based on each zone/cpu * SWAP_CLUSTER_SIZE

Value refreshed through cpu notifier

Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Fabian Frederick <[email protected]>
---
mm/vmpressure.c | 16 +---------------
mm/vmstat.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index d4042e7..909169d 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -24,21 +24,7 @@
#include <linux/printk.h>
#include <linux/vmpressure.h>

-/*
- * The window size (vmpressure_win) is the number of scanned pages before
- * we try to analyze scanned/reclaimed ratio. So the window is used as a
- * rate-limit tunable for the "low" level notification, and also for
- * averaging the ratio for medium/critical levels. Using small window
- * sizes can cause lot of false positives, but too big window size will
- * delay the notifications.
- *
- * As the vmscan reclaimer logic works with chunks which are multiple of
- * SWAP_CLUSTER_MAX, it makes sense to use it for the window size as well.
- *
- * TODO: Make the window size depend on machine size, as we do for vmstat
- * thresholds. Currently we set it to 512 pages (2MB for 4KB pages).
- */
-static const unsigned long vmpressure_win = SWAP_CLUSTER_MAX * 16;
+extern unsigned long vmpressure_win;

/*
* These thresholds are used when we account memory pressure through
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..b8f8ff2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -23,6 +23,23 @@

#include "internal.h"

+/*
+ * The window size (vmpressure_win) is the number of scanned pages before
+ * we try to analyze scanned/reclaimed ratio. So the window is used as a
+ * rate-limit tunable for the "low" level notification, and also for
+ * averaging the ratio for medium/critical levels. Using small window
+ * sizes can cause lot of false positives, but too big window size will
+ * delay the notifications.
+ *
+ * As the vmscan reclaimer logic works with chunks which are multiple of
+ * SWAP_CLUSTER_MAX, it makes sense to use it for the window size as well.
+ *
+ */
+#ifdef CONFIG_MEMCG
+static DEFINE_SPINLOCK(vmpressure_win_lock);
+unsigned long vmpressure_win __read_mostly = SWAP_CLUSTER_MAX * 16;
+#endif
+
#ifdef CONFIG_VM_EVENT_COUNTERS
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
EXPORT_PER_CPU_SYMBOL(vm_event_states);
@@ -163,11 +180,13 @@ void refresh_zone_stat_thresholds(void)
struct zone *zone;
int cpu;
int threshold;
+ unsigned long new_vmpressure_win = 0;

for_each_populated_zone(zone) {
unsigned long max_drift, tolerate_drift;

threshold = calculate_normal_threshold(zone);
+ new_vmpressure_win += threshold;

for_each_online_cpu(cpu)
per_cpu_ptr(zone->pageset, cpu)->stat_threshold
@@ -184,6 +203,11 @@ void refresh_zone_stat_thresholds(void)
zone->percpu_drift_mark = high_wmark_pages(zone) +
max_drift;
}
+#ifdef CONFIG_MEMCG
+ spin_lock(&vmpressure_win_lock);
+ vmpressure_win = new_vmpressure_win * SWAP_CLUSTER_MAX;
+ spin_unlock(&vmpressure_win_lock);
+#endif
}

void set_pgdat_percpu_threshold(pg_data_t *pgdat,
--
1.8.4.5


2014-04-28 21:10:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] MM: make vmpressure_win dynamic

On Mon, 28 Apr 2014 22:25:21 +0200 Fabian Frederick <[email protected]> wrote:

> Initialize vmpressure_win in vmstat using
> calculate_normal_threshold() based on each zone/cpu * SWAP_CLUSTER_SIZE
>
> Value refreshed through cpu notifier

Anton wrote vmpressure so please let's cc him. `git blame' is useful
for working out who to cc.

What were the reasons for this change? Any testing results?

> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -23,6 +23,23 @@
>
> #include "internal.h"
>
> +/*
> + * The window size (vmpressure_win) is the number of scanned pages before
> + * we try to analyze scanned/reclaimed ratio. So the window is used as a
> + * rate-limit tunable for the "low" level notification, and also for
> + * averaging the ratio for medium/critical levels. Using small window
> + * sizes can cause lot of false positives, but too big window size will
> + * delay the notifications.
> + *
> + * As the vmscan reclaimer logic works with chunks which are multiple of
> + * SWAP_CLUSTER_MAX, it makes sense to use it for the window size as well.
> + *
> + */
> +#ifdef CONFIG_MEMCG
> +static DEFINE_SPINLOCK(vmpressure_win_lock);
> +unsigned long vmpressure_win __read_mostly = SWAP_CLUSTER_MAX * 16;
> +#endif
> +
> #ifdef CONFIG_VM_EVENT_COUNTERS
> DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
> EXPORT_PER_CPU_SYMBOL(vm_event_states);
> @@ -163,11 +180,13 @@ void refresh_zone_stat_thresholds(void)
> struct zone *zone;
> int cpu;
> int threshold;
> + unsigned long new_vmpressure_win = 0;
>
> for_each_populated_zone(zone) {
> unsigned long max_drift, tolerate_drift;
>
> threshold = calculate_normal_threshold(zone);
> + new_vmpressure_win += threshold;
>
> for_each_online_cpu(cpu)
> per_cpu_ptr(zone->pageset, cpu)->stat_threshold
> @@ -184,6 +203,11 @@ void refresh_zone_stat_thresholds(void)
> zone->percpu_drift_mark = high_wmark_pages(zone) +
> max_drift;
> }
> +#ifdef CONFIG_MEMCG
> + spin_lock(&vmpressure_win_lock);
> + vmpressure_win = new_vmpressure_win * SWAP_CLUSTER_MAX;
> + spin_unlock(&vmpressure_win_lock);
> +#endif
> }

I don't think we need the spinlock really. It's a ulong - concurrent
readers will see either the old value or the new one whether or not the
writer took that lock.

If we're going to modify this thing on the fly then we probably should
expose the current value to userspace in some fashion.

2014-04-28 21:19:54

by Fabian Frédérick

[permalink] [raw]
Subject: Re: [PATCH 1/1] MM: make vmpressure_win dynamic

On Mon, 28 Apr 2014 14:10:25 -0700
Andrew Morton <[email protected]> wrote:

> On Mon, 28 Apr 2014 22:25:21 +0200 Fabian Frederick <[email protected]> wrote:
>
> > Initialize vmpressure_win in vmstat using
> > calculate_normal_threshold() based on each zone/cpu * SWAP_CLUSTER_SIZE
> >
> > Value refreshed through cpu notifier
>
> Anton wrote vmpressure so please let's cc him. `git blame' is useful
> for working out who to cc.
>
> What were the reasons for this change? Any testing results?
This patch was mainly addressing the TODO above const definition
("Make the window size depend on machine size" ...)
I can do another version removing the mutex stuff and expose to /sys ...

>
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -23,6 +23,23 @@
> >
> > #include "internal.h"
> >
> > +/*
> > + * The window size (vmpressure_win) is the number of scanned pages before
> > + * we try to analyze scanned/reclaimed ratio. So the window is used as a
> > + * rate-limit tunable for the "low" level notification, and also for
> > + * averaging the ratio for medium/critical levels. Using small window
> > + * sizes can cause lot of false positives, but too big window size will
> > + * delay the notifications.
> > + *
> > + * As the vmscan reclaimer logic works with chunks which are multiple of
> > + * SWAP_CLUSTER_MAX, it makes sense to use it for the window size as well.
> > + *
> > + */
> > +#ifdef CONFIG_MEMCG
> > +static DEFINE_SPINLOCK(vmpressure_win_lock);
> > +unsigned long vmpressure_win __read_mostly = SWAP_CLUSTER_MAX * 16;
> > +#endif
> > +
> > #ifdef CONFIG_VM_EVENT_COUNTERS
> > DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
> > EXPORT_PER_CPU_SYMBOL(vm_event_states);
> > @@ -163,11 +180,13 @@ void refresh_zone_stat_thresholds(void)
> > struct zone *zone;
> > int cpu;
> > int threshold;
> > + unsigned long new_vmpressure_win = 0;
> >
> > for_each_populated_zone(zone) {
> > unsigned long max_drift, tolerate_drift;
> >
> > threshold = calculate_normal_threshold(zone);
> > + new_vmpressure_win += threshold;
> >
> > for_each_online_cpu(cpu)
> > per_cpu_ptr(zone->pageset, cpu)->stat_threshold
> > @@ -184,6 +203,11 @@ void refresh_zone_stat_thresholds(void)
> > zone->percpu_drift_mark = high_wmark_pages(zone) +
> > max_drift;
> > }
> > +#ifdef CONFIG_MEMCG
> > + spin_lock(&vmpressure_win_lock);
> > + vmpressure_win = new_vmpressure_win * SWAP_CLUSTER_MAX;
> > + spin_unlock(&vmpressure_win_lock);
> > +#endif
> > }
>
> I don't think we need the spinlock really. It's a ulong - concurrent
> readers will see either the old value or the new one whether or not the
> writer took that lock.
>
> If we're going to modify this thing on the fly then we probably should
> expose the current value to userspace in some fashion.
>