2010-01-13 08:19:49

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 1/3] vmscan: get_scan_ratio cleanup

The get_scan_ratio() should have all scan-ratio related calculations.
Thus, this patch move some calculation into get_scan_ratio.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/vmscan.c | 23 ++++++++++++++---------
1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2bbee91..640486b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1501,6 +1501,13 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
unsigned long ap, fp;
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);

+ /* If we have no swap space, do not bother scanning anon pages. */
+ if (!sc->may_swap || (nr_swap_pages <= 0)) {
+ percent[0] = 0;
+ percent[1] = 100;
+ return;
+ }
+
anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
file = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
@@ -1598,22 +1605,20 @@ static void shrink_zone(int priority, struct zone *zone,
unsigned long nr_reclaimed = sc->nr_reclaimed;
unsigned long nr_to_reclaim = sc->nr_to_reclaim;
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
- int noswap = 0;

- /* If we have no swap space, do not bother scanning anon pages. */
- if (!sc->may_swap || (nr_swap_pages <= 0)) {
- noswap = 1;
- percent[0] = 0;
- percent[1] = 100;
- } else
- get_scan_ratio(zone, sc, percent);
+ get_scan_ratio(zone, sc, percent);

for_each_evictable_lru(l) {
int file = is_file_lru(l);
unsigned long scan;

+ if (percent[file] == 0) {
+ nr[l] = 0;
+ continue;
+ }
+
scan = zone_nr_lru_pages(zone, sc, l);
- if (priority || noswap) {
+ if (priority) {
scan >>= priority;
scan = (scan * percent[file]) / 100;
}
--
1.6.5.2



2010-01-13 08:21:46

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

Changelog
from v1
- get_anon_scan_ratio don't tak zone->lru_lock anymore
because zoneinfo_show_print takes zone->lock.


======================================
Vmscan folks was asked "why does my system makes so much swap-out?"
in lkml at several times.
At that time, I made the debug patch to show recent_anon_{scanned/rorated}
parameter at least three times.

Thus, its parameter should be showed on /proc/zoneinfo. It help
vmscan folks debugging.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
---
include/linux/swap.h | 2 ++
mm/vmscan.c | 50 ++++++++++++++++++++++++++++++++++++--------------
mm/vmstat.c | 7 +++++--
3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index a2602a8..e95d7ed 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -280,6 +280,8 @@ extern void scan_unevictable_unregister_node(struct node *node);
extern int kswapd_run(int nid);
extern void kswapd_stop(int nid);

+unsigned long get_anon_scan_ratio(struct zone *zone, struct mem_cgroup *memcg, int swappiness);
+
#ifdef CONFIG_MMU
/* linux/mm/shmem.c */
extern int shmem_unuse(swp_entry_t entry, struct page *page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 640486b..0900931 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1493,8 +1493,8 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
* percent[0] specifies how much pressure to put on ram/swap backed
* memory, while percent[1] determines pressure on the file LRUs.
*/
-static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
- unsigned long *percent)
+static void __get_scan_ratio(struct zone *zone, struct scan_control *sc,
+ int need_update, unsigned long *percent)
{
unsigned long anon, file, free;
unsigned long anon_prio, file_prio;
@@ -1535,18 +1535,19 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
*
* anon in [0], file in [1]
*/
- if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
- spin_lock_irq(&zone->lru_lock);
- reclaim_stat->recent_scanned[0] /= 2;
- reclaim_stat->recent_rotated[0] /= 2;
- spin_unlock_irq(&zone->lru_lock);
- }
-
- if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
- spin_lock_irq(&zone->lru_lock);
- reclaim_stat->recent_scanned[1] /= 2;
- reclaim_stat->recent_rotated[1] /= 2;
- spin_unlock_irq(&zone->lru_lock);
+ if (need_update) {
+ if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
+ spin_lock_irq(&zone->lru_lock);
+ reclaim_stat->recent_scanned[0] /= 2;
+ reclaim_stat->recent_rotated[0] /= 2;
+ spin_unlock_irq(&zone->lru_lock);
+ }
+ if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
+ spin_lock_irq(&zone->lru_lock);
+ reclaim_stat->recent_scanned[1] /= 2;
+ reclaim_stat->recent_rotated[1] /= 2;
+ spin_unlock_irq(&zone->lru_lock);
+ }
}

/*
@@ -1572,6 +1573,27 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
percent[1] = 100 - percent[0];
}

+static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
+ unsigned long *percent)
+{
+ __get_scan_ratio(zone, sc, 1, percent);
+}
+
+unsigned long get_anon_scan_ratio(struct zone *zone, struct mem_cgroup *memcg, int swappiness)
+{
+ unsigned long percent[2];
+ struct scan_control sc = {
+ .may_swap = 1,
+ .swappiness = swappiness,
+ .mem_cgroup = memcg,
+ };
+
+ __get_scan_ratio(zone, &sc, 0, percent);
+
+ return percent[0];
+}
+
+
/*
* Smallish @nr_to_scan's are deposited in @nr_saved_scan,
* until we collected @swap_cluster_max pages to scan.
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6051fba..f690117 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -15,6 +15,7 @@
#include <linux/cpu.h>
#include <linux/vmstat.h>
#include <linux/sched.h>
+#include <linux/swap.h>

#ifdef CONFIG_VM_EVENT_COUNTERS
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -760,11 +761,13 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n all_unreclaimable: %u"
"\n prev_priority: %i"
"\n start_pfn: %lu"
- "\n inactive_ratio: %u",
+ "\n inactive_ratio: %u"
+ "\n anon_scan_ratio: %lu",
zone_is_all_unreclaimable(zone),
zone->prev_priority,
zone->zone_start_pfn,
- zone->inactive_ratio);
+ zone->inactive_ratio,
+ get_anon_scan_ratio(zone, NULL, vm_swappiness));
seq_putc(m, '\n');
}

--
1.6.5.2


2010-01-13 08:22:42

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 3/3] [v2] memcg: add anon_scan_ratio to memory.stat file

Changelog
since v1: cancel to remove "recent_xxx" debug statistics as bilbir's
mention

===========================================

anon_scan_ratio feature doesn't only useful for global VM pressure
analysis, but it also useful for memcg memroy pressure analysis.

Then, this patch add anon_scan_ratio field to memory.stat file too.

Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/memcontrol.c | 65 +++++++++++++++++++++++++++++++++++-------------------
1 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 325df12..7348edc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2950,6 +2950,11 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
{
struct mem_cgroup *mem_cont = mem_cgroup_from_cont(cont);
struct mcs_total_stat mystat;
+ struct zone *zone;
+ unsigned long total_anon = 0;
+ unsigned long total_scan_anon = 0;
+ unsigned long recent_rotated[2] = {0};
+ unsigned long recent_scanned[2] = {0};
int i;

memset(&mystat, 0, sizeof(mystat));
@@ -2978,33 +2983,47 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
cb->fill(cb, memcg_stat_strings[i].total_name, mystat.stat[i]);
}

-#ifdef CONFIG_DEBUG_VM
cb->fill(cb, "inactive_ratio", calc_inactive_ratio(mem_cont, NULL));

- {
- int nid, zid;
+ for_each_populated_zone(zone) {
+ int nid = zone->zone_pgdat->node_id;
+ int zid = zone_idx(zone);
struct mem_cgroup_per_zone *mz;
- unsigned long recent_rotated[2] = {0, 0};
- unsigned long recent_scanned[2] = {0, 0};
-
- for_each_online_node(nid)
- for (zid = 0; zid < MAX_NR_ZONES; zid++) {
- mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
-
- recent_rotated[0] +=
- mz->reclaim_stat.recent_rotated[0];
- recent_rotated[1] +=
- mz->reclaim_stat.recent_rotated[1];
- recent_scanned[0] +=
- mz->reclaim_stat.recent_scanned[0];
- recent_scanned[1] +=
- mz->reclaim_stat.recent_scanned[1];
- }
- cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
- cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
- cb->fill(cb, "recent_scanned_anon", recent_scanned[0]);
- cb->fill(cb, "recent_scanned_file", recent_scanned[1]);
+ unsigned long anon;
+ unsigned long ratio;
+
+ mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
+
+ anon = MEM_CGROUP_ZSTAT(mz, LRU_INACTIVE_ANON);
+ anon += MEM_CGROUP_ZSTAT(mz, LRU_ACTIVE_ANON);
+
+ ratio = get_anon_scan_ratio(zone, mem_cont, mem_cont->swappiness);
+
+ /*
+ * We have per-zone anon-scan-ratio. but we don't hope display such
+ * value directly. Instead, we display following fomula.
+ *
+ * sum(anon * ratio/100)
+ * --------------------- * 100
+ * sum(anon)
+ */
+ total_anon += anon;
+ total_scan_anon += anon * ratio;
+
+#ifdef CONFIG_DEBUG_VM
+ recent_rotated[0] += mz->reclaim_stat.recent_rotated[0];
+ recent_rotated[1] += mz->reclaim_stat.recent_rotated[1];
+ recent_scanned[0] += mz->reclaim_stat.recent_scanned[0];
+ recent_scanned[1] += mz->reclaim_stat.recent_scanned[1];
+#endif
}
+ cb->fill(cb, "anon_scan_ratio", total_scan_anon / total_anon);
+
+#ifdef CONFIG_DEBUG_VM
+ cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
+ cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
+ cb->fill(cb, "recent_scanned_anon", recent_scanned[0]);
+ cb->fill(cb, "recent_scanned_file", recent_scanned[1]);
#endif

return 0;
--
1.6.5.2


2010-01-13 08:27:25

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 3/3] [v2] memcg: add anon_scan_ratio to memory.stat file

On Wed, 13 Jan 2010 17:22:37 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Changelog
> since v1: cancel to remove "recent_xxx" debug statistics as bilbir's
> mention
>
> ===========================================
>
> anon_scan_ratio feature doesn't only useful for global VM pressure
> analysis, but it also useful for memcg memroy pressure analysis.
>
> Then, this patch add anon_scan_ratio field to memory.stat file too.
>
> Cc: Balbir Singh <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

Acked-by: KAMEZAWA Hiroyuki <[email protected]>

Thanks,
-Kame

2010-01-13 09:08:59

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 3/3] [v2] memcg: add anon_scan_ratio to memory.stat file

On Wed, Jan 13, 2010 at 1:52 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Changelog
> ?since v1: cancel to remove "recent_xxx" debug statistics as bilbir's
> ?mention
>
> ===========================================
>
> anon_scan_ratio feature doesn't only useful for global VM pressure
> analysis, but it also useful for memcg memroy pressure analysis.
>
> Then, this patch add anon_scan_ratio field to memory.stat file too.
>
> Cc: Balbir Singh <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

[snip]

Looks good to me

Acked-by: Balbir Singh <[email protected]>

2010-01-13 09:54:22

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 1/3] vmscan: get_scan_ratio cleanup

On Wed, Jan 13, 2010 at 5:19 PM, KOSAKI Motohiro
<[email protected]> wrote:
> The get_scan_ratio() should have all scan-ratio related calculations.
> Thus, this patch move some calculation into get_scan_ratio.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: Rik van Riel <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>


--
Kind regards,
Minchan Kim

2010-01-13 10:31:56

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

Hi, Kosaki.

On Wed, Jan 13, 2010 at 5:21 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Changelog
>  from v1
>  - get_anon_scan_ratio don't tak zone->lru_lock anymore
>   because zoneinfo_show_print takes zone->lock.

When I saw this changelog first, I got confused.
That's because there is no relation between lru_lock and lock in zone.
You mean zoneinfo is allowed to have a stale data?
Tend to agree with it.

>
>
> ======================================
> Vmscan folks was asked "why does my system makes so much swap-out?"
> in lkml at several times.
> At that time, I made the debug patch to show recent_anon_{scanned/rorated}
> parameter at least three times.
>
> Thus, its parameter should be showed on /proc/zoneinfo. It help
> vmscan folks debugging.

I support this suggestion.

>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: Rik van Riel <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
>  include/linux/swap.h |    2 ++
>  mm/vmscan.c          |   50 ++++++++++++++++++++++++++++++++++++--------------
>  mm/vmstat.c          |    7 +++++--
>  3 files changed, 43 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index a2602a8..e95d7ed 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -280,6 +280,8 @@ extern void scan_unevictable_unregister_node(struct node *node);
>  extern int kswapd_run(int nid);
>  extern void kswapd_stop(int nid);
>
> +unsigned long get_anon_scan_ratio(struct zone *zone, struct mem_cgroup *memcg, int swappiness);

Today Andrew said to me. :)
"The vmscan.c code makes an effort to look nice in an 80-col display."

> +
>  #ifdef CONFIG_MMU
>  /* linux/mm/shmem.c */
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 640486b..0900931 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1493,8 +1493,8 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  * percent[0] specifies how much pressure to put on ram/swap backed
>  * memory, while percent[1] determines pressure on the file LRUs.
>  */
> -static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> -                                       unsigned long *percent)
> +static void __get_scan_ratio(struct zone *zone, struct scan_control *sc,
> +                            int need_update, unsigned long *percent)
>  {
>        unsigned long anon, file, free;
>        unsigned long anon_prio, file_prio;
> @@ -1535,18 +1535,19 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>         *
>         * anon in [0], file in [1]
>         */
> -       if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
> -               spin_lock_irq(&zone->lru_lock);
> -               reclaim_stat->recent_scanned[0] /= 2;
> -               reclaim_stat->recent_rotated[0] /= 2;
> -               spin_unlock_irq(&zone->lru_lock);
> -       }
> -
> -       if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
> -               spin_lock_irq(&zone->lru_lock);
> -               reclaim_stat->recent_scanned[1] /= 2;
> -               reclaim_stat->recent_rotated[1] /= 2;
> -               spin_unlock_irq(&zone->lru_lock);

Why do you add new parameter 'need_update'?
Do you see any lru_lock heavy contention? (reclaim VS cat /proc/zoneinfo)
I think maybe not.
I am not sure no locking version is needed.

> +       if (need_update) {
> +               if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
> +                       spin_lock_irq(&zone->lru_lock);
> +                       reclaim_stat->recent_scanned[0] /= 2;
> +                       reclaim_stat->recent_rotated[0] /= 2;
> +                       spin_unlock_irq(&zone->lru_lock);
> +               }
> +               if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
> +                       spin_lock_irq(&zone->lru_lock);
> +                       reclaim_stat->recent_scanned[1] /= 2;
> +                       reclaim_stat->recent_rotated[1] /= 2;
> +                       spin_unlock_irq(&zone->lru_lock);
> +               }
>        }
>
>        /*
> @@ -1572,6 +1573,27 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>        percent[1] = 100 - percent[0];
>  }
>
> +static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> +                          unsigned long *percent)
> +{
> +       __get_scan_ratio(zone, sc, 1, percent);
> +}
> +

If we really need this version and your changelog is right,
Let's add 'note'. ;-)

/* Caller have to hold zone->lock */
> +unsigned long get_anon_scan_ratio(struct zone *zone, struct mem_cgroup *memcg, int swappiness)
> +{
> +       unsigned long percent[2];
> +       struct scan_control sc = {
> +               .may_swap = 1,
> +               .swappiness = swappiness,
> +               .mem_cgroup = memcg,
> +       };
> +
> +       __get_scan_ratio(zone, &sc, 0, percent);
> +
> +       return percent[0];
> +}
> +
> +
>  /*
>  * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
>  * until we collected @swap_cluster_max pages to scan.
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 6051fba..f690117 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -15,6 +15,7 @@
>  #include <linux/cpu.h>
>  #include <linux/vmstat.h>
>  #include <linux/sched.h>
> +#include <linux/swap.h>
>
>  #ifdef CONFIG_VM_EVENT_COUNTERS
>  DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
> @@ -760,11 +761,13 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>                   "\n  all_unreclaimable: %u"
>                   "\n  prev_priority:     %i"
>                   "\n  start_pfn:         %lu"
> -                  "\n  inactive_ratio:    %u",
> +                  "\n  inactive_ratio:    %u"
> +                  "\n  anon_scan_ratio:   %lu",
>                           zone_is_all_unreclaimable(zone),
>                   zone->prev_priority,
>                   zone->zone_start_pfn,
> -                  zone->inactive_ratio);
> +                  zone->inactive_ratio,
> +                  get_anon_scan_ratio(zone, NULL, vm_swappiness));
>        seq_putc(m, '\n');
>  }
>
> --
> 1.6.5.2
>
>
>
>



--
Kind regards,
Minchan Kim

2010-01-13 23:50:14

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

Hi

> Hi, Kosaki.
>
> On Wed, Jan 13, 2010 at 5:21 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> > Changelog
> >  from v1
> >  - get_anon_scan_ratio don't tak zone->lru_lock anymore
> >   because zoneinfo_show_print takes zone->lock.
>
> When I saw this changelog first, I got confused.
> That's because there is no relation between lru_lock and lock in zone.
> You mean zoneinfo is allowed to have a stale data?
> Tend to agree with it.

Well. zone->lock and zone->lru_lock should be not taked at the same time.
[1/4] of my last version removed zone->lock, then get_anon_scan_ratioo()
can take zone->lru_lock. but I dropped it. thus get_anon_scan_ration() can't
take zone->lru_lock.

Thus, I added need_update parameter.

2010-01-14 05:12:32

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

On Thu, Jan 14, 2010 at 8:50 AM, KOSAKI Motohiro
<[email protected]> wrote:
> Hi
>
>> Hi, Kosaki.
>>
>> On Wed, Jan 13, 2010 at 5:21 PM, KOSAKI Motohiro
>> <[email protected]> wrote:
>> > Changelog
>> >  from v1
>> >  - get_anon_scan_ratio don't tak zone->lru_lock anymore
>> >   because zoneinfo_show_print takes zone->lock.
>>
>> When I saw this changelog first, I got confused.
>> That's because there is no relation between lru_lock and lock in zone.
>> You mean zoneinfo is allowed to have a stale data?
>> Tend to agree with it.
>
> Well. zone->lock and zone->lru_lock should be not taked at the same time.

I looked over the code since I am out of office.
I can't find any locking problem zone->lock and zone->lru_lock.
Do you know any locking order problem?
Could you explain it with call graph if you don't mind?

I am out of office by tomorrow so I can't reply quickly.
Sorry for late reponse.


--
Kind regards,
Minchan Kim

2010-01-14 05:18:56

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

> > Well. zone->lock and zone->lru_lock should be not taked at the same time.
>
> I looked over the code since I am out of office.
> I can't find any locking problem zone->lock and zone->lru_lock.
> Do you know any locking order problem?
> Could you explain it with call graph if you don't mind?
>
> I am out of office by tomorrow so I can't reply quickly.
> Sorry for late reponse.

This is not lock order issue. both zone->lock and zone->lru_lock are
hotpath lock. then, same tame grabbing might cause performance impact.

thanks.


2010-01-18 01:04:24

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

> Hi, KOSAKI.
>
> On Thu, Jan 14, 2010 at 2:18 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> >> > Well. zone->lock and zone->lru_lock should be not taked at the same time.
> >>
> >> I looked over the code since I am out of office.
> >> I can't find any locking problem zone->lock and zone->lru_lock.
> >> Do you know any locking order problem?
> >> Could you explain it with call graph if you don't mind?
> >>
> >> I am out of office by tomorrow so I can't reply quickly.
> >> Sorry for late reponse.
> >
> > This is not lock order issue. both zone->lock and zone->lru_lock are
> > hotpath lock. then, same tame grabbing might cause performance impact.
>
> Sorry for late response.
>
> Your patch makes get_anon_scan_ratio of zoneinfo stale.
> What you said about performance impact is effective when VM pressure high.
> I think stale data is all right normally.
> But when VM pressure is high and we want to see the information in zoneinfo(
> this case is what you said), stale data is not a good, I think.
>
> If it's not a strong argue, I want to use old get_scan_ratio
> in get_anon_scan_ratio.

please looks such function again.

usally we use recent_rotated/recent_scanned ratio. then following
decreasing doesn't change any scan-ratio meaning. it only prevent
stat overflow.

if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
spin_lock_irq(&zone->lru_lock);
reclaim_stat->recent_scanned[0] /= 2;
reclaim_stat->recent_rotated[0] /= 2;
spin_unlock_irq(&zone->lru_lock);
}


So, I don't think current implementation can show stale data.

Thanks.




2010-01-18 01:54:25

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

> Hi, KOSAKI.
>
> On Mon, Jan 18, 2010 at 10:04 AM, KOSAKI Motohiro
> <[email protected]> wrote:
> >> Hi, KOSAKI.
> >>
> >> On Thu, Jan 14, 2010 at 2:18 PM, KOSAKI Motohiro
> >> <[email protected]> wrote:
> >> >> > Well. zone->lock and zone->lru_lock should be not taked at the same time.
> >> >>
> >> >> I looked over the code since I am out of office.
> >> >> I can't find any locking problem zone->lock and zone->lru_lock.
> >> >> Do you know any locking order problem?
> >> >> Could you explain it with call graph if you don't mind?
> >> >>
> >> >> I am out of office by tomorrow so I can't reply quickly.
> >> >> Sorry for late reponse.
> >> >
> >> > This is not lock order issue. both zone->lock and zone->lru_lock are
> >> > hotpath lock. then, same tame grabbing might cause performance impact.
> >>
> >> Sorry for late response.
> >>
> >> Your patch makes get_anon_scan_ratio of zoneinfo stale.
> >> What you said about performance impact is effective when VM pressure high.
> >> I think stale data is all right normally.
> >> But when VM pressure is high and we want to see the information in zoneinfo(
> >> this case is what you said), stale data is not a good, I think.
> >>
> >> If it's not a strong argue, I want to use old get_scan_ratio
> >> in get_anon_scan_ratio.
> >
> > please looks such function again.
> >
> > usally we use recent_rotated/recent_scanned ratio. then following
> > decreasing doesn't change any scan-ratio meaning. it only prevent
> > stat overflow.
>
> It has a primary role that floating average as well as prevenitng overflow. :)
> So, It's important.
>
> >
> >        if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
> >                spin_lock_irq(&zone->lru_lock);
> >                reclaim_stat->recent_scanned[0] /= 2;
> >                reclaim_stat->recent_rotated[0] /= 2;
> >                spin_unlock_irq(&zone->lru_lock);
> >        }
> >
> >
> > So, I don't think current implementation can show stale data.
>
> It can make stale data when high memory pressure happens.

?? why? and when?
I think it depend on what's stale mean.

Currently(i.e. before the patch), get_scan_ratio have following fomula.
in such region, recent_scanned is not protected by zone->lru_lock.

ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
ap /= reclaim_stat->recent_rotated[0] + 1;
fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
fp /= reclaim_stat->recent_rotated[1] + 1;
percent[0] = 100 * ap / (ap + fp + 1);
percent[1] = 100 - percent[0];

It mean, shrink_zone() doesn't use exactly recent_scanned value. then
zoneinfo can use the same unexactly value.


> Moreever, I don't want to make complicate thing(ie, need_update)
> than old if it doesn't have some benefit.(I think lru_lock isn't big overhead)

Hmm..
I think lru_lock can makes big overhead.


2010-01-18 01:55:32

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

Hi, KOSAKI.

On Mon, Jan 18, 2010 at 10:04 AM, KOSAKI Motohiro
<[email protected]> wrote:
>> Hi, KOSAKI.
>>
>> On Thu, Jan 14, 2010 at 2:18 PM, KOSAKI Motohiro
>> <[email protected]> wrote:
>> >> > Well. zone->lock and zone->lru_lock should be not taked at the same time.
>> >>
>> >> I looked over the code since I am out of office.
>> >> I can't find any locking problem zone->lock and zone->lru_lock.
>> >> Do you know any locking order problem?
>> >> Could you explain it with call graph if you don't mind?
>> >>
>> >> I am out of office by tomorrow so I can't reply quickly.
>> >> Sorry for late reponse.
>> >
>> > This is not lock order issue. both zone->lock and zone->lru_lock are
>> > hotpath lock. then, same tame grabbing might cause performance impact.
>>
>> Sorry for late response.
>>
>> Your patch makes get_anon_scan_ratio of zoneinfo stale.
>> What you said about performance impact is effective when VM pressure high.
>> I think stale data is all right normally.
>> But when VM pressure is high and we want to see the information in zoneinfo(
>> this case is what you said), stale data is not a good, I think.
>>
>> If it's not a strong argue, I want to use old get_scan_ratio
>> in get_anon_scan_ratio.
>
> please looks such function again.
>
> usally we use recent_rotated/recent_scanned ratio. then following
> decreasing doesn't change any scan-ratio meaning. it only prevent
> stat overflow.

It has a primary role that floating average as well as prevenitng overflow. :)
So, It's important.

>
>        if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
>                spin_lock_irq(&zone->lru_lock);
>                reclaim_stat->recent_scanned[0] /= 2;
>                reclaim_stat->recent_rotated[0] /= 2;
>                spin_unlock_irq(&zone->lru_lock);
>        }
>
>
> So, I don't think current implementation can show stale data.

It can make stale data when high memory pressure happens.

>
> Thanks.
>

Moreever, I don't want to make complicate thing(ie, need_update)
than old if it doesn't have some benefit.(I think lru_lock isn't big overhead)

--
Kind regards,
Minchan Kim

2010-01-18 02:10:59

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

I missed Cc.

On Mon, Jan 18, 2010 at 10:54 AM, KOSAKI Motohiro
<[email protected]> wrote:
>> Hi, KOSAKI.
>>
>> On Mon, Jan 18, 2010 at 10:04 AM, KOSAKI Motohiro
>> <[email protected]> wrote:
>> >> Hi, KOSAKI.
>> >>
>> >> On Thu, Jan 14, 2010 at 2:18 PM, KOSAKI Motohiro
>> >> <[email protected]> wrote:
>> >> >> > Well. zone->lock and zone->lru_lock should be not taked at the same time.
>> >> >>
>> >> >> I looked over the code since I am out of office.
>> >> >> I can't find any locking problem zone->lock and zone->lru_lock.
>> >> >> Do you know any locking order problem?
>> >> >> Could you explain it with call graph if you don't mind?
>> >> >>
>> >> >> I am out of office by tomorrow so I can't reply quickly.
>> >> >> Sorry for late reponse.
>> >> >
>> >> > This is not lock order issue. both zone->lock and zone->lru_lock are
>> >> > hotpath lock. then, same tame grabbing might cause performance impact.
>> >>
>> >> Sorry for late response.
>> >>
>> >> Your patch makes get_anon_scan_ratio of zoneinfo stale.
>> >> What you said about performance impact is effective when VM pressure high.
>> >> I think stale data is all right normally.
>> >> But when VM pressure is high and we want to see the information in zoneinfo(
>> >> this case is what you said), stale data is not a good, I think.
>> >>
>> >> If it's not a strong argue, I want to use old get_scan_ratio
>> >> in get_anon_scan_ratio.
>> >
>> > please looks such function again.
>> >
>> > usally we use recent_rotated/recent_scanned ratio. then following
>> > decreasing doesn't change any scan-ratio meaning. it only prevent
>> > stat overflow.
>>
>> It has a primary role that floating average as well as prevenitng overflow. :)
>> So, It's important.
>>
>> >
>> >        if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
>> >                spin_lock_irq(&zone->lru_lock);
>> >                reclaim_stat->recent_scanned[0] /= 2;
>> >                reclaim_stat->recent_rotated[0] /= 2;
>> >                spin_unlock_irq(&zone->lru_lock);
>> >        }
>> >
>> >
>> > So, I don't think current implementation can show stale data.
>>
>> It can make stale data when high memory pressure happens.
>
> ?? why? and when?
> I think it depend on what's stale mean.
>
> Currently(i.e. before the patch), get_scan_ratio have following fomula.
> in such region, recent_scanned is not protected by zone->lru_lock.
>
>        ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
>        ap /= reclaim_stat->recent_rotated[0] + 1;
>        fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>        fp /= reclaim_stat->recent_rotated[1] + 1;
>        percent[0] = 100 * ap / (ap + fp + 1);
>        percent[1] = 100 - percent[0];
>
> It mean, shrink_zone() doesn't use exactly recent_scanned value. then
> zoneinfo can use the same unexactly value.

Absoultely right. I missed that. Thanks.
get_scan_ratio used lru_lock to get reclaim_stat->recent_xxxx.
But, it doesn't used lru_lock to get ap/fp.

Is it intentional? I think you or Rik know it. :)
I think if we want to get exact value, we have to use lru_lock until
getting ap/fp.
If it isn't, we don't need lru_lock when we get the reclaim_stat->recent_xxxx.

What do you think about it?

>
>
>> Moreever, I don't want to make complicate thing(ie, need_update)
>> than old if it doesn't have some benefit.(I think lru_lock isn't big overhead)
>
> Hmm..
> I think lru_lock can makes big overhead.

I don't want to argue strongly about this.
That's because i don't have seen that.
If you have a conern about lru_lock, I don't opposed your patch.

--
Kind regards,
Minchan Kim

2010-01-18 02:14:57

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

> >> It can make stale data when high memory pressure happens.
> >
> > ?? why? and when?
> > I think it depend on what's stale mean.
> >
> > Currently(i.e. before the patch), get_scan_ratio have following fomula.
> > in such region, recent_scanned is not protected by zone->lru_lock.
> >
> >        ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
> >        ap /= reclaim_stat->recent_rotated[0] + 1;
> >        fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
> >        fp /= reclaim_stat->recent_rotated[1] + 1;
> >        percent[0] = 100 * ap / (ap + fp + 1);
> >        percent[1] = 100 - percent[0];
> >
> > It mean, shrink_zone() doesn't use exactly recent_scanned value. then
> > zoneinfo can use the same unexactly value.
>
> Absoultely right. I missed that. Thanks.
> get_scan_ratio used lru_lock to get reclaim_stat->recent_xxxx.
> But, it doesn't used lru_lock to get ap/fp.
>
> Is it intentional? I think you or Rik know it. :)
> I think if we want to get exact value, we have to use lru_lock until
> getting ap/fp.
> If it isn't, we don't need lru_lock when we get the reclaim_stat->recent_xxxx.
>
> What do you think about it?

I believe the current code is intentional.


2010-01-18 02:16:28

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

On 01/17/2010 09:10 PM, Minchan Kim wrote:

> Absoultely right. I missed that. Thanks.
> get_scan_ratio used lru_lock to get reclaim_stat->recent_xxxx.
> But, it doesn't used lru_lock to get ap/fp.
>
> Is it intentional? I think you or Rik know it. :)
> I think if we want to get exact value, we have to use lru_lock until
> getting ap/fp.
> If it isn't, we don't need lru_lock when we get the reclaim_stat->recent_xxxx.
>
> What do you think about it?

This is definately not intentional.

Getting race conditions in this code could throw off the
statistics by a factor 2. I do not know how serious that
would be for the VM or whether (and how quickly) it would
self correct.

--
All rights reversed.

2010-01-18 02:25:01

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

> On 01/17/2010 09:10 PM, Minchan Kim wrote:
>
> > Absoultely right. I missed that. Thanks.
> > get_scan_ratio used lru_lock to get reclaim_stat->recent_xxxx.
> > But, it doesn't used lru_lock to get ap/fp.
> >
> > Is it intentional? I think you or Rik know it. :)
> > I think if we want to get exact value, we have to use lru_lock until
> > getting ap/fp.
> > If it isn't, we don't need lru_lock when we get the reclaim_stat->recent_xxxx.
> >
> > What do you think about it?
>
> This is definately not intentional.

Really?
So, I'll post next patch.

Thanks.



> Getting race conditions in this code could throw off the
> statistics by a factor 2. I do not know how serious that
> would be for the VM or whether (and how quickly) it would
> self correct.
>
> --
> All rights reversed.


2010-01-18 02:25:16

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

On Mon, Jan 18, 2010 at 11:16 AM, Rik van Riel <[email protected]> wrote:
> On 01/17/2010 09:10 PM, Minchan Kim wrote:
>
>> Absoultely right. I missed that. Thanks.
>> get_scan_ratio used lru_lock to get reclaim_stat->recent_xxxx.
>> But, it doesn't used lru_lock to get ap/fp.
>>
>> Is it intentional? I think you or Rik know it. :)
>> I think if we want to get exact value, we have to use lru_lock until
>> getting ap/fp.
>> If it isn't, we don't need lru_lock when we get the
>> reclaim_stat->recent_xxxx.
>>
>> What do you think about it?
>
> This is definately not intentional.
>
> Getting race conditions in this code could throw off the
> statistics by a factor 2.  I do not know how serious that
> would be for the VM or whether (and how quickly) it would
> self correct.

Okay. How about making patch to get exact ap/fp?
Although it were not serious or fast recoverable, I think it would be better
to protect lru_lock for consistency if lru_lock isn't big contention lock.

>
> --
> All rights reversed.
>



--
Kind regards,
Minchan Kim

2010-01-15 17:23:18

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] vmstat: add anon_scan_ratio field to zoneinfo

Hi, KOSAKI.

On Thu, Jan 14, 2010 at 2:18 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> > Well. zone->lock and zone->lru_lock should be not taked at the same time.
>>
>> I looked over the code since I am out of office.
>> I can't find any locking problem zone->lock and zone->lru_lock.
>> Do you know any locking order problem?
>> Could you explain it with call graph if you don't mind?
>>
>> I am out of office by tomorrow so I can't reply quickly.
>> Sorry for late reponse.
>
> This is not lock order issue. both zone->lock and zone->lru_lock are
> hotpath lock. then, same tame grabbing might cause performance impact.

Sorry for late response.

Your patch makes get_anon_scan_ratio of zoneinfo stale.
What you said about performance impact is effective when VM pressure high.
I think stale data is all right normally.
But when VM pressure is high and we want to see the information in zoneinfo(
this case is what you said), stale data is not a good, I think.

If it's not a strong argue, I want to use old get_scan_ratio
in get_anon_scan_ratio.

--
Kind regards,
Minchan Kim