2011-04-11 07:59:39

by KOSAKI Motohiro

[permalink] [raw]
Subject: [patch 0/3 resend] mm, mem-hotplug: proper maintainance zone attribute when memory hotplug occur

Hi,

This is resending very old patch. The code has no change.


KOSAKI Motohiro (3):
mm, mem-hotplug: fix section mismatch.
setup_per_zone_inactive_ratio() should be __meminit.
mm, mem-hotplug: recalculate lowmem_reserve when memory hotplug occur
mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur

include/linux/mm.h | 2 +-
include/linux/vmstat.h | 1 +
mm/memory_hotplug.c | 13 +++++++------
mm/page_alloc.c | 8 +++++---
mm/vmstat.c | 3 +--
5 files changed, 15 insertions(+), 12 deletions(-)

--
1.7.3.1



2011-04-11 08:00:15

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 1/3] mm, mem-hotplug: fix section mismatch. setup_per_zone_inactive_ratio() should be __meminit.

Commit bce7394a3e (page-allocator: reset wmark_min and inactive ratio of
zone when hotplug happens) introduced invalid section references.
Now, setup_per_zone_inactive_ratio() is marked __init and then it can't
be referenced from memory hotplug code.

Then, this patch marks it as __meminit and also marks caller as __ref.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Yasunori Goto <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
mm/memory_hotplug.c | 4 ++--
mm/page_alloc.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f0651ae..0419c3d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -400,7 +400,7 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
}


-int online_pages(unsigned long pfn, unsigned long nr_pages)
+int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
{
unsigned long onlined_pages = 0;
struct zone *zone;
@@ -795,7 +795,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
return offlined;
}

-static int offline_pages(unsigned long start_pfn,
+static int __ref offline_pages(unsigned long start_pfn,
unsigned long end_pfn, unsigned long timeout)
{
unsigned long pfn, nr_pages, expire;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e400779..44fae85 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5072,7 +5072,7 @@ void setup_per_zone_wmarks(void)
* 1TB 101 10GB
* 10TB 320 32GB
*/
-void calculate_zone_inactive_ratio(struct zone *zone)
+void __meminit calculate_zone_inactive_ratio(struct zone *zone)
{
unsigned int gb, ratio;

@@ -5086,7 +5086,7 @@ void calculate_zone_inactive_ratio(struct zone *zone)
zone->inactive_ratio = ratio;
}

-static void __init setup_per_zone_inactive_ratio(void)
+static void __meminit setup_per_zone_inactive_ratio(void)
{
struct zone *zone;

--
1.7.3.1


2011-04-11 08:00:41

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 2/3] mm, mem-hotplug: recalculate lowmem_reserve when memory hotplug occur

Currently, memory hotplug call setup_per_zone_wmarks() and
calculate_zone_inactive_ratio(), but don't call setup_per_zone_lowmem_reserve().

It mean number of reserved pages aren't updated even if memory hot plug
occur. This patch fixes it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Mel Gorman <[email protected]>
---
include/linux/mm.h | 2 +-
mm/memory_hotplug.c | 9 +++++----
mm/page_alloc.c | 4 ++--
3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 628b31c..84f1e31 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1357,7 +1357,7 @@ extern void set_dma_reserve(unsigned long new_dma_reserve);
extern void memmap_init_zone(unsigned long, int, unsigned long,
unsigned long, enum memmap_context);
extern void setup_per_zone_wmarks(void);
-extern void calculate_zone_inactive_ratio(struct zone *zone);
+extern int __meminit init_per_zone_wmark_min(void);
extern void mem_init(void);
extern void __init mmap_init(void);
extern void show_mem(unsigned int flags);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0419c3d..ba7019e9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -459,8 +459,9 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
zone_pcp_update(zone);

mutex_unlock(&zonelists_mutex);
- setup_per_zone_wmarks();
- calculate_zone_inactive_ratio(zone);
+
+ init_per_zone_wmark_min();
+
if (onlined_pages) {
kswapd_run(zone_to_nid(zone));
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
@@ -893,8 +894,8 @@ repeat:
zone->zone_pgdat->node_present_pages -= offlined_pages;
totalram_pages -= offlined_pages;

- setup_per_zone_wmarks();
- calculate_zone_inactive_ratio(zone);
+ init_per_zone_wmark_min();
+
if (!node_present_pages(node)) {
node_clear_state(node, N_HIGH_MEMORY);
kswapd_stop(node);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44fae85..ef90a87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5072,7 +5072,7 @@ void setup_per_zone_wmarks(void)
* 1TB 101 10GB
* 10TB 320 32GB
*/
-void __meminit calculate_zone_inactive_ratio(struct zone *zone)
+static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
{
unsigned int gb, ratio;

@@ -5118,7 +5118,7 @@ static void __meminit setup_per_zone_inactive_ratio(void)
* 8192MB: 11584k
* 16384MB: 16384k
*/
-static int __init init_per_zone_wmark_min(void)
+int __meminit init_per_zone_wmark_min(void)
{
unsigned long lowmem_kbytes;

--
1.7.3.1


2011-04-11 08:01:14

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 3/3] mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur

Currently, cpu hotplug updates pcp->stat_threashold, but memory
hotplug doesn't. there is no reason.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
---
include/linux/vmstat.h | 1 +
mm/page_alloc.c | 2 ++
mm/vmstat.c | 3 +--
3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 04fa5df..5f72954 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -201,6 +201,7 @@ extern void dec_zone_state(struct zone *, enum zone_stat_item);
extern void __dec_zone_state(struct zone *, enum zone_stat_item);

void refresh_cpu_vm_stats(int);
+void refresh_zone_stat_thresholds(void);

int calculate_pressure_threshold(struct zone *zone);
int calculate_normal_threshold(struct zone *zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef90a87..10b5816 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -54,6 +54,7 @@
#include <trace/events/kmem.h>
#include <linux/ftrace_event.h>
#include <linux/memcontrol.h>
+#include <linux/vmstat.h>

#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -5130,6 +5131,7 @@ int __meminit init_per_zone_wmark_min(void)
if (min_free_kbytes > 65536)
min_free_kbytes = 65536;
setup_per_zone_wmarks();
+ refresh_zone_stat_thresholds();
setup_per_zone_lowmem_reserve();
setup_per_zone_inactive_ratio();
return 0;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 897ea9e..b23b1f5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -157,7 +157,7 @@ int calculate_normal_threshold(struct zone *zone)
/*
* Refresh the thresholds for each zone.
*/
-static void refresh_zone_stat_thresholds(void)
+void refresh_zone_stat_thresholds(void)
{
struct zone *zone;
int cpu;
@@ -1198,7 +1198,6 @@ static int __init setup_vmstat(void)
#ifdef CONFIG_SMP
int cpu;

- refresh_zone_stat_thresholds();
register_cpu_notifier(&vmstat_notifier);

for_each_online_cpu(cpu)
--
1.7.3.1


2011-04-12 06:18:37

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm, mem-hotplug: fix section mismatch. setup_per_zone_inactive_ratio() should be __meminit.

On Mon, Apr 11, 2011 at 5:00 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Commit bce7394a3e (page-allocator: reset wmark_min and inactive ratio of
> zone when hotplug happens) introduced invalid section references.
> Now, setup_per_zone_inactive_ratio() is marked __init and then it can't
> be referenced from memory hotplug code.
>
> Then, this patch marks it as __meminit and also marks caller as __ref.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

Just a nitpick.

As below comment of __ref said, It would be better to add _why_ such
as memory_hotplug.c.

"so optimally document why the __ref is needed and why it's OK"

Anyway,
Thanks, KOSAKI.

--
Kind regards,
Minchan Kim

2011-04-12 06:45:18

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm, mem-hotplug: fix section mismatch. setup_per_zone_inactive_ratio() should be __meminit.

On Mon, 11 Apr 2011 17:00:08 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Commit bce7394a3e (page-allocator: reset wmark_min and inactive ratio of
> zone when hotplug happens) introduced invalid section references.
> Now, setup_per_zone_inactive_ratio() is marked __init and then it can't
> be referenced from memory hotplug code.
>
> Then, this patch marks it as __meminit and also marks caller as __ref.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Yasunori Goto <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Johannes Weiner <[email protected]>

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

2011-04-12 07:04:01

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, mem-hotplug: recalculate lowmem_reserve when memory hotplug occur

On Mon, Apr 11, 2011 at 5:00 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Currently, memory hotplug call setup_per_zone_wmarks() and
> calculate_zone_inactive_ratio(), but don't call setup_per_zone_lowmem_reserve().
>
> It mean number of reserved pages aren't updated even if memory hot plug
> occur. This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

--
Kind regards,
Minchan Kim

2011-04-12 09:24:11

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur

Hi, KOSAKI

On Mon, Apr 11, 2011 at 5:01 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Currently, cpu hotplug updates pcp->stat_threashold, but memory
> hotplug doesn't. there is no reason.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Christoph Lameter <[email protected]>

I can think it makes sense so I don't oppose the patch merging.
But as you know I am very keen on the description.

What is the problem if hotplug doesn't do it?
I means the patch solves what's problem?

Please write down fully for better description.
Thanks.

--
Kind regards,
Minchan Kim

2011-04-12 09:29:29

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur

Hi

> Hi, KOSAKI
>
> On Mon, Apr 11, 2011 at 5:01 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> > Currently, cpu hotplug updates pcp->stat_threashold, but memory
> > hotplug doesn't. there is no reason.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> > Acked-by: Mel Gorman <[email protected]>
> > Acked-by: Christoph Lameter <[email protected]>
>
> I can think it makes sense so I don't oppose the patch merging.
> But as you know I am very keen on the description.
>
> What is the problem if hotplug doesn't do it?
> I means the patch solves what's problem?
>
> Please write down fully for better description.
> Thanks.

No real world issue. I found the fault by code review.
No good stat_threshold might makes performance hurt.



2011-04-12 10:12:06

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur

On Tue, Apr 12, 2011 at 6:29 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Hi
>
>> Hi, KOSAKI
>>
>> On Mon, Apr 11, 2011 at 5:01 PM, KOSAKI Motohiro
>> <[email protected]> wrote:
>> > Currently, cpu hotplug updates pcp->stat_threashold, but memory
>> > hotplug doesn't. there is no reason.
>> >
>> > Signed-off-by: KOSAKI Motohiro <[email protected]>
>> > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
>> > Acked-by: Mel Gorman <[email protected]>
>> > Acked-by: Christoph Lameter <[email protected]>
>>
>> I can think it makes sense so I don't oppose the patch merging.
>> But as you know I am very keen on the description.
>>
>> What is the problem if hotplug doesn't do it?
>> I means the patch solves what's problem?
>>
>> Please write down fully for better description.
>> Thanks.
>
> No real world issue. I found the fault by code review.

I don't mean we should solve only real world issue.
Just finding out code review is much valuable. :)

> No good stat_threshold might makes performance hurt.

Yes. That's I want it.
My intention is that if you write down log fully, it can help much
newbies to understand the patch in future and it would be very clear
Andrew to merge it.

What I want is following as.

==

Currently, memory hotplug doesn't updates pcp->stat_threashold.
Then, It ends up making the wrong stat_threshold and percpu_driftmark.

It could make confusing zoneinfo or overhead by frequent draining.
Even when memory is low and kswapd is awake, it can mismatch between
the number of real free pages and vmstat NR_FREE_PAGES so that it can
result in the livelock. Please look at aa4548403 for more.

This patch solves the issue.
==


--
Kind regards,
Minchan Kim

2011-04-12 10:28:05

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm, mem-hotplug: fix section mismatch. setup_per_zone_inactive_ratio() should be __meminit.

Hi

> On Mon, Apr 11, 2011 at 5:00 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> > Commit bce7394a3e (page-allocator: reset wmark_min and inactive ratio of
> > zone when hotplug happens) introduced invalid section references.
> > Now, setup_per_zone_inactive_ratio() is marked __init and then it can't
> > be referenced from memory hotplug code.
> >
> > Then, this patch marks it as __meminit and also marks caller as __ref.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: Minchan Kim <[email protected]>
>
> Just a nitpick.
>
> As below comment of __ref said, It would be better to add _why_ such
> as memory_hotplug.c.
>
> "so optimally document why the __ref is needed and why it's OK"

Hmm...
All of memory_hotplug.c function can call __meminit function. It's
definition of __meminit.

We can put the same comment to every function in memory_hotplug.c.
like hotadd_newpgdat().

/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
{
(snip)
}

But it has zero information. ;)


2011-04-12 10:32:51

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm, mem-hotplug: fix section mismatch. setup_per_zone_inactive_ratio() should be __meminit.

On Tue, Apr 12, 2011 at 7:28 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Hi
>
>> On Mon, Apr 11, 2011 at 5:00 PM, KOSAKI Motohiro
>> <[email protected]> wrote:
>> > Commit bce7394a3e (page-allocator: reset wmark_min and inactive ratio of
>> > zone when hotplug happens) introduced invalid section references.
>> > Now, setup_per_zone_inactive_ratio() is marked __init and then it can't
>> > be referenced from memory hotplug code.
>> >
>> > Then, this patch marks it as __meminit and also marks caller as __ref.
>> >
>> > Signed-off-by: KOSAKI Motohiro <[email protected]>
>> Reviewed-by: Minchan Kim <[email protected]>
>>
>> Just a nitpick.
>>
>> As below comment of __ref said, It would be better to add _why_ such
>> as memory_hotplug.c.
>>
>> "so optimally document why the __ref is needed and why it's OK"
>
> Hmm...
> All of memory_hotplug.c function can call __meminit function. It's
> definition of __meminit.
>
> We can put the same comment to every function in memory_hotplug.c.
> like hotadd_newpgdat().
>
> /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
> static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
> {
> (snip)
> }
>
> But it has zero information. ;)

It does make sense. Never mind. :)



--
Kind regards,
Minchan Kim

2011-04-12 10:34:18

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur

> > No good stat_threshold might makes performance hurt.
>
> Yes. That's I want it.
> My intention is that if you write down log fully, it can help much
> newbies to understand the patch in future and it would be very clear
> Andrew to merge it.
>
> What I want is following as.
> ==
>
> Currently, memory hotplug doesn't updates pcp->stat_threashold.
> Then, It ends up making the wrong stat_threshold and percpu_driftmark.
>
> It could make confusing zoneinfo or overhead by frequent draining.
> Even when memory is low and kswapd is awake, it can mismatch between
> the number of real free pages and vmstat NR_FREE_PAGES so that it can
> result in the livelock. Please look at aa4548403 for more.
>
> This patch solves the issue.
> ==

Now, wakeup_kswapd() are using zone_watermark_ok_safe(). (ie avoid to use
per-cpu stat jiffies). Then, I don't think we have livelock chance.
Am I missing something?


2011-04-12 10:42:09

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occur

On Tue, Apr 12, 2011 at 7:34 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> > No good stat_threshold might makes performance hurt.
>>
>> Yes. That's I want it.
>> My intention is that if you write down log fully, it can help much
>> newbies to understand the patch in future and it would be very clear
>> Andrew to merge it.
>>
>> What I want is following as.
>> ==
>>
>> Currently, memory hotplug doesn't updates pcp->stat_threashold.
>> Then, It ends up making the wrong stat_threshold and percpu_driftmark.
>>
>> It could make confusing zoneinfo or overhead by frequent draining.
>> Even when memory is low and kswapd is awake, it can mismatch between
>> the number of real free pages and vmstat NR_FREE_PAGES so that it can
>> result in the livelock. Please look at aa4548403 for more.
>>
>> This patch solves the issue.
>> ==
>
> Now, wakeup_kswapd() are using zone_watermark_ok_safe(). (ie avoid to use
> per-cpu stat jiffies). Then, I don't think we have livelock chance.
> Am I missing something?
>

I have no idea. I just referenced the description in aa4548403.
As I look code, zone_watermark_ok_safe works well if percpu_drift_mark
is set rightly. but if memory hotplug happens, zone->present_pages
would be changed so that it can affect wmarks. It means it can affect
percpu_drift_mark, I think.

My point is to write down the description clear.

--
Kind regards,
Minchan Kim