2012-05-09 06:13:11

by Rusty Russell

[permalink] [raw]
Subject: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

Hi Ingo,

I finally rebased this on top of your tip tree, and tested it
locally. Some more old-style cpumask usages have crept in, but it's a
fairly simple series.

The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
'struct cpumask' becomes an undefined type. You can't accidentally take
the size of it, assign it, or pass it by value. And thus it's safe for
us to make it smaller if nr_cpu_ids < NR_CPUS, as the final patch does.

It unfortunately requires the lglock cleanup patch, which Al already has
queued, so I've included it here.

Cheers,
Rusty.

The following changes since commit 76b12156b42f876ae399dd9db1cbef27c24a4899:

Merge branch 'x86/apic' (2012-05-07 19:21:48 +0200)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/rusty/cpumask.git master

for you to fetch changes up to 45e788415aced8d518817d6be968ab4a132724ed:

cpumask: reduce cpumask_size (2012-05-09 15:02:15 +0930)

----------------------------------------------------------------
Rusty Russell (9):
lglock: remove online variants of lock
page_alloc: use cpumask_var_t.
cpumask: prepare for reduced cpumask_allocation.
cpumask: make task_struct.cpus_allowed a cpumask_var_t
cpumask: select disabling obsolete cpumask functions on x86
cpumask: get rid of cpumask_t.
irq: remove sizeof(struct cpumask)
cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y
cpumask: reduce cpumask_size

arch/arm/mach-integrator/cpu.c | 4 +-
arch/ia64/kernel/cpufreq/acpi-cpufreq.c | 4 +-
arch/ia64/kernel/mca.c | 2 +-
arch/ia64/kernel/salinfo.c | 2 +-
arch/ia64/kernel/topology.c | 2 +-
arch/ia64/sn/kernel/sn2/sn_hwperf.c | 2 +-
arch/mips/kernel/cpufreq/loongson2_cpufreq.c | 2 +-
arch/mips/kernel/traps.c | 8 ++--
arch/sh/kernel/cpufreq.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce_intel.c | 2 +-
drivers/acpi/processor_throttling.c | 4 +-
drivers/firmware/dcdbas.c | 2 +-
fs/proc/array.c | 4 +-
include/linux/cpumask.h | 25 ++++++++---
include/linux/init_task.h | 9 +++-
include/linux/interrupt.h | 2 +-
include/linux/lglock.h | 58 +-------------------------
include/linux/mm_types.h | 25 +++++++++--
include/linux/sched.h | 6 +--
kernel/cpuset.c | 2 +-
kernel/fork.c | 31 +++++++++-----
kernel/irq/irqdesc.c | 2 +-
kernel/rcutree_plugin.h | 4 +-
kernel/sched/core.c | 6 +--
kernel/sched/cpupri.c | 4 +-
kernel/trace/trace_workqueue.c | 6 +--
kernel/workqueue.c | 2 +-
lib/Kconfig | 8 +++-
lib/cpu_rmap.c | 2 +-
lib/cpumask.c | 2 +
mm/page_alloc.c | 17 ++++----
31 files changed, 124 insertions(+), 127 deletions(-)


commit a9e14f56ae7519d3c98651c927be53d094a3841a
Author: Rusty Russell <[email protected]>
Date: Wed May 9 14:55:15 2012 +0930

lglock: remove online variants of lock

Optimizing the slow paths adds a lot of complexity. If you need to
grab every lock often, you have other problems.

Signed-off-by: Rusty Russell <[email protected]>
Acked-by: Nick Piggin <[email protected]>

commit fab26054c0500d426cf1bc2ce227a6a428dc815e
Author: Rusty Russell <[email protected]>
Date: Wed May 9 14:55:15 2012 +0930

page_alloc: use cpumask_var_t.

The BSS trick works, but it still wastes space. Especially since there's
a nice fallback in the case where we fail to allocate a temporary cpumask.

Signed-off-by: Rusty Russell <[email protected]>

commit 0196da928b332fba819877f5ab7aa02d8cd78e9b
Author: Rusty Russell <[email protected]>
Date: Wed May 9 14:56:15 2012 +0930

cpumask: prepare for reduced cpumask_allocation.

Thomas and Linus already made CONFIG_CPUMASK_OFFSTACK use a cpumask
at the end of struct mm_struct, this just changes it into a bitmap
and allocates it using cpumask_size().

This means it will shrink when cpumask_size() is changed to reflect
nr_cpu_ids not NR_CPUS.

Signed-off-by: Rusty Russell <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mike Travis <[email protected]>

commit 5583b004a294063cbb03e586680de91f316955b8
Author: Rusty Russell <[email protected]>
Date: Wed May 9 14:57:15 2012 +0930

cpumask: make task_struct.cpus_allowed a cpumask_var_t

This turns it into a pointer for everyone. No change for those
already using the tsk_cpus_allowed() accessor; I've enhanced some of
the sched/ code to use that. For others, I just changed them
directly.

For CONFIG_CPUMASK_OFFSTACK=y, we now allocate it off the end; it
would be better to avoid the indirection and use a dangling bitmap,
but I didn't want to alter the layout of task_struct and risk breaking
carefully balanced caches.

Even better would be to point to the fixed "one cpu" and "all cpus"
masks where possible, and make a copy when setting it to something
else. But you'd have to track down those naughty places which frob it
directly...

Signed-off-by: Rusty Russell <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mike Travis <[email protected]>

commit c04a0a0d6ec248a533401f284ef2173c706a8e94
Author: Rusty Russell <[email protected]>
Date: Wed May 9 14:58:15 2012 +0930

cpumask: select disabling obsolete cpumask functions on x86

It currently depends on CONFIG_BROKEN; change to be set by
CONFIG_CPUMASK_OFFSTACK now it all compiles.

We also complete it: the header shouldn't refer to the deprected
CPU_MASK_LAST_WORD, and the deprecated implementations are removed.

Signed-off-by: Rusty Russell <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mike Travis <[email protected]>

commit d690ee2ccc74e2cb936ddb47322617a544417d6a
Author: Rusty Russell <[email protected]>
Date: Wed May 9 14:59:15 2012 +0930

cpumask: get rid of cpumask_t.

Very shortly, struct cpumask will be declared but undefined for
CONFIG_CPUMASK_OFFSTACK, so use 'struct cpumask' rather than
the obsolescent 'cpumask_t'.

Signed-off-by: Rusty Russell <[email protected]>

commit 124e4f2c2b72117c303a86691881f3abd952b31c
Author: Rusty Russell <[email protected]>
Date: Wed May 9 15:00:15 2012 +0930

irq: remove sizeof(struct cpumask)

Very shortly, struct cpumask will be declared but undefined for
CONFIG_CPUMASK_OFFSTACK, so sizeof() won't compile. This is a Good
Thing, since we want to use cpumask_size() here anyway.

Signed-off-by: Rusty Russell <[email protected]>

commit 898eb73305e2277be91b931c5a75484f8c87ae36
Author: Rusty Russell <[email protected]>
Date: Wed May 9 15:01:15 2012 +0930

cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y

We're about to change CONFIG_CPUMASK_OFFSTACK so it only allocate
nr_cpu_ids bits for all cpumasks. We need to make sure that when
CONFIG_CPUMASK_OFFSTACK is set:

1) Noone uses the old bitmap ops, which use NR_CPUS bits (use cpumask_*)
2) Noone uses assignment of struct cpumask (use cpumask_copy)
3) Noone passes a struct cpumask (pass a pointer)
4) Noone declares them on the stack (use cpumask_var_t)

So we finally remove the definition of struct cpumask when
CONFIG_CPUMASK_OFFSTACK=y. This means that these usages will hit a compile
error the moment that config option is turned on.

Note that it also means you can't declare a static cpumask. You
should avoid this anyway (use cpumask_var_t), but there's a
deliberately-ugly workaround for special cases, using DECLARE_BITMAP()
and to_cpumask().

Signed-off-by: Rusty Russell <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mike Travis <[email protected]>

commit 45e788415aced8d518817d6be968ab4a132724ed
Author: Rusty Russell <[email protected]>
Date: Wed May 9 15:02:15 2012 +0930

cpumask: reduce cpumask_size

Now we're sure noone is using old cpumask operators, nor *cpumask, we can
allocate less bits safely. This reduces the memory usage of off-stack
cpumasks when CONFIG_CPUMASK_OFFSTACK=y but we don't have NR_CPUS actual
cpus.

Signed-off-by: Rusty Russell <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mike Travis <[email protected]>


2012-05-09 08:45:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.


* Rusty Russell <[email protected]> wrote:

> Hi Ingo,
>
> I finally rebased this on top of your tip tree, and tested it
> locally. Some more old-style cpumask usages have crept in, but it's a
> fairly simple series.

Cool! Most of it looks pretty sane. I have a question about the
gist of the series:

> commit 898eb73305e2277be91b931c5a75484f8c87ae36
> Author: Rusty Russell <[email protected]>
> Date: Wed May 9 15:01:15 2012 +0930
>
> cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y
>
> We're about to change CONFIG_CPUMASK_OFFSTACK so it only allocate
> nr_cpu_ids bits for all cpumasks. We need to make sure that when
> CONFIG_CPUMASK_OFFSTACK is set:
>
> 1) Noone uses the old bitmap ops, which use NR_CPUS bits (use cpumask_*)
> 2) Noone uses assignment of struct cpumask (use cpumask_copy)
> 3) Noone passes a struct cpumask (pass a pointer)
> 4) Noone declares them on the stack (use cpumask_var_t)
>
> So we finally remove the definition of struct cpumask when
> CONFIG_CPUMASK_OFFSTACK=y. This means that these usages will hit a compile
> error the moment that config option is turned on.
>
> Note that it also means you can't declare a static cpumask. You
> should avoid this anyway (use cpumask_var_t), but there's a
> deliberately-ugly workaround for special cases, using DECLARE_BITMAP()
> and to_cpumask().
>
> Signed-off-by: Rusty Russell <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: [email protected]
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mike Travis <[email protected]>

Is there any good reason to not remove it altogether, regardless
of whether the OFFSTACK config is set? I mean, triggering build
failures for a relatively rarely turned on config option is
asking for constant maintenance trouble.

Thanks,

Ingo

2012-05-10 01:10:04

by Rusty Russell

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

On Wed, 9 May 2012 10:44:53 +0200, Ingo Molnar <[email protected]> wrote:
>
> * Rusty Russell <[email protected]> wrote:
>
> > Hi Ingo,
> >
> > I finally rebased this on top of your tip tree, and tested it
> > locally. Some more old-style cpumask usages have crept in, but it's a
> > fairly simple series.
>
> Cool! Most of it looks pretty sane. I have a question about the
> gist of the series:
>
> > commit 898eb73305e2277be91b931c5a75484f8c87ae36
> > Author: Rusty Russell <[email protected]>
> > Date: Wed May 9 15:01:15 2012 +0930
> >
> > cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y
> >
> > We're about to change CONFIG_CPUMASK_OFFSTACK so it only allocate
> > nr_cpu_ids bits for all cpumasks. We need to make sure that when
> > CONFIG_CPUMASK_OFFSTACK is set:
> >
> > 1) Noone uses the old bitmap ops, which use NR_CPUS bits (use cpumask_*)
> > 2) Noone uses assignment of struct cpumask (use cpumask_copy)
> > 3) Noone passes a struct cpumask (pass a pointer)
> > 4) Noone declares them on the stack (use cpumask_var_t)
> >
> > So we finally remove the definition of struct cpumask when
> > CONFIG_CPUMASK_OFFSTACK=y. This means that these usages will hit a compile
> > error the moment that config option is turned on.
> >
> > Note that it also means you can't declare a static cpumask. You
> > should avoid this anyway (use cpumask_var_t), but there's a
> > deliberately-ugly workaround for special cases, using DECLARE_BITMAP()
> > and to_cpumask().
> >
> > Signed-off-by: Rusty Russell <[email protected]>
> > Cc: Arnd Bergmann <[email protected]>
> > Cc: [email protected]
> > Cc: KOSAKI Motohiro <[email protected]>
> > Cc: Mike Travis <[email protected]>
>
> Is there any good reason to not remove it altogether, regardless
> of whether the OFFSTACK config is set? I mean, triggering build
> failures for a relatively rarely turned on config option is
> asking for constant maintenance trouble.

Mainly because I didn't want to disturb the archs which don't care at
all about large cpumasks. After all, putting a struct cpumask on the
stack is pretty convenient.

But we could add a new arch config which removes it, and set it from
x86.

Cheers,
Rusty.

2012-05-10 01:33:07

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

(5/9/12 2:10 AM), Rusty Russell wrote:
> Hi Ingo,
>
> I finally rebased this on top of your tip tree, and tested it
> locally. Some more old-style cpumask usages have crept in, but it's a
> fairly simple series.
>
> The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
> 'struct cpumask' becomes an undefined type. You can't accidentally take
> the size of it, assign it, or pass it by value. And thus it's safe for
> us to make it smaller if nr_cpu_ids< NR_CPUS, as the final patch does.
>
> It unfortunately requires the lglock cleanup patch, which Al already has
> queued, so I've included it here.

Hi

Thanks this effort. This is very cleaner than I expected.
However I should NAK following one patch. sorry. because of, lru-drain is
called from memory reclaim context. It mean, additional allocation may not
work. Please just use bare NR_CPUS bitmap instead. space wasting is minor
issue than that.

> commit fab26054c0500d426cf1bc2ce227a6a428dc815e
> Author: Rusty Russell<[email protected]>
> Date: Wed May 9 14:55:15 2012 +0930
>
> page_alloc: use cpumask_var_t.
>
> The BSS trick works, but it still wastes space. Especially since there's
> a nice fallback in the case where we fail to allocate a temporary cpumask.
>
> Signed-off-by: Rusty Russell<[email protected]>



Other patches looks very goold.
Acked-by: KOSAKI Motohiro <[email protected]>

Thanks.

2012-05-10 02:17:08

by Rusty Russell

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

On Wed, 09 May 2012 21:32:57 -0400, KOSAKI Motohiro <[email protected]> wrote:
> (5/9/12 2:10 AM), Rusty Russell wrote:
> > Hi Ingo,
> >
> > I finally rebased this on top of your tip tree, and tested it
> > locally. Some more old-style cpumask usages have crept in, but it's a
> > fairly simple series.
> >
> > The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
> > 'struct cpumask' becomes an undefined type. You can't accidentally take
> > the size of it, assign it, or pass it by value. And thus it's safe for
> > us to make it smaller if nr_cpu_ids< NR_CPUS, as the final patch does.
> >
> > It unfortunately requires the lglock cleanup patch, which Al already has
> > queued, so I've included it here.
>
> Hi
>
> Thanks this effort. This is very cleaner than I expected.
> However I should NAK following one patch. sorry. because of, lru-drain is
> called from memory reclaim context. It mean, additional allocation may not
> work. Please just use bare NR_CPUS bitmap instead. space wasting is minor
> issue than that.

But if it fails the allocation, that's fine: we just send a few more
IPIs to every CPU:

+ if (!zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL)) {
+ on_each_cpu(drain_local_pages, NULL, 1);
+ return;
+ }

We can do it the other way, but it sets a bad example, and after we get
rid of cpumask, it becomes:

static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);

......

if (has_pcps)
cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
else
cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
}
on_each_cpu_mask(to_cpumask(cpus_with_pcps), drain_local_pages, NULL, 1);

Or is there a reason we shouldn't even try to allocate here?

Thanks,
Rusty.

2012-05-10 02:43:45

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

(5/9/12 10:16 PM), Rusty Russell wrote:
> On Wed, 09 May 2012 21:32:57 -0400, KOSAKI Motohiro<[email protected]> wrote:
>> (5/9/12 2:10 AM), Rusty Russell wrote:
>>> Hi Ingo,
>>>
>>> I finally rebased this on top of your tip tree, and tested it
>>> locally. Some more old-style cpumask usages have crept in, but it's a
>>> fairly simple series.
>>>
>>> The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
>>> 'struct cpumask' becomes an undefined type. You can't accidentally take
>>> the size of it, assign it, or pass it by value. And thus it's safe for
>>> us to make it smaller if nr_cpu_ids< NR_CPUS, as the final patch does.
>>>
>>> It unfortunately requires the lglock cleanup patch, which Al already has
>>> queued, so I've included it here.
>>
>> Hi
>>
>> Thanks this effort. This is very cleaner than I expected.
>> However I should NAK following one patch. sorry. because of, lru-drain is
>> called from memory reclaim context. It mean, additional allocation may not
>> work. Please just use bare NR_CPUS bitmap instead. space wasting is minor
>> issue than that.
>
> But if it fails the allocation, that's fine: we just send a few more
> IPIs to every CPU:
>
> + if (!zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL)) {
> + on_each_cpu(drain_local_pages, NULL, 1);
> + return;
> + }
>
> We can do it the other way, but it sets a bad example, and after we get
> rid of cpumask, it becomes:
>
> static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);
>
> ......
>
> if (has_pcps)
> cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
> else
> cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
> }
> on_each_cpu_mask(to_cpumask(cpus_with_pcps), drain_local_pages, NULL, 1);
>
> Or is there a reason we shouldn't even try to allocate here?

1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.
2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
cause stack overflow. because of, alloc_pages() can be called from very deep call stack.

Thought?

2012-05-10 05:38:49

by Rusty Russell

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

On Wed, 09 May 2012 22:43:39 -0400, KOSAKI Motohiro <[email protected]> wrote:
> > Or is there a reason we shouldn't even try to allocate here?
>
> 1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.

Oh :(

How about the below instead?

> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
> cause stack overflow. because of, alloc_pages() can be called from
> very deep call stack.

You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
otherwise you'll get many other stack overflows, too.

Thanks,
Rusty.

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 581e74b..7c1db9c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -367,7 +367,7 @@ extern void free_hot_cold_page_list(struct list_head *list, int cold);

void page_alloc_init(void);
void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
-void drain_all_pages(void);
+void drain_all_pages(gfp_t gfp_flags);
void drain_local_pages(void *dummy);

/*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 97cc273..daf0d7b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -237,7 +237,7 @@ void shake_page(struct page *p, int access)
lru_add_drain_all();
if (PageLRU(p))
return;
- drain_all_pages();
+ drain_all_pages(GFP_KERNEL);
if (PageLRU(p) || is_free_buddy_page(p))
return;
}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6629faf..1372a9b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -922,7 +922,7 @@ repeat:
if (drain) {
lru_add_drain_all();
cond_resched();
- drain_all_pages();
+ drain_all_pages(GFP_KERNEL);
}

pfn = scan_lru_pages(start_pfn, end_pfn);
@@ -944,7 +944,7 @@ repeat:
lru_add_drain_all();
yield();
/* drain pcp pages , this is synchrouns. */
- drain_all_pages();
+ drain_all_pages(GFP_KERNEL);
/* check again */
offlined_pages = check_pages_isolated(start_pfn, end_pfn);
if (offlined_pages < 0) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a712fb9..aaac25c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1169,17 +1169,17 @@ void drain_local_pages(void *arg)
* nothing keeps CPUs from showing up after we populated the cpumask and
* before the call to on_each_cpu_mask().
*/
-void drain_all_pages(void)
+void drain_all_pages(gfp_t gfp_flags)
{
int cpu;
struct per_cpu_pageset *pcp;
struct zone *zone;
+ cpumask_var_t cpus_with_pcps;

- /*
- * Allocate in the BSS so we wont require allocation in
- * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
- */
- static cpumask_t cpus_with_pcps;
+ if (!zalloc_cpumask_var(&cpus_with_pcps, gfp_flags)) {
+ on_each_cpu(drain_local_pages, NULL, 1);
+ return;
+ }

/*
* We don't care about racing with CPU hotplug event
@@ -1197,11 +1197,10 @@ void drain_all_pages(void)
}
}
if (has_pcps)
- cpumask_set_cpu(cpu, &cpus_with_pcps);
- else
- cpumask_clear_cpu(cpu, &cpus_with_pcps);
+ cpumask_set_cpu(cpu, cpus_with_pcps);
}
- on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
+ on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
+ free_cpumask_var(cpus_with_pcps);
}

#ifdef CONFIG_HIBERNATION
@@ -2132,7 +2131,7 @@ retry:
* pages are pinned on the per-cpu lists. Drain them and try again
*/
if (!page && !drained) {
- drain_all_pages();
+ drain_all_pages(GFP_ATOMIC);
drained = true;
goto retry;
}
@@ -5532,7 +5531,7 @@ out:

spin_unlock_irqrestore(&zone->lock, flags);
if (!ret)
- drain_all_pages();
+ drain_all_pages(GFP_KERNEL);
return ret;
}

2012-05-10 06:42:41

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

(5/10/12 12:54 AM), Rusty Russell wrote:
> On Wed, 09 May 2012 22:43:39 -0400, KOSAKI Motohiro<[email protected]> wrote:
>>> Or is there a reason we shouldn't even try to allocate here?
>>
>> 1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.
>
> Oh :(
>
> How about the below instead?

This code still slow than original. when calling reclaim path, new allocation is almost always
fail. then, your code almost always invoke all cpu batch invalidation. i.e. many ipi.


>> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
>> cause stack overflow. because of, alloc_pages() can be called from
>> very deep call stack.
>
> You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
> otherwise you'll get many other stack overflows, too.

Original code put cpumask bss instead stack then. :-)

2012-05-10 07:42:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.


* Rusty Russell <[email protected]> wrote:

> Mainly because I didn't want to disturb the archs which don't
> care at all about large cpumasks. After all, putting a struct
> cpumask on the stack is pretty convenient.

Yes.

> But we could add a new arch config which removes it, and set
> it from x86.

Could we just use a single cpumask type, cpumask_t or so, which
would be the *only* generic method to use cpumasks?

(Current cpumask_t would move to cpumask_full_t.)

This would be the 'final' destiation for the cpumask code: the
natural type to use in new code is cpumask_t, while in special
cases we could use cpumask_full_t - but the name signals that
it's a potentially large structure.

On architectures that don't worry about large cpumasks (yet ...)
cpumask_t and cpumask_full_t maps to the same thing, so there's
no difference.

This would make things more natural IMO.

There would be no 'struct cpumask'. (and 'cpumask_var_t' would
disappear too due to the rename.)

Thoughts?

Ingo

2012-05-15 01:27:48

by Rusty Russell

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

On Thu, 10 May 2012 09:42:15 +0200, Ingo Molnar <[email protected]> wrote:
>
> * Rusty Russell <[email protected]> wrote:
>
> > Mainly because I didn't want to disturb the archs which don't
> > care at all about large cpumasks. After all, putting a struct
> > cpumask on the stack is pretty convenient.
>
> Yes.
>
> > But we could add a new arch config which removes it, and set
> > it from x86.
>
> Could we just use a single cpumask type, cpumask_t or so, which
> would be the *only* generic method to use cpumasks?
>
> (Current cpumask_t would move to cpumask_full_t.)
>
> This would be the 'final' destiation for the cpumask code: the
> natural type to use in new code is cpumask_t, while in special
> cases we could use cpumask_full_t - but the name signals that
> it's a potentially large structure.
>
> On architectures that don't worry about large cpumasks (yet ...)
> cpumask_t and cpumask_full_t maps to the same thing, so there's
> no difference.
>
> This would make things more natural IMO.
>
> There would be no 'struct cpumask'. (and 'cpumask_var_t' would
> disappear too due to the rename.)
>
> Thoughts?

I don't understand, sorry. I think I'd need some code to understand.

Unfortunately I was wrong about being able to remove struct cpumask's
definition when CONFIG_CPUMASK_OFFSTACK=n: we need it for cpumask_var_t
in that case :(

Cheers,
Rusty.

2012-05-15 01:27:47

by Rusty Russell

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

On Thu, 10 May 2012 02:42:43 -0400, KOSAKI Motohiro <[email protected]> wrote:
> (5/10/12 12:54 AM), Rusty Russell wrote:
> > On Wed, 09 May 2012 22:43:39 -0400, KOSAKI Motohiro<[email protected]> wrote:
> >>> Or is there a reason we shouldn't even try to allocate here?
> >>
> >> 1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.
> >
> > Oh :(
> >
> > How about the below instead?
>
> This code still slow than original. when calling reclaim path, new allocation is almost always
> fail. then, your code almost always invoke all cpu batch invalidation. i.e. many ipi.

I don't know this code. Does that happen often? Do we really need to
optimize the out-of-memory path?

But I should have used on_each_cpu_cond() helper which does this for us
(except it falls back to individial IPIs) which would make this code
neater.

> >> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
> >> cause stack overflow. because of, alloc_pages() can be called from
> >> very deep call stack.
> >
> > You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
> > otherwise you'll get many other stack overflows, too.
>
> Original code put cpumask bss instead stack then. :-)

Yes, and this is what it looks like if we convert it directly, but I
still don't want to encourage people to do this :(

Cheers,
Rusty.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1179,7 +1179,7 @@ void drain_all_pages(void)
* Allocate in the BSS so we wont require allocation in
* direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
*/
- static cpumask_t cpus_with_pcps;
+ static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);

/*
* We don't care about racing with CPU hotplug event
@@ -1197,11 +1197,12 @@ void drain_all_pages(void)
}
}
if (has_pcps)
- cpumask_set_cpu(cpu, &cpus_with_pcps);
+ cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
else
- cpumask_clear_cpu(cpu, &cpus_with_pcps);
+ cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
}
- on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
+ on_each_cpu_mask(to_cpumask(cpus_with_pcps),
+ drain_local_pages, NULL, 1);
}

#ifdef CONFIG_HIBERNATION

2012-05-15 01:38:07

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.

>> This code still slow than original. when calling reclaim path, new allocation is almost always
>> fail. then, your code almost always invoke all cpu batch invalidation. i.e. many ipi.
>
> I don't know this code. Does that happen often?Do we really need to
> optimize the out-of-memory path?

we don't need optimize out-of-memory path. but it's not out-of-memory path. our reclaim code
has two steps 1) purge small file cache (try_to_free_pages) 2) get new page (get_page_from_freelist).
but if you have smp box, it's racy. To success 1) doesn't guarantee to success 2). then, drain_all_pages()
is called frequently than you expected.



> But I should have used on_each_cpu_cond() helper which does this for us
> (except it falls back to individial IPIs) which would make this code
> neater.

Ah, yes. that definitely makes sense.


>>>> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
>>>> cause stack overflow. because of, alloc_pages() can be called from
>>>> very deep call stack.
>>>
>>> You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
>>> otherwise you'll get many other stack overflows, too.
>>
>> Original code put cpumask bss instead stack then. :-)
>
> Yes, and this is what it looks like if we convert it directly, but I
> still don't want to encourage people to do this :(
>
> Cheers,
> Rusty.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1179,7 +1179,7 @@ void drain_all_pages(void)
> * Allocate in the BSS so we wont require allocation in
> * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
> */
> - static cpumask_t cpus_with_pcps;
> + static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);
>
> /*
> * We don't care about racing with CPU hotplug event
> @@ -1197,11 +1197,12 @@ void drain_all_pages(void)
> }
> }
> if (has_pcps)
> - cpumask_set_cpu(cpu,&cpus_with_pcps);
> + cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
> else
> - cpumask_clear_cpu(cpu,&cpus_with_pcps);
> + cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
> }
> - on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
> + on_each_cpu_mask(to_cpumask(cpus_with_pcps),
> + drain_local_pages, NULL, 1);
> }

Looks good to me. thanks.

Acked-by: KOSAKI Motohiro <[email protected]>