2007-11-27 21:55:59

by Mike Travis

[permalink] [raw]
Subject: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

Change loops controlled by 'for (i = 0; i < NR_CPUS; i++)' to use
'for_each_possible_cpu(i)' when there's a _remote possibility_ of
dereferencing a non-allocated per_cpu variable involved.

All files except mm/vmstat.c are x86 arch.

Based on 2.6.24-rc3-mm1 .

Thanks to [email protected] for pointing this out.

Signed-off-by: Mike Travis <[email protected]>
---
arch/x86/kernel/smp_32.c | 4 ++--
arch/x86/kernel/smpboot_32.c | 4 ++--
arch/x86/xen/smp.c | 4 ++--
mm/vmstat.c | 4 ++--
4 files changed, 8 insertions(+), 8 deletions(-)

--- a/arch/x86/kernel/smp_32.c
+++ b/arch/x86/kernel/smp_32.c
@@ -223,7 +223,7 @@ void send_IPI_mask_sequence(cpumask_t ma
*/

local_irq_save(flags);
- for (query_cpu = 0; query_cpu < NR_CPUS; ++query_cpu) {
+ for_each_possible_cpu(query_cpu) {
if (cpu_isset(query_cpu, mask)) {
__send_IPI_dest_field(cpu_to_logical_apicid(query_cpu),
vector);
@@ -675,7 +675,7 @@ static int convert_apicid_to_cpu(int api
{
int i;

- for (i = 0; i < NR_CPUS; i++) {
+ for_each_possible_cpu(i) {
if (per_cpu(x86_cpu_to_apicid, i) == apic_id)
return i;
}
--- a/arch/x86/kernel/smpboot_32.c
+++ b/arch/x86/kernel/smpboot_32.c
@@ -1091,7 +1091,7 @@ static void __init smp_boot_cpus(unsigne
* Allow the user to impress friends.
*/
Dprintk("Before bogomips.\n");
- for (cpu = 0; cpu < NR_CPUS; cpu++)
+ for_each_possible_cpu(cpu)
if (cpu_isset(cpu, cpu_callout_map))
bogosum += cpu_data(cpu).loops_per_jiffy;
printk(KERN_INFO
@@ -1122,7 +1122,7 @@ static void __init smp_boot_cpus(unsigne
* construct cpu_sibling_map, so that we can tell sibling CPUs
* efficiently.
*/
- for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ for_each_possible_cpu(cpu) {
cpus_clear(per_cpu(cpu_sibling_map, cpu));
cpus_clear(per_cpu(cpu_core_map, cpu));
}
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -146,7 +146,7 @@ void __init xen_smp_prepare_boot_cpu(voi
old memory can be recycled */
make_lowmem_page_readwrite(&per_cpu__gdt_page);

- for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ for_each_possible_cpu(cpu) {
cpus_clear(per_cpu(cpu_sibling_map, cpu));
/*
* cpu_core_map lives in a per cpu area that is cleared
@@ -163,7 +163,7 @@ void __init xen_smp_prepare_cpus(unsigne
{
unsigned cpu;

- for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ for_each_possible_cpu(cpu) {
cpus_clear(per_cpu(cpu_sibling_map, cpu));
/*
* cpu_core_ map will be zeroed when the per
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -27,12 +27,12 @@ static void sum_vm_events(unsigned long
memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));

cpu = first_cpu(*cpumask);
- while (cpu < NR_CPUS) {
+ while (cpu < NR_CPUS && cpu_possible(cpu)) {
struct vm_event_state *this = &per_cpu(vm_event_states, cpu);

cpu = next_cpu(cpu, *cpumask);

- if (cpu < NR_CPUS)
+ if (cpu < NR_CPUS && cpu_possible(cpu))
prefetch(&per_cpu(vm_event_states, cpu));



--


2007-11-27 22:16:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, Nov 27, 2007 at 01:50:53PM -0800, [email protected] wrote:
> Change loops controlled by 'for (i = 0; i < NR_CPUS; i++)' to use
> 'for_each_possible_cpu(i)' when there's a _remote possibility_ of
> dereferencing a non-allocated per_cpu variable involved.
>
> All files except mm/vmstat.c are x86 arch.
>
> Based on 2.6.24-rc3-mm1 .
>
> Thanks to [email protected] for pointing this out.

Looks good to me. 2.6.24 candidate.

-Andi

2007-11-27 23:02:53

by PaX Team

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On 27 Nov 2007 at 13:50, [email protected] wrote:

> Change loops controlled by 'for (i = 0; i < NR_CPUS; i++)' to use
> 'for_each_possible_cpu(i)' when there's a _remote possibility_ of
> dereferencing a non-allocated per_cpu variable involved.

actually, it's not that remote, it happens every time
NR_CPUS > num_possible_cpus(). i ran into this myself
on a dual core box with NR_CPUS=4. due to my rewrite
of the i386 per-cpu segment handling, i actually got
a NULL deref where the vanilla kernel would be accessing
the area of [__per_cpu_start, __per_cpu_end] for each
non-possible CPU (which doesn't crash per se but is
still not correct somehow i think).

2007-11-27 23:13:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, 27 Nov 2007 23:16:28 +0100
Andi Kleen <[email protected]> wrote:

> On Tue, Nov 27, 2007 at 01:50:53PM -0800, [email protected] wrote:
> > Change loops controlled by 'for (i = 0; i < NR_CPUS; i++)' to use
> > 'for_each_possible_cpu(i)' when there's a _remote possibility_ of
> > dereferencing a non-allocated per_cpu variable involved.
> >
> > All files except mm/vmstat.c are x86 arch.
> >
> > Based on 2.6.24-rc3-mm1 .
> >
> > Thanks to [email protected] for pointing this out.
>
> Looks good to me. 2.6.24 candidate.

hm. Has anyone any evidence that we're actually touching
not-possible-cpu's memory here?

Also, the sum_vm_events() change looks buggy - it assumes that
cpu_possible_map has no gaps in it. But that change is unneeded because
sum_vm_events() is only ever passed cpu_online_map and I'm hoping that we
don't usually online not-possible CPUs.

--- a/mm/vmstat.c~mm-prevent-dereferencing-non-allocated-per_cpu-variables-fix
+++ a/mm/vmstat.c
@@ -27,12 +27,12 @@ static void sum_vm_events(unsigned long
memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));

cpu = first_cpu(*cpumask);
- while (cpu < NR_CPUS && cpu_possible(cpu)) {
+ while (cpu < NR_CPUS) {
struct vm_event_state *this = &per_cpu(vm_event_states, cpu);

cpu = next_cpu(cpu, *cpumask);

- if (cpu < NR_CPUS && cpu_possible(cpu))
+ if (cpu < NR_CPUS)
prefetch(&per_cpu(vm_event_states, cpu));


_

2007-11-27 23:15:16

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, 27 Nov 2007, Andrew Morton wrote:

> hm. Has anyone any evidence that we're actually touching
> not-possible-cpu's memory here?

I saw it in acpi when the __cpu_offset() pointers become zero. I have
never seen it in vmstat.c. We do not need the vmstat.c fix.

2007-11-27 23:21:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, 27 Nov 2007 15:12:41 -0800
Andrew Morton <[email protected]> wrote:

> On Tue, 27 Nov 2007 23:16:28 +0100
> Andi Kleen <[email protected]> wrote:
>
> > On Tue, Nov 27, 2007 at 01:50:53PM -0800, [email protected] wrote:
> > > Change loops controlled by 'for (i = 0; i < NR_CPUS; i++)' to use
> > > 'for_each_possible_cpu(i)' when there's a _remote possibility_ of
> > > dereferencing a non-allocated per_cpu variable involved.
> > >
> > > All files except mm/vmstat.c are x86 arch.
> > >
> > > Based on 2.6.24-rc3-mm1 .
> > >
> > > Thanks to [email protected] for pointing this out.
> >
> > Looks good to me. 2.6.24 candidate.
>
> hm. Has anyone any evidence that we're actually touching
> not-possible-cpu's memory here?
>
> Also, the sum_vm_events() change looks buggy - it assumes that
> cpu_possible_map has no gaps in it. But that change is unneeded because
> sum_vm_events() is only ever passed cpu_online_map and I'm hoping that we
> don't usually online not-possible CPUs.
>
> --- a/mm/vmstat.c~mm-prevent-dereferencing-non-allocated-per_cpu-variables-fix
> +++ a/mm/vmstat.c
> @@ -27,12 +27,12 @@ static void sum_vm_events(unsigned long
> memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));
>
> cpu = first_cpu(*cpumask);
> - while (cpu < NR_CPUS && cpu_possible(cpu)) {
> + while (cpu < NR_CPUS) {
> struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
>
> cpu = next_cpu(cpu, *cpumask);
>
> - if (cpu < NR_CPUS && cpu_possible(cpu))
> + if (cpu < NR_CPUS)
> prefetch(&per_cpu(vm_event_states, cpu));

The prefetch however might still need some work - we can indeed do
prefetch() against a not-possible CPU's memory here. And I do recall that
4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
prefetching from a bad address. I forget what the conclusion was on that
matter.

If we do want to fix the prefetch-from-outer-space then we should be using
cpu_isset(cpu, *cpumask) here rather than cpu_possible().

2007-11-27 23:23:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, 27 Nov 2007, Andrew Morton wrote:

> The prefetch however might still need some work - we can indeed do
> prefetch() against a not-possible CPU's memory here. And I do recall that
> 4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
> prefetching from a bad address. I forget what the conclusion was on that
> matter.
>
> If we do want to fix the prefetch-from-outer-space then we should be using
> cpu_isset(cpu, *cpumask) here rather than cpu_possible().

Generally the prefetch things have turned out to be not that useful. How
about dropping the prefetch? I kept it because it was there.

2007-11-27 23:42:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, 27 Nov 2007 15:22:56 -0800 (PST)
Christoph Lameter <[email protected]> wrote:

> On Tue, 27 Nov 2007, Andrew Morton wrote:
>
> > The prefetch however might still need some work - we can indeed do
> > prefetch() against a not-possible CPU's memory here. And I do recall that
> > 4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
> > prefetching from a bad address. I forget what the conclusion was on that
> > matter.
> >
> > If we do want to fix the prefetch-from-outer-space then we should be using
> > cpu_isset(cpu, *cpumask) here rather than cpu_possible().
>
> Generally the prefetch things have turned out to be not that useful. How
> about dropping the prefetch? I kept it because it was there.

I don't recall anyone ever demonstrating that prefetch is useful in-kernel.

I think I've heard of situations where benefits have been seen in userspace
- if a loop does a lot of calculation on each datum which it fetches then
there's a good opportunity to pipeline the fetch with the on-core
crunching. But kernel doesn't do that sort of thing..

2007-11-27 23:48:57

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, Nov 27, 2007 at 03:42:13PM -0800, Andrew Morton wrote:
> On Tue, 27 Nov 2007 15:22:56 -0800 (PST)
> Christoph Lameter <[email protected]> wrote:
>
> > On Tue, 27 Nov 2007, Andrew Morton wrote:
> >
> > > The prefetch however might still need some work - we can indeed do
> > > prefetch() against a not-possible CPU's memory here. And I do recall that
> > > 4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
> > > prefetching from a bad address. I forget what the conclusion was on that
> > > matter.
> > >
> > > If we do want to fix the prefetch-from-outer-space then we should be using
> > > cpu_isset(cpu, *cpumask) here rather than cpu_possible().
> >
> > Generally the prefetch things have turned out to be not that useful. How
> > about dropping the prefetch? I kept it because it was there.
>
> I don't recall anyone ever demonstrating that prefetch is useful in-kernel.

It was demonstrated useful for some specific cases, like context switch early
fetch on IA64. But I agree the prefetch on each list_for_each() is probably
a bad idea and should be removed. Will also help code size.

The best strategy is probably to figure out which oprofile counters
to use and then do some profiling and only insert prefetches where
the profiler actually finds significant cache misses.

-Andi

2007-11-28 00:09:51

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Tue, 27 Nov 2007, Andrew Morton wrote:

> I don't recall anyone ever demonstrating that prefetch is useful in-kernel.

vmstat: remove prefetch

Remove the prefetch logic in order to avoid touching impossible per cpu
areas.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/vmstat.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c 2007-11-27 16:04:15.345713812 -0800
+++ linux-2.6/mm/vmstat.c 2007-11-27 16:07:00.552713192 -0800
@@ -21,21 +21,14 @@ EXPORT_PER_CPU_SYMBOL(vm_event_states);

static void sum_vm_events(unsigned long *ret, cpumask_t *cpumask)
{
- int cpu = 0;
+ int cpu;
int i;

memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));

- cpu = first_cpu(*cpumask);
- while (cpu < NR_CPUS) {
+ for_each_cpu_mask(cpu, *cpumask) {
struct vm_event_state *this = &per_cpu(vm_event_states, cpu);

- cpu = next_cpu(cpu, *cpumask);
-
- if (cpu < NR_CPUS)
- prefetch(&per_cpu(vm_event_states, cpu));
-
-
for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
ret[i] += this->event[i];
}

2007-11-28 00:15:54

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

On Wed, 28 Nov 2007, Andi Kleen wrote:

> It was demonstrated useful for some specific cases, like context switch early
> fetch on IA64. But I agree the prefetch on each list_for_each() is probably
> a bad idea and should be removed. Will also help code size.

Looks like sum_vm_events() is only ever called from all_vm_events().
Callers of all_vm_events():

App monitoring?
arch/s390/appldata/appldata_mem.c: all_vm_events(ev);

Leds:
drivers/parisc/led.c: all_vm_events(events);

proc out put for /proc/vmstat:
mm/vmstat.c: all_vm_events(e);

All of that does not seem to be performance critical