2006-02-09 08:41:45

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

In-Reply-To: <[email protected]>

On Wed, 8 Feb 2006 at 20:45:02 -0800, Andrew Morton wrote:

> > Its even documented in line 332 of include/linux/cpumask.h
> >
> > * #ifdef CONFIG_HOTPLUG_CPU
> > * cpu_possible_map - all NR_CPUS bits set
>
> That seems a quite bad idea. If we know which CPUs are possible we should
> populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is
> set.

I don't think that's, um, "possible." Even if you could discover how many
empty sockets there were in a system, someone might be able to hotplug
a board with more of them on it. And there's no way to tell how many CPUs
to reserve for each socket anyway, e.g. AMD has already announced quad-core
processors.

But what really surprised me is that for_each_cpu() actually walks
cpu_possible_map and not cpu_present_map as I had assumed. This violates
the Principle Of Least Surprise. Maybe renaming for_each_cpu to
for_each_possible_cpu might be a good idea?

--
Chuck
"Equations are the Devil's sentences." --Stephen Colbert


2006-02-09 08:56:59

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

Chuck wrote:
> I don't think that's, um, "possible." Even if you could discover how many
> empty sockets there were in a system, someone might be able to hotplug
> a board with more of them on it.

That's going to depend on your system hardware configuration.

cpu_possible_map should be whatever is the largest set of
cpus you might possibly want to deal with plugging in.

Some hardware configurations will support more addition
of hardware cpus than others.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-02-09 09:08:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

Chuck Ebbert <[email protected]> wrote:
>
> In-Reply-To: <[email protected]>
>
> On Wed, 8 Feb 2006 at 20:45:02 -0800, Andrew Morton wrote:
>
> > > Its even documented in line 332 of include/linux/cpumask.h
> > >
> > > * #ifdef CONFIG_HOTPLUG_CPU
> > > * cpu_possible_map - all NR_CPUS bits set
> >
> > That seems a quite bad idea. If we know which CPUs are possible we should
> > populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is
> > set.
>
> I don't think that's, um, "possible." Even if you could discover how many
> empty sockets there were in a system, someone might be able to hotplug
> a board with more of them on it. And there's no way to tell how many CPUs
> to reserve for each socket anyway, e.g. AMD has already announced quad-core
> processors.

Well maybe. But it's awfully presumptuous for us to say that no platform
will be capable of telling us what its maximum number of CPUs is, or even
whether certain CPUs within that range aren't possible.

<checks>

Yup, on my 2-way x86 test box, with NR_CPUS=16 we have
cpu_possible_map=0xffff.

That's just insane - the default setting for a distro kernel should be (or
will become) NR_CPUS=lots, HOTPLUG_CPU=y. All those for_each_cpu() loops
are iterating across 16 CPUs.

aargh.

> But what really surprised me is that for_each_cpu() actually walks
> cpu_possible_map and not cpu_present_map as I had assumed. This violates
> the Principle Of Least Surprise. Maybe renaming for_each_cpu to
> for_each_possible_cpu might be a good idea?

That would make sense.

2006-02-09 09:12:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

Andrew Morton <[email protected]> wrote:
>
> aargh.
>

Actually, x86 appears to be the only arch which suffers this braindamage.
The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
CPU_MASK_NONE equals all-zeroes).

2006-02-09 10:08:45

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

> > aargh.
> Actually, x86 appears to be the only arch which suffers this braindamage.
> The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> CPU_MASK_NONE equals all-zeroes).

s390 will join, as soon as the cpu_possible_map fix is merged...

Heiko

2006-02-09 10:20:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

Heiko Carstens <[email protected]> wrote:
>
> > > aargh.
> > Actually, x86 appears to be the only arch which suffers this braindamage.
> > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > CPU_MASK_NONE equals all-zeroes).
>
> s390 will join, as soon as the cpu_possible_map fix is merged...

What cpu_possible_map fix?

2006-02-09 10:23:28

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

> > > > aargh.
> > > Actually, x86 appears to be the only arch which suffers this braindamage.
> > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > > CPU_MASK_NONE equals all-zeroes).
> >
> > s390 will join, as soon as the cpu_possible_map fix is merged...
>
> What cpu_possible_map fix?

This one:

http://lkml.org/lkml/2006/2/8/162

Heiko

2006-02-09 10:32:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

Heiko Carstens <[email protected]> wrote:
>
> > > > > aargh.
> > > > Actually, x86 appears to be the only arch which suffers this braindamage.
> > > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > > > CPU_MASK_NONE equals all-zeroes).
> > >
> > > s390 will join, as soon as the cpu_possible_map fix is merged...
> >
> > What cpu_possible_map fix?
>
> This one:
>
> http://lkml.org/lkml/2006/2/8/162
>

Oh, OK. Ow, I don't think you want to do that. It means that all those
for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
or not they're even possible.

2006-02-09 11:47:09

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

> > > > > Actually, x86 appears to be the only arch which suffers this braindamage.
> > > > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > > > > CPU_MASK_NONE equals all-zeroes).
> > > >
> > > > s390 will join, as soon as the cpu_possible_map fix is merged...
> > >
> > > What cpu_possible_map fix?
> >
> > This one:
> >
> > http://lkml.org/lkml/2006/2/8/162
> >
>
> Oh, OK. Ow, I don't think you want to do that. It means that all those
> for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
> or not they're even possible.

That's ok. We're mainly running under z/VM where you can attach new virtual
cpus on the fly to the virtual machine (up to 64 cpus).
The only difference to before is that it was possible to limit the waste of
resources by passing a number with 'maxcpus'. This value was used to generate
the cpu_possible_map.
But since the map needs to be ready when we return from setup_arch, we don't
have access to max_cpus, unless we parse commandline on our own...

Heiko

2006-02-09 12:46:40

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

Heiko Carstens a ?crit :
>>>>> > Actually, x86 appears to be the only arch which suffers this braindamage.
>>>>> > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
>>>>> > CPU_MASK_NONE equals all-zeroes).
>>>>>
>>>>> s390 will join, as soon as the cpu_possible_map fix is merged...
>>>> What cpu_possible_map fix?
>>> This one:
>>>
>>> http://lkml.org/lkml/2006/2/8/162
>>>
>> Oh, OK. Ow, I don't think you want to do that. It means that all those
>> for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
>> or not they're even possible.
>
> That's ok. We're mainly running under z/VM where you can attach new virtual
> cpus on the fly to the virtual machine (up to 64 cpus).
> The only difference to before is that it was possible to limit the waste of
> resources by passing a number with 'maxcpus'. This value was used to generate
> the cpu_possible_map.
> But since the map needs to be ready when we return from setup_arch, we don't
> have access to max_cpus, unless we parse commandline on our own...
>

Then it's OK to clear bits from cpu_possible_map once you have max_cpus value

for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++)
cpu_clear(cpu, cpu_possible_map);

2006-02-09 13:12:30

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

> >>Oh, OK. Ow, I don't think you want to do that. It means that all those
> >>for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
> >>or not they're even possible.
> >That's ok. We're mainly running under z/VM where you can attach new virtual
> >cpus on the fly to the virtual machine (up to 64 cpus).
> >The only difference to before is that it was possible to limit the waste of
> >resources by passing a number with 'maxcpus'. This value was used to generate
> >the cpu_possible_map.
> >But since the map needs to be ready when we return from setup_arch, we don't
> >have access to max_cpus, unless we parse commandline on our own...
>
> Then it's OK to clear bits from cpu_possible_map once you have max_cpus value
>
> for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++)
> cpu_clear(cpu, cpu_possible_map);

Hmm... I don't think the semantics of cpu_possible_map allow to change it.
Also any code that uses for_each_cpu() may allocate memory _before_
cpu_possible_map is changed back to reflect a smaller number of cpus.
Doesn't look like the correct way to fix this.
Thinking a bit longer this was probably a reason why initialization of
this map was done in smp_prepare_cpus() before it silently moved to
setup_arch().

Heiko

2006-02-09 13:57:04

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] percpu data: only iterate over possible CPUs

Heiko Carstens a ?crit :
>> for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++)
>> cpu_clear(cpu, cpu_possible_map);
>
> Hmm... I don't think the semantics of cpu_possible_map allow to change it.
> Also any code that uses for_each_cpu() may allocate memory _before_
> cpu_possible_map is changed back to reflect a smaller number of cpus.
> Doesn't look like the correct way to fix this.
> Thinking a bit longer this was probably a reason why initialization of
> this map was done in smp_prepare_cpus() before it silently moved to
> setup_arch().

Hum... of course you may loose some bits of memory (percpu data for example)
but clearing a cpu in cpu_possible_map is allowed.

See arch/alpha/kernel/process.c and arch/x86_64/kernel/smpboot.c for uses of
cpu_clear()