2005-12-08 05:07:45

by Dave Jones

[permalink] [raw]
Subject: for_each_online_cpu broken ?

Whilst debugging a memory leak, I hit sysrq meminfo,
and got hot/cold info for CONFIG_NR_CPUS rather than 4 cpus
I fixed a bug recently in mm/page_alloc.c to change this from
a for_each_cpu to a for_each_online_cpu and I'm pretty certain
I tested that it worked, but for reasons unknown, it's now
misbehaving again.

I've only tried reproducing this on x86-64 so far.

Dave


2005-12-08 05:26:39

by Andi Kleen

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

On Thu, Dec 08, 2005 at 12:07:39AM -0500, Dave Jones wrote:
> Whilst debugging a memory leak, I hit sysrq meminfo,
> and got hot/cold info for CONFIG_NR_CPUS rather than 4 cpus
> I fixed a bug recently in mm/page_alloc.c to change this from
> a for_each_cpu to a for_each_online_cpu and I'm pretty certain
> I tested that it worked, but for reasons unknown, it's now
> misbehaving again.
>
> I've only tried reproducing this on x86-64 so far.

If the online map is wrong all kinds of things would go wrong.

Most likely your kernel doesn't have the fix.

The possible map is fixed kind of BTW in 2.6.15rc*. It was a side effect
of CPU hotplug, which now uses a better algorithm to guess the
number of possible CPUs. In 2.6.15 you will just get half the number
of available CPUs in addition by default

-Andi

2005-12-08 05:33:09

by Dave Jones

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

On Thu, Dec 08, 2005 at 06:26:32AM +0100, Andi Kleen wrote:

Hi Andi,

> > Whilst debugging a memory leak, I hit sysrq meminfo,
> > and got hot/cold info for CONFIG_NR_CPUS rather than 4 cpus
> >
> > I've only tried reproducing this on x86-64 so far.
>
> If the online map is wrong all kinds of things would go wrong.
>
> Most likely your kernel doesn't have the fix.

This was seen with a .15rc5-git1 kernel.
Is this something still living in your x86-64 patchset or -mm ?

> The possible map is fixed kind of BTW in 2.6.15rc*. It was a side effect
> of CPU hotplug, which now uses a better algorithm to guess the
> number of possible CPUs. In 2.6.15 you will just get half the number
> of available CPUs in addition by default

Yep, I noticed it offers a maximum of 6 cpus on my way.
As a sidenote, seems kinda funny (and wasteful maybe?), doing this
on a lot of hardware that isn't hotplug capable. (Whilst I could
disable cpu hotplug in my local build, this isn't an answer for
a generic distro kernel).

Dave

2005-12-08 05:38:32

by David Miller

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

From: Dave Jones <[email protected]>
Date: Thu, 8 Dec 2005 00:33:02 -0500

> On Thu, Dec 08, 2005 at 06:26:32AM +0100, Andi Kleen wrote:
>
> > The possible map is fixed kind of BTW in 2.6.15rc*. It was a side effect
> > of CPU hotplug, which now uses a better algorithm to guess the
> > number of possible CPUs. In 2.6.15 you will just get half the number
> > of available CPUs in addition by default
>
> Yep, I noticed it offers a maximum of 6 cpus on my way.
> As a sidenote, seems kinda funny (and wasteful maybe?), doing this
> on a lot of hardware that isn't hotplug capable. (Whilst I could
> disable cpu hotplug in my local build, this isn't an answer for
> a generic distro kernel).

This can be dangerous btw, as some subsystem such as netfilter
allocate enormous datastructures based upon the largest possible
cpu number in the system.

In 2.6.16 it will use something a bit more intelligent, but
overestimating the possible cpu set can be quite a waste.

2005-12-08 06:12:22

by Andi Kleen

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

On Wed, Dec 07, 2005 at 09:38:25PM -0800, David S. Miller wrote:
> From: Dave Jones <[email protected]>
> Date: Thu, 8 Dec 2005 00:33:02 -0500
>
> > On Thu, Dec 08, 2005 at 06:26:32AM +0100, Andi Kleen wrote:
> >
> > > The possible map is fixed kind of BTW in 2.6.15rc*. It was a side effect
> > > of CPU hotplug, which now uses a better algorithm to guess the
> > > number of possible CPUs. In 2.6.15 you will just get half the number
> > > of available CPUs in addition by default
> >
> > Yep, I noticed it offers a maximum of 6 cpus on my way.
> > As a sidenote, seems kinda funny (and wasteful maybe?), doing this
> > on a lot of hardware that isn't hotplug capable. (Whilst I could
> > disable cpu hotplug in my local build, this isn't an answer for
> > a generic distro kernel).

If you can figure out a way to detect this please share.
The ACPI designers unfortunately didn't think that far
(they did it right for memory hotplug, but not for CPU)

I invented an ACPI extensin for it, but it's non standard
so the half of CPUs is used as a default unless overwritten
(additional_cpus=NUM)

Anyways I changed it earlier to 1 additional CPU by default.

>
> This can be dangerous btw, as some subsystem such as netfilter
> allocate enormous datastructures based upon the largest possible
> cpu number in the system.

It is a big improvement in fact. Previously with NR_CPUS==128
(default on some distros) and CPU_HOTPLUG enabled it would
always allocate several hundred KB of just standard
per cpu data unnecessarily.

-Andi

2005-12-08 06:27:54

by Dave Jones

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

On Thu, Dec 08, 2005 at 07:12:12AM +0100, Andi Kleen wrote:
> On Wed, Dec 07, 2005 at 09:38:25PM -0800, David S. Miller wrote:
> > From: Dave Jones <[email protected]>
> > Date: Thu, 8 Dec 2005 00:33:02 -0500
> >
> > > On Thu, Dec 08, 2005 at 06:26:32AM +0100, Andi Kleen wrote:
> > >
> > > > The possible map is fixed kind of BTW in 2.6.15rc*. It was a side effect
> > > > of CPU hotplug, which now uses a better algorithm to guess the
> > > > number of possible CPUs. In 2.6.15 you will just get half the number
> > > > of available CPUs in addition by default
> > >
> > > Yep, I noticed it offers a maximum of 6 cpus on my way.
> > > As a sidenote, seems kinda funny (and wasteful maybe?), doing this
> > > on a lot of hardware that isn't hotplug capable. (Whilst I could
> > > disable cpu hotplug in my local build, this isn't an answer for
> > > a generic distro kernel).
>
> If you can figure out a way to detect this please share.
> The ACPI designers unfortunately didn't think that far
> (they did it right for memory hotplug, but not for CPU)
>
> I invented an ACPI extensin for it, but it's non standard
> so the half of CPUs is used as a default unless overwritten
> (additional_cpus=NUM)
>
> Anyways I changed it earlier to 1 additional CPU by default.

Just guessing seems to be pretty guaranteed to give the wrong answer.
I think it makes more sense to say "if your BIOS doesn't give
the relevant info (as is usually the case), boot with additional_cpus)

Penalising the many for the needs of the few just seems wrong.

Dave

2005-12-08 06:25:39

by Nigel Cunningham

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

Hi.

On Thu, 2005-12-08 at 15:33, Dave Jones wrote:
> On Thu, Dec 08, 2005 at 06:26:32AM +0100, Andi Kleen wrote:
>
> Hi Andi,
>
> > > Whilst debugging a memory leak, I hit sysrq meminfo,
> > > and got hot/cold info for CONFIG_NR_CPUS rather than 4 cpus
> > >
> > > I've only tried reproducing this on x86-64 so far.
> >
> > If the online map is wrong all kinds of things would go wrong.
> >
> > Most likely your kernel doesn't have the fix.
>
> This was seen with a .15rc5-git1 kernel.
> Is this something still living in your x86-64 patchset or -mm ?
>
> > The possible map is fixed kind of BTW in 2.6.15rc*. It was a side effect
> > of CPU hotplug, which now uses a better algorithm to guess the
> > number of possible CPUs. In 2.6.15 you will just get half the number
> > of available CPUs in addition by default
>
> Yep, I noticed it offers a maximum of 6 cpus on my way.
> As a sidenote, seems kinda funny (and wasteful maybe?), doing this
> on a lot of hardware that isn't hotplug capable. (Whilst I could
> disable cpu hotplug in my local build, this isn't an answer for
> a generic distro kernel).

Both suspend to disk (and suspend to ram?) implementations now depend on
hotplug_cpu to enable extra cpus, so there is at least one reason for
them to want hotplug support in a generic kernel.

Regards,

Nigel

2005-12-08 06:28:53

by Dave Jones

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

On Thu, Dec 08, 2005 at 04:22:05PM +1000, Nigel Cunningham wrote:

> > Yep, I noticed it offers a maximum of 6 cpus on my way.
> > As a sidenote, seems kinda funny (and wasteful maybe?), doing this
> > on a lot of hardware that isn't hotplug capable. (Whilst I could
> > disable cpu hotplug in my local build, this isn't an answer for
> > a generic distro kernel).
>
> Both suspend to disk (and suspend to ram?) implementations now depend on
> hotplug_cpu to enable extra cpus, so there is at least one reason for
> them to want hotplug support in a generic kernel.

You mean suspend -> plug in a new cpu -> resume transitions ?
That sounds *terrifying*.

Dave

2005-12-08 06:30:22

by Andi Kleen

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

On Thu, Dec 08, 2005 at 01:28:44AM -0500, Dave Jones wrote:
> On Thu, Dec 08, 2005 at 04:22:05PM +1000, Nigel Cunningham wrote:
>
> > > Yep, I noticed it offers a maximum of 6 cpus on my way.
> > > As a sidenote, seems kinda funny (and wasteful maybe?), doing this
> > > on a lot of hardware that isn't hotplug capable. (Whilst I could
> > > disable cpu hotplug in my local build, this isn't an answer for
> > > a generic distro kernel).
> >
> > Both suspend to disk (and suspend to ram?) implementations now depend on
> > hotplug_cpu to enable extra cpus, so there is at least one reason for
> > them to want hotplug support in a generic kernel.
>
> You mean suspend -> plug in a new cpu -> resume transitions ?
> That sounds *terrifying*.

No, they unplug all non BP CPUs before suspending.
It obviously doesn't require additional CPUs though.


-Andi

2005-12-08 06:30:52

by Andi Kleen

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

> This was seen with a .15rc5-git1 kernel.
> Is this something still living in your x86-64 patchset or -mm ?

Only the change for less additional CPUs.

-Andi

2005-12-08 06:43:30

by Dave Jones

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

On Thu, Dec 08, 2005 at 07:30:50AM +0100, Andi Kleen wrote:
> > This was seen with a .15rc5-git1 kernel.
> > Is this something still living in your x86-64 patchset or -mm ?
>
> Only the change for less additional CPUs.

Ok, then I'm at a loss why it's dumping state for NR_CPUS cpus.
I'm away tomorrow, but I'll poke at it some more on Friday.

Dave

2005-12-09 00:06:58

by Nigel Cunningham

[permalink] [raw]
Subject: Re: for_each_online_cpu broken ?

Hi.

On Thu, 2005-12-08 at 16:28, Dave Jones wrote:
> On Thu, Dec 08, 2005 at 04:22:05PM +1000, Nigel Cunningham wrote:
>
> > > Yep, I noticed it offers a maximum of 6 cpus on my way.
> > > As a sidenote, seems kinda funny (and wasteful maybe?), doing this
> > > on a lot of hardware that isn't hotplug capable. (Whilst I could
> > > disable cpu hotplug in my local build, this isn't an answer for
> > > a generic distro kernel).
> >
> > Both suspend to disk (and suspend to ram?) implementations now depend on
> > hotplug_cpu to enable extra cpus, so there is at least one reason for
> > them to want hotplug support in a generic kernel.
>
> You mean suspend -> plug in a new cpu -> resume transitions ?
> That sounds *terrifying*.

Andi is right, it's just a logical unplug. But having said that, I
suppose extra cpus could be plugged/unplugged during a suspend to disk.
Not that I've ever tried it. I have a real SMP mobo, but haven't had the
opportunity to fire it up.

Regards,

Nigel