2006-11-30 09:03:29

by Jens Axboe

[permalink] [raw]
Subject: CPU hotplug broken with 2GB VMSPLIT

Hi,

Just got a new notebook (Lenovo X60), setup a custom kernel and then I
noticed that suspend to ram doesn't work anymore. The machine suspends
just fine, on resume it brings back the text display but reboots after
it has stalled for a few seconds. On the suggestion of Pavel, I tried
testing CPU hotplug, and indeed he was right: I can offline 1 of the
cores fine, bringing it back online freezes the machine for 3-4 seconds
and then reboots.

carl:/sys/devices/system/cpu/cpu1 # echo 0 > online
carl:/sys/devices/system/cpu/cpu1 # dmesg
Breaking affinity for irq 219
CPU 1 is now offline
SMP alternatives: switching to UP code
carl:/sys/devices/system/cpu/cpu1 # echo 1 > online
Read from remote host carl: Connection reset by peer

Booting with maxcpus=1 and resume works fine. Does this ring a bell with
anyone? With highmem enabled and the standard vmsplit, cpu hotplug works
fine for me.

--
Jens Axboe


2006-11-30 09:13:13

by Jens Axboe

[permalink] [raw]
Subject: Re: CPU hotplug broken with 2GB VMSPLIT

On Thu, Nov 30 2006, Jens Axboe wrote:
> Hi,
>
> Just got a new notebook (Lenovo X60), setup a custom kernel and then I
> noticed that suspend to ram doesn't work anymore. The machine suspends
> just fine, on resume it brings back the text display but reboots after
> it has stalled for a few seconds. On the suggestion of Pavel, I tried
> testing CPU hotplug, and indeed he was right: I can offline 1 of the
> cores fine, bringing it back online freezes the machine for 3-4 seconds
> and then reboots.
>
> carl:/sys/devices/system/cpu/cpu1 # echo 0 > online
> carl:/sys/devices/system/cpu/cpu1 # dmesg
> Breaking affinity for irq 219
> CPU 1 is now offline
> SMP alternatives: switching to UP code
> carl:/sys/devices/system/cpu/cpu1 # echo 1 > online
> Read from remote host carl: Connection reset by peer
>
> Booting with maxcpus=1 and resume works fine. Does this ring a bell with
> anyone? With highmem enabled and the standard vmsplit, cpu hotplug works
> fine for me.

Some more clues - booting with noreplacement doesn't fix it, so I think
the alternatives code is off the hook.

--
Jens Axboe

2006-11-30 16:44:08

by Nathan Lynch

[permalink] [raw]
Subject: Re: CPU hotplug broken with 2GB VMSPLIT

Jens Axboe wrote:
> On Thu, Nov 30 2006, Jens Axboe wrote:
> > Hi,
> >
> > Just got a new notebook (Lenovo X60), setup a custom kernel and then I
> > noticed that suspend to ram doesn't work anymore. The machine suspends
> > just fine, on resume it brings back the text display but reboots after
> > it has stalled for a few seconds. On the suggestion of Pavel, I tried
> > testing CPU hotplug, and indeed he was right: I can offline 1 of the
> > cores fine, bringing it back online freezes the machine for 3-4 seconds
> > and then reboots.
> >
> > carl:/sys/devices/system/cpu/cpu1 # echo 0 > online
> > carl:/sys/devices/system/cpu/cpu1 # dmesg
> > Breaking affinity for irq 219
> > CPU 1 is now offline
> > SMP alternatives: switching to UP code
> > carl:/sys/devices/system/cpu/cpu1 # echo 1 > online
> > Read from remote host carl: Connection reset by peer
> >
> > Booting with maxcpus=1 and resume works fine. Does this ring a bell with
> > anyone? With highmem enabled and the standard vmsplit, cpu hotplug works
> > fine for me.
>
> Some more clues - booting with noreplacement doesn't fix it, so I think
> the alternatives code is off the hook.

I don't think this adds any new information, but it has been open
awhile:

http://bugme.osdl.org/show_bug.cgi?id=6542

I was able to narrow it down to the vmsplit setting but I wasn't able
to debug it further.

2006-11-30 16:48:11

by Pavel Machek

[permalink] [raw]
Subject: Re: CPU hotplug broken with 2GB VMSPLIT

Hi!

> > Some more clues - booting with noreplacement doesn't fix it, so I think
> > the alternatives code is off the hook.
>
> I don't think this adds any new information, but it has been open
> awhile:
>
> http://bugme.osdl.org/show_bug.cgi?id=6542
>
> I was able to narrow it down to the vmsplit setting but I wasn't able
> to debug it further.

Can we at least add error return or something? Or make CPU_HOTPLUG
depend on "normal" VM split, or...? Having s2ram breaking for known
problem is annoying...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-30 17:21:09

by Nathan Lynch

[permalink] [raw]
Subject: Re: CPU hotplug broken with 2GB VMSPLIT

Nathan Lynch wrote:
> Jens Axboe wrote:
> > On Thu, Nov 30 2006, Jens Axboe wrote:
> > > Hi,
> > >
> > > Just got a new notebook (Lenovo X60), setup a custom kernel and then I
> > > noticed that suspend to ram doesn't work anymore. The machine suspends
> > > just fine, on resume it brings back the text display but reboots after
> > > it has stalled for a few seconds. On the suggestion of Pavel, I tried
> > > testing CPU hotplug, and indeed he was right: I can offline 1 of the
> > > cores fine, bringing it back online freezes the machine for 3-4 seconds
> > > and then reboots.
> > >
> > > carl:/sys/devices/system/cpu/cpu1 # echo 0 > online
> > > carl:/sys/devices/system/cpu/cpu1 # dmesg
> > > Breaking affinity for irq 219
> > > CPU 1 is now offline
> > > SMP alternatives: switching to UP code
> > > carl:/sys/devices/system/cpu/cpu1 # echo 1 > online
> > > Read from remote host carl: Connection reset by peer
> > >
> > > Booting with maxcpus=1 and resume works fine. Does this ring a bell with
> > > anyone? With highmem enabled and the standard vmsplit, cpu hotplug works
> > > fine for me.
> >
> > Some more clues - booting with noreplacement doesn't fix it, so I think
> > the alternatives code is off the hook.
>
> I don't think this adds any new information, but it has been open
> awhile:
>
> http://bugme.osdl.org/show_bug.cgi?id=6542
>
> I was able to narrow it down to the vmsplit setting but I wasn't able
> to debug it further.

Hmm, I'm pretty sure this is the same problem I reported in March,
there might be some more information in that thread:

http://marc.theaimsgroup.com/?t=114039363100002&r=1&w=1

but I didn't realize it was vmsplit-related at that time.

2006-12-01 02:49:13

by Shaohua Li

[permalink] [raw]
Subject: Re: CPU hotplug broken with 2GB VMSPLIT

On Fri, 2006-12-01 at 01:20 +0800, Nathan Lynch wrote:
> Nathan Lynch wrote:
> > Jens Axboe wrote:
> > > On Thu, Nov 30 2006, Jens Axboe wrote:
> > > > Hi,
> > > >
> > > > Just got a new notebook (Lenovo X60), setup a custom kernel and
> then I
> > > > noticed that suspend to ram doesn't work anymore. The machine
> suspends
> > > > just fine, on resume it brings back the text display but reboots
> after
> > > > it has stalled for a few seconds. On the suggestion of Pavel, I
> tried
> > > > testing CPU hotplug, and indeed he was right: I can offline 1 of
> the
> > > > cores fine, bringing it back online freezes the machine for 3-4
> seconds
> > > > and then reboots.
> > > >
> > > > carl:/sys/devices/system/cpu/cpu1 # echo 0 > online
> > > > carl:/sys/devices/system/cpu/cpu1 # dmesg
> > > > Breaking affinity for irq 219
> > > > CPU 1 is now offline
> > > > SMP alternatives: switching to UP code
> > > > carl:/sys/devices/system/cpu/cpu1 # echo 1 > online
> > > > Read from remote host carl: Connection reset by peer
> > > >
> > > > Booting with maxcpus=1 and resume works fine. Does this ring a
> bell with
> > > > anyone? With highmem enabled and the standard vmsplit, cpu
> hotplug works
> > > > fine for me.
> > >
> > > Some more clues - booting with noreplacement doesn't fix it, so I
> think
> > > the alternatives code is off the hook.
> >
> > I don't think this adds any new information, but it has been open
> > awhile:
> >
> > http://bugme.osdl.org/show_bug.cgi?id=6542
> >
> > I was able to narrow it down to the vmsplit setting but I wasn't
> able
> > to debug it further.
>
> Hmm, I'm pretty sure this is the same problem I reported in March,
> there might be some more information in that thread:
>
> http://marc.theaimsgroup.com/?t=114039363100002&r=1&w=1
>
> but I didn't realize it was vmsplit-related at that time.
Does this patch help?

Thanks,
Shaohua

Signed-off-by: Shaohua Li <[email protected]>

diff --git a/arch/i386/kernel/smpboot.c b/arch/i386/kernel/smpboot.c
index 4bb8b77..b5f562e 100644
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -1095,7 +1095,7 @@ static int __cpuinit __smp_prepare_cpu(i

/* init low mem mapping */
clone_pgd_range(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
- KERNEL_PGD_PTRS);
+ min_t(unsigned long, KERNEL_PGD_PTRS, USER_PGD_PTRS));
flush_tlb_all();
schedule_work(&task);
wait_for_completion(&done);

2006-12-01 07:12:15

by Jens Axboe

[permalink] [raw]
Subject: Re: CPU hotplug broken with 2GB VMSPLIT

On Fri, Dec 01 2006, Shaohua Li wrote:
> On Fri, 2006-12-01 at 01:20 +0800, Nathan Lynch wrote:
> > Nathan Lynch wrote:
> > > Jens Axboe wrote:
> > > > On Thu, Nov 30 2006, Jens Axboe wrote:
> > > > > Hi,
> > > > >
> > > > > Just got a new notebook (Lenovo X60), setup a custom kernel and
> > then I
> > > > > noticed that suspend to ram doesn't work anymore. The machine
> > suspends
> > > > > just fine, on resume it brings back the text display but reboots
> > after
> > > > > it has stalled for a few seconds. On the suggestion of Pavel, I
> > tried
> > > > > testing CPU hotplug, and indeed he was right: I can offline 1 of
> > the
> > > > > cores fine, bringing it back online freezes the machine for 3-4
> > seconds
> > > > > and then reboots.
> > > > >
> > > > > carl:/sys/devices/system/cpu/cpu1 # echo 0 > online
> > > > > carl:/sys/devices/system/cpu/cpu1 # dmesg
> > > > > Breaking affinity for irq 219
> > > > > CPU 1 is now offline
> > > > > SMP alternatives: switching to UP code
> > > > > carl:/sys/devices/system/cpu/cpu1 # echo 1 > online
> > > > > Read from remote host carl: Connection reset by peer
> > > > >
> > > > > Booting with maxcpus=1 and resume works fine. Does this ring a
> > bell with
> > > > > anyone? With highmem enabled and the standard vmsplit, cpu
> > hotplug works
> > > > > fine for me.
> > > >
> > > > Some more clues - booting with noreplacement doesn't fix it, so I
> > think
> > > > the alternatives code is off the hook.
> > >
> > > I don't think this adds any new information, but it has been open
> > > awhile:
> > >
> > > http://bugme.osdl.org/show_bug.cgi?id=6542
> > >
> > > I was able to narrow it down to the vmsplit setting but I wasn't
> > able
> > > to debug it further.
> >
> > Hmm, I'm pretty sure this is the same problem I reported in March,
> > there might be some more information in that thread:
> >
> > http://marc.theaimsgroup.com/?t=114039363100002&r=1&w=1
> >
> > but I didn't realize it was vmsplit-related at that time.
> Does this patch help?

It does! Booted a vmsplit 2G kernel again, and started with offlining
and onlining CPU1. Worked fine, that would previously reboot the
notebook. Then I did a suspend-to-ram -> resume cycle, worked fine as
well. Thanks!

--
Jens Axboe