2008-08-06 20:11:52

by Max.Krasnyansky

[permalink] [raw]
Subject: [PATCH] Resurect proper handling of maxcpus= kernel option

From: Max Krasnyansky <[email protected]>

For some reason we had redundant parsers registered for maxcpus=.
One in init/main.c and another in arch/x86/smpboot.c
So I nuked the one in arch/x86.

Also 64-bit kernels used to handle maxcpus= as documented in
Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized
but not booted. 32-bit version for some reason ignored them even though
all the infrastructure for booting them later is there.

In the current mainline both 64 and 32 bit versions are broken. I'm
too lazy to look through git history but I'm guessing it happened as
part of the i386 and x86_64 unification.

This patch restores the correct behaviour. I've tested x86_64 version on
4- and 8- way Core2 and 2-way Opteron based machines. Various config
combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG.
Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected.

I cannot test 32-bit version (no 32-bit machines here).

Signed-off-by: Max Krasnyansky <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kernel/apic_32.c | 8 --------
arch/x86/kernel/apic_64.c | 7 -------
arch/x86/kernel/smpboot.c | 14 --------------
3 files changed, 0 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index d6c8983..71d49cc 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -1454,8 +1454,6 @@ void disconnect_bsp_APIC(int virt_wire_setup)
}
}

-unsigned int __cpuinitdata maxcpus = NR_CPUS;
-
void __cpuinit generic_processor_info(int apicid, int version)
{
int cpu;
@@ -1482,12 +1480,6 @@ void __cpuinit generic_processor_info(int apicid, int version)
return;
}

- if (num_processors >= maxcpus) {
- printk(KERN_WARNING "WARNING: maxcpus limit of %i reached."
- " Processor ignored.\n", maxcpus);
- return;
- }
-
num_processors++;
cpus_complement(tmp_map, cpu_present_map);
cpu = first_cpu(tmp_map);
diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index 7f1f030..446c062 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -90,7 +90,6 @@ static unsigned long apic_phys;

unsigned long mp_lapic_addr;

-unsigned int __cpuinitdata maxcpus = NR_CPUS;
/*
* Get the LAPIC version
*/
@@ -1062,12 +1061,6 @@ void __cpuinit generic_processor_info(int apicid, int version)
return;
}

- if (num_processors >= maxcpus) {
- printk(KERN_WARNING "WARNING: maxcpus limit of %i reached."
- " Processor ignored.\n", maxcpus);
- return;
- }
-
num_processors++;
cpus_complement(tmp_map, cpu_present_map);
cpu = first_cpu(tmp_map);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3325127..4fcce79 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1370,17 +1370,3 @@ void __cpu_die(unsigned int cpu)
BUG();
}
#endif
-
-/*
- * If the BIOS enumerates physical processors before logical,
- * maxcpus=N at enumeration-time can be used to disable HT.
- */
-static int __init parse_maxcpus(char *arg)
-{
- extern unsigned int maxcpus;
-
- if (arg)
- maxcpus = simple_strtoul(arg, NULL, 0);
- return 0;
-}
-early_param("maxcpus", parse_maxcpus);
--
1.5.5.1


2008-08-06 20:23:44

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option

> From: Max Krasnyansky <[email protected]>
>
> For some reason we had redundant parsers registered for maxcpus=.
> One in init/main.c and another in arch/x86/smpboot.c
> So I nuked the one in arch/x86.
>
> Also 64-bit kernels used to handle maxcpus= as documented in
> Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized
> but not booted. 32-bit version for some reason ignored them even though
> all the infrastructure for booting them later is there.
>
> In the current mainline both 64 and 32 bit versions are broken. I'm
> too lazy to look through git history but I'm guessing it happened as
> part of the i386 and x86_64 unification.
>
> This patch restores the correct behaviour. I've tested x86_64 version on
> 4- and 8- way Core2 and 2-way Opteron based machines. Various config
> combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG.
> Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
>
> I cannot test 32-bit version (no 32-bit machines here).
>
> Signed-off-by: Max Krasnyansky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]

I just realized that I managed to screw up my From: email address in
git-send-email command line. Please reply to [email protected].

Max

2008-08-07 04:02:13

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option

[email protected] wrote:
> From: Max Krasnyansky <[email protected]>
>
> For some reason we had redundant parsers registered for maxcpus=.
> One in init/main.c and another in arch/x86/smpboot.c
> So I nuked the one in arch/x86.
>
> Also 64-bit kernels used to handle maxcpus= as documented in
> Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized
> but not booted. 32-bit version for some reason ignored them even though
> all the infrastructure for booting them later is there.
>
> In the current mainline both 64 and 32 bit versions are broken. I'm
> too lazy to look through git history but I'm guessing it happened as
> part of the i386 and x86_64 unification.
>
> This patch restores the correct behaviour. I've tested x86_64 version on
> 4- and 8- way Core2 and 2-way Opteron based machines. Various config
> combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG.
> Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
>
> I cannot test 32-bit version (no 32-bit machines here).
>

I booted my 2-core x86_32 box with maxcpus=1, and saw cpu1 was offline,
and then I got softlockup BUG immediately when I onlined cpu1:

SMP alternatives: switching to SMP code
CPU 1 irqstacks, hard=c078c000 soft=c076c000
Booting processor 1/1 ip 6000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5600.37 BogoMIPS (lpj=2800188)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Switched to high resolution mode on CPU 1
BUG: soft lockup - CPU#1 stuck for 216s! [events/0:0]
Modules linked in: bridge stp llc autofs4 dm_mirror dm_log dm_mod snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd soundcore r8169 snd_page_alloc sg button sata_sis pata_sis ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
irq event stamp: 156
hardirqs last enabled at (155): [<c044407f>] trace_hardirqs_on+0xb/0xd
hardirqs last disabled at (156): [<c04eee88>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last enabled at (152): [<c042c2f3>] __do_softirq+0xe3/0xe9
softirqs last disabled at (95): [<c04058eb>] do_softirq+0x65/0xb4

Pid: 0, comm: events/0 Not tainted (2.6.27-rc1 #224)
EIP: 0060:[<c04088ba>] EFLAGS: 00000246 CPU: 1
EIP is at mwait_idle+0x3c/0x4a
EAX: 00000000 EBX: e3e48008 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: e3e48f9c ESP: e3e48f98
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 00768000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c0402591>] cpu_idle+0xbf/0xdf
[<c05fb737>] start_secondary+0x16b/0x170
=======================


216s should be the time since the machine booted up.


(maybe off-topic)
I never succeed to offline cpu1, it caused the kernel to hang
whenver I offlined cpu1

2008-08-07 05:48:51

by Jeff Chua

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option

On Thu, Aug 7, 2008 at 12:00 PM, Li Zefan <[email protected]> wrote:

> I booted my 2-core x86_32 box with maxcpus=1, and saw cpu1 was offline,

Test on Lenovo X60s (CoreDuo) 2 CPUs with maxcpus=1, and Dell R900
(Xeon QuadCore x4) 16 CPUs with maxcpus=8.

All working. I can switch online and offline up to 2CPUs on X60s, and
16CPUs on R900.

> and then I got softlockup BUG immediately when I onlined cpu1:

I'm using latest linux kernel.


Thanks,
Jeff.

2008-08-07 17:21:26

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option

Li Zefan wrote:
> [email protected] wrote:
>> From: Max Krasnyansky <[email protected]>
>>
>> For some reason we had redundant parsers registered for maxcpus=.
>> One in init/main.c and another in arch/x86/smpboot.c
>> So I nuked the one in arch/x86.
>>
>> Also 64-bit kernels used to handle maxcpus= as documented in
>> Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized
>> but not booted. 32-bit version for some reason ignored them even though
>> all the infrastructure for booting them later is there.
>>
>> In the current mainline both 64 and 32 bit versions are broken. I'm
>> too lazy to look through git history but I'm guessing it happened as
>> part of the i386 and x86_64 unification.
>>
>> This patch restores the correct behaviour. I've tested x86_64 version on
>> 4- and 8- way Core2 and 2-way Opteron based machines. Various config
>> combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG.
>> Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
>>
>> I cannot test 32-bit version (no 32-bit machines here).
>>
>
> I booted my 2-core x86_32 box with maxcpus=1, and saw cpu1 was offline,
> and then I got softlockup BUG immediately when I onlined cpu1:
>
> SMP alternatives: switching to SMP code
> CPU 1 irqstacks, hard=c078c000 soft=c076c000
> Booting processor 1/1 ip 6000
> Initializing CPU#1
> Calibrating delay using timer specific routine.. 5600.37 BogoMIPS (lpj=2800188)
> CPU: Trace cache: 12K uops, L1 D cache: 16K
> CPU: L2 cache: 1024K
> CPU: Physical Processor ID: 0
> CPU: Processor Core ID: 1
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#1.
> CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
> CPU1: Thermal monitoring enabled
> CPU1: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
> checking TSC synchronization [CPU#0 -> CPU#1]: passed.
> Switched to high resolution mode on CPU 1
> BUG: soft lockup - CPU#1 stuck for 216s! [events/0:0]
> Modules linked in: bridge stp llc autofs4 dm_mirror dm_log dm_mod snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd soundcore r8169 snd_page_alloc sg button sata_sis pata_sis ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
> irq event stamp: 156
> hardirqs last enabled at (155): [<c044407f>] trace_hardirqs_on+0xb/0xd
> hardirqs last disabled at (156): [<c04eee88>] trace_hardirqs_off_thunk+0xc/0x10
> softirqs last enabled at (152): [<c042c2f3>] __do_softirq+0xe3/0xe9
> softirqs last disabled at (95): [<c04058eb>] do_softirq+0x65/0xb4
>
> Pid: 0, comm: events/0 Not tainted (2.6.27-rc1 #224)
> EIP: 0060:[<c04088ba>] EFLAGS: 00000246 CPU: 1
> EIP is at mwait_idle+0x3c/0x4a
> EAX: 00000000 EBX: e3e48008 ECX: 00000000 EDX: 00000000
> ESI: 00000000 EDI: 00000000 EBP: e3e48f9c ESP: e3e48f98
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: 00000000 CR3: 00768000 CR4: 000006d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> [<c0402591>] cpu_idle+0xbf/0xdf
> [<c05fb737>] start_secondary+0x16b/0x170
> =======================
>
>
> 216s should be the time since the machine booted up.
>
>
> (maybe off-topic)
> I never succeed to offline cpu1, it caused the kernel to hang
> whenver I offlined cpu1

This is unrelated to the patch that I sent. In fact looks like the patch
actually worked for you. In the sense that it did the right thing,
initialized cpus but did not boot them.

As far as the soft-lockup goes you might want to try different configs.
ie Disable features you do not need. For example cpusets hotplug path in
the current mainline is unsafe (the patch is in review). Also for me if
ftrace is enabled onlining a cpu causes immediate reboot. So I'd say
start disabling features and see which one cases the problem.

Max

2008-08-07 17:23:20

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option

Jeff Chua wrote:
> On Thu, Aug 7, 2008 at 12:00 PM, Li Zefan <[email protected]> wrote:
>
>> I booted my 2-core x86_32 box with maxcpus=1, and saw cpu1 was offline,
>
> Test on Lenovo X60s (CoreDuo) 2 CPUs with maxcpus=1, and Dell R900
> (Xeon QuadCore x4) 16 CPUs with maxcpus=8.
>
> All working. I can switch online and offline up to 2CPUs on X60s, and
> 16CPUs on R900.
>
>> and then I got softlockup BUG immediately when I onlined cpu1:
>
> I'm using latest linux kernel.

Excellent.

Andrew, Ingo, can one of you guys please push this patch to mainline.

Thanx
Max

2008-08-08 02:15:23

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option

Max Krasnyansky wrpte:
> Li Zefan wrote:
>> [email protected] wrote:
>>> From: Max Krasnyansky <[email protected]>
>>>
>>> For some reason we had redundant parsers registered for maxcpus=.
>>> One in init/main.c and another in arch/x86/smpboot.c
>>> So I nuked the one in arch/x86.
>>>
>>> Also 64-bit kernels used to handle maxcpus= as documented in
>>> Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized
>>> but not booted. 32-bit version for some reason ignored them even though
>>> all the infrastructure for booting them later is there.
>>>
>>> In the current mainline both 64 and 32 bit versions are broken. I'm
>>> too lazy to look through git history but I'm guessing it happened as
>>> part of the i386 and x86_64 unification.
>>>
>>> This patch restores the correct behaviour. I've tested x86_64 version on
>>> 4- and 8- way Core2 and 2-way Opteron based machines. Various config
>>> combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG.
>>> Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
>>>
>>> I cannot test 32-bit version (no 32-bit machines here).
>>>
>> I booted my 2-core x86_32 box with maxcpus=1, and saw cpu1 was offline,
>> and then I got softlockup BUG immediately when I onlined cpu1:
>>
>> SMP alternatives: switching to SMP code
>> CPU 1 irqstacks, hard=c078c000 soft=c076c000
>> Booting processor 1/1 ip 6000
>> Initializing CPU#1
>> Calibrating delay using timer specific routine.. 5600.37 BogoMIPS (lpj=2800188)
>> CPU: Trace cache: 12K uops, L1 D cache: 16K
>> CPU: L2 cache: 1024K
>> CPU: Physical Processor ID: 0
>> CPU: Processor Core ID: 1
>> Intel machine check architecture supported.
>> Intel machine check reporting enabled on CPU#1.
>> CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
>> CPU1: Thermal monitoring enabled
>> CPU1: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
>> checking TSC synchronization [CPU#0 -> CPU#1]: passed.
>> Switched to high resolution mode on CPU 1
>> BUG: soft lockup - CPU#1 stuck for 216s! [events/0:0]
>> Modules linked in: bridge stp llc autofs4 dm_mirror dm_log dm_mod snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd soundcore r8169 snd_page_alloc sg button sata_sis pata_sis ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
>> irq event stamp: 156
>> hardirqs last enabled at (155): [<c044407f>] trace_hardirqs_on+0xb/0xd
>> hardirqs last disabled at (156): [<c04eee88>] trace_hardirqs_off_thunk+0xc/0x10
>> softirqs last enabled at (152): [<c042c2f3>] __do_softirq+0xe3/0xe9
>> softirqs last disabled at (95): [<c04058eb>] do_softirq+0x65/0xb4
>>
>> Pid: 0, comm: events/0 Not tainted (2.6.27-rc1 #224)
>> EIP: 0060:[<c04088ba>] EFLAGS: 00000246 CPU: 1
>> EIP is at mwait_idle+0x3c/0x4a
>> EAX: 00000000 EBX: e3e48008 ECX: 00000000 EDX: 00000000
>> ESI: 00000000 EDI: 00000000 EBP: e3e48f9c ESP: e3e48f98
>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> CR0: 8005003b CR2: 00000000 CR3: 00768000 CR4: 000006d0
>> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> DR6: ffff0ff0 DR7: 00000400
>> [<c0402591>] cpu_idle+0xbf/0xdf
>> [<c05fb737>] start_secondary+0x16b/0x170
>> =======================
>>
>>
>> 216s should be the time since the machine booted up.
>>
>>
>> (maybe off-topic)
>> I never succeed to offline cpu1, it caused the kernel to hang
>> whenver I offlined cpu1
>
> This is unrelated to the patch that I sent. In fact looks like the patch
> actually worked for you. In the sense that it did the right thing,
> initialized cpus but did not boot them.
>
> As far as the soft-lockup goes you might want to try different configs.
> ie Disable features you do not need. For example cpusets hotplug path in
> the current mainline is unsafe (the patch is in review). Also for me if
> ftrace is enabled onlining a cpu causes immediate reboot. So I'd say
> start disabling features and see which one cases the problem.
>

Yes, the patch works for me, and the soft-lockup is another different issue.
Thx for the explanation. :)

2008-08-11 18:17:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option


* [email protected] <[email protected]> wrote:

> From: Max Krasnyansky <[email protected]>
>
> For some reason we had redundant parsers registered for maxcpus=. One
> in init/main.c and another in arch/x86/smpboot.c So I nuked the one in
> arch/x86.
>
> Also 64-bit kernels used to handle maxcpus= as documented in
> Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are
> initialized but not booted. 32-bit version for some reason ignored
> them even though all the infrastructure for booting them later is
> there.
>
> In the current mainline both 64 and 32 bit versions are broken. I'm
> too lazy to look through git history but I'm guessing it happened as
> part of the i386 and x86_64 unification.

yes in essence. 32-bit always had maxcpus as a hard restriction in the
number of CPUs. This got extended to 64-bit as well, via commit
89b08200ad:

x86: make x86_64 accept the max_cpus parameter

in v2.6.25. Two major kernel releases and nobody noticed - it's a rarely
used option.

> This patch restores the correct behaviour. I've tested x86_64 version
> on 4- and 8- way Core2 and 2-way Opteron based machines. Various
> config combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG. Booted with
> maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
>
> I cannot test 32-bit version (no 32-bit machines here).

This will need some test time on 32-bit as that is where this represents
a material change. ( albeit what matters most is the maxcpus=1
distinction - and for that nosmp can be used as well to turn off
multi-cpu support altogether. So we could do this in v2.6.27 as well. )

Also, a quick grep shows that your patch will very likely break the
visws build:

arch/x86/kernel/visws_quirks.c:extern unsigned int __cpuinitdata maxcpus;
arch/x86/kernel/visws_quirks.c: if (ncpus > maxcpus)
arch/x86/kernel/visws_quirks.c: ncpus = maxcpus;

could you please fix that?

Ingo

2008-08-11 18:28:51

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option



Ingo Molnar wrote:
> * [email protected] <[email protected]> wrote:
>
>> From: Max Krasnyansky <[email protected]>
>>
>> For some reason we had redundant parsers registered for maxcpus=. One
>> in init/main.c and another in arch/x86/smpboot.c So I nuked the one in
>> arch/x86.
>>
>> Also 64-bit kernels used to handle maxcpus= as documented in
>> Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are
>> initialized but not booted. 32-bit version for some reason ignored
>> them even though all the infrastructure for booting them later is
>> there.
>>
>> In the current mainline both 64 and 32 bit versions are broken. I'm
>> too lazy to look through git history but I'm guessing it happened as
>> part of the i386 and x86_64 unification.
>
> yes in essence. 32-bit always had maxcpus as a hard restriction in the
> number of CPUs. This got extended to 64-bit as well, via commit
> 89b08200ad:
>
> x86: make x86_64 accept the max_cpus parameter
>
> in v2.6.25. Two major kernel releases and nobody noticed - it's a rarely
> used option.
>
>> This patch restores the correct behaviour. I've tested x86_64 version
>> on 4- and 8- way Core2 and 2-way Opteron based machines. Various
>> config combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG. Booted with
>> maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
>>
>> I cannot test 32-bit version (no 32-bit machines here).
>
> This will need some test time on 32-bit as that is where this represents
> a material change. ( albeit what matters most is the maxcpus=1
> distinction - and for that nosmp can be used as well to turn off
> multi-cpu support altogether. So we could do this in v2.6.27 as well. )

So far we got a couple of reports that it works as expected on 32 (both laptop
and server/desktop).

> Also, a quick grep shows that your patch will very likely break the
> visws build:
>
> arch/x86/kernel/visws_quirks.c:extern unsigned int __cpuinitdata maxcpus;
> arch/x86/kernel/visws_quirks.c: if (ncpus > maxcpus)
> arch/x86/kernel/visws_quirks.c: ncpus = maxcpus;
>
> could you please fix that?

Oh. I missed that one. Will fix.

Max


2008-08-11 18:39:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option


* Max Krasnyansky <[email protected]> wrote:

> > This will need some test time on 32-bit as that is where this
> > represents a material change. ( albeit what matters most is the
> > maxcpus=1 distinction - and for that nosmp can be used as well to
> > turn off multi-cpu support altogether. So we could do this in
> > v2.6.27 as well. )
>
> So far we got a couple of reports that it works as expected on 32
> (both laptop and server/desktop).

Yes, but the usecase i'm worried about is when say maxcpus=1 was used to
_prevent_ an SMP bootup - because the system would not work otherwise.

i guess we want to tickle those systems anyway as that case is not
supposed to happen (and it can always be totally disabled via nosmp or
noapic).

So i'm not against your fix/change per se, i just wanted to highlight
that it has some impact on existing uses of maxcpus that is outside of
your cpu-hotplug usecase.

Ingo

2008-08-11 18:40:17

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option

Ingo Molnar wrote:
> * [email protected] <[email protected]> wrote:
>
>> From: Max Krasnyansky <[email protected]>
>>
>> For some reason we had redundant parsers registered for maxcpus=. One
>> in init/main.c and another in arch/x86/smpboot.c So I nuked the one in
>> arch/x86.
>>
>> Also 64-bit kernels used to handle maxcpus= as documented in
>> Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are
>> initialized but not booted. 32-bit version for some reason ignored
>> them even though all the infrastructure for booting them later is
>> there.
>>
>> In the current mainline both 64 and 32 bit versions are broken. I'm
>> too lazy to look through git history but I'm guessing it happened as
>> part of the i386 and x86_64 unification.
>
> yes in essence. 32-bit always had maxcpus as a hard restriction in the
> number of CPUs. This got extended to 64-bit as well, via commit
> 89b08200ad:
>
> x86: make x86_64 accept the max_cpus parameter
>
> in v2.6.25. Two major kernel releases and nobody noticed - it's a rarely
> used option.

btw I think it's rarely used because many people do not realize it's there.
There are at least a couple of use cases that came up recently.
- Busted cpu. You can boot the machine with maxcpus=1 and then bring up cpus
one by one to see which one is busted.
- Recently reported regression that 16cpu box booted fine with NRCPUS=8 but
failed with NRCPUS=16. Again we can boot with maxcpus=8 and bring other cpus
later to see when/where we fail.

Things like that.

Max

2008-08-11 18:47:10

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH] Resurect proper handling of maxcpus= kernel option



Ingo Molnar wrote:
> * Max Krasnyansky <[email protected]> wrote:
>
>>> This will need some test time on 32-bit as that is where this
>>> represents a material change. ( albeit what matters most is the
>>> maxcpus=1 distinction - and for that nosmp can be used as well to
>>> turn off multi-cpu support altogether. So we could do this in
>>> v2.6.27 as well. )
>> So far we got a couple of reports that it works as expected on 32
>> (both laptop and server/desktop).
>
> Yes, but the usecase i'm worried about is when say maxcpus=1 was used to
> _prevent_ an SMP bootup - because the system would not work otherwise.
>
> i guess we want to tickle those systems anyway as that case is not
> supposed to happen (and it can always be totally disabled via nosmp or
> noapic).
>
> So i'm not against your fix/change per se, i just wanted to highlight
> that it has some impact on existing uses of maxcpus that is outside of
> your cpu-hotplug usecase.

I see what you mean. I think it's fairly safe though since we do not actually
do much for the cpus that are not going to be brought online. Mainly just
setting cpu_*_map and initializing per cpu areas. If something is broken in
there we'd probably want to fix that asap anyway. And like you said nosmp does
the job too.

Max