2003-09-05 01:33:20

by john stultz

[permalink] [raw]
Subject: [RFC] NR_CPUS=8 on a 32 cpu box

Andrew, All,

So, I found the cause of that memory corruption I mentioned last night.
I was booting on the 16x w/ HT and I hadn't taken note that NR_CPUS now
defaults to 8. Whoops. Ends up there isn't any bounds checking when
bringing up the cpus, thus we overflow bios_cpu_apicid[].

Here's a quick patch that checks if num_processors has hit NR_CPUS.
Seems to fix it for me.

Let me know if you have any comments or suggestions.

thanks
-john

===== arch/i386/kernel/mpparse.c 1.49 vs edited =====
--- 1.49/arch/i386/kernel/mpparse.c Sun Aug 31 16:14:25 2003
+++ edited/arch/i386/kernel/mpparse.c Thu Sep 4 18:07:15 2003
@@ -167,6 +167,10 @@
boot_cpu_logical_apicid = apicid;
}

+ if (num_processors > NR_CPUS){
+ printk(KERN_WARNING "NR_CPUS limit of %i reached. Cannot boot CPU(apicid 0x%d).\n", NR_CPUS, m->mpc_apicid);
+ return;
+ }
num_processors++;

if (MAX_APICS - m->mpc_apicid <= 0) {




2003-09-05 01:48:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC] NR_CPUS=8 on a 32 cpu box

On Thu, 2003-09-04 at 18:27, john stultz wrote:
> Let me know if you have any comments or suggestions.

While you're at it, can we do this as well? 10 bucks says we'll keep
hitting this otherwise. I think Bill can manage to remember to change
it if he tries for a 64x NUMA-Q. The rest of us are too stupid most of
the time.

--
Dave Hansen
[email protected]


Attachments:
nr_cpus_kconfig-2.6.0-test4+bk-0.patch (371.00 B)

2003-09-05 02:30:35

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [RFC] NR_CPUS=8 on a 32 cpu box

> On Thu, 2003-09-04 at 18:27, john stultz wrote:
>> Let me know if you have any comments or suggestions.
>
> While you're at it, can we do this as well? 10 bucks says we'll keep
> hitting this otherwise. I think Bill can manage to remember to change
> it if he tries for a 64x NUMA-Q. The rest of us are too stupid most of
> the time.

Yeah, good plan.

Thanks for that,

M.

2003-09-05 07:52:31

by Tim Schmielau

[permalink] [raw]
Subject: Re: [RFC] NR_CPUS=8 on a 32 cpu box

> + if (num_processors > NR_CPUS){
> + printk(KERN_WARNING "NR_CPUS limit of %i reached. Cannot boot CPU(apicid 0x%d).\n", NR_CPUS, m->mpc_apicid);

I'm no expert in this field at all, but doesnt this need to check for '>=' ?

Also, the code following the check could get some reordering for
readability. How about the following:

--- linux-2.6.0-test4/arch/i386/kernel/mpparse.c.orig Fri Sep 5 09:40:07 2003
+++ linux-2.6.0-test4/arch/i386/kernel/mpparse.c Fri Sep 5 09:50:11 2003
@@ -167,15 +167,18 @@
boot_cpu_logical_apicid = apicid;
}

- num_processors++;
-
if (MAX_APICS - m->mpc_apicid <= 0) {
printk(KERN_WARNING "Processor #%d INVALID. (Max ID: %d).\n",
m->mpc_apicid, MAX_APICS);
- --num_processors;
return;
}
ver = m->mpc_apicver;
+
+ if (num_processors >= NR_CPUS){
+ printk(KERN_WARNING "NR_CPUS limit of %i reached. Cannot boot CPU(apicid 0x%d).\n", NR_CPUS, m->mpc_apicid);
+ return;
+ }
+ num_processors++;

tmp = apicid_to_cpu_present(apicid);
physids_or(phys_cpu_present_map, phys_cpu_present_map, tmp);

Tim