Andrew, All,
So, I found the cause of that memory corruption I mentioned last night.
I was booting on the 16x w/ HT and I hadn't taken note that NR_CPUS now
defaults to 8. Whoops. Ends up there isn't any bounds checking when
bringing up the cpus, thus we overflow bios_cpu_apicid[].
Here's a quick patch that checks if num_processors has hit NR_CPUS.
Seems to fix it for me.
Let me know if you have any comments or suggestions.
thanks
-john
===== arch/i386/kernel/mpparse.c 1.49 vs edited =====
--- 1.49/arch/i386/kernel/mpparse.c Sun Aug 31 16:14:25 2003
+++ edited/arch/i386/kernel/mpparse.c Thu Sep 4 18:07:15 2003
@@ -167,6 +167,10 @@
boot_cpu_logical_apicid = apicid;
}
+ if (num_processors > NR_CPUS){
+ printk(KERN_WARNING "NR_CPUS limit of %i reached. Cannot boot CPU(apicid 0x%d).\n", NR_CPUS, m->mpc_apicid);
+ return;
+ }
num_processors++;
if (MAX_APICS - m->mpc_apicid <= 0) {
On Thu, 2003-09-04 at 18:27, john stultz wrote:
> Let me know if you have any comments or suggestions.
While you're at it, can we do this as well? 10 bucks says we'll keep
hitting this otherwise. I think Bill can manage to remember to change
it if he tries for a 64x NUMA-Q. The rest of us are too stupid most of
the time.
--
Dave Hansen
[email protected]
> On Thu, 2003-09-04 at 18:27, john stultz wrote:
>> Let me know if you have any comments or suggestions.
>
> While you're at it, can we do this as well? 10 bucks says we'll keep
> hitting this otherwise. I think Bill can manage to remember to change
> it if he tries for a 64x NUMA-Q. The rest of us are too stupid most of
> the time.
Yeah, good plan.
Thanks for that,
M.
> + if (num_processors > NR_CPUS){
> + printk(KERN_WARNING "NR_CPUS limit of %i reached. Cannot boot CPU(apicid 0x%d).\n", NR_CPUS, m->mpc_apicid);
I'm no expert in this field at all, but doesnt this need to check for '>=' ?
Also, the code following the check could get some reordering for
readability. How about the following:
--- linux-2.6.0-test4/arch/i386/kernel/mpparse.c.orig Fri Sep 5 09:40:07 2003
+++ linux-2.6.0-test4/arch/i386/kernel/mpparse.c Fri Sep 5 09:50:11 2003
@@ -167,15 +167,18 @@
boot_cpu_logical_apicid = apicid;
}
- num_processors++;
-
if (MAX_APICS - m->mpc_apicid <= 0) {
printk(KERN_WARNING "Processor #%d INVALID. (Max ID: %d).\n",
m->mpc_apicid, MAX_APICS);
- --num_processors;
return;
}
ver = m->mpc_apicver;
+
+ if (num_processors >= NR_CPUS){
+ printk(KERN_WARNING "NR_CPUS limit of %i reached. Cannot boot CPU(apicid 0x%d).\n", NR_CPUS, m->mpc_apicid);
+ return;
+ }
+ num_processors++;
tmp = apicid_to_cpu_present(apicid);
physids_or(phys_cpu_present_map, phys_cpu_present_map, tmp);
Tim