2002-03-14 17:25:18

by Mikael Pettersson

[permalink] [raw]
Subject: [PATCH] boot_cpu_data corruption on SMP x86

The patch below eliminates a case of boot_cpu_data corruption
on SMP x86 machines. This was first observed on SMP Athlons,
but it also affects SMP Intel boxes in a less serious way.

When the secondary processors boot and execute head.S:checkCPUtype,
the code performs a 32-bit write of a small constant to the
byte-sized variable boot_cpu_data.x86 (X86 in head.S). Since the
write is 32-bit, it also writes zeros to the following 3 bytes,
which clobbers the x86_vendor, x86_model, and x86_mask fields
previously set up by check_bugs()'s call to identify_cpu().
Thus, after smp_init(), boot_cpu_data will _always_ identify
the CPU as an Intel (X86_VENDOR_INTEL == 0 in processor.h) with
model 0 and stepping 0.

The effect in standard kernels is not catastrophic, since:
(a) most SMP x86 boxes are Intel
(b) most uses of x86_vendor occur before smp_init() or reference
the SMP cpu_data[] array
(c) most post-boot references to boot_cpu_data occur in the
cpu_has_XXX macros which only read the x86_capability[] array
However, third-party extensions (like my x86 performance-monitoring
conters driver) can get seriously confused by this mis-identification.

The patch is for 2.4.19-pre3, but it also applies to 2.5.6 and
2.2.21rc1. Please apply.

/Mikael

--- linux-2.4.19-pre3/arch/i386/kernel/head.S.~1~ Tue Feb 26 13:26:56 2002
+++ linux-2.4.19-pre3/arch/i386/kernel/head.S Thu Mar 14 16:20:57 2002
@@ -178,7 +178,7 @@
* we don't need to preserve eflags.
*/

- movl $3,X86 # at least 386
+ movb $3,X86 # at least 386
pushfl # push EFLAGS
popl %eax # get EFLAGS
movl %eax,%ecx # save original EFLAGS
@@ -191,7 +191,7 @@
andl $0x40000,%eax # check if AC bit changed
je is386

- movl $4,X86 # at least 486
+ movb $4,X86 # at least 486
movl %ecx,%eax
xorl $0x200000,%eax # check ID flag
pushl %eax


2002-03-21 20:05:35

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] boot_cpu_data corruption on SMP x86

On Thu, 14 Mar 2002, Mikael Pettersson wrote:

> --- linux-2.4.19-pre3/arch/i386/kernel/head.S.~1~ Tue Feb 26 13:26:56 2002
> +++ linux-2.4.19-pre3/arch/i386/kernel/head.S Thu Mar 14 16:20:57 2002
> @@ -178,7 +178,7 @@
> * we don't need to preserve eflags.
> */
>
> - movl $3,X86 # at least 386
> + movb $3,X86 # at least 386
> pushfl # push EFLAGS
> popl %eax # get EFLAGS
> movl %eax,%ecx # save original EFLAGS
> @@ -191,7 +191,7 @@
> andl $0x40000,%eax # check if AC bit changed
> je is386
>
> - movl $4,X86 # at least 486
> + movb $4,X86 # at least 486
> movl %ecx,%eax
> xorl $0x200000,%eax # check ID flag
> pushl %eax

This is broken -- these word stores assure a proper initialization on
pre-CPUID processors.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-03-21 21:17:56

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] boot_cpu_data corruption on SMP x86


On Thu, 21 Mar 2002 21:01:39 +0100 (MET), Maciej W. Rozycki wrote:
>On Thu, 14 Mar 2002, Mikael Pettersson wrote:
>
>> --- linux-2.4.19-pre3/arch/i386/kernel/head.S.~1~ Tue Feb 26 13:26:56 2002
>> +++ linux-2.4.19-pre3/arch/i386/kernel/head.S Thu Mar 14 16:20:57 2002
>> @@ -178,7 +178,7 @@
>> * we don't need to preserve eflags.
>> */
>>
>> - movl $3,X86 # at least 386
>> + movb $3,X86 # at least 386
>...
>
> This is broken -- these word stores assure a proper initialization on
>pre-CPUID processors.

boot_cpu_data is a static-extent object with an explicit initialiser
(i.e., ".data") in setup.c in 2.2.21rc2, 2.4.19-pre4, and 2.5.7.
Any further "initialisation" by APs is called "clobbering".

/Mikael

2002-03-21 23:39:36

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] boot_cpu_data corruption on SMP x86

On Thu, 21 Mar 2002, Mikael Pettersson wrote:

> > This is broken -- these word stores assure a proper initialization on
> >pre-CPUID processors.
>
> boot_cpu_data is a static-extent object with an explicit initialiser
> (i.e., ".data") in setup.c in 2.2.21rc2, 2.4.19-pre4, and 2.5.7.
> Any further "initialisation" by APs is called "clobbering".

boot_cpu_data is initialized and then copied to cpu_data for each CPU
booted. If say the BSP supports cpuid but an AP does not (possible for an
i486 setup), leftover values will be stored for the AP incorrectly.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-03-22 07:58:51

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] boot_cpu_data corruption on SMP x86

On Fri, Mar 22, 2002 at 12:36:02AM +0100, Maciej W. Rozycki wrote:
> If say the BSP supports cpuid but an AP does not (possible for an
> i486 setup)

It's also possible on any SMP aware system, but with the warning
"you use asymetric CPUs, you get to keep the pieces". I don't recall
486's being any exception to this rule.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-03-22 14:27:45

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] boot_cpu_data corruption on SMP x86

On Fri, 22 Mar 2002, Dave Jones wrote:

> > If say the BSP supports cpuid but an AP does not (possible for an
> > i486 setup)
>
> It's also possible on any SMP aware system, but with the warning

Nope, anything that provides cpuid will update the model and the stepping
correctly.

> "you use asymetric CPUs, you get to keep the pieces". I don't recall
> 486's being any exception to this rule.

Cpuid vs non-cpuid is a non-issue for the i486 -- the glue logic is
external as well as APICs and we don't care about the SMM, so no need to
unsupport it explicitly.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-03-22 14:46:26

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] boot_cpu_data corruption on SMP x86

On Fri, 22 Mar 2002, Maciej W. Rozycki wrote:

> On Fri, 22 Mar 2002, Dave Jones wrote:
>
> > > If say the BSP supports cpuid but an AP does not (possible for an
> > > i486 setup)
> >
> > It's also possible on any SMP aware system, but with the warning
>
> Nope, anything that provides cpuid will update the model and the stepping
> correctly.
>
> > "you use asymetric CPUs, you get to keep the pieces". I don't recall
> > 486's being any exception to this rule.
>
> Cpuid vs non-cpuid is a non-issue for the i486 -- the glue logic is
> external as well as APICs and we don't care about the SMM, so no need to
> unsupport it explicitly.
>

FYI, the "fix" to make Windows/2000/Professional survive more than
a day before requiring a low-level format and re-install of everything,
was to remove the second CPU from a 2 - CPU system that ran for two
years without errors under Linux. Linux may have a race-or two, but
it certainly does a very good job with SMP, something that M$ will
apparently never do.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.