The change to mm/slab.c between 2.6.6-rc2-bk4 and -bk5
broke x86-64 SMP. The symptoms are general protection
faults in __switch_to shortly after init starts, and
then the machine is dead. (Can't be more specific, my
box can't log early boot oopses.)
I'm only seeing this with x86-64 SMP; x86-64 UP and i386
SMP on the same machine (Athlon64 UP) have no problems.
Reverting 2.6.6-rc2-bk5's change to mm/slab.c eliminates
the problem.
/Mikael
> The change to mm/slab.c between 2.6.6-rc2-bk4 and -bk5
> broke x86-64 SMP. The symptoms are general protection
> faults in __switch_to shortly after init starts, and
> then the machine is dead. (Can't be more specific, my
> box can't log early boot oopses.)
>
> I'm only seeing this with x86-64 SMP; x86-64 UP and i386
> SMP on the same machine (Athlon64 UP) have no problems.
FWIW, this sure looks a lot like the boot-time crash I'm seeing; I get the
same __switch_to oopses once init starts. *But* I'm running a UP,
no-preempt kernel. And I get it with -rc1 as well. Might reverting the
later slab change be concealing a different problem?
jon
Mikael Pettersson <[email protected]> wrote:
>
> The change to mm/slab.c between 2.6.6-rc2-bk4 and -bk5
> broke x86-64 SMP. The symptoms are general protection
> faults in __switch_to shortly after init starts, and
> then the machine is dead. (Can't be more specific, my
> box can't log early boot oopses.)
>
> I'm only seeing this with x86-64 SMP; x86-64 UP and i386
> SMP on the same machine (Athlon64 UP) have no problems.
>
> Reverting 2.6.6-rc2-bk5's change to mm/slab.c eliminates
> the problem.
The "-bk5" terminology doesn't mean much to people who use bitkeeper or who
use http://www.kernel.org/pub/linux/kernel/v2.5/testing/cset/ - I assume
you refer to the alignment changes?
Does this fix?
diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
--- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
+++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
@@ -20,6 +20,8 @@
#include <asm/mmsegment.h>
#include <linux/personality.h>
+#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
+
#define TF_MASK 0x00000100
#define IF_MASK 0x00000200
#define IOPL_MASK 0x00003000
_
On Friday 30 of April 2004 20:27, Andrew Morton wrote:
> Mikael Pettersson <[email protected]> wrote:
> > The change to mm/slab.c between 2.6.6-rc2-bk4 and -bk5
> > broke x86-64 SMP. The symptoms are general protection
> > faults in __switch_to shortly after init starts, and
> > then the machine is dead. (Can't be more specific, my
> > box can't log early boot oopses.)
> >
> > I'm only seeing this with x86-64 SMP; x86-64 UP and i386
> > SMP on the same machine (Athlon64 UP) have no problems.
> >
> > Reverting 2.6.6-rc2-bk5's change to mm/slab.c eliminates
> > the problem.
>
> The "-bk5" terminology doesn't mean much to people who use bitkeeper or who
> use http://www.kernel.org/pub/linux/kernel/v2.5/testing/cset/ - I assume
> you refer to the alignment changes?
>
> Does this fix?
>
> diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
> --- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
> +++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
> @@ -20,6 +20,8 @@
> #include <asm/mmsegment.h>
> #include <linux/personality.h>
>
> +#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> +
> #define TF_MASK 0x00000100
> #define IF_MASK 0x00000200
> #define IOPL_MASK 0x00003000
>
AFAICS, yes, it does. :-)
I'm now (happily) running 2.6.6-rc3 on a dual-Opteron box.
RJW
> Does this fix?
>
> diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
> --- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
> +++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
> @@ -20,6 +20,8 @@
> #include <asm/mmsegment.h>
> #include <linux/personality.h>
>
> +#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> +
That made my x86_64 boot problem go away; with that patch the system comes
up just fine.
Now I have weird display problems with my Radeon card instead. Ever seen X
running 100% in kernel space, unkillable?
jon
Jonathan Corbet
Executive editor, LWN.net
[email protected]
"R. J. Wysocki" <[email protected]> wrote:
>
> > Does this fix?
> >
> > diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
> > --- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
> > +++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
> > @@ -20,6 +20,8 @@
> > #include <asm/mmsegment.h>
> > #include <linux/personality.h>
> >
> > +#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> > +
> > #define TF_MASK 0x00000100
> > #define IF_MASK 0x00000200
> > #define IOPL_MASK 0x00003000
> >
>
> AFAICS, yes, it does. :-)
> I'm now (happily) running 2.6.6-rc3 on a dual-Opteron box.
OK, thanks. I suspect that change has broken other architectures for the
same reason.
I think I'll just change the default:
diff -puN kernel/fork.c~task-struct-alignment-fix kernel/fork.c
--- 25/kernel/fork.c~task-struct-alignment-fix Fri Apr 30 13:22:24 2004
+++ 25-akpm/kernel/fork.c Fri Apr 30 13:22:36 2004
@@ -211,7 +211,7 @@ void __init fork_init(unsigned long memp
{
#ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR
#ifndef ARCH_MIN_TASKALIGN
-#define ARCH_MIN_TASKALIGN 0
+#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
#endif
/* create a slab on which task_structs can be allocated */
task_struct_cachep =
_
[email protected] (Jonathan Corbet) wrote:
>
> > Does this fix?
> >
> > diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
> > --- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
> > +++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
> > @@ -20,6 +20,8 @@
> > #include <asm/mmsegment.h>
> > #include <linux/personality.h>
> >
> > +#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> > +
>
> That made my x86_64 boot problem go away; with that patch the system comes
> up just fine.
OK, thanks. It broke parisc too...
> Now I have weird display problems with my Radeon card instead. Ever seen X
> running 100% in kernel space, unkillable?
I did, about a year ago. It was spinning madly in some ioctl waiting for a
bit in a device register to change state. Are you able to generate a
kernel profile while it's being silly? That will tell us where it's stuck.
> Does this fix?
>
> diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
> --- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
> +++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
> @@ -20,6 +20,8 @@
> #include <asm/mmsegment.h>
> #include <linux/personality.h>
>
> +#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
16 should be enough actually. The problem is the FXSAVE instruction that
is used to switch the FPU state, and that only requires 16 byte alignment.
-Andi
Andi Kleen <[email protected]> wrote:
>
> > Does this fix?
> >
> > diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
> > --- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
> > +++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
> > @@ -20,6 +20,8 @@
> > #include <asm/mmsegment.h>
> > #include <linux/personality.h>
> >
> > +#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
>
> 16 should be enough actually. The problem is the FXSAVE instruction that
> is used to switch the FPU state, and that only requires 16 byte alignment.
>
yup. I sent Linus the patch which changes the default from 0 to
L1_CACHE_SIZE in kernel/fork.c. x86_64 can override that by setting
ARCH_MIN_TASKALIGN to 16 in asm/processor.h
On Fri, Apr 30, 2004 at 07:01:02PM -0700, Andrew Morton wrote:
> Andi Kleen <[email protected]> wrote:
> >
> > > Does this fix?
> > >
> > > diff -puN include/asm-x86_64/processor.h~a include/asm-x86_64/processor.h
> > > --- 25/include/asm-x86_64/processor.h~a Fri Apr 30 11:24:58 2004
> > > +++ 25-akpm/include/asm-x86_64/processor.h Fri Apr 30 11:25:28 2004
> > > @@ -20,6 +20,8 @@
> > > #include <asm/mmsegment.h>
> > > #include <linux/personality.h>
> > >
> > > +#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> >
> > 16 should be enough actually. The problem is the FXSAVE instruction that
> > is used to switch the FPU state, and that only requires 16 byte alignment.
> >
>
> yup. I sent Linus the patch which changes the default from 0 to
> L1_CACHE_SIZE in kernel/fork.c. x86_64 can override that by setting
> ARCH_MIN_TASKALIGN to 16 in asm/processor.h
Ok, I will change it in my next patchkit.
For i386 it is the same - 16 should be enough.
-Andi