2003-09-30 07:38:54

by Jamie Lokier

[permalink] [raw]
Subject: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

This builds upon Andi Kleen's excellent patch for the AMD prefetch bug
workaround. It is applies to 2.6.0-test6.

There are four changes on top of Andi's recent patch:

1. The workaround code is only included when compiling a kernel
optimised for AMD processors with the bug. For kernels optimised
for a different processor, there is no code size overhead.

This will make some people happier.

It will make some people unhappier, because it means a generic
kernel on an AMD will run fine, but won't fixup _userspace_
prefetch faults. More on this below.

2. Nevertheless, when a kernel _not_ optimised for those AMD processors
is run on one, the "alternative" mechanism now correctly replaces
prefetch instructions with nops.

3. The is_prefetch() instruction decoder now handles all cases of
funny GDT and LDT segments, checks limits and so on. As far as I
can see, there are no further bugs or flaws in that part.

4. A consequence of the is_prefetch() change is that, despite the
longer C code (rarely used code, for segments), is_prefetch()
should run marginally faster now because the access_ok() calls are
not required. They're subsumed into the segment limit
calculation.

Like Andi's patch, this removes 10k or so from current x86 kernels by
removing the silly conditional from the prefetch() function.

Controversial point coming up!

It's a reasonable argument that if a kernel version, say 2.6.0, promises
to hide the errata from _userspace_ programs, then the generic or
not-AMD-optimised kernels should do that too. Otherwise userspace may
as well provide the workaround itself, by catching SIGSEGV - because it
is perfectly normal and common to run a generic kernel on an AMD.

If userspace has to do that, then there's no point in the kernel having
the fixup at all - it may as well use __ex_table entries for the
kernel-space prefetch instructions, and give up on hiding the processor
bug from userspace. That would be much simpler than any of the patches
so far.

Arguably the bug has been around for years already (all K7's), so the
rare userspace programs which use prefetch should be aware of this already.

Something to think about.

Here is Andi's comment from the patch this is derived from:

Here is a new version of the Athlon/Opteron prefetch issue workaround
for 2.6.0test6. The issue was hit regularly by the 2.6 kernel
while doing prefetches on NULL terminated hlists.

These CPUs sometimes generate illegal exception for prefetch
instructions. The operating system can work around this by checking
if the faulting instruction is a prefetch and ignoring it when
it's the case.

The code is structured carefully to ensure that the page fault
will never recurse more than once. Also unmapped EIPs are special
cased to give more friendly oopses when the kernel jumps to
unmapped addresses.

It removes the previous dumb in kernel workaround for this and shrinks the
kernel by >10k.

Small behaviour change is that a SIGBUS fault for a *_user access will
cause an EFAULT now, no SIGBUS.


I'm running this now, on my dual AMD Athlon, and it's working fine. (It
works when I comment out the fast path and force the full segment check,
before you ask ;)

Enjoy,
-- Jamie

diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/Kconfig dual-2.6.0-test6/arch/i386/Kconfig
--- orig-2.6.0-test6/arch/i386/Kconfig 2003-09-30 05:39:33.467103692 +0100
+++ dual-2.6.0-test6/arch/i386/Kconfig 2003-09-30 06:05:19.868623767 +0100
@@ -397,6 +397,12 @@
depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
default y

+config X86_PREFETCH_FIXUP
+ bool
+ depends on MK7 || MK8
+ default y
+
+
config HPET_TIMER
bool "HPET Timer Support"
help
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/kernel/cpu/common.c dual-2.6.0-test6/arch/i386/kernel/cpu/common.c
--- orig-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 05:39:33.871035696 +0100
+++ dual-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 06:03:46.222851345 +0100
@@ -327,6 +327,17 @@
* we do "generic changes."
*/

+ /* Prefetch works ok? */
+#ifndef CONFIG_X86_PREFETCH_FIXUP
+ if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
+#endif
+ {
+ if (cpu_has(c, X86_FEATURE_XMM))
+ set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
+ if (cpu_has(c, X86_FEATURE_3DNOW))
+ set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
+ }
+
/* TSC disabled? */
if ( tsc_disable )
clear_bit(X86_FEATURE_TSC, c->x86_capability);
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/mm/fault.c dual-2.6.0-test6/arch/i386/mm/fault.c
--- orig-2.6.0-test6/arch/i386/mm/fault.c 2003-07-08 21:31:09.000000000 +0100
+++ dual-2.6.0-test6/arch/i386/mm/fault.c 2003-09-30 06:03:46.284839268 +0100
@@ -55,6 +55,157 @@
console_loglevel = loglevel_save;
}

+#ifdef CONFIG_X86_PREFETCH_FIXUP
+/*
+ * Return EIP plus the CS segment base. The segment limit is also
+ * adjusted, clamped to the kernel/user address space (whichever is
+ * appropriate), and returned in *eip_limit.
+ *
+ * The segment is checked, because it might have been changed by another
+ * task between the original faulting instruction and here.
+ *
+ * If CS is no longer a valid code segment, or if EIP is beyond the
+ * limit, or if it is a kernel address when CS is not a kernel segment,
+ * then the returned value will be greater than *eip_limit.
+ */
+static inline unsigned long get_segment_eip(struct pt_regs *regs,
+ unsigned long *eip_limit)
+{
+ unsigned long eip = regs->eip;
+ unsigned seg = regs->xcs & 0xffff;
+ u32 seg_ar, seg_limit, base, *desc;
+
+ /* The standard kernel/user address space limit. */
+ *eip_limit = (seg & 3) ? USER_DS.seg : KERNEL_DS.seg;
+
+ /* Unlikely, but must come before segment checks. */
+ if (unlikely((regs->eflags & VM_MASK) != 0))
+ return eip + (seg << 4);
+
+ /* By far the commonest cases. */
+ if (likely(seg == __USER_CS || seg == __KERNEL_CS))
+ return eip;
+
+ /* Check the segment exists, is within the current LDT/GDT size,
+ that kernel/user (ring 0..3) has the appropriate privelege,
+ that it's a code segment, and get the limit. */
+ __asm__ ("larl %3,%0; lsll %3,%1"
+ : "=&r" (seg_ar), "=r" (seg_limit) : "0" (0), "rm" (seg));
+ if ((~seg_ar & 0x9800) || eip > seg_limit) {
+ *eip_limit = 0;
+ return 1; /* So that returned eip > *eip_limit. */
+ }
+
+ /* Get the GDT/LDT descriptor base. */
+ if (seg & (1<<2)) {
+ /* Must lock the LDT while reading it. */
+ down(&current->mm->context.sem);
+ desc = current->mm->context.ldt;
+ } else {
+ /* Must disable preemption while reading the GDT. */
+ desc = (u32 *)&cpu_gdt_table[get_cpu()];
+ }
+ desc = (void *)desc + (seg & ~7);
+ base = (desc[0] >> 16) |
+ ((desc[1] & 0xff) << 16) |
+ (desc[1] & 0xff000000);
+ if (seg & (1<<2))
+ up(&current->mm->context.sem);
+ else
+ put_cpu();
+
+ /* Adjust EIP and segment limit, and clamp at the kernel limit.
+ It's legitimate for segments to wrap at 0xffffffff. */
+ seg_limit += base;
+ if (seg_limit < *eip_limit && seg_limit >= base)
+ *eip_limit = seg_limit;
+ return eip + base;
+}
+
+/*
+ * Sometimes AMD Athlon/Opteron CPUs report invalid exceptions on prefetch.
+ * Check that here and ignore it.
+ */
+static int __is_prefetch(struct pt_regs *regs, unsigned long addr)
+{
+ unsigned long limit;
+ unsigned long instr = get_segment_eip (regs, &limit);
+ int scan_more = 1;
+ int prefetch = 0;
+ int i;
+
+ /*
+ * Avoid recursive faults. This catches the kernel jumping to nirvana.
+ * More complicated races with unmapped EIP are handled elsewhere for
+ * user space.
+ */
+ if (instr == addr)
+ return 0;
+
+ for (i = 0; scan_more && i < 15; i++) {
+ unsigned char opcode;
+ unsigned char instr_hi;
+ unsigned char instr_lo;
+
+ if (instr > limit)
+ break;
+ if (__get_user(opcode, (unsigned char *) instr))
+ break;
+
+ instr_hi = opcode & 0xf0;
+ instr_lo = opcode & 0x0f;
+ instr++;
+
+ switch (instr_hi) {
+ case 0x20:
+ case 0x30:
+ /* Values 0x26,0x2E,0x36,0x3E are valid x86 prefixes. */
+ scan_more = ((instr_lo & 7) == 0x6);
+ break;
+
+ case 0x60:
+ /* 0x64 thru 0x67 are valid prefixes in all modes. */
+ scan_more = (instr_lo & 0xC) == 0x4;
+ break;
+ case 0xF0:
+ /* 0xF0, 0xF2, and 0xF3 are valid prefixes in all modes. */
+ scan_more = !instr_lo || (instr_lo>>1) == 1;
+ break;
+ case 0x00:
+ /* Prefetch instruction is 0x0F0D or 0x0F18 */
+ scan_more = 0;
+ if (instr > limit)
+ break;
+ if (__get_user(opcode, (unsigned char *) instr))
+ break;
+ prefetch = (instr_lo == 0xF) &&
+ (opcode == 0x0D || opcode == 0x18);
+ break;
+ default:
+ scan_more = 0;
+ break;
+ }
+ }
+
+ return prefetch;
+}
+
+static inline int is_prefetch(struct pt_regs *regs, unsigned long addr)
+{
+ /* This code is included only when optimising for AMD, so we
+ may as well bias in favour of it. */
+ if (likely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86 >= 6))
+ return __is_prefetch(regs, addr);
+ return 0;
+}
+
+#else /* CONFIG_X86_PREFETCH_FIXUP */
+
+#define is_prefetch(regs, addr) (0)
+
+#endif
+
asmlinkage void do_invalid_op(struct pt_regs *, unsigned long);

/*
@@ -110,7 +261,7 @@
* atomic region then we must not take the fault..
*/
if (in_atomic() || !mm)
- goto no_context;
+ goto bad_area_nosemaphore;

down_read(&mm->mmap_sem);

@@ -198,8 +349,16 @@
bad_area:
up_read(&mm->mmap_sem);

+bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
+ /*
+ * Valid to do another page fault here because this one came
+ * from user space.
+ */
+ if (is_prefetch(regs, address))
+ return;
+
tsk->thread.cr2 = address;
tsk->thread.error_code = error_code;
tsk->thread.trap_no = 14;
@@ -232,6 +391,14 @@
if (fixup_exception(regs))
return;

+ /*
+ * Valid to do another page fault here, because if this fault
+ * had been triggered by is_prefetch fixup_exception would have
+ * handled it.
+ */
+ if (is_prefetch(regs, address))
+ return;
+
/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
@@ -286,10 +453,14 @@
do_sigbus:
up_read(&mm->mmap_sem);

- /*
- * Send a sigbus, regardless of whether we were in kernel
- * or user mode.
- */
+ /* Kernel mode? Handle exceptions or die */
+ if (!(error_code & 4))
+ goto no_context;
+
+ /* User space => ok to do another page fault */
+ if (is_prefetch(regs, address))
+ return;
+
tsk->thread.cr2 = address;
tsk->thread.error_code = error_code;
tsk->thread.trap_no = 14;
@@ -298,10 +469,6 @@
info.si_code = BUS_ADRERR;
info.si_addr = (void *)address;
force_sig_info(SIGBUS, &info, tsk);
-
- /* Kernel mode? Handle exceptions or die */
- if (!(error_code & 4))
- goto no_context;
return;

vmalloc_fault:
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/include/asm-i386/cpufeature.h dual-2.6.0-test6/include/asm-i386/cpufeature.h
--- orig-2.6.0-test6/include/asm-i386/cpufeature.h 2003-09-30 05:41:04.520720265 +0100
+++ dual-2.6.0-test6/include/asm-i386/cpufeature.h 2003-09-30 06:03:46.302835762 +0100
@@ -68,6 +68,9 @@
#define X86_FEATURE_K7 (3*32+ 5) /* Athlon */
#define X86_FEATURE_P3 (3*32+ 6) /* P3 */
#define X86_FEATURE_P4 (3*32+ 7) /* P4 */
+/* Prefetch insns without AMD spurious faults, or with fixup for them. */
+#define X86_FEATURE_XMM_PREFETCH (3*32+ 8) /* SSE */
+#define X86_FEATURE_3DNOW_PREFETCH (3*32+ 9) /* 3DNow! */

/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_EST (4*32+ 7) /* Enhanced SpeedStep */
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/include/asm-i386/processor.h dual-2.6.0-test6/include/asm-i386/processor.h
--- orig-2.6.0-test6/include/asm-i386/processor.h 2003-09-30 05:41:04.861655987 +0100
+++ dual-2.6.0-test6/include/asm-i386/processor.h 2003-09-30 06:03:46.334829529 +0100
@@ -589,11 +589,9 @@
#define ARCH_HAS_PREFETCH
extern inline void prefetch(const void *x)
{
- if (cpu_data[0].x86_vendor == X86_VENDOR_AMD)
- return; /* Some athlons fault if the address is bad */
alternative_input(ASM_NOP4,
"prefetchnta (%1)",
- X86_FEATURE_XMM,
+ X86_FEATURE_XMM_PREFETCH,
"r" (x));
}

@@ -607,7 +605,7 @@
{
alternative_input(ASM_NOP4,
"prefetchw (%1)",
- X86_FEATURE_3DNOW,
+ X86_FEATURE_3DNOW_PREFETCH,
"r" (x));
}
#define spin_lock_prefetch(x) prefetchw(x)


2003-09-30 08:01:38

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch



Jamie Lokier wrote:

>This builds upon Andi Kleen's excellent patch for the AMD prefetch bug
>workaround. It is applies to 2.6.0-test6.
>
>There are four changes on top of Andi's recent patch:
>
> 1. The workaround code is only included when compiling a kernel
> optimised for AMD processors with the bug. For kernels optimised
> for a different processor, there is no code size overhead.
>
> This will make some people happier.
>
> It will make some people unhappier, because it means a generic
> kernel on an AMD will run fine, but won't fixup _userspace_
> prefetch faults. More on this below.
>
> 2. Nevertheless, when a kernel _not_ optimised for those AMD processors
> is run on one, the "alternative" mechanism now correctly replaces
> prefetch instructions with nops.
>
> 3. The is_prefetch() instruction decoder now handles all cases of
> funny GDT and LDT segments, checks limits and so on. As far as I
> can see, there are no further bugs or flaws in that part.
>
> 4. A consequence of the is_prefetch() change is that, despite the
> longer C code (rarely used code, for segments), is_prefetch()
> should run marginally faster now because the access_ok() calls are
> not required. They're subsumed into the segment limit
> calculation.
>
>Like Andi's patch, this removes 10k or so from current x86 kernels by
>removing the silly conditional from the prefetch() function.
>
>Controversial point coming up!
>

It sounds good Jamie, but could you possibly compile it in
unconditionally? Otherwise it introduces yet another caveat to the CPU
selection / build system. This can be addressed seperately.

I don't think anyone advocates attempting to handle this nicely with the
current CPU selection system, nor were they targeting this patch in
particular. It just kicked off a discussion about the shortfalls of the
selection system.


2003-09-30 13:39:54

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Dave Jones wrote:
> This looks to be completely gratuitous. Why disable it when we have the
> ability to work around it ?

Because some people expressed a wish to have kernels that don't
contain the workaround code, be they P4-optimised or 486-optimised
kernels. After all we have kernels that don't contain the F00F
workaround too. I'm not pushing this patch as is, it's for
considering the pros and cons.

CONFIG_X86_PREFETCH_WORKAROUND makes more makes more sense with the
recently available "split every x86 CPU into individually selectable
options" patch, and, on reflection, that's probably where it belongs.

-- Jamie

2003-09-30 13:26:14

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

On Tue, Sep 30, 2003 at 08:38:14AM +0100, Jamie Lokier wrote:
> diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/Kconfig dual-2.6.0-test6/arch/i386/Kconfig
> --- orig-2.6.0-test6/arch/i386/Kconfig 2003-09-30 05:39:33.467103692 +0100
> +++ dual-2.6.0-test6/arch/i386/Kconfig 2003-09-30 06:05:19.868623767 +0100
> @@ -397,6 +397,12 @@
> depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
> default y
>
> +config X86_PREFETCH_FIXUP
> + bool
> + depends on MK7 || MK8
> + default y
> +
> +
> config HPET_TIMER
> bool "HPET Timer Support"
> help
> diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/kernel/cpu/common.c dual-2.6.0-test6/arch/i386/kernel/cpu/common.c
> --- orig-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 05:39:33.871035696 +0100
> +++ dual-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 06:03:46.222851345 +0100
> @@ -327,6 +327,17 @@
> * we do "generic changes."
> */
>
> + /* Prefetch works ok? */
> +#ifndef CONFIG_X86_PREFETCH_FIXUP
> + if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
> +#endif
> + {
> + if (cpu_has(c, X86_FEATURE_XMM))
> + set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
> + if (cpu_has(c, X86_FEATURE_3DNOW))
> + set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
> + }
> +
> /* TSC disabled? */
> if ( tsc_disable )

This looks to be completely gratuitous. Why disable it when we have the
ability to work around it ?

Dave

--
Dave Jones http://www.codemonkey.org.uk

2003-09-30 13:53:51

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

On Tue, Sep 30, 2003 at 02:39:36PM +0100, Jamie Lokier wrote:
> Dave Jones wrote:
> > This looks to be completely gratuitous. Why disable it when we have the
> > ability to work around it ?
>
> Because some people expressed a wish to have kernels that don't
> contain the workaround code, be they P4-optimised or 486-optimised
> kernels.

And those people are wrong. If they want to save bloat, instead of
'fixing' things by removing <1 page of .text, how about working on
some of the real problems like shrinking some of the growth of various
data structures that actually *matter*.

The "I don't want Athlon code in my kernel because I run a P4 and
it makes it slow/bigger" argument is totally bogus. It's akin to
the gentoo-esque "I compiled my distro with -march=p4 and now
my /bin/ls is faster than yours" argument.

> After all we have kernels that don't contain the F00F
> workaround too. I'm not pushing this patch as is, it's for
> considering the pros and cons.

F00F workaround was enabled on every kernel that is possible
to boot on affected hardware last time I looked.
This is what you seem to be missing, it's not optional.
If its possible to boot that kernel on affected hardware,
it needs errata workarounds.

> CONFIG_X86_PREFETCH_WORKAROUND makes more makes more sense with the
> recently available "split every x86 CPU into individually selectable
> options" patch, and, on reflection, that's probably where it belongs.

Said patch isn't included in mainline, so this argument is bogus.
Relative merits of that patch were already discussed in another thread.

Dave

--
Dave Jones http://www.codemonkey.org.uk

2003-09-30 14:45:47

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Dave Jones wrote:
> On Tue, Sep 30, 2003 at 02:39:36PM +0100, Jamie Lokier wrote:
> > Dave Jones wrote:
> > > This looks to be completely gratuitous. Why disable it when we have the
> > > ability to work around it ?
> >
> > Because some people expressed a wish to have kernels that don't
> > contain the workaround code, be they P4-optimised or 486-optimised
> > kernels.
>
> And those people are wrong. If they want to save bloat, instead of
> 'fixing' things by removing <1 page of .text, how about working on
> some of the real problems like shrinking some of the growth of various
> data structures that actually *matter*.

How about both?

> The "I don't want Athlon code in my kernel because I run a P4 and
> it makes it slow/bigger" argument is totally bogus. It's akin to
> the gentoo-esque "I compiled my distro with -march=p4 and now
> my /bin/ls is faster than yours" argument.

I'm talking about people with embedded 486s or old 486s donated. P4s
are abundant in RAM, but 2MB is still not unheard of in the small
boxes, and in 2MB, 512 bytes of code (which is about the size of the
prefetch workaround) is more significant. Not by itself, but by
virtue of being yet another unnecessary, and very clearly removable
chunk. Diligence and all that.

> > After all we have kernels that don't contain the F00F
> > workaround too. I'm not pushing this patch as is, it's for
> > considering the pros and cons.
>
> F00F workaround was enabled on every kernel that is possible
> to boot on affected hardware last time I looked.
> This is what you seem to be missing, it's not optional.
> If its possible to boot that kernel on affected hardware,
> it needs errata workarounds.

We have a few confusing issues here.

1. First, your point about affected hardware.

I see your point, and the new "alternative" macro is allowing things
to develop along those lines even better: using fancier features (like
prefetch) but not requiring them. Most of the x86 kernel is like that now.
All the stuff in arch/i386/kernel/cpu/* is mandatory. Yet:

- I don't see anything that prevents a PPro-compiled kernel from booting
on a P5MMX with the F00F erratum.

- Nor do I see anything that prevents a PII-compiled kernel from booting
on a PPro with the store ordering erratum (X86_PPRO_FENCE).

Those are both nasty bugs, and the latter can corrupt data on an SMP PPro.

Currently, it seems quite hypocritical to treat the AMD workaround
as, in effect, mandatory, while there are others which aren't.
Perhaps it's this apparent hypocrisy which needs healing.

2. I'm not sure if you're criticising the other chap who wants
rid of the AMD errata workaround, or my X86_PREFETCH_FIXUP code.

In case you hadn't fully grokked it, my code doesn't disable the
workaround! It simply substitutes it for a smaller, slightly
slower one, on kernels which are not optimised for AMD. Those
kernels continue to work just fine on AMD processors.

Given that, I'm not sure what the thrust of your argument is.

> > CONFIG_X86_PREFETCH_WORKAROUND makes more makes more sense with the
> > recently available "split every x86 CPU into individually selectable
> > options" patch, and, on reflection, that's probably where it belongs.
>
> Said patch isn't included in mainline, so this argument is bogus.
> Relative merits of that patch were already discussed in another thread.

That wasn't an argument and the patch from me isn't in the mainline
either, but anyway I agree that topic has its own thread. Let's
please leave it out of this one and focus on the merits, or otherwise,
of my patch and Andi's.

-- Jamie

2003-09-30 15:08:52

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

On Tue, Sep 30, 2003 at 03:45:26PM +0100, Jamie Lokier wrote:

> > And those people are wrong. If they want to save bloat, instead of
> > 'fixing' things by removing <1 page of .text, how about working on
> > some of the real problems like shrinking some of the growth of various
> > data structures that actually *matter*.
> How about both?

Sounds like wasted effort, in the same sense that rewriting a crap
algorithm in assembly won't be better than using a more efficient
algorithm in C.

> I'm talking about people with embedded 486s or old 486s donated. P4s
> are abundant in RAM

Mine has 256MB. Sure its a huge amount in comparison to how we kitted
out 486s a few years back, but still hardly an abundance in todays bloatware..

> but 2MB is still not unheard of in the small
> boxes, and in 2MB, 512 bytes of code (which is about the size of the
> prefetch workaround) is more significant.

I'l be *amazed* if you manage to get a 2.6 kernel to boot and function
efficiently in 2MB without config'ing away a lot of the 'growth' like
sysfs. (Sidenote: Before some loon actually tries this, by function
efficiently, I mean is actually usable for some purpose, not "it booted
to a bash prompt")

> > F00F workaround was enabled on every kernel that is possible
> > to boot on affected hardware last time I looked.
> > This is what you seem to be missing, it's not optional.
> > If its possible to boot that kernel on affected hardware,
> > it needs errata workarounds.
>
> We have a few confusing issues here.
>
> 1. First, your point about affected hardware.
>
> - I don't see anything that prevents a PPro-compiled kernel from booting
> on a P5MMX with the F00F erratum.

Compiled with -m686 - Uses CMOV, won't boot.

> - Nor do I see anything that prevents a PII-compiled kernel from booting
> on a PPro with the store ordering erratum (X86_PPRO_FENCE).

Correct. As noted in another mail, it arguably should contain the
workaround.

> Perhaps it's this apparent hypocrisy which needs healing.

Agreed.

> 2. I'm not sure if you're criticising the other chap who wants
> rid of the AMD errata workaround, or my X86_PREFETCH_FIXUP code.

My criticism was twofold.

1. The splitting of X86_FEATURE_XMM into X86_FEATURE_XMM_PREFETCH and
X86_FEATURE_3DNOW_PREFETCH doesn't seem to really buy us anything
other than complication.
2. THis chunk...

+ /* Prefetch works ok? */
+#ifndef CONFIG_X86_PREFETCH_FIXUP
+ if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
+#endif
+ {
+ if (cpu_has(c, X86_FEATURE_XMM))
+ set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
+ if (cpu_has(c, X86_FEATURE_3DNOW))
+ set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
+ }

- If we haven't set CONFIG_X86_PREFETCH_FIXUP (say a P4 kernel), this
code path isn't taken, and we end up not doing prefetches on P4's too
as you're not setting X86_FEATURE_XMM_PREFETCH anywhere else, and apply_alternatives
leaves them as NOPs.
- Newer C3s are 686's with prefetch, this nobbles them too.


> In case you hadn't fully grokked it, my code doesn't disable the
> workaround! It simply substitutes it for a smaller, slightly
> slower one, on kernels which are not optimised for AMD.

See above. Or have I missed something ?

> Given that, I'm not sure what the thrust of your argument is.

It's possible I'm missing something silly..

Dave

--
Dave Jones http://www.codemonkey.org.uk

2003-09-30 16:55:14

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Dave Jones wrote:
> > - I don't see anything that prevents a PPro-compiled kernel from booting
> > on a P5MMX with the F00F erratum.
>
> Compiled with -m686 - Uses CMOV, won't boot.

Ok, not PPro, but with Processor Family set to K6, CYRIXIII, or any of
the 3 WINCHIP choices, it is compiled with -march=i585 and without F00F.

(Fixing that by adding F00F too all those non-Intel processors, just to
make sure non-F00F kernels crash with a cmov instruction is too subtle
for my taste.)

Anyway, it should complain about lack of cmov not crash :)

> 1. The splitting of X86_FEATURE_XMM into X86_FEATURE_XMM_PREFETCH and
> X86_FEATURE_3DNOW_PREFETCH doesn't seem to really buy us anything
> other than complication.

I once suggested turning off XMM entirely when prefetch is broken and
not fixed, but that resulted in a mild toasting, hence the extra
synthetic flag.

It's not really split: XMM_PREFETCH is an additional flag, on top of
XMM, which simply means Linux decided it's safe to use the prefetch
instruction.

The only reason it's a feature flag is because the "alternative" macro
needs one.

> + /* Prefetch works ok? */
> +#ifndef CONFIG_X86_PREFETCH_FIXUP
> + if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
> +#endif
> + {
> + if (cpu_has(c, X86_FEATURE_XMM))
> + set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
> + if (cpu_has(c, X86_FEATURE_3DNOW))
> + set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
> + }
>
> - If we haven't set CONFIG_X86_PREFETCH_FIXUP (say a P4 kernel), this
> code path isn't taken, and we end up not doing prefetches on P4's too
> as you're not setting X86_FEATURE_XMM_PREFETCH anywhere else, and apply_alternatives
> leaves them as NOPs.
> - Newer C3s are 686's with prefetch, this nobbles them too.

Read the code again. It does what you think it doesn't do, so to speak.

-- Jamie

2003-09-30 17:26:46

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

On Tue, Sep 30, 2003 at 05:54:50PM +0100, Jamie Lokier wrote:

> > > - I don't see anything that prevents a PPro-compiled kernel from booting
> > > on a P5MMX with the F00F erratum.
> > Compiled with -m686 - Uses CMOV, won't boot.
> Ok, not PPro, but with Processor Family set to K6, CYRIXIII, or any of
> the 3 WINCHIP choices, it is compiled with -march=i585 and without F00F.

in theory, these should be interchangable. Not tested though.

> (Fixing that by adding F00F too all those non-Intel processors, just to
> make sure non-F00F kernels crash with a cmov instruction is too subtle
> for my taste.)

this case is a little more obscure imo, and not worth the effort.

> Anyway, it should complain about lack of cmov not crash :)

not easy, given we execute cmov instructions before we even hit
a printk. Such a test & output needs to be done in assembly in early
startup.

> > 1. The splitting of X86_FEATURE_XMM into X86_FEATURE_XMM_PREFETCH and
> > X86_FEATURE_3DNOW_PREFETCH doesn't seem to really buy us anything
> > other than complication.
> I once suggested turning off XMM entirely when prefetch is broken and
> not fixed, but that resulted in a mild toasting, hence the extra
> synthetic flag.

Which gets us back to the question of why this is needed at all ?
You said earlier "In case you hadn't fully grokked it, my code doesn't
disable the workaround!" So why do you need this ?

> > - If we haven't set CONFIG_X86_PREFETCH_FIXUP (say a P4 kernel), this
> > code path isn't taken, and we end up not doing prefetches on P4's too
> > as you're not setting X86_FEATURE_XMM_PREFETCH anywhere else, and apply_alternatives
> > leaves them as NOPs.
> > - Newer C3s are 686's with prefetch, this nobbles them too.
> Read the code again. It does what you think it doesn't do, so to speak.

Yep, brain fart.

Dave

--
Dave Jones http://www.codemonkey.org.uk

2003-09-30 19:08:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Dave Jones <[email protected]> writes:
>
> > Anyway, it should complain about lack of cmov not crash :)
>
> not easy, given we execute cmov instructions before we even hit
> a printk. Such a test & output needs to be done in assembly in early
> startup.

I implemented it for long mode on x86-64.

It has to be done before the vesafb is initialized too, otherwise
you cannot see the error message.

You could copy the code from arch/x86_64/boot/setup.S
(starting with /* check for long mode. */) and change it to
check for the CPUID bits you want. x86-64 checks a basic collection
that has been determined to be the base set of x86-64 supporting CPUs.
But it could be less or more.

-Andi

2003-09-30 20:09:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Followup to: <[email protected]>
By author: Andi Kleen <[email protected]>
In newsgroup: linux.dev.kernel
>
> I implemented it for long mode on x86-64.
>
> It has to be done before the vesafb is initialized too, otherwise
> you cannot see the error message.
>
> You could copy the code from arch/x86_64/boot/setup.S
> (starting with /* check for long mode. */) and change it to
> check for the CPUID bits you want. x86-64 checks a basic collection
> that has been determined to be the base set of x86-64 supporting CPUs.
> But it could be less or more.
>

Do it in boot/setup.S so that you can still issue a message via the
BIOS. However, don't forget you might need bug fixes/workarounds like
the P6 SEP in there, too. From that perspective it would be nice to
have something that can be written in C. Recent binutils actually
allow gcc-generated .s-files to be assembled for a 16-bit environment
(with lots of overrides.)

-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

2003-10-01 00:27:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Jamie Lokier <[email protected]> wrote:
>
> What I'd really like your opinion on is the appropriate userspace
> behaviour. If we don't care about fixing up userspace, then
> __ex_table is a much tidier workaround for the prefetch bug.

I think we should fix up userspace.

> If we do
> care about fixing up userspace, then do we need a policy decision that
> says it's not acceptable to run on AMD without userspace fixups from
> 2.6.0 onwards - it must fixup userspace or refuse to run?

If you're saying that the kernel should refuse to boot if it discovers that
a) the CPU needs the workaround and b) the kernel does not implement the
workaround then yup.

2003-09-30 23:58:10

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Dave Jones wrote:
> Which gets us back to the question of why this is needed at all ?
> You said earlier "In case you hadn't fully grokked it, my code doesn't
> disable the workaround!" So why do you need this ?

To change the prefetch workaround from a critical requirement to an
optimisation knob.

X86_USE_3DNOW is very similar: if it's enabled, the kernel has some
extra code to make certain CPUs run faster, but they also run fine
without it. X86_OOSTORE is another.

What I'd really like your opinion on is the appropriate userspace
behaviour. If we don't care about fixing up userspace, then
__ex_table is a much tidier workaround for the prefetch bug. If we do
care about fixing up userspace, then do we need a policy decision that
says it's not acceptable to run on AMD without userspace fixups from
2.6.0 onwards - it must fixup userspace or refuse to run?

-- Jamie

2003-10-01 01:54:15

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

> I think we should fix up userspace.
What do you mean by this? Patch user code at runtime (it's much more
complex than it sounds) or remove prefetch instructions from userspace?

Jun

> -----Original Message-----
> From: [email protected] [mailto:linux-kernel-
> [email protected]] On Behalf Of Andrew Morton
> Sent: Tuesday, September 30, 2003 5:27 PM
> To: Jamie Lokier
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch
errata
> patch
>
> Jamie Lokier <[email protected]> wrote:
> >
> > What I'd really like your opinion on is the appropriate userspace
> > behaviour. If we don't care about fixing up userspace, then
> > __ex_table is a much tidier workaround for the prefetch bug.
>
> I think we should fix up userspace.
>
> > If we do
> > care about fixing up userspace, then do we need a policy decision
that
> > says it's not acceptable to run on AMD without userspace fixups from
> > 2.6.0 onwards - it must fixup userspace or refuse to run?
>
> If you're saying that the kernel should refuse to boot if it discovers
> that
> a) the CPU needs the workaround and b) the kernel does not implement
the
> workaround then yup.
>
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-10-01 02:08:30

by Mike Fedyk

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

On Tue, Sep 30, 2003 at 06:54:06PM -0700, Nakajima, Jun wrote:
> > I think we should fix up userspace.
> What do you mean by this? Patch user code at runtime (it's much more
> complex than it sounds) or remove prefetch instructions from userspace?

No, just make sure that userspace doesn't see the exceptions that the errata
will produce in obscure cases. It is all explained in several threads about
this. Search for "Andi Kleen" and "prefetch".

2003-10-01 02:06:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

"Nakajima, Jun" <[email protected]> wrote:
>
> > I think we should fix up userspace.
> What do you mean by this? Patch user code at runtime (it's much more
> complex than it sounds) or remove prefetch instructions from userspace?

Detect when user code stumbles over this CPU errata and make it look like
nothing happened.

2003-10-01 02:23:20

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Oh, I thought you already closed this issue and you were discussing a
different thing.

I agree. They kernel should fix up userspace for this CPU errata.

Jun

> -----Original Message-----
> From: Andrew Morton [mailto:[email protected]]
> Sent: Tuesday, September 30, 2003 7:07 PM
> To: Nakajima, Jun
> Cc: [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch
errata
> patch
>
> "Nakajima, Jun" <[email protected]> wrote:
> >
> > > I think we should fix up userspace.
> > What do you mean by this? Patch user code at runtime (it's much
more
> > complex than it sounds) or remove prefetch instructions from
userspace?
>
> Detect when user code stumbles over this CPU errata and make it look
like
> nothing happened.

2003-10-01 02:51:47

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Nakajima, Jun wrote:
> Oh, I thought you already closed this issue and you were discussing a
> different thing.
>
> I agree. They kernel should fix up userspace for this CPU errata.

Our question is whether kernels configured specifically for non-AMD
(e.g. Winchip or Elan kernels) must have the AMD fixup code (it is a
few hundred bytes), refuse to boot on AMD, ignore the problem, or work
but not fix up userspace.

-- Jamie

2003-10-01 03:13:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

Jamie Lokier <[email protected]> wrote:
>
> Nakajima, Jun wrote:
> > Oh, I thought you already closed this issue and you were discussing a
> > different thing.
> >
> > I agree. They kernel should fix up userspace for this CPU errata.
>
> Our question is whether kernels configured specifically for non-AMD
> (e.g. Winchip or Elan kernels) must have the AMD fixup code (it is a
> few hundred bytes), refuse to boot on AMD, ignore the problem, or work
> but not fix up userspace.

Refusing to boot would be best I think. Fixing it up anyway would be OK
too.

Ignoring the problem in kernel and/or userspace could be viewed as a
response to pilot error but as always it would be nice if, given a choice,
we can help the pilot.