2001-10-05 10:46:53

by Paul Mackerras

[permalink] [raw]
Subject: [PATCH] change name of rep_nop

I was a bit dismayed by the changes to smp_init() in init/main.c in
the pre4 patch, because:

* rep_nop() is a silly name, it is not at all obvious what it does, in
fact even if you know it does the rep; nop instruction sequence on
x86 it still isn't obvious even to most x86 users what it would do.
I propose that we change the name to cpu_relax as a more descriptive
name.

* The change breaks the SMP compile on all platforms except x86, since
rep_nop is only defined for x86.

* Why are we using a volatile attribute on wait_init_idle instead of
using a barrier?

Here is a patch that addresses those three issues. It adds an empty
definition of cpu_relax for all architectures except x86 (for x86 it
is defined to be rep_nop), and it changes smp_init to use a barrier
instead of making wait_init_idle be volatile.

The patch also includes definitions for prefetch* on PPC since they
were also in our local asm-ppc/processor.h.

This compiles and runs correctly on a 4-way PPC box here.

Linus, please apply this to your tree.

Thanks,
Paul.

diff -urN linux/include/asm-alpha/processor.h pmac/include/asm-alpha/processor.h
--- linux/include/asm-alpha/processor.h Mon Sep 24 09:31:35 2001
+++ pmac/include/asm-alpha/processor.h Fri Oct 5 15:58:47 2001
@@ -148,6 +148,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#define ARCH_HAS_PREFETCH
#define ARCH_HAS_PREFETCHW
#define ARCH_HAS_SPINLOCK_PREFETCH
diff -urN linux/include/asm-arm/processor.h pmac/include/asm-arm/processor.h
--- linux/include/asm-arm/processor.h Mon Sep 24 09:31:35 2001
+++ pmac/include/asm-arm/processor.h Fri Oct 5 15:59:05 2001
@@ -112,6 +112,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
/*
* Create a new kernel thread
*/
diff -urN linux/include/asm-cris/processor.h pmac/include/asm-cris/processor.h
--- linux/include/asm-cris/processor.h Mon Sep 24 09:31:35 2001
+++ pmac/include/asm-cris/processor.h Fri Oct 5 15:59:21 2001
@@ -141,4 +141,6 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#endif /* __ASM_CRIS_PROCESSOR_H */
diff -urN linux/include/asm-i386/processor.h pmac/include/asm-i386/processor.h
--- linux/include/asm-i386/processor.h Mon Sep 24 09:31:35 2001
+++ pmac/include/asm-i386/processor.h Fri Oct 5 15:58:05 2001
@@ -476,6 +476,8 @@
__asm__ __volatile__("rep;nop");
}

+#define cpu_relax() rep_nop()
+
/* Prefetch instructions for Pentium III and AMD Athlon */
#ifdef CONFIG_MPENTIUMIII

diff -urN linux/include/asm-ia64/processor.h pmac/include/asm-ia64/processor.h
--- linux/include/asm-ia64/processor.h Mon Sep 24 09:31:35 2001
+++ pmac/include/asm-ia64/processor.h Fri Oct 5 15:59:38 2001
@@ -969,6 +969,8 @@
return result;
}

+#define cpu_relax() do { } while (0)
+

#define ARCH_HAS_PREFETCH
#define ARCH_HAS_PREFETCHW
diff -urN linux/include/asm-m68k/processor.h pmac/include/asm-m68k/processor.h
--- linux/include/asm-m68k/processor.h Mon Sep 24 09:31:35 2001
+++ pmac/include/asm-m68k/processor.h Fri Oct 5 15:59:52 2001
@@ -155,4 +155,6 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#endif
diff -urN linux/include/asm-mips/processor.h pmac/include/asm-mips/processor.h
--- linux/include/asm-mips/processor.h Mon Sep 24 09:31:35 2001
+++ pmac/include/asm-mips/processor.h Fri Oct 5 16:00:04 2001
@@ -259,6 +259,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#endif /* !defined (_LANGUAGE_ASSEMBLY) */
#endif /* __KERNEL__ */

diff -urN linux/include/asm-mips64/processor.h pmac/include/asm-mips64/processor.h
--- linux/include/asm-mips64/processor.h Mon Sep 24 09:31:36 2001
+++ pmac/include/asm-mips64/processor.h Fri Oct 5 16:00:18 2001
@@ -290,6 +290,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#endif /* !defined (_LANGUAGE_ASSEMBLY) */
#endif /* __KERNEL__ */

diff -urN linux/include/asm-parisc/processor.h pmac/include/asm-parisc/processor.h
--- linux/include/asm-parisc/processor.h Mon Sep 24 09:31:36 2001
+++ pmac/include/asm-parisc/processor.h Fri Oct 5 16:00:29 2001
@@ -333,5 +333,7 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+

#endif /* __ASM_PARISC_PROCESSOR_H */
diff -urN linux/include/asm-ppc/processor.h pmac/include/asm-ppc/processor.h
--- linux/include/asm-ppc/processor.h Mon Sep 24 09:31:36 2001
+++ pmac/include/asm-ppc/processor.h Fri Oct 5 16:26:22 2001
@@ -1,5 +1,5 @@
/*
- * BK Id: SCCS/s.processor.h 1.28 08/17/01 15:23:17 paulus
+ * BK Id: SCCS/s.processor.h 1.31 10/05/01 16:26:22 paulus
*/
#ifdef __KERNEL__
#ifndef __ASM_PPC_PROCESSOR_H
@@ -654,6 +654,27 @@

/* In misc.c */
void _nmask_and_or_msr(unsigned long nmask, unsigned long or_val);
+
+#define cpu_relax() do { } while (0)
+
+/*
+ * Prefetch macros.
+ */
+#define ARCH_HAS_PREFETCH
+#define ARCH_HAS_PREFETCHW
+#define ARCH_HAS_SPINLOCK_PREFETCH
+
+extern inline void prefetch(const void *x)
+{
+ __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
+}
+
+extern inline void prefetchw(const void *x)
+{
+ __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x));
+}
+
+#define spin_lock_prefetch(x) prefetchw(x)

#endif /* !__ASSEMBLY__ */

diff -urN linux/include/asm-s390/processor.h pmac/include/asm-s390/processor.h
--- linux/include/asm-s390/processor.h Mon Sep 24 09:31:36 2001
+++ pmac/include/asm-s390/processor.h Fri Oct 5 16:01:03 2001
@@ -149,6 +149,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
/*
* Set of msr bits that gdb can change on behalf of a process.
*/
diff -urN linux/include/asm-s390x/processor.h pmac/include/asm-s390x/processor.h
--- linux/include/asm-s390x/processor.h Mon Sep 24 09:31:36 2001
+++ pmac/include/asm-s390x/processor.h Fri Oct 5 16:01:12 2001
@@ -159,6 +159,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
/*
* Set of msr bits that gdb can change on behalf of a process.
*/
diff -urN linux/include/asm-sh/processor.h pmac/include/asm-sh/processor.h
--- linux/include/asm-sh/processor.h Mon Sep 24 09:31:37 2001
+++ pmac/include/asm-sh/processor.h Fri Oct 5 16:01:20 2001
@@ -218,4 +218,6 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#endif /* __ASM_SH_PROCESSOR_H */
diff -urN linux/include/asm-sparc/processor.h pmac/include/asm-sparc/processor.h
--- linux/include/asm-sparc/processor.h Mon Sep 24 09:31:37 2001
+++ pmac/include/asm-sparc/processor.h Fri Oct 5 16:01:36 2001
@@ -201,6 +201,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#endif

#endif /* __ASM_SPARC_PROCESSOR_H */
diff -urN linux/include/asm-sparc64/processor.h pmac/include/asm-sparc64/processor.h
--- linux/include/asm-sparc64/processor.h Mon Sep 24 09:31:37 2001
+++ pmac/include/asm-sparc64/processor.h Fri Oct 5 16:01:46 2001
@@ -276,6 +276,8 @@
#define init_task (init_task_union.task)
#define init_stack (init_task_union.stack)

+#define cpu_relax() do { } while (0)
+
#endif /* __KERNEL__ */

#endif /* !(__ASSEMBLY__) */
diff -urN linux/init/main.c pmac/init/main.c
--- linux/init/main.c Fri Oct 5 14:35:22 2001
+++ pmac/init/main.c Fri Oct 5 16:09:38 2001
@@ -482,7 +482,7 @@
extern void setup_arch(char **);
extern void cpu_idle(void);

-volatile unsigned long wait_init_idle = 0UL;
+unsigned long wait_init_idle;

#ifndef CONFIG_SMP

@@ -510,13 +510,12 @@
smp_commence();

/* Wait for the other cpus to set up their idle processes */
- while (1) {
- if (!wait_init_idle)
- break;
- rep_nop();
- }
+ while (wait_init_idle) {
+ cpu_relax();
+ barrier();
+ }
printk("All processors have done init_idle\n");
-}
+}

#endif

diff -urN linux/kernel/sched.c pmac/kernel/sched.c
--- linux/kernel/sched.c Fri Oct 5 14:35:22 2001
+++ pmac/kernel/sched.c Fri Oct 5 16:03:03 2001
@@ -1309,7 +1309,7 @@
atomic_inc(&current->files->count);
}

-extern volatile unsigned long wait_init_idle;
+extern unsigned long wait_init_idle;

void __init init_idle(void)
{


2001-10-05 14:32:23

by Alan

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

> Here is a patch that addresses those three issues. It adds an empty
> definition of cpu_relax for all architectures except x86 (for x86 it
> is defined to be rep_nop), and it changes smp_init to use a barrier
> instead of making wait_init_idle be volatile.
>

Looks good to me

2001-10-05 18:08:12

by Peter Rival

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

Alan Cox wrote:

>>Here is a patch that addresses those three issues. It adds an empty
>>definition of cpu_relax for all architectures except x86 (for x86 it
>>is defined to be rep_nop), and it changes smp_init to use a barrier
>>instead of making wait_init_idle be volatile.
>>
>>
>
> Looks good to me


You also need to move the call to smp_boot_cpus() below the
clear_bit(...) line in smp_init(). Without it, my Wildfire doesn't get
past the while(wait_init_idle) loop - seems all of the CPUs have already
done their work before the mask is set. Besides, it's the right place
for it anyway. I'd generate a patch, but my system is bogged down in a
benchmark for the next couple of hours. If someone says so, I'll
generate the patch after that...

- Pete


2001-10-05 23:28:36

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

Peter Rival writes:

> You also need to move the call to smp_boot_cpus() below the
> clear_bit(...) line in smp_init(). Without it, my Wildfire doesn't get

No, that won't work for me, because cpu_online_map is set by
smp_boot_cpus(), at least on PPC (in fact each CPU sets its bit in
cpu_online_map as it spins up).

There shouldn't be a race on x86 at all, because the secondary
processors don't call init_idle until after they see that the primary
cpu has call smp_commence. (There is currently a race on PPC since we
call init_idle before waiting for smp_commence, but that would not be
your problem.)

> past the while(wait_init_idle) loop - seems all of the CPUs have already
> done their work before the mask is set. Besides, it's the right place
> for it anyway.

No, I think it should be smp_boot_cpus, set wait_init_idle,
smp_commence, then the secondaries start clearing their bits. Which
AFAICS is the way it is on x86. What architecture is your wildfire?

Paul.

2001-10-05 23:58:32

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

>> You also need to move the call to smp_boot_cpus() below the
>> clear_bit(...) line in smp_init(). Without it, my Wildfire doesn't get
>
> No, that won't work for me, because cpu_online_map is set by
> smp_boot_cpus(), at least on PPC (in fact each CPU sets its bit in
> cpu_online_map as it spins up).
>
> There shouldn't be a race on x86 at all, because the secondary
> processors don't call init_idle until after they see that the primary
> cpu has call smp_commence. (There is currently a race on PPC since we
> call init_idle before waiting for smp_commence, but that would not be
> your problem.)

There *is* a race on x86 - the problem is that the primary cpu can
get to reschedule_idle before the secondarys have done init_idle.
I'm not claiming the way I fixed it is beautiful, but the race definitely
exists (I hit it) and the patch makes the problem go away.

Sounds like rep_nop was the wrong way to spin - aplogies.

Martin.

2001-10-06 01:41:14

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

Martin J. Bligh writes:

> There *is* a race on x86 - the problem is that the primary cpu can
> get to reschedule_idle before the secondarys have done init_idle.
> I'm not claiming the way I fixed it is beautiful, but the race definitely
> exists (I hit it) and the patch makes the problem go away.

I meant that there isn't a race on x86 in pre4, now that we are using
your patch. I didn't mean to say that there wasn't a race on x86
before your patch went in.

There is a race on PPC with your patch, but I can fix that by removing
the init_idle() call from smp_callin() in arch/ppc/kernel/smp.c.
At a quick glance it looks like alpha and sparc (sun4m) may have the
same problem since they also call init_idle before waiting for
smp_commence() (or smp_threads_ready != 0).

> Sounds like rep_nop was the wrong way to spin - aplogies.

Well it was just that the name was a bit x86-centric, and a bit
non-obvious, even to people who know a bit of x86 assembly (one could
be forgiven for thinking that rep nop was a slow way to set CX to 0).

Paul.

2001-10-08 19:29:32

by Peter Rival

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

diff -urN linux-2.4.11-pre5/arch/alpha/kernel/smp.c linux-2.4.11-pre5+fix/arch/alpha/kernel/smp.c
--- linux-2.4.11-pre5/arch/alpha/kernel/smp.c Mon Oct 8 13:37:57 2001
+++ linux-2.4.11-pre5+fix/arch/alpha/kernel/smp.c Mon Oct 8 14:17:50 2001
@@ -171,13 +171,6 @@
/* Set interrupt vector. */
wrent(entInt, 0);

- /* Setup the scheduler for this processor. */
- init_idle();
-
- /* ??? This should be in init_idle. */
- atomic_inc(&init_mm.mm_count);
- current->active_mm = &init_mm;
-
/* Get our local ticker going. */
smp_setup_percpu_timer(cpuid);

@@ -207,6 +200,12 @@
DBGS(("smp_callin: commencing CPU %d current %p\n",
cpuid, current));

+ /* Setup the scheduler for this processor. */
+ init_idle();
+
+ /* ??? This should be in init_idle. */
+ atomic_inc(&init_mm.mm_count);
+ current->active_mm = &init_mm;
/* Do nothing. */
cpu_idle();
}


Attachments:
alpha_patch (880.00 B)

2001-10-08 22:36:55

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop


While we're on the subject of stupidly named routines and x86-isms, I'm
having trouble reconciling this text in Documentation/cachetlb.txt:

1) void flush_cache_all(void)

The most severe flush of all. After this interface runs,
the entire cpu cache is flushed.

... with this implementation in include/asm-i386/pgtable.h:

#define flush_cache_all() do { } while (0)

That really doesn't seem to be doing what it says on the tin.

Some people have asserted, falsely, that it's never sane to want an i386 to
flush its cache. Even if that were true, it wouldn't really be an excuse for
the above discrepancy.

It's probably too late to fix it - this function seems to have evolved
completely undocumented and different semantics, and many architectures now
have a NOP implementation of it. Maybe we need to introduce a new call which
actually _does_ flush the cache, called simon_says_flush_cache_all() ?


Index: Documentation/cachetlb.txt
===================================================================
RCS file: /inst/cvs/linux/Documentation/Attic/cachetlb.txt,v
retrieving revision 1.1.2.6
diff -u -r1.1.2.6 cachetlb.txt
--- Documentation/cachetlb.txt 2001/04/05 14:06:40 1.1.2.6
+++ Documentation/cachetlb.txt 2001/10/08 22:29:25
@@ -163,6 +163,11 @@
This is usually invoked when the kernel page tables are
changed, since such translations are "global" in nature.

+ NB. Some architecture maintainers have decided that their
+ architecture should treat this call as a NOP. Those
+ architectures may implement simon_says_flush_cache_all()
+ which actually does as it's told. YMMV.
+
2) void flush_cache_mm(struct mm_struct *mm)

This interface flushes an entire user address space from
Index: arch/i386/kernel/mtrr.c
===================================================================
RCS file: /inst/cvs/linux/arch/i386/kernel/mtrr.c,v
retrieving revision 1.2.2.30
diff -u -r1.2.2.30 mtrr.c
--- arch/i386/kernel/mtrr.c 2001/09/22 17:36:09 1.2.2.30
+++ arch/i386/kernel/mtrr.c 2001/10/08 22:29:25
@@ -394,13 +394,13 @@
write_cr4(ctxt->cr4val & (unsigned char) ~(1<<7));
}

- /* Disable and flush caches. Note that wbinvd flushes the TLBs as
- a side-effect */
+ /* Disable and flush caches. Note that simon_says_flush_cache_all
+ flushes the TLBs as a side-effect */
{
unsigned int cr0 = read_cr0() | 0x40000000;
- wbinvd();
+ simon_says_flush_cache_all();
write_cr0( cr0 );
- wbinvd();
+ simon_says_flush_cache_all();
}

if ( mtrr_if == MTRR_IF_INTEL ) {
@@ -424,7 +424,7 @@
}

/* Flush caches and TLBs */
- wbinvd();
+ simon_says_flush_cache_all();

/* Restore MTRRdefType */
if ( mtrr_if == MTRR_IF_INTEL ) {
@@ -784,7 +784,7 @@
* The writeback rule is quite specific. See the manual. Its
* disable local interrupts, write back the cache, set the mtrr
*/
- wbinvd();
+ simon_says_flush_cache_all();
wrmsr (MSR_K6_UWCCR, regs[0], regs[1]);
if (do_safe) set_mtrr_done (&ctxt);
} /* End Function amd_set_mtrr_up */
Index: arch/i386/kernel/setup.c
===================================================================
RCS file: /inst/cvs/linux/arch/i386/kernel/setup.c,v
retrieving revision 1.4.2.70
diff -u -r1.4.2.70 setup.c
--- arch/i386/kernel/setup.c 2001/10/08 16:25:04 1.4.2.70
+++ arch/i386/kernel/setup.c 2001/10/08 22:29:25
@@ -1213,7 +1213,7 @@
unsigned long flags;
l=(1<<0)|((mbytes/4)<<1);
local_irq_save(flags);
- wbinvd();
+ simon_says_flush_cache_all();
wrmsr(MSR_K6_WHCR, l, h);
local_irq_restore(flags);
printk(KERN_INFO "Enabling old style K6 write allocation for %d Mb\n",
@@ -1234,7 +1234,7 @@
unsigned long flags;
l=((mbytes>>2)<<22)|(1<<16);
local_irq_save(flags);
- wbinvd();
+ simon_says_flush_cache_all();
wrmsr(MSR_K6_WHCR, l, h);
local_irq_restore(flags);
printk(KERN_INFO "Enabling new style K6 write allocation for %d Mb\n",
Index: drivers/acpi/hardware/hwsleep.c
===================================================================
RCS file: /inst/cvs/linux/drivers/acpi/hardware/Attic/hwsleep.c,v
retrieving revision 1.1.2.4
diff -u -r1.1.2.4 hwsleep.c
--- drivers/acpi/hardware/hwsleep.c 2001/09/23 20:45:20 1.1.2.4
+++ drivers/acpi/hardware/hwsleep.c 2001/10/08 22:29:25
@@ -199,7 +199,7 @@

/* flush caches */

- wbinvd();
+ simon_says_flush_cache_all();

/* write #2: SLP_TYP + SLP_EN */

Index: drivers/acpi/include/platform/acgcc.h
===================================================================
RCS file: /inst/cvs/linux/drivers/acpi/include/platform/Attic/acgcc.h,v
retrieving revision 1.1.2.3
diff -u -r1.1.2.3 acgcc.h
--- drivers/acpi/include/platform/acgcc.h 2001/09/23 20:45:20 1.1.2.3
+++ drivers/acpi/include/platform/acgcc.h 2001/10/08 22:29:25
@@ -39,7 +39,6 @@
#define BREAKPOINT3
#define disable() __cli()
#define enable() __sti()
-#define wbinvd()

/*! [Begin] no source code translation */

@@ -101,7 +100,6 @@
#define disable() __cli()
#define enable() __sti()
#define halt() __asm__ __volatile__ ("sti; hlt":::"memory")
-#define wbinvd() __asm__ __volatile__ ("wbinvd":::"memory")

/*! [Begin] no source code translation
*
Index: include/asm-i386/system.h
===================================================================
RCS file: /inst/cvs/linux/include/asm-i386/system.h,v
retrieving revision 1.2.2.19
diff -u -r1.2.2.19 system.h
--- include/asm-i386/system.h 2001/09/07 11:08:42 1.2.2.19
+++ include/asm-i386/system.h 2001/10/08 22:29:26
@@ -124,7 +124,7 @@

#endif /* __KERNEL__ */

-#define wbinvd() \
+#define simon_says_flush_cache_all() \
__asm__ __volatile__ ("wbinvd": : :"memory");

static inline unsigned long get_limit(unsigned long segment)

--
dwmw2


2001-10-08 22:45:26

by Alan

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

> While we're on the subject of stupidly named routines and x86-isms, I'm
> having trouble reconciling this text in Documentation/cachetlb.txt:
>
> 1) void flush_cache_all(void)
>
> The most severe flush of all. After this interface runs,
> the entire cpu cache is flushed.

I suspect the real definition should be

"After this interface runs the device and other views of
memory includes all dirty cache lines on the processor. On
processors with cache coherent bus interfaces this may well
be a no operation

Footnote: AGP tends to have its own coherency and caching setup
so the AGP drivers use their own interfaces to avoid needless
overhead"

2001-10-08 22:49:05

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

From: David Woodhouse <[email protected]>
Date: Mon, 08 Oct 2001 23:36:26 +0100

While we're on the subject of stupidly named routines and x86-isms, I'm
having trouble reconciling this text in Documentation/cachetlb.txt:

1) void flush_cache_all(void)

The most severe flush of all. After this interface runs,
the entire cpu cache is flushed.

... with this implementation in include/asm-i386/pgtable.h:

#define flush_cache_all() do { } while (0)

That really doesn't seem to be doing what it says on the tin.

"for the purposes of having the processor maintain cache coherency
due to a kernel level TLB mapping change"

Yes, I know the text isn't there, but that is the implication.
Add the text, don't add a stupid "simon_says.." interface.

The mtrr stuff, if it really does need the flush, should probably
make it's own macro/inline with a huge comment about it explaining
why the flush is actually needed.

Some people have asserted, falsely, that it's never sane to want an i386 to
flush its cache.

"for the purposes of having the processor maintain cache coherency
due to a kernel level TLB mapping change"

All of these flush_foo interfaces are about cache flushes needed when
address space changes occur. They are not meant to be a way to deal
with all sorts of other cache details, for those we have the PCI DMA
interfaces, flush_dcache_page etc.

Even if that were true, it wouldn't really be an excuse for
the above discrepancy.

There is no discrepancy, only missing text in cachetlb.txt, please
add it, but I thought that the location of the routine made it obvious
what context it is meant to operate and be used.

Franks a lot,
David S. Miller
[email protected]

2001-10-08 23:06:37

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop


[email protected] said:
> Yes, I know the text isn't there, but that is the implication. Add
> the text, don't add a stupid "simon_says.." interface.

> The mtrr stuff, if it really does need the flush, should probably make
> it's own macro/inline with a huge comment about it explaining why the
> flush is actually needed.

It's not just mtrr stuff, and it's not just arch-specific code either. In
some cases, there is a need for a function which actually does flush the
cache.

Feel free to suggest an alternative name for it, as long as it's not
'wbinvd' - I would have picked flush_cache_all(), but that is already taken
by a function which generally doesn't actually flush the cache :)

--
dwmw2


2001-10-08 23:11:47

by Alan

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

> The mtrr stuff, if it really does need the flush, should probably
> make it's own macro/inline with a huge comment about it explaining
> why the flush is actually needed.

The wbinvd in the mtrr handling is basically cpu specific deep magic from
the chip documentation.

> with all sorts of other cache details, for those we have the PCI DMA
> interfaces, flush_dcache_page etc.

We need to work out how to fix the pci dma interfaces on the PC. The PPro
has an interesting errata where writes to combining memory can pass writes
to uncached memory. That means to fix it I have to lob in a locked store
or other workaround. That costs clocks - and isnt needed on pre ppro boxes,
ditto the spin_unlock using xchg fix isnt needed except on ppro.

That raises the question of whether x86 should seperate the "386" "486" ..
kernels by adding "Generic" for building a kernel that has all the work
arounds for everyones randomly buggy processors

2001-10-08 23:11:37

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

From: David Woodhouse <[email protected]>
Date: Tue, 09 Oct 2001 00:06:28 +0100

It's not just mtrr stuff, and it's not just arch-specific code either. In
some cases, there is a need for a function which actually does flush the
cache.

Example of this on ix86?

Regardless, the purpose of the cachetlb.txt interfaces is for the
generic VM subsystem of the kernel. Nothing more. If AGP, mtrr,
whatever weird device stuff needs this, it belongs in a different
area.

Franks a lot,
David S. Miller
[email protected]

2001-10-08 23:24:20

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

On Tue, 9 Oct 2001, Alan Cox wrote:

> That raises the question of whether x86 should seperate the "386" "486" ..
> kernels by adding "Generic" for building a kernel that has all the work
> arounds for everyones randomly buggy processors

How do you propose to do this without turning setup.c and friends
into a #ifdef nightmare ? setup_intel.c, setup_amd.c etc ??

Some of the bits I've planned for setup.c in 2.5 will make this
look not so bad. Its grown quite large, and is continuing to do so
as more and more vendors make more and more hardware bugs for us
to work around.

Splitting out things like the memory detection to a seperate
file should bring this back down to a more sensible size.

regards,

Dave.

--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs

2001-10-08 23:29:01

by Alan

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

> How do you propose to do this without turning setup.c and friends
> into a #ifdef nightmare ? setup_intel.c, setup_amd.c etc ??

By leaving well alone. That is all init and non runtime overhead. Things
like lock addl $0, (%esp) all through the code are not.

2001-10-08 23:33:12

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

From: Dave Jones <[email protected]>
Date: Tue, 9 Oct 2001 01:24:18 +0200 (CEST)

How do you propose to do this without turning setup.c and friends
into a #ifdef nightmare ? setup_intel.c, setup_amd.c etc ??

It is possible to contain the mess in header files. Other
architectures have done this...

You can either macro inline the cpu tests and/or use function
pointers. The munging in the header files nops it out into
whatever CONFIG_CPUTYPE was chosen if you didn't ask for the
generic build.

Franks a lot,
David S. Miller
[email protected]

2001-10-08 23:42:12

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop


[email protected] said:
> Example of this on ix86?

AGP and ACPI are the cases which already use it. Of course if it was
x86-only code it wouldn't matter if we kept on calling it wbinvd().

The case which originally drew my attention was something which hasn't been
implemented yet - flash drivers where you need a cached mapping in order to
do burst reads from the chip, but obviously you still need to be able to
flush the cache on demand.

In fact, this is something we probably won't do on x86 - only on other
architectures. You need both cached and uncached mappings of the chip to
make this work (although often you have physical aliases so even on x86 you
can map one address cached and another uncached), and also you don't have
fine-grained flushes of selected ranges on x86 so it's likely to have a very
noticeable effect on performance - so much so that it's probably worth just
doing it all uncached in the first place.

But x86 isn't particularly interesting - it'd be useful to have a
flush_dcache_range() which actually works across other architectures anyway.

> Regardless, the purpose of the cachetlb.txt interfaces is for the
> generic VM subsystem of the kernel. Nothing more.

So they should probably have less misleading names, perchance including the
letter 'v' and the letter 'm' somewhere? And they should _certainly_ have
less misleading documentation. :)

--
dwmw2


2001-10-08 23:47:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop


On Mon, 8 Oct 2001, David Woodhouse wrote:
>
> While we're on the subject of stupidly named routines and x86-isms, I'm
> having trouble reconciling this text in Documentation/cachetlb.txt:
>
> 1) void flush_cache_all(void)
>
> The most severe flush of all. After this interface runs,
> the entire cpu cache is flushed.
>
> ... with this implementation in include/asm-i386/pgtable.h:
>
> #define flush_cache_all() do { } while (0)
>
> That really doesn't seem to be doing what it says on the tin.

There's no way we should implement "simon_says".

There's a difference between stupiud hardware changing memory from under
us through mapping tricks, and cache coherency in general.

What you want is the "memory_went_away_from_us()" kind of cache flush,
which has nothing at all to do with cache coherency. And it should be
explicitly and clearly named THAT, and you should not blame the fact that
x86 is always 100% cache coherent and that the normal cache coherency
routines should _never_ be anything but a nop.

Also note that wbinvd is known to corrupted the caches and lead to lockups
on certain x86 cores. You need to be a _lot_ more careful than just doing
it indiscriminately from a driver.

Linus

2001-10-08 23:48:52

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

From: David Woodhouse <[email protected]>
Date: Tue, 09 Oct 2001 00:42:04 +0100

But x86 isn't particularly interesting - it'd be useful to have a
flush_dcache_range() which actually works across other architectures anyway.

The memory technology device case is weird, give it a solution
such as "asm/memdev.h".

> Regardless, the purpose of the cachetlb.txt interfaces is for the
> generic VM subsystem of the kernel. Nothing more.

So they should probably have less misleading names, perchance including the
letter 'v' and the letter 'm' somewhere? And they should _certainly_ have
less misleading documentation. :)

Why? find_get_page says nothing about "page cache", but people
understand that is what it is used for.

The documention should be more specific, thats all.

Franks a lot,
David S. Miller
[email protected]

2001-10-09 00:03:23

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop


[email protected] said:
> There's no way we should implement "simon_says".
> There's a difference between stupiud hardware changing memory from
> under us through mapping tricks, and cache coherency in general.

True. Although for simplicity's sake I wasn't talking about the mapping
tricks, this was just for writing/erasing flash chips without doing any
paging - it you're mapping different chips into the same hole you need the
same cache-flush tricks even for read-only chips.

> What you want is the "memory_went_away_from_us()" kind of cache flush,
> which has nothing at all to do with cache coherency. And it should be
> explicitly and clearly named THAT

As you wish. How about:

void memory_went_away_from_us(void);
and
void memory_range_went_away_from_us(unsigned long start, unsigned long len);

Where 'start' is an ioremap cookie.

> and you should not blame the fact that x86 is always 100% cache coherent
> and that the normal cache coherency routines should _never_ be anything
> but a nop.

Indeed. That's eminently sane - it was just the nomenclature and the
documentation which was less so.

> Also note that wbinvd is known to corrupted the caches and lead to
> lockups on certain x86 cores. You need to be a _lot_ more careful than
> just doing it indiscriminately from a driver.

Yeah - but x86 isn't a particularly interesting architecture in this
context. If you can't selectively flush a range, you'll probably find that
you haven't gained enough from being able to do burst reads to offset the
cost of the complete cache flushes.

--
dwmw2


2001-10-09 05:01:38

by George Greer

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

On Tue, 9 Oct 2001, Dave Jones wrote:

>On Tue, 9 Oct 2001, Alan Cox wrote:
>
>> That raises the question of whether x86 should seperate the "386" "486" ..
>> kernels by adding "Generic" for building a kernel that has all the work
>> arounds for everyones randomly buggy processors
>
>How do you propose to do this without turning setup.c and friends
>into a #ifdef nightmare ? setup_intel.c, setup_amd.c etc ??

I did a patch for that in the 2.2.x days that simply modified the existing
#ifdef's to be more specific. The Configure file then set define_bool
options correctly for whatever option you chose. It was a very simple
strategy, but not a completely comprehensive patch.

--
George Greer, [email protected]
http://www.m-l.org/~greerga/

2001-10-09 08:51:36

by Etienne Lorrain

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

Hello,
> __asm__ __volatile__("rep;nop");

"The behavior of the REP prefix is undefined when used with non-string
instructions." page 3-404 of Intel documentation, in "CHAPTER 3,
INSTRUCTION SET REFERENCE"...

How about: __asm__ __volatile__("loop ." : "+c" (nbloop)); ?

Etienne.

___________________________________________________________
Un nouveau Nokia Game commence.
Allez sur http://fr.yahoo.com/nokiagame avant le 3 novembre
pour participer ? cette aventure tous m?dias.

2001-10-09 10:33:51

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

>That raises the question of whether x86 should seperate the "386" "486" ..
>kernels by adding "Generic" for building a kernel that has all the work
>arounds for everyones randomly buggy processors

One approach we take on PPC that you may or may not like for this is
dynamic patching.

Basically, we have some very early init code that probes the CPU type,
and extract from a table a bit mask of "CPU features". Those can be
real features, like has an FPU or an Altivec, but can also be known
erratas.

Then, we have a a couple of macros that look like this : (to be used
in .S files or in inline assembly)

some asm code...
BEGIN_FEATURE_SECTION()
some asm code specific to the presence
or absence of a CPU feature bit
END_FEATURE_SECTION(mask, value)

basically, what those macros do is to add references to the enclosed
bit of code to a separate ELF section, along with the mask & values
32 bit values.

The early CPU probe code, after having determined the feature bits mask
of the CPU will then walk that additional ELF section, and for each
entry in it (an entry is a start address, an end address, a mask and
a value), will test if (cpu_feature & mask) == value. If the result is
false, then the entire section of code referenced is replaced with nop's.

This works well for small bits of code (we use it for commenting out
some altivec code in the context switch path on non-altivec machines,
and for nop'ing out some "sync" intructions that are only necessary
on some older buggy CPUs). You still have the cost of executing a NOP
(which is pretty minimal on PPC, but other archs may want a more suitable
instruction as NOP can be context synchronising on some CPUs).

If you happen to have large code sections covered by this mecanism,
the patching code can be improved to insert a branch to the end of
the nop'ed out section on it's first instruction.

Regards,
Ben.


2001-10-09 11:25:05

by Alan

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

> Hello,
> > __asm__ __volatile__("rep;nop");
>
> "The behavior of the REP prefix is undefined when used with non-string
> instructions." page 3-404 of Intel documentation, in "CHAPTER 3,
> INSTRUCTION SET REFERENCE"...
>
> How about: __asm__ __volatile__("loop ." : "+c" (nbloop)); ?

rep nop is a magic instruction to powersave momentarily on the Pentium 4.
It happens to be a true nop on the older processors

2001-10-09 11:30:06

by Keith Owens

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

On Tue, 9 Oct 2001 12:33:56 +0200,
Benjamin Herrenschmidt <[email protected]> wrote:
>One approach we take on PPC that you may or may not like for this is
>dynamic patching.

BIG RED WARNING: Anybody thinking of using dynamic patching must
consider modules as well as the kernel. modutils 2.4.8 added support
for ppc archdata to allow dynamic patching of modules using the ftr
data. There also has to be code in kernel/module.c::module_arch_init()
to take the archdata and do whatever is required.

If anybody starts doing dynamic patching, please let me know so I can
handle modutils and module_arch_init().

2001-10-09 12:11:12

by Alan

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

> consider modules as well as the kernel. modutils 2.4.8 added support
> for ppc archdata to allow dynamic patching of modules using the ftr
> data. There also has to be code in kernel/module.c::module_arch_init()
> to take the archdata and do whatever is required.

I am considering the possibility for the x86 ppro fix but firstly I want
to be sure the spin_unlock change is necessary. The PCI one is much less
of an impact since we don't go around doing it all the time.

> If anybody starts doing dynamic patching, please let me know so I can
> handle modutils and module_arch_init().

For testing purposes it won't be needed thankfully - a non nop patched
module might just be a tiny bit slower.

Anyway firstly I have to reasonably show PPro errata 66/92 are what is
causing the odd weird ppro crash (the evidence so far is the xchg
based unlock helps). The WC v UC misordered store one is a proven case,
as is the 0x70000000->0x7003FFFF wbinvd hang which turns up on some ppro
boxes that never got the bios E820 fixup

Alan

2001-10-09 12:13:03

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] change name of rep_nop

>>One approach we take on PPC that you may or may not like for this is
>>dynamic patching.
>
>BIG RED WARNING: Anybody thinking of using dynamic patching must
>consider modules as well as the kernel. modutils 2.4.8 added support
>for ppc archdata to allow dynamic patching of modules using the ftr
>data. There also has to be code in kernel/module.c::module_arch_init()
>to take the archdata and do whatever is required.
>
>If anybody starts doing dynamic patching, please let me know so I can
>handle modutils and module_arch_init().

Yes, definitely an issue. The mecanism we used on PPC was so specific
to some low-level arch asm code that modules were a non-issue... until
some of that critical code ended up beeing inlined.

Ben.