The following series implements lightweight SYNC memory barriers for SMP Linux
and a correct use of SYNCs around atomics, futexes, spinlocks etc LL-SC loops -
the basic building blocks of any atomics in MIPS.
Historically, a generic MIPS doesn't use memory barriers around LL-SC loops in
atomics, spinlocks etc. However, Architecture documents never specify that LL-SC
loop creates a memory barrier. Some non-generic MIPS vendors already feel
the pain and enforces it. With introduction in a recent out-of-order superscalar
MIPS processors an aggressive speculative memory read it is a problem now.
The generic MIPS memory barrier instruction SYNC (aka SYNC 0) is something
very heavvy because it was designed for propogating barrier down to memory.
MIPS R2 introduced lightweight SYNC instructions which correspond to smp_*()
set of SMP barriers. The description was very HW-specific and it was never
used, however, it is much less trouble for processor pipelines and can be used
in smp_mb()/smp_rmb()/smp_wmb() as is as in acquire/release barrier semantics.
After prolonged discussions with HW team it became clear that lightweight SYNCs
were designed specifically with smp_*() in mind but description is in timeline
ordering space.
So, the problem was spotted recently in engineering tests and it was confirmed
with tests that without memory barrier load and store may pass LL/SC
instructions in both directions, even in old MIPS R2 processors.
Aggressive speculation in MIPS R6 and MIPS I5600 processors adds more fire to
this issue.
3 patches introduces a configurable control for lightweight SYNCs around LL/SC
loops and for MIPS32 R2 it was allowed to choose an enforcing SYNCs or not
(keep as is) because some old MIPS32 R2 may be happy without that SYNCs.
In MIPS R6 I chose to have SYNC around LL/SC mandatory because all of that
processors have an agressive speculation and delayed write buffers. In that
processors series it is still possible the use of SYNC 0 instead of
lightweight SYNCs in configuration - just in case of some trouble in
implementation in specific CPU. However, it is considered safe do not implement
some or any lightweight SYNC in specific core because Architecture requires
HW map of unimplemented SYNCs to SYNC 0.
---
Leonid Yegoshin (3):
MIPS: R6: Use lightweight SYNC instruction in smp_* memory barriers
MIPS: enforce LL-SC loop enclosing with SYNC (ACQUIRE and RELEASE)
MIPS: bugfix - replace smp_mb with release barrier function in unlocks
arch/mips/Kconfig | 47 ++++++++++++++++++++++++++++++++++++++
arch/mips/include/asm/barrier.h | 32 +++++++++++++++++++++++---
arch/mips/include/asm/bitops.h | 2 +-
arch/mips/include/asm/spinlock.h | 2 +-
4 files changed, 77 insertions(+), 6 deletions(-)
--
Signature
This instructions were specifically designed to work for smp_*() sort of
memory barriers in MIPS R2/R3/R5 and R6.
Unfortunately, it's description is very cryptic and is done in HW engineering
style which prevents use of it by SW. This instructions are not mandatory but
there is a mandatory requirement - if not implemented, then a regular MIPS
SYNC 0 must be used instead.
The reason for this change is - SYNC 0 is a heavvy-weighted in many CPUs, it may
disrupt other cores pipelines etc.
Due to concern about verification of old MIPS R2 compatible cores of other
vendors it is enforced only for MIPS R6 and other MIPS32/64 R2/R5 processors
have it configurable.
Signed-off-by: Leonid Yegoshin <[email protected]>
---
arch/mips/Kconfig | 22 ++++++++++++++++++++++
arch/mips/include/asm/barrier.h | 6 ++++++
2 files changed, 28 insertions(+)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index be384d6a58bb..c7d0cacece3d 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1347,6 +1347,7 @@ config CPU_MIPS32_R2
select CPU_SUPPORTS_32BIT_KERNEL
select CPU_SUPPORTS_HIGHMEM
select CPU_SUPPORTS_MSA
+ select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
select HAVE_KVM
help
Choose this option to build a kernel for release 2 or later of the
@@ -1365,6 +1366,8 @@ config CPU_MIPS32_R6
select GENERIC_CSUM
select HAVE_KVM
select MIPS_O32_FP64_SUPPORT
+ select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
+ select WEAK_REORDERING_BEYOND_LLSC
help
Choose this option to build a kernel for release 6 or later of the
MIPS32 architecture. New MIPS processors, starting with the Warrior
@@ -1399,6 +1402,7 @@ config CPU_MIPS64_R2
select CPU_SUPPORTS_HIGHMEM
select CPU_SUPPORTS_HUGEPAGES
select CPU_SUPPORTS_MSA
+ select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
help
Choose this option to build a kernel for release 2 or later of the
MIPS64 architecture. Many modern embedded systems with a 64-bit
@@ -1415,6 +1419,8 @@ config CPU_MIPS64_R6
select CPU_SUPPORTS_HIGHMEM
select CPU_SUPPORTS_MSA
select GENERIC_CSUM
+ select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
+ select WEAK_REORDERING_BEYOND_LLSC
help
Choose this option to build a kernel for release 6 or later of the
MIPS64 architecture. New MIPS processors, starting with the Warrior
@@ -1876,6 +1882,20 @@ config WEAK_ORDERING
#
config WEAK_REORDERING_BEYOND_LLSC
bool
+
+config MIPS_LIGHTWEIGHT_SYNC
+ bool "CPU lightweight SYNC instruction for weak reordering"
+ depends on CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC && WEAK_ORDERING
+ default y if CPU_MIPSR6
+ help
+ This option enforces a use of "lightweight sync" instructions
+ for SMP (multi-CPU) memory barriers. This instructions are much
+ more faster than a traditional "SYNC 0".
+
+ If that instructions are not implemented in processor then it is
+ converted to generic "SYNC 0".
+
+ If you unsure, say N here, it may slightly decrease your performance
endmenu
#
@@ -1928,6 +1948,8 @@ config CPU_SUPPORTS_HUGEPAGES
bool
config CPU_SUPPORTS_UNCACHED_ACCELERATED
bool
+config CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
+ bool
config MIPS_PGD_C0_CONTEXT
bool
default y if 64BIT && CPU_MIPSR2 && !CPU_XLP
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 2b8bbbcb9be0..d2a63abfc7c6 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -96,9 +96,15 @@
# define smp_rmb() barrier()
# define smp_wmb() __syncw()
# else
+# ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
+# define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
+# define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
+# define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
+# else
# define smp_mb() __asm__ __volatile__("sync" : : :"memory")
# define smp_rmb() __asm__ __volatile__("sync" : : :"memory")
# define smp_wmb() __asm__ __volatile__("sync" : : :"memory")
+# endif
# endif
#else
#define smp_mb() barrier()
Many MIPS32 R2 and all MIPS R6 CPUs are out of order execution, so it
needs memory barriers in SMP environment. However, past cores may have
a pipeline short enough to ignore that requirements and problem may
never occurs until recently.
This patch gives an option to enclose LL-SC loops by SYNC barriers in spinlocks,
atomics, futexes, cmpxchg and bitops.
So, this option is defined for MIPS32 R2 only, because that recent
CPUs may occasionally have problems in accordance with HW team.
And most of MIPS64 R2 vendor processors already have some kind of memory
barrier and the only one generic 5KEs has a pretty short pipeline.
Using memory barriers in MIPS R6 is mandatory, all that
processors have a speculative memory read which can inflict a trouble
without a correct use of barriers in LL-SC loop cycles.
The same is actually for MIPS32 R5 I5600 processor.
Signed-off-by: Leonid Yegoshin <[email protected]>
---
arch/mips/Kconfig | 25 +++++++++++++++++++++++++
arch/mips/include/asm/barrier.h | 26 ++++++++++++++++++++++----
2 files changed, 47 insertions(+), 4 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c7d0cacece3d..676eb64f5545 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1896,6 +1896,30 @@ config MIPS_LIGHTWEIGHT_SYNC
converted to generic "SYNC 0".
If you unsure, say N here, it may slightly decrease your performance
+
+config MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC
+ bool "Enforce memory barriers at LLSC loops - atomics, spinlocks etc"
+ depends on CPU_MIPS32_R2
+ default y if CPU_MIPSR6
+ select WEAK_REORDERING_BEYOND_LLSC
+ help
+ Many MIPS32 R2 and all MIPS R6 CPUs are out of order execution, so it
+ needs memory barriers in SMP environment. However, past cores may have
+ a pipeline short enough to ignore that requirements and problem may
+ never occurs until recently.
+
+ So, this option is defined for MIPS32 R2 only, because that recent
+ CPUs may occasionally have problems in accordance with HW team.
+ And MIPS64 R2 vendor processors already have some kind of memory
+ barrier and the only one generic 5KEs has a pretty short pipeline.
+
+ Using memory barriers in MIPS R6 is mandatory, all that
+ processors have a speculative memory read which can inflict a trouble
+ without a correct use of barriers in LL-SC loop cycles.
+ The same is actually for MIPS32 R5 I5600 processor.
+
+ If you unsure, say Y here, it may slightly decrease your performance
+ but increase a reliability.
endmenu
#
@@ -1924,6 +1948,7 @@ config CPU_MIPSR2
config CPU_MIPSR6
bool
default y if CPU_MIPS32_R6 || CPU_MIPS64_R6
+ select MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC
select MIPS_SPRAM
config EVA
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index d2a63abfc7c6..f3cc7a91ac0d 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -95,33 +95,51 @@
# define smp_mb() __sync()
# define smp_rmb() barrier()
# define smp_wmb() __syncw()
+# define smp_acquire() __sync()
+# define smp_release() __sync()
# else
# ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
# define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
# define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
# define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
+# define smp_acquire() __asm__ __volatile__("sync 0x11" : : :"memory")
+# define smp_release() __asm__ __volatile__("sync 0x12" : : :"memory")
# else
# define smp_mb() __asm__ __volatile__("sync" : : :"memory")
# define smp_rmb() __asm__ __volatile__("sync" : : :"memory")
# define smp_wmb() __asm__ __volatile__("sync" : : :"memory")
+# define smp_acquire() __asm__ __volatile__("sync" : : :"memory")
+# define smp_release() __asm__ __volatile__("sync" : : :"memory")
# endif
# endif
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_acquire() barrier()
+#define smp_release() barrier()
#endif
#if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
+#ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
+#define __WEAK_LLSC_MB " sync 0x10 \n"
+#define __WEAK_ACQUIRE " sync 0x11 \n"
+#define __WEAK_RELEASE " sync 0x12 \n"
+#else
#define __WEAK_LLSC_MB " sync \n"
+#define __WEAK_ACQUIRE __WEAK_LLSC_MB
+#define __WEAK_RELEASE __WEAK_LLSC_MB
+#endif
#else
#define __WEAK_LLSC_MB " \n"
+#define __WEAK_ACQUIRE __WEAK_LLSC_MB
+#define __WEAK_RELEASE __WEAK_LLSC_MB
#endif
#define set_mb(var, value) \
do { var = value; smp_mb(); } while (0)
-#define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
+#define smp_llsc_mb() __asm__ __volatile__(__WEAK_ACQUIRE : : :"memory")
#ifdef CONFIG_CPU_CAVIUM_OCTEON
#define smp_mb__before_llsc() smp_wmb()
@@ -131,14 +149,14 @@
"syncw\n\t" \
".set pop" : : : "memory")
#else
-#define smp_mb__before_llsc() smp_llsc_mb()
+#define smp_mb__before_llsc() __asm__ __volatile__(__WEAK_RELEASE : : :"memory")
#define nudge_writes() mb()
#endif
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
- smp_mb(); \
+ smp_release(); \
ACCESS_ONCE(*p) = (v); \
} while (0)
@@ -146,7 +164,7 @@ do { \
({ \
typeof(*p) ___p1 = ACCESS_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
- smp_mb(); \
+ smp_acquire(); \
___p1; \
})
Repleace smp_mb() in arch_write_unlock() and __clear_bit_unlock() to
smp_mb__before_llsc() call which does "release" barrier functionality.
It seems like it was missed in commit f252ffd50c97dae87b45f1dbad24f71358ccfbd6
during introduction of "acquire" and "release" semantics.
Signed-off-by: Leonid Yegoshin <[email protected]>
---
arch/mips/include/asm/bitops.h | 2 +-
arch/mips/include/asm/spinlock.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 0cf29bd5dc5c..ce9666cf1499 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -469,7 +469,7 @@ static inline int test_and_change_bit(unsigned long nr,
*/
static inline void __clear_bit_unlock(unsigned long nr, volatile unsigned long *addr)
{
- smp_mb();
+ smp_mb__before_llsc();
__clear_bit(nr, addr);
}
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index 1fca2e0793dc..7c7f3b2bd3de 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -317,7 +317,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
static inline void arch_write_unlock(arch_rwlock_t *rw)
{
- smp_mb();
+ smp_mb__before_llsc();
__asm__ __volatile__(
" # arch_write_unlock \n"
On 06/01/2015 20:09, Leonid Yegoshin wrote:
> The following series implements lightweight SYNC memory barriers for SMP Linux
> and a correct use of SYNCs around atomics, futexes, spinlocks etc LL-SC loops -
> the basic building blocks of any atomics in MIPS.
>
> Historically, a generic MIPS doesn't use memory barriers around LL-SC loops in
> atomics, spinlocks etc. However, Architecture documents never specify that LL-SC
> loop creates a memory barrier. Some non-generic MIPS vendors already feel
> the pain and enforces it. With introduction in a recent out-of-order superscalar
> MIPS processors an aggressive speculative memory read it is a problem now.
>
> The generic MIPS memory barrier instruction SYNC (aka SYNC 0) is something
> very heavvy because it was designed for propogating barrier down to memory.
> MIPS R2 introduced lightweight SYNC instructions which correspond to smp_*()
> set of SMP barriers. The description was very HW-specific and it was never
> used, however, it is much less trouble for processor pipelines and can be used
> in smp_mb()/smp_rmb()/smp_wmb() as is as in acquire/release barrier semantics.
> After prolonged discussions with HW team it became clear that lightweight SYNCs
> were designed specifically with smp_*() in mind but description is in timeline
> ordering space.
>
> So, the problem was spotted recently in engineering tests and it was confirmed
> with tests that without memory barrier load and store may pass LL/SC
> instructions in both directions, even in old MIPS R2 processors.
> Aggressive speculation in MIPS R6 and MIPS I5600 processors adds more fire to
> this issue.
>
> 3 patches introduces a configurable control for lightweight SYNCs around LL/SC
> loops and for MIPS32 R2 it was allowed to choose an enforcing SYNCs or not
> (keep as is) because some old MIPS32 R2 may be happy without that SYNCs.
> In MIPS R6 I chose to have SYNC around LL/SC mandatory because all of that
> processors have an agressive speculation and delayed write buffers. In that
> processors series it is still possible the use of SYNC 0 instead of
> lightweight SYNCs in configuration - just in case of some trouble in
> implementation in specific CPU. However, it is considered safe do not implement
> some or any lightweight SYNC in specific core because Architecture requires
> HW map of unimplemented SYNCs to SYNC 0.
How useful might this be for older hardware, such as the R10k CPUs? Just
fallbacks to the old sync insn?
--J
On Tue, Jun 02, 2015 at 04:41:21AM -0400, Joshua Kinard wrote:
> On 06/01/2015 20:09, Leonid Yegoshin wrote:
> > The following series implements lightweight SYNC memory barriers for SMP Linux
> > and a correct use of SYNCs around atomics, futexes, spinlocks etc LL-SC loops -
> > the basic building blocks of any atomics in MIPS.
> >
> > Historically, a generic MIPS doesn't use memory barriers around LL-SC loops in
> > atomics, spinlocks etc. However, Architecture documents never specify that LL-SC
> > loop creates a memory barrier. Some non-generic MIPS vendors already feel
> > the pain and enforces it. With introduction in a recent out-of-order superscalar
> > MIPS processors an aggressive speculative memory read it is a problem now.
> >
> > The generic MIPS memory barrier instruction SYNC (aka SYNC 0) is something
> > very heavvy because it was designed for propogating barrier down to memory.
> > MIPS R2 introduced lightweight SYNC instructions which correspond to smp_*()
> > set of SMP barriers. The description was very HW-specific and it was never
> > used, however, it is much less trouble for processor pipelines and can be used
> > in smp_mb()/smp_rmb()/smp_wmb() as is as in acquire/release barrier semantics.
> > After prolonged discussions with HW team it became clear that lightweight SYNCs
> > were designed specifically with smp_*() in mind but description is in timeline
> > ordering space.
> >
> > So, the problem was spotted recently in engineering tests and it was confirmed
> > with tests that without memory barrier load and store may pass LL/SC
> > instructions in both directions, even in old MIPS R2 processors.
> > Aggressive speculation in MIPS R6 and MIPS I5600 processors adds more fire to
> > this issue.
> >
> > 3 patches introduces a configurable control for lightweight SYNCs around LL/SC
> > loops and for MIPS32 R2 it was allowed to choose an enforcing SYNCs or not
> > (keep as is) because some old MIPS32 R2 may be happy without that SYNCs.
> > In MIPS R6 I chose to have SYNC around LL/SC mandatory because all of that
> > processors have an agressive speculation and delayed write buffers. In that
> > processors series it is still possible the use of SYNC 0 instead of
> > lightweight SYNCs in configuration - just in case of some trouble in
> > implementation in specific CPU. However, it is considered safe do not implement
> > some or any lightweight SYNC in specific core because Architecture requires
> > HW map of unimplemented SYNCs to SYNC 0.
>
> How useful might this be for older hardware, such as the R10k CPUs? Just
> fallbacks to the old sync insn?
The R10000 family is strongly ordered so there is no SYNC instruction
required in the entire kernel even though some Origin hardware documentation
incorrectly claims otherwise.
Ralf
Hi Leonid,
On Mon, Jun 01, 2015 at 05:09:34PM -0700, Leonid Yegoshin wrote:
> This instructions were specifically designed to work for smp_*() sort of
> memory barriers in MIPS R2/R3/R5 and R6.
>
> Unfortunately, it's description is very cryptic and is done in HW engineering
> style which prevents use of it by SW.
FYI this reads to me like "I couldn't figure it out from the manuals"
which you might want to leave out.
> This instructions are not mandatory but
> there is a mandatory requirement - if not implemented, then a regular MIPS
> SYNC 0 must be used instead.
>
> The reason for this change is - SYNC 0 is a heavvy-weighted in many CPUs, it may
> disrupt other cores pipelines etc.
>
> Due to concern about verification of old MIPS R2 compatible cores of other
> vendors it is enforced only for MIPS R6 and other MIPS32/64 R2/R5 processors
> have it configurable.
>
> Signed-off-by: Leonid Yegoshin <[email protected]>
> ---
> arch/mips/Kconfig | 22 ++++++++++++++++++++++
> arch/mips/include/asm/barrier.h | 6 ++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index be384d6a58bb..c7d0cacece3d 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1347,6 +1347,7 @@ config CPU_MIPS32_R2
> select CPU_SUPPORTS_32BIT_KERNEL
> select CPU_SUPPORTS_HIGHMEM
> select CPU_SUPPORTS_MSA
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> select HAVE_KVM
> help
> Choose this option to build a kernel for release 2 or later of the
> @@ -1365,6 +1366,8 @@ config CPU_MIPS32_R6
> select GENERIC_CSUM
> select HAVE_KVM
> select MIPS_O32_FP64_SUPPORT
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> + select WEAK_REORDERING_BEYOND_LLSC
This WEAK_REORDERING_BEYOND_LLSC change should probably be split out
into a separate patch, since it has nothing to do with the smp_*
barriers and is just left as a sync 0 (__WEAK_LLSC_MB) as of this patch.
> help
> Choose this option to build a kernel for release 6 or later of the
> MIPS32 architecture. New MIPS processors, starting with the Warrior
> @@ -1399,6 +1402,7 @@ config CPU_MIPS64_R2
> select CPU_SUPPORTS_HIGHMEM
> select CPU_SUPPORTS_HUGEPAGES
> select CPU_SUPPORTS_MSA
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> help
> Choose this option to build a kernel for release 2 or later of the
> MIPS64 architecture. Many modern embedded systems with a 64-bit
> @@ -1415,6 +1419,8 @@ config CPU_MIPS64_R6
> select CPU_SUPPORTS_HIGHMEM
> select CPU_SUPPORTS_MSA
> select GENERIC_CSUM
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> + select WEAK_REORDERING_BEYOND_LLSC
Ditto.
> help
> Choose this option to build a kernel for release 6 or later of the
> MIPS64 architecture. New MIPS processors, starting with the Warrior
> @@ -1876,6 +1882,20 @@ config WEAK_ORDERING
> #
> config WEAK_REORDERING_BEYOND_LLSC
> bool
> +
> +config MIPS_LIGHTWEIGHT_SYNC
> + bool "CPU lightweight SYNC instruction for weak reordering"
> + depends on CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC && WEAK_ORDERING
> + default y if CPU_MIPSR6
> + help
> + This option enforces a use of "lightweight sync" instructions
> + for SMP (multi-CPU) memory barriers. This instructions are much
> + more faster than a traditional "SYNC 0".
"enforces a use" would probably read better as "makes use", and "much
more faster" should just be "much faster". Personally I'd not make the
SYNC so shouty - sync is perfectly valid & generally the way the
instruction is named even in the kernel.
> +
> + If that instructions are not implemented in processor then it is
> + converted to generic "SYNC 0".
I think this would read better as something like:
If a processor does not implement the lightweight sync operations then
the architecture requires that they interpret the corresponding sync
instructions as the typical heavyweight "sync 0". Therefore this
should be safe to enable on all CPUs implementing release 2 or
later of the MIPS architecture.
> +
> + If you unsure, say N here, it may slightly decrease your performance
> endmenu
>
> #
> @@ -1928,6 +1948,8 @@ config CPU_SUPPORTS_HUGEPAGES
> bool
> config CPU_SUPPORTS_UNCACHED_ACCELERATED
> bool
> +config CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> + bool
I'm not sure putting MIPS_ in the name of this makes sense either BTW.
Other things around it don't bother (eg. UNCACHED_ACCELERATED right
above) and it's already under arch/mips/Kconfig.
> config MIPS_PGD_C0_CONTEXT
> bool
> default y if 64BIT && CPU_MIPSR2 && !CPU_XLP
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index 2b8bbbcb9be0..d2a63abfc7c6 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -96,9 +96,15 @@
> # define smp_rmb() barrier()
> # define smp_wmb() __syncw()
> # else
> +# ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
> +# define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
> +# define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
> +# define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
Tabs please.
Thanks,
Paul
> +# else
> # define smp_mb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_rmb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_wmb() __asm__ __volatile__("sync" : : :"memory")
> +# endif
> # endif
> #else
> #define smp_mb() barrier()
>
>
Hi Leonid,
On 02/06/15 01:09, Leonid Yegoshin wrote:
> This instructions were specifically designed to work for smp_*() sort of
> memory barriers in MIPS R2/R3/R5 and R6.
>
> Unfortunately, it's description is very cryptic and is done in HW engineering
> style which prevents use of it by SW. This instructions are not mandatory but
> there is a mandatory requirement - if not implemented, then a regular MIPS
> SYNC 0 must be used instead.
>
> The reason for this change is - SYNC 0 is a heavvy-weighted in many CPUs, it may
heavy
> disrupt other cores pipelines etc.
>
> Due to concern about verification of old MIPS R2 compatible cores of other
> vendors it is enforced only for MIPS R6 and other MIPS32/64 R2/R5 processors
> have it configurable.
Is it worth inverting the logic to exclude the platforms you're
concerned about (which also makes it explicit what hardware needs
verifying). For new platforms we don't need to worry about kernel
regressions so much, so it probably makes sense to have them used by
default.
>
> Signed-off-by: Leonid Yegoshin <[email protected]>
> ---
> arch/mips/Kconfig | 22 ++++++++++++++++++++++
> arch/mips/include/asm/barrier.h | 6 ++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index be384d6a58bb..c7d0cacece3d 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1347,6 +1347,7 @@ config CPU_MIPS32_R2
> select CPU_SUPPORTS_32BIT_KERNEL
> select CPU_SUPPORTS_HIGHMEM
> select CPU_SUPPORTS_MSA
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> select HAVE_KVM
> help
> Choose this option to build a kernel for release 2 or later of the
> @@ -1365,6 +1366,8 @@ config CPU_MIPS32_R6
> select GENERIC_CSUM
> select HAVE_KVM
> select MIPS_O32_FP64_SUPPORT
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> + select WEAK_REORDERING_BEYOND_LLSC
> help
> Choose this option to build a kernel for release 6 or later of the
> MIPS32 architecture. New MIPS processors, starting with the Warrior
> @@ -1399,6 +1402,7 @@ config CPU_MIPS64_R2
> select CPU_SUPPORTS_HIGHMEM
> select CPU_SUPPORTS_HUGEPAGES
> select CPU_SUPPORTS_MSA
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> help
> Choose this option to build a kernel for release 2 or later of the
> MIPS64 architecture. Many modern embedded systems with a 64-bit
> @@ -1415,6 +1419,8 @@ config CPU_MIPS64_R6
> select CPU_SUPPORTS_HIGHMEM
> select CPU_SUPPORTS_MSA
> select GENERIC_CSUM
> + select CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> + select WEAK_REORDERING_BEYOND_LLSC
> help
> Choose this option to build a kernel for release 6 or later of the
> MIPS64 architecture. New MIPS processors, starting with the Warrior
> @@ -1876,6 +1882,20 @@ config WEAK_ORDERING
> #
> config WEAK_REORDERING_BEYOND_LLSC
> bool
> +
> +config MIPS_LIGHTWEIGHT_SYNC
> + bool "CPU lightweight SYNC instruction for weak reordering"
> + depends on CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC && WEAK_ORDERING
Worth depending on SMP, so as not to give the user more options than
they need?
> + default y if CPU_MIPSR6
> + help
> + This option enforces a use of "lightweight sync" instructions
> + for SMP (multi-CPU) memory barriers. This instructions are much
> + more faster than a traditional "SYNC 0".
> +
> + If that instructions are not implemented in processor then it is
> + converted to generic "SYNC 0".
> +
> + If you unsure, say N here, it may slightly decrease your performance
"it" is ambiguous. "Saying N" or "this option"? (I guess "saying N", but
its not obvious to an uninformed kernel configurer).
> endmenu
>
> #
> @@ -1928,6 +1948,8 @@ config CPU_SUPPORTS_HUGEPAGES
> bool
> config CPU_SUPPORTS_UNCACHED_ACCELERATED
> bool
> +config CPU_SUPPORTS_MIPS_LIGHTWEIGHT_SYNC
> + bool
> config MIPS_PGD_C0_CONTEXT
> bool
> default y if 64BIT && CPU_MIPSR2 && !CPU_XLP
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index 2b8bbbcb9be0..d2a63abfc7c6 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -96,9 +96,15 @@
> # define smp_rmb() barrier()
> # define smp_wmb() __syncw()
> # else
> +# ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
> +# define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
> +# define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
> +# define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
binutils appears to support the sync_mb, sync_rmb, sync_wmb aliases
since version 2.21. Can we safely use them?
Cheers
James
> +# else
> # define smp_mb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_rmb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_wmb() __asm__ __volatile__("sync" : : :"memory")
> +# endif
> # endif
> #else
> #define smp_mb() barrier()
>
>
Hi Leonid,
On 02/06/15 01:09, Leonid Yegoshin wrote:
> Many MIPS32 R2 and all MIPS R6 CPUs are out of order execution, so it
> needs memory barriers in SMP environment. However, past cores may have
> a pipeline short enough to ignore that requirements and problem may
> never occurs until recently.
>
> This patch gives an option to enclose LL-SC loops by SYNC barriers in spinlocks,
> atomics, futexes, cmpxchg and bitops.
Please reflow text.
>
> So, this option is defined for MIPS32 R2 only, because that recent
s/that/those/
> CPUs may occasionally have problems in accordance with HW team.
"have problems in accordance with HW team" is a bit confusing. What do
you mean?
> And most of MIPS64 R2 vendor processors already have some kind of memory
> barrier and the only one generic 5KEs has a pretty short pipeline.
>
> Using memory barriers in MIPS R6 is mandatory, all that
s/that/those/
> processors have a speculative memory read which can inflict a trouble
s/a //
> without a correct use of barriers in LL-SC loop cycles.
> The same is actually for MIPS32 R5 I5600 processor.
Actually *true*?
P5600 you mean?
Same in Kconfig help text.
>
> Signed-off-by: Leonid Yegoshin <[email protected]>
> ---
> arch/mips/Kconfig | 25 +++++++++++++++++++++++++
> arch/mips/include/asm/barrier.h | 26 ++++++++++++++++++++++----
> 2 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index c7d0cacece3d..676eb64f5545 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1896,6 +1896,30 @@ config MIPS_LIGHTWEIGHT_SYNC
> converted to generic "SYNC 0".
>
> If you unsure, say N here, it may slightly decrease your performance
> +
> +config MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC
> + bool "Enforce memory barriers at LLSC loops - atomics, spinlocks etc"
> + depends on CPU_MIPS32_R2
> + default y if CPU_MIPSR6
> + select WEAK_REORDERING_BEYOND_LLSC
> + help
> + Many MIPS32 R2 and all MIPS R6 CPUs are out of order execution, so it
> + needs memory barriers in SMP environment. However, past cores may have
> + a pipeline short enough to ignore that requirements and problem may
> + never occurs until recently.
> +
> + So, this option is defined for MIPS32 R2 only, because that recent
> + CPUs may occasionally have problems in accordance with HW team.
> + And MIPS64 R2 vendor processors already have some kind of memory
> + barrier and the only one generic 5KEs has a pretty short pipeline.
> +
> + Using memory barriers in MIPS R6 is mandatory, all that
> + processors have a speculative memory read which can inflict a trouble
> + without a correct use of barriers in LL-SC loop cycles.
> + The same is actually for MIPS32 R5 I5600 processor.
> +
> + If you unsure, say Y here, it may slightly decrease your performance
If you *are* unsure (same in patch 1). Same comment as patch 1 too about
ambiguity of "it".
> + but increase a reliability.
well only if the hardware has weak ordering, and only because it would
be technically wrong in that case to say N here.
It feels wrong to be giving the user this option. Can't we just select
WEAK_REORDERING_BEYOND_LLSC automatically based on the hardware that
needs to be supported by the kernel configuration (e.g. CPU_MIPSR6 or
CPU_MIPS32_R2)? Those who care about mips r2 performance on hardware
which doesn't strictly need it can always speak up / add an exception.
Out of interest, are futex operations safe with weak llsc ordering, on
the premise that they're mainly used by userland so ordering with normal
kernel accesses just doesn't matter in practice?
> endmenu
>
> #
> @@ -1924,6 +1948,7 @@ config CPU_MIPSR2
> config CPU_MIPSR6
> bool
> default y if CPU_MIPS32_R6 || CPU_MIPS64_R6
> + select MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC
> select MIPS_SPRAM
>
> config EVA
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index d2a63abfc7c6..f3cc7a91ac0d 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -95,33 +95,51 @@
> # define smp_mb() __sync()
> # define smp_rmb() barrier()
> # define smp_wmb() __syncw()
> +# define smp_acquire() __sync()
> +# define smp_release() __sync()
> # else
> # ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
> # define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
> # define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
> # define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
> +# define smp_acquire() __asm__ __volatile__("sync 0x11" : : :"memory")
> +# define smp_release() __asm__ __volatile__("sync 0x12" : : :"memory")
same question about use of sync_acquire/sync_release instructions which
binutils understands since 2.21.
> # else
> # define smp_mb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_rmb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_wmb() __asm__ __volatile__("sync" : : :"memory")
> +# define smp_acquire() __asm__ __volatile__("sync" : : :"memory")
> +# define smp_release() __asm__ __volatile__("sync" : : :"memory")
> # endif
> # endif
> #else
> #define smp_mb() barrier()
> #define smp_rmb() barrier()
> #define smp_wmb() barrier()
> +#define smp_acquire() barrier()
> +#define smp_release() barrier()
> #endif
>
> #if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
> +#ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
> +#define __WEAK_LLSC_MB " sync 0x10 \n"
> +#define __WEAK_ACQUIRE " sync 0x11 \n"
> +#define __WEAK_RELEASE " sync 0x12 \n"
tabs
__WEAK_ACQUIRE and __WEAK_RELEASE are specific to llsc, maybe call them
__WEAK_LLSC_ACQUIRE and __WEAK_LLSC_RELEASE instead to avoid confusion.
> +#else
> #define __WEAK_LLSC_MB " sync \n"
> +#define __WEAK_ACQUIRE __WEAK_LLSC_MB
> +#define __WEAK_RELEASE __WEAK_LLSC_MB
tabs
> +#endif
> #else
> #define __WEAK_LLSC_MB " \n"
> +#define __WEAK_ACQUIRE __WEAK_LLSC_MB
> +#define __WEAK_RELEASE __WEAK_LLSC_MB
tabs
> #endif
>
> #define set_mb(var, value) \
> do { var = value; smp_mb(); } while (0)
>
> -#define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
> +#define smp_llsc_mb() __asm__ __volatile__(__WEAK_ACQUIRE : : :"memory")
tabs
>
> #ifdef CONFIG_CPU_CAVIUM_OCTEON
> #define smp_mb__before_llsc() smp_wmb()
> @@ -131,14 +149,14 @@
> "syncw\n\t" \
> ".set pop" : : : "memory")
> #else
> -#define smp_mb__before_llsc() smp_llsc_mb()
> +#define smp_mb__before_llsc() __asm__ __volatile__(__WEAK_RELEASE : : :"memory")
> #define nudge_writes() mb()
> #endif
>
> #define smp_store_release(p, v) \
> do { \
> compiletime_assert_atomic_type(*p); \
> - smp_mb(); \
> + smp_release(); \
> ACCESS_ONCE(*p) = (v); \
> } while (0)
>
> @@ -146,7 +164,7 @@ do { \
> ({ \
> typeof(*p) ___p1 = ACCESS_ONCE(*p); \
> compiletime_assert_atomic_type(*p); \
> - smp_mb(); \
> + smp_acquire(); \
> ___p1; \
> })
>
>
>
This patch does 3 logically separable things:
1) add smp_release/smp_acquire based on MIPS_LIGHTWEIGHT_SYNC and weaken
smp_store_release()/smp_load_acquire() to use them.
2) weaken llsc barriers when MIPS_LIGHTWEIGHT_SYNC.
3) the MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC Kconfig stuff (or
whatever method to select WEAK_REORDERING_BEYOND_LLSC more often).
Any reason not to split them, and give a clear description of each?
Cheers
James
On 02/06/15 01:09, Leonid Yegoshin wrote:
> Repleace smp_mb() in arch_write_unlock() and __clear_bit_unlock() to
Replace.
> smp_mb__before_llsc() call which does "release" barrier functionality.
>
> It seems like it was missed in commit f252ffd50c97dae87b45f1dbad24f71358ccfbd6
> during introduction of "acquire" and "release" semantics.
>
> Signed-off-by: Leonid Yegoshin <[email protected]>
> ---
> arch/mips/include/asm/bitops.h | 2 +-
> arch/mips/include/asm/spinlock.h | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
> index 0cf29bd5dc5c..ce9666cf1499 100644
> --- a/arch/mips/include/asm/bitops.h
> +++ b/arch/mips/include/asm/bitops.h
> @@ -469,7 +469,7 @@ static inline int test_and_change_bit(unsigned long nr,
> */
> static inline void __clear_bit_unlock(unsigned long nr, volatile unsigned long *addr)
> {
> - smp_mb();
> + smp_mb__before_llsc();
> __clear_bit(nr, addr);
> }
>
> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> index 1fca2e0793dc..7c7f3b2bd3de 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -317,7 +317,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
>
> static inline void arch_write_unlock(arch_rwlock_t *rw)
> {
> - smp_mb();
> + smp_mb__before_llsc();
arch_write_unlock appears to just use sw, not sc, and __clear_bit
appears to be implemented in plain C, so is smp_mb__before_llsc() really
appropriate? Would smp_release() be more accurate/correct in both cases?
Cheers
James
>
> __asm__ __volatile__(
> " # arch_write_unlock \n"
>
>
On Tue, Jun 02, 2015 at 11:08:35AM +0100, Paul Burton wrote:
> Hi Leonid,
>
<snip>
> > +
> > + If that instructions are not implemented in processor then it is
> > + converted to generic "SYNC 0".
>
> I think this would read better as something like:
>
> If a processor does not implement the lightweight sync operations then
> the architecture requires that they interpret the corresponding sync
> instructions as the typical heavyweight "sync 0". Therefore this
> should be safe to enable on all CPUs implementing release 2 or
> later of the MIPS architecture.
>
Is it really the case for release 2?
I'm asking because recently I needed to do something similar and I couldn't
find this garantee in the revision 2.00 of the manual.
May it's just poorly formulated but here is what I find in it:
- "The stype values 1-31 are reserved for future extensions to the architecture."
(ok)
- "A value of zero will always be defined such that it performs all defined
synchronization operations." (ok)
- "Non-zero values may be defined to remove some synchronization operations."
(ok, certainly if we understand the word "weaker" instead of "remove")
- "As such, software should never use a non-zero value of the stype field, as
this may inadvertently cause future failures if non-zero values remove
synchronization operations." (Mmmm, ok but ...)
Nowhere is there something close to what is found in the revision 5.0 or later:
"If an implementation does not use one of these non-zero values to define a
different synchronization behavior, then that non-zero value of stype must
act the same as stype zero completion barrier."
The wording may have changed since revision 2.8 but I don't have access to the
corresponding manual.
Luc Van Oostenryck
On Tue, Jun 02, 2015 at 02:12:29PM +0200, Luc Van Oostenryck wrote:
> Date: Tue, 2 Jun 2015 14:12:29 +0200
> From: Luc Van Oostenryck <[email protected]>
> To: Paul Burton <[email protected]>
> Cc: Leonid Yegoshin <[email protected]>,
> [email protected], [email protected], [email protected],
> [email protected], [email protected],
> [email protected], [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH 1/3] MIPS: R6: Use lightweight SYNC instruction in
> smp_* memory barriers
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Jun 02, 2015 at 11:08:35AM +0100, Paul Burton wrote:
> > Hi Leonid,
> >
>
> <snip>
>
> > > +
> > > + If that instructions are not implemented in processor then it is
> > > + converted to generic "SYNC 0".
> >
> > I think this would read better as something like:
> >
> > If a processor does not implement the lightweight sync operations then
> > the architecture requires that they interpret the corresponding sync
> > instructions as the typical heavyweight "sync 0". Therefore this
> > should be safe to enable on all CPUs implementing release 2 or
> > later of the MIPS architecture.
> >
>
> Is it really the case for release 2?
>
> I'm asking because recently I needed to do something similar and I couldn't
> find this garantee in the revision 2.00 of the manual.
> May it's just poorly formulated but here is what I find in it:
> - "The stype values 1-31 are reserved for future extensions to the architecture."
> (ok)
> - "A value of zero will always be defined such that it performs all defined
> synchronization operations." (ok)
> - "Non-zero values may be defined to remove some synchronization operations."
> (ok, certainly if we understand the word "weaker" instead of "remove")
Yes, "weaker" is what was meant here.
> - "As such, software should never use a non-zero value of the stype field, as
> this may inadvertently cause future failures if non-zero values remove
> synchronization operations." (Mmmm, ok but ...)
> Nowhere is there something close to what is found in the revision 5.0 or later:
I think that's just a very convoluted way to say non-zero values are
reserved and the CPU may bite you and your kittens if you dare to use
such values.
> "If an implementation does not use one of these non-zero values to define a
> different synchronization behavior, then that non-zero value of stype must
> act the same as stype zero completion barrier."
"We try to be nice to bad code but ... you've been warned!" :-)
> The wording may have changed since revision 2.8 but I don't have access to the
> corresponding manual.
The page about the SYNC instruction has changed significantly over time -
the SYNC instruction's documentation manual has grown from one page for the
R4000 to four pages for MIPS32 R1 and R2.5 to five pages for MIPS64 R3.02
and newer. R3 added read, write, rw, acquire and release sync types.
But the sentence in question exists unchanged even in the R6 manual.
Ralf
On Tue, Jun 02, 2015 at 12:42:40PM +0100, James Hogan wrote:
> Replace.
>
> > smp_mb__before_llsc() call which does "release" barrier functionality.
> >
> > It seems like it was missed in commit f252ffd50c97dae87b45f1dbad24f71358ccfbd6
> > during introduction of "acquire" and "release" semantics.
> >
> > Signed-off-by: Leonid Yegoshin <[email protected]>
> > ---
> > arch/mips/include/asm/bitops.h | 2 +-
> > arch/mips/include/asm/spinlock.h | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
> > index 0cf29bd5dc5c..ce9666cf1499 100644
> > --- a/arch/mips/include/asm/bitops.h
> > +++ b/arch/mips/include/asm/bitops.h
> > @@ -469,7 +469,7 @@ static inline int test_and_change_bit(unsigned long nr,
> > */
> > static inline void __clear_bit_unlock(unsigned long nr, volatile unsigned long *addr)
> > {
> > - smp_mb();
> > + smp_mb__before_llsc();
> > __clear_bit(nr, addr);
> > }
> >
> > diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> > index 1fca2e0793dc..7c7f3b2bd3de 100644
> > --- a/arch/mips/include/asm/spinlock.h
> > +++ b/arch/mips/include/asm/spinlock.h
> > @@ -317,7 +317,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
> >
> > static inline void arch_write_unlock(arch_rwlock_t *rw)
> > {
> > - smp_mb();
> > + smp_mb__before_llsc();
>
> arch_write_unlock appears to just use sw, not sc, and __clear_bit
> appears to be implemented in plain C, so is smp_mb__before_llsc() really
> appropriate? Would smp_release() be more accurate/correct in both cases?
Yes on the both questions.
Ralf
On Tue, 2 Jun 2015, James Hogan wrote:
> > diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> > index 2b8bbbcb9be0..d2a63abfc7c6 100644
> > --- a/arch/mips/include/asm/barrier.h
> > +++ b/arch/mips/include/asm/barrier.h
> > @@ -96,9 +96,15 @@
> > # define smp_rmb() barrier()
> > # define smp_wmb() __syncw()
> > # else
> > +# ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
> > +# define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
> > +# define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
> > +# define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
>
> binutils appears to support the sync_mb, sync_rmb, sync_wmb aliases
> since version 2.21. Can we safely use them?
I suggest that we don't -- we still officially support binutils 2.12 and
have other places where we even use `.word' to insert instructions current
versions of binutils properly handle. It may be worth noting in a comment
though that these encodings correspond to these operations that you named.
Maciej
On 06/02/2015 05:12 AM, Luc Van Oostenryck wrote:
> On Tue, Jun 02, 2015 at 11:08:35AM +0100, Paul Burton wrote:
>
>> I think this would read better as something like:
>>
>> If a processor does not implement the lightweight sync operations then
>> the architecture requires that they interpret the corresponding sync
>> instructions as the typical heavyweight "sync 0". Therefore this
>> should be safe to enable on all CPUs implementing release 2 or
>> later of the MIPS architecture.
>>
> Is it really the case for release 2?
>
> I'm asking because recently I needed to do something similar and I couldn't
> find this garantee in the revision 2.00 of the manual.
Yes. MD00086/MD00084/MD00087 Rev 2.60 are technically MIPS R2. And this
revision explicitly lists optional codes and it has a clear statement:
> Implementations that do not use any of the non-zero values of stype to
> define different barriers, such as ordering bar-
> riers, must make those stype values act the same as stype zero.
(don't blame me that Rev 2.60 is 5 years after initial 2.00, it is still
MIPS R2).
- Leonid.
On 06/02/2015 04:39 AM, James Hogan wrote:
> Hi Leonid,
>
> On 02/06/15 01:09, Leonid Yegoshin wrote:
>
>> CPUs may occasionally have problems in accordance with HW team.
> "have problems in accordance with HW team" is a bit confusing. What do
> you mean?
I wrote about memory barriers and problems may happens without it.
Some details from internal e-mail exchange:
-------------------------------------------
yes, it is possible for the stores to be observed out of order
The SC B can complete if it has ownership and the SW A is still
waiting for ownership. This scenario was also possible on older
cores.
(snip)
-----Original Message-----
From: Leonid Yegoshin
Subject: more SYNC issues
...
I want to know - do XXXX orders stores without SYNC?
Let's consider a scenario:
SW $0, A (cache miss)
...
LL $1, B (cache hit)
..
SC $1, B (cache hit again)
B (is not taken!)
..
SYNC 0x10
Is it possible for other core to get new value of B before new value of
A between SC-B and SYNC?
-------------------------------------------------------------
Another mail, another processor:
==========================================
Hi Leonid
I looked into the LSU RTL and I do not see speculation being blocked for
younger loads following LL.
I also ran a testcase to confirm my observation:
LL (cacheable)Miss
LW (cacheable)Miss, different cacheline
Miss request for LW goes out before the Miss request for LL, also the
GPR updated for LW happens before the LL GPR update.
==========================================
> Actually *true*? P5600 you mean? Same in Kconfig help text.
Yes, thank you, it is P5600 and I5600 doesn't exist.
> It feels wrong to be giving the user this option. Can't we just select
> WEAK_REORDERING_BEYOND_LLSC automatically based on the hardware that
> needs to be supported by the kernel configuration (e.g. CPU_MIPSR6 or
> CPU_MIPS32_R2)?
No, we can't - a lot of old processors are in-order and all of that is
still MIPS R2.
> Those who care about mips r2 performance on hardware
> which doesn't strictly need it can always speak up / add an exception.
>
> Out of interest, are futex operations safe with weak llsc ordering, on
> the premise that they're mainly used by userland so ordering with normal
> kernel accesses just doesn't matter in practice?
I think futex is used to communicate between user threads and problem is
theoretically still here.
> This patch does 3 logically separable things:
> 1) add smp_release/smp_acquire based on MIPS_LIGHTWEIGHT_SYNC and weaken
> smp_store_release()/smp_load_acquire() to use them.
> 2) weaken llsc barriers when MIPS_LIGHTWEIGHT_SYNC.
> 3) the MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC Kconfig stuff (or
> whatever method to select WEAK_REORDERING_BEYOND_LLSC more often).
>
> Any reason not to split them, and give a clear description of each?
>
>
I don't see a reason to split it.
- Leonid.
On 06/02/2015 05:59, Ralf Baechle wrote:
> On Tue, Jun 02, 2015 at 04:41:21AM -0400, Joshua Kinard wrote:
>
>> On 06/01/2015 20:09, Leonid Yegoshin wrote:
>>> The following series implements lightweight SYNC memory barriers for SMP Linux
>>> and a correct use of SYNCs around atomics, futexes, spinlocks etc LL-SC loops -
>>> the basic building blocks of any atomics in MIPS.
>>>
>>> Historically, a generic MIPS doesn't use memory barriers around LL-SC loops in
>>> atomics, spinlocks etc. However, Architecture documents never specify that LL-SC
>>> loop creates a memory barrier. Some non-generic MIPS vendors already feel
>>> the pain and enforces it. With introduction in a recent out-of-order superscalar
>>> MIPS processors an aggressive speculative memory read it is a problem now.
>>>
>>> The generic MIPS memory barrier instruction SYNC (aka SYNC 0) is something
>>> very heavvy because it was designed for propogating barrier down to memory.
>>> MIPS R2 introduced lightweight SYNC instructions which correspond to smp_*()
>>> set of SMP barriers. The description was very HW-specific and it was never
>>> used, however, it is much less trouble for processor pipelines and can be used
>>> in smp_mb()/smp_rmb()/smp_wmb() as is as in acquire/release barrier semantics.
>>> After prolonged discussions with HW team it became clear that lightweight SYNCs
>>> were designed specifically with smp_*() in mind but description is in timeline
>>> ordering space.
>>>
>>> So, the problem was spotted recently in engineering tests and it was confirmed
>>> with tests that without memory barrier load and store may pass LL/SC
>>> instructions in both directions, even in old MIPS R2 processors.
>>> Aggressive speculation in MIPS R6 and MIPS I5600 processors adds more fire to
>>> this issue.
>>>
>>> 3 patches introduces a configurable control for lightweight SYNCs around LL/SC
>>> loops and for MIPS32 R2 it was allowed to choose an enforcing SYNCs or not
>>> (keep as is) because some old MIPS32 R2 may be happy without that SYNCs.
>>> In MIPS R6 I chose to have SYNC around LL/SC mandatory because all of that
>>> processors have an agressive speculation and delayed write buffers. In that
>>> processors series it is still possible the use of SYNC 0 instead of
>>> lightweight SYNCs in configuration - just in case of some trouble in
>>> implementation in specific CPU. However, it is considered safe do not implement
>>> some or any lightweight SYNC in specific core because Architecture requires
>>> HW map of unimplemented SYNCs to SYNC 0.
>>
>> How useful might this be for older hardware, such as the R10k CPUs? Just
>> fallbacks to the old sync insn?
>
> The R10000 family is strongly ordered so there is no SYNC instruction
> required in the entire kernel even though some Origin hardware documentation
> incorrectly claims otherwise.
So no benefits even in the speculative execution case on noncoherent hardware
like IP28 and IP32?
--J
On Tue, Jun 02, 2015 at 02:59:38PM -0400, Joshua Kinard wrote:
> >> How useful might this be for older hardware, such as the R10k CPUs? Just
> >> fallbacks to the old sync insn?
> >
> > The R10000 family is strongly ordered so there is no SYNC instruction
> > required in the entire kernel even though some Origin hardware documentation
> > incorrectly claims otherwise.
>
> So no benefits even in the speculative execution case on noncoherent hardware
> like IP28 and IP32?
That's handled entirely differently by using a CACHE BARRIER instruction,
something which is specific to the R10000-family. It's also used
differently by putting once such instruction at the end of every basic
block that might result in speculatively dirty cache lines.
Note that these systems affected by this speculation issue are all
non-coherent uniprocessor systems while Leonid's patch matters for
SMP kernels; the primitives he's changed will not genrate any code for
a !CONFIG_SMP kernel.
Ralf
On 06/02/2015 09:15 AM, Maciej W. Rozycki wrote:
> On Tue, 2 Jun 2015, James Hogan wrote:
>
>>> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
>>> index 2b8bbbcb9be0..d2a63abfc7c6 100644
>>> --- a/arch/mips/include/asm/barrier.h
>>> +++ b/arch/mips/include/asm/barrier.h
>>> @@ -96,9 +96,15 @@
>>> # define smp_rmb() barrier()
>>> # define smp_wmb() __syncw()
>>> # else
>>> +# ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
>>> +# define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
>>> +# define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
>>> +# define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
>>
>> binutils appears to support the sync_mb, sync_rmb, sync_wmb aliases
>> since version 2.21. Can we safely use them?
>
> I suggest that we don't -- we still officially support binutils 2.12 and
> have other places where we even use `.word' to insert instructions current
> versions of binutils properly handle. It may be worth noting in a comment
> though that these encodings correspond to these operations that you named.
>
Surely the other MIPSr6 instructions are not supported in binutils 2.12
either. So if it is for r6, why not require modern tools, and put
something user readable in here?
David Daney
On 06/02/2015 04:56 PM, David Daney wrote:
> On 06/02/2015 09:15 AM, Maciej W. Rozycki wrote:
>> On Tue, 2 Jun 2015, James Hogan wrote:
>>
>>>
>>> binutils appears to support the sync_mb, sync_rmb, sync_wmb aliases
>>> since version 2.21. Can we safely use them?
>>
>> I suggest that we don't -- we still officially support binutils
>> 2.12 and
>> have other places where we even use `.word' to insert instructions
>> current
>> versions of binutils properly handle. It may be worth noting in a
>> comment
>> though that these encodings correspond to these operations that you
>> named.
>>
>
> Surely the other MIPSr6 instructions are not supported in binutils
> 2.12 either. So if it is for r6, why not require modern tools, and
> put something user readable in here?
>
>
No, it can be used for MIPS R2 also.
On Mon, Jun 01, 2015 at 05:09:34PM -0700, Leonid Yegoshin wrote:
Leonid,
to me the biggest technical problem with this patch is that the new Kconfig
option is user visible. This is the kind of deeply technical options
which exceeds the technical knowledge of most users, so it should probably
be driven by a select.
We probably also want to enquire how old CPUs from before the invention
of the stype field are behaving. If those as I hope for all treat an
stype != 0 as stype 0 we could simply drop the option. But we might
simply be out of luck - dunno.
Maciej,
do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
test this? I think we don't need to test that SYNC actually works as
intended but the simpler test that SYNC <stype != 0> is not causing a
illegal instruction exception is sufficient, that is if something like
int main(int argc, charg *argv[])
{
asm(" .set mips2 \n"
" sync 0x10 \n"
" sync 0x13 \n"
" sync 0x04 \n"
" .set mips 0 \n");
return 0;
}
doesn't crash we should be ok.
The kernel's SYNC emulation should already be ok. We ignore the stype
field entirely and for a uniprocessor R2000/R3000 that should be just
the right thing.
Ralf
On Fri, 5 Jun 2015, Ralf Baechle wrote:
> do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
> test this?
I should be able to check R4400 (that is virtually the same as R4000)
next week or so. As to SiByte -- not before next month I'm afraid. I
don't have access to any of the other processors you named. You may
want to find a better person if you want to accept this change soon.
Maciej