2018-10-08 00:40:36

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH 0/4] MIPS: Ordering enforcement fixes for MMIO accessors

Hi,

This patch series is a follow-up to my earlier consideration about MMIO
access ordering recorded here: <https://lkml.org/lkml/2014/4/28/201>.

As I have learnt in a recent Alpha/Linux discussion starting here:
<https://marc.info/?i=alpine.LRH.2.02.1808161556450.13597%20()%20file01%20!%20intranet%20!%20prod%20!%20int%20!%20rdu2%20!%20redhat%20!%20com>
related to MMIO accessor ordering barriers ports are actually required to
follow the x86 strongly ordered semantics. As the ordering is not
specified in the MIPS architecture except for the SYNC instruction we do
have to put explicit barriers in MMIO accessors as otherwise ordering may
not be guaranteed.

Fortunately on strongly ordered systems SYNC is expected to be as cheap
as a NOP, and on weakly ordered ones it is needed anyway. As from
revision 2.60 of the MIPS architecture specification however we have a
number of SYNC operations defined, and SYNC 0 has been upgraded from an
ordering to a completion barrier. We currently don't make use of these
extra operations and always use SYNC 0 instead, which this means that we
may be doing too much synchronisation with the barriers we have already
defined.

This patch series does not make an attempt to optimise for SYNC operation
use, which belongs to a separate improvement. Instead it focuses on
fixing MMIO accesses so that drivers can rely on our own API definition.

Following the original consideration specific MMIO barrier operations are
added. As they have turned out to be required to be implied by MMIO
accessors there is no immediate need to make them form a generic
cross-architecture internal Linux API. Therefore I defined them for the
MIPS architecture only, using the names originally coined by mostly taking
them from the PowerPC port.

Then I have used them to fix `mmiowb', and then `readX' and `writeX'
accessors. Finally I have updated the `_relaxed' accessors to avoid
unnecessary synchronisation WRT DMA.

See individual commit descriptions for further details.

As a follow-up clean-up places across the architecture tree could be
reviewed for barrier use that is actually related to MMIO rather than
memory and updated to use the new names of the MMIO barrier operations. I
plan to do this for the DECstation and possibly the SiByte platform,
however I am leaving it for someone else to do it elsewhere.

Similarly I think the DMA barrier in `readX' and `inX' should be using
`dma_rmb' rather than `rmb', but I'm leaving it for someone else to
handle.

These changes have been verified at run time with an R3000 (MIPS I)
DECstation machine (32-bit kernel, little endianness), an R4400 (MIPS III)
DECstation machine (64-bit kernel, little endianness) and an SB-1 (MIPS64)
SWARM machine (64-bit kernel, big endianness), by booting them into the
multiuser mode and running them for a couple of hours.

Please apply.

Maciej


2018-10-08 00:41:30

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH 2/4] MIPS: Correct `mmiowb' barrier for `wbflush' platforms

Redefine `mmiowb' in terms of `iobarrier_w' so that it works correctly
for MIPS I platforms, which have no SYNC machine instruction and use a
call to `wbflush' instead.

This doesn't change the semantics for CONFIG_CPU_CAVIUM_OCTEON, because
`iobarrier_w' expands to `wmb', which is ultimately the same as the
current arrangement. For MIPS I platforms this not only makes any code
that would happen to use `mmiowb' build and run, but it actually
enforces the ordering required as well, as `iobarrier_w' has it already
covered with the use of `wmb'.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
arch/mips/include/asm/io.h | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

linux-mips-mmiowb.patch
Index: linux-20180930-3maxp/arch/mips/include/asm/io.h
===================================================================
--- linux-20180930-3maxp.orig/arch/mips/include/asm/io.h
+++ linux-20180930-3maxp/arch/mips/include/asm/io.h
@@ -91,6 +91,9 @@ static inline void set_io_port_base(unsi
#define iobarrier_w() wmb()
#define iobarrier_sync() iob()

+/* Some callers use this older API instead. */
+#define mmiowb() iobarrier_w()
+
/*
* Thanks to James van Artsdalen for a better timing-fix than
* the two short jumps: using outb's to a nonexistent port seems
@@ -573,14 +576,6 @@ BUILDSTRING(l, u32)
BUILDSTRING(q, u64)
#endif

-
-#ifdef CONFIG_CPU_CAVIUM_OCTEON
-#define mmiowb() wmb()
-#else
-/* Depends on MIPS II instruction set */
-#define mmiowb() asm volatile ("sync" ::: "memory")
-#endif
-
static inline void memset_io(volatile void __iomem *addr, unsigned char val, int count)
{
memset((void __force *) addr, val, count);

2018-10-08 00:42:00

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH 1/4] MIPS: Define MMIO ordering barriers

Define MMIO ordering barriers as separate operations so as to allow
making places where such a barrier is required distinct from places
where a memory or a DMA barrier is needed.

Architecturally MIPS does not specify ordering requirements for uncached
bus accesses such as MMIO operations normally use and therefore explicit
barriers have to be inserted between MMIO accesses where unspecified
ordering of operations would cause unpredictable results.

MIPS MMIO ordering barriers are implemented using the same underlying
mechanism that memory or a DMA barrier ordering barriers use, that is
either a suitable SYNC instruction or a platform-specific `wbflush'
call. However platforms may implement different ordering rules for
different kinds of bus activity, so having a separate API makes it
possible to remove unnecessary barriers and avoid a performance hit they
may cause due to unrelated bus activity by making their implementation
expand to nil while keeping the necessary ones.

Also having distinct barriers for each kind of use makes it easier for
the reader to understand what code has been intended to do.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
arch/mips/include/asm/io.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

linux-mips-iobarrier.patch
Index: linux-20180930-3maxp/arch/mips/include/asm/io.h
===================================================================
--- linux-20180930-3maxp.orig/arch/mips/include/asm/io.h
+++ linux-20180930-3maxp/arch/mips/include/asm/io.h
@@ -20,6 +20,7 @@
#include <linux/irqflags.h>

#include <asm/addrspace.h>
+#include <asm/barrier.h>
#include <asm/bug.h>
#include <asm/byteorder.h>
#include <asm/cpu.h>
@@ -80,6 +81,17 @@ static inline void set_io_port_base(unsi
}

/*
+ * Enforce in-order execution of data I/O. In the MIPS architecture
+ * these are equivalent to corresponding platform-specific memory
+ * barriers defined in <asm/barrier.h>. API pinched from PowerPC,
+ * with sync additionally defined.
+ */
+#define iobarrier_rw() mb()
+#define iobarrier_r() rmb()
+#define iobarrier_w() wmb()
+#define iobarrier_sync() iob()
+
+/*
* Thanks to James van Artsdalen for a better timing-fix than
* the two short jumps: using outb's to a nonexistent port seems
* to guarantee better timings even on fast machines.

2018-10-08 00:42:39

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH 4/4] MIPS: Provide actually relaxed MMIO accessors

Improve performance for the relevant systems and remove the DMA ordering
barrier from `readX_relaxed' and `writeX_relaxed' MMIO accessors, where
it is not needed according to our requirements[1]. For consistency make
the same arrangement with low-level port I/O accessors, but do not
actually provide any accessors making use of it.

References:

[1] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
Section "KERNEL I/O BARRIER EFFECTS"

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
arch/mips/include/asm/io.h | 48 ++++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 20 deletions(-)

linux-mips-readx-writex-relaxed.patch
Index: linux-20180930-3maxp/arch/mips/include/asm/io.h
===================================================================
--- linux-20180930-3maxp.orig/arch/mips/include/asm/io.h
+++ linux-20180930-3maxp/arch/mips/include/asm/io.h
@@ -51,6 +51,11 @@
# define __raw_ioswabq(a, x) (x)
# define ____raw_ioswabq(a, x) (x)

+# define __relaxed_ioswabb ioswabb
+# define __relaxed_ioswabw ioswabw
+# define __relaxed_ioswabl ioswabl
+# define __relaxed_ioswabq ioswabq
+
/* ioswab[bwlq], __mem_ioswab[bwlq] are defined in mangle-port.h */

#define IO_SPACE_LIMIT 0xffff
@@ -337,7 +342,7 @@ static inline void iounmap(const volatil
#define war_io_reorder_wmb() barrier()
#endif

-#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, barrier, irq) \
+#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, barrier, relax, irq) \
\
static inline void pfx##write##bwlq(type val, \
volatile void __iomem *mem) \
@@ -411,11 +416,12 @@ static inline type pfx##read##bwlq(const
} \
\
/* prevent prefetching of coherent DMA data prematurely */ \
- rmb(); \
+ if (!relax) \
+ rmb(); \
return pfx##ioswab##bwlq(__mem, __val); \
}

-#define __BUILD_IOPORT_SINGLE(pfx, bwlq, type, barrier, p, slow) \
+#define __BUILD_IOPORT_SINGLE(pfx, bwlq, type, barrier, relax, p, slow) \
\
static inline void pfx##out##bwlq##p(type val, unsigned long port) \
{ \
@@ -454,19 +460,21 @@ static inline type pfx##in##bwlq##p(unsi
slow; \
\
/* prevent prefetching of coherent DMA data prematurely */ \
- rmb(); \
+ if (!relax) \
+ rmb(); \
return pfx##ioswab##bwlq(__addr, __val); \
}

-#define __BUILD_MEMORY_PFX(bus, bwlq, type) \
+#define __BUILD_MEMORY_PFX(bus, bwlq, type, relax) \
\
-__BUILD_MEMORY_SINGLE(bus, bwlq, type, 1, 1)
+__BUILD_MEMORY_SINGLE(bus, bwlq, type, 1, relax, 1)

#define BUILDIO_MEM(bwlq, type) \
\
-__BUILD_MEMORY_PFX(__raw_, bwlq, type) \
-__BUILD_MEMORY_PFX(, bwlq, type) \
-__BUILD_MEMORY_PFX(__mem_, bwlq, type) \
+__BUILD_MEMORY_PFX(__raw_, bwlq, type, 0) \
+__BUILD_MEMORY_PFX(__relaxed_, bwlq, type, 1) \
+__BUILD_MEMORY_PFX(__mem_, bwlq, type, 0) \
+__BUILD_MEMORY_PFX(, bwlq, type, 0)

BUILDIO_MEM(b, u8)
BUILDIO_MEM(w, u16)
@@ -474,8 +482,8 @@ BUILDIO_MEM(l, u32)
BUILDIO_MEM(q, u64)

#define __BUILD_IOPORT_PFX(bus, bwlq, type) \
- __BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, ,) \
- __BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, _p, SLOW_DOWN_IO)
+ __BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, 0, ,) \
+ __BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, 0, _p, SLOW_DOWN_IO)

#define BUILDIO_IOPORT(bwlq, type) \
__BUILD_IOPORT_PFX(, bwlq, type) \
@@ -490,19 +498,19 @@ BUILDIO_IOPORT(q, u64)

#define __BUILDIO(bwlq, type) \
\
-__BUILD_MEMORY_SINGLE(____raw_, bwlq, type, 1, 0)
+__BUILD_MEMORY_SINGLE(____raw_, bwlq, type, 1, 0, 0)

__BUILDIO(q, u64)

-#define readb_relaxed readb
-#define readw_relaxed readw
-#define readl_relaxed readl
-#define readq_relaxed readq
+#define readb_relaxed __relaxed_readb
+#define readw_relaxed __relaxed_readw
+#define readl_relaxed __relaxed_readl
+#define readq_relaxed __relaxed_readq

-#define writeb_relaxed writeb
-#define writew_relaxed writew
-#define writel_relaxed writel
-#define writeq_relaxed writeq
+#define writeb_relaxed __relaxed_writeb
+#define writew_relaxed __relaxed_writew
+#define writel_relaxed __relaxed_writel
+#define writeq_relaxed __relaxed_writeq

#define readb_be(addr) \
__raw_readb((__force unsigned *)(addr))

2018-10-08 00:43:09

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH 3/4] MIPS: Enforce strong ordering for MMIO accessors

Architecturally the MIPS ISA does not specify ordering requirements for
uncached bus accesses such as MMIO operations normally use and therefore
explicit barriers have to be inserted between MMIO accesses where
unspecified ordering of operations would cause unpredictable results.

For example the R2020 write buffer implements write gathering and
combining[1] and as used with the DECstation models 2100 and 3100 for
MMIO accesses it bypasses the read buffer entirely, because conflicts
are resolved by the memory controller for DRAM accesses only[2] (NB the
R2020 and R3020 buffers are the same except for the maximum clock rate).

Consequently if a device has say a 16-bit control register at offset 0,
a 16-bit event mask register at offset 2 and a 16-bit reset register at
offset 4, and the initial value of the control register is 0x1111, then
in the absence of barriers a hypothetical code sequence like this:

u16 init_dev(u16 __iomem *dev);
u16 x;

write16(dev + 2, 0xffff);
write16(dev + 0, 0x2222);
x = read16(dev + 0);
write16(dev + 1, 0x3333);
write16(dev + 0, 0x4444);

return x;
}

will return 0x1111 and issue a single 32-bit write of 0x33334444 (in the
little-endian bus configuration) to offset 0 on the system bus.

This is because the read to set `x' from offset 0 bypasses the write of
0x2222 that is still in the write buffer pending the completion of the
write of 0xffff to the reset register. Then the write of 0x3333 to the
event mask register is merged with the preceding write to the control
register as they share the same word address, making it a 32-bit write
of 0x33332222 to offset 0. Finally the write of 0x4444 to the control
register is combined with the outstanding 32-bit write of 0x33332222 to
offset 0, because, again, it shares the same address.

This is an example from a legacy system, given here because it is well
documented and affects a machine we actually support. But likewise
modern MIPS systems may implement weak MMIO ordering, possibly even
without having it clearly documented except for being compliant with the
architecture specification with respect to the currently defined SYNC
instruction variants[3].

Considering the above and that we are required to implement MMIO
accessors such that individual accesses made with them are strongly
ordered with respect to each other[4], add the necessary barriers to our
`inX', `outX', `readX' and `writeX' handlers, as well the associated
special use variants. It's up to platforms then to possibly define the
respective barriers so as to expand to nil if no ordering enforcement is
actually needed for a given system; SYNC is supposed to be as cheap as
a NOP on strongly ordered MIPS implementations though.

Retain the option to generate weakly-ordered accessors, so that the
arrangement for `war_io_reorder_wmb' is not lost in case we need it for
fully raw accessors in the future. The reason for this is that it is
unclear from commit 1e820da3c9af ("MIPS: Loongson-3: Introduce
CONFIG_LOONGSON3_ENHANCEMENT") and especially commit 8faca49a6731
("MIPS: Modify core io.h macros to account for the Octeon Errata
Core-301.") why they are needed there under the previous assumption that
these accessors can be weakly ordered.

References:

[1] "LR3020 Write Buffer", LSI Logic Corporation, September 1988,
Section "Byte Gathering", pp. 6-7

[2] "DECstation 3100 Desktop Workstation Functional Specification",
Digital Equipment Corporation, Revision 1.3, August 28, 1990,
Section 6.1 "Processor", p. 4

[3] "MIPS Architecture For Programmers, Volume II-A: The MIPS32
Instruction Set Manual", Imagination Technologies LTD, Document
Number: MD00086, Revision 6.06, December 15, 2016, Table 5.5
"Encodings of the Bits[10:6] of the SYNC instruction; the SType
Field", p. 409

[4] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
Section "KERNEL I/O BARRIER EFFECTS"

Signed-off-by: Maciej W. Rozycki <[email protected]>
References: 8faca49a6731 ("MIPS: Modify core io.h macros to account for the Octeon Errata Core-301.")
References: 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT")
---
arch/mips/include/asm/io.h | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

linux-mips-readx-writex-barrier.patch
Index: linux-20180930-3maxp/arch/mips/include/asm/io.h
===================================================================
--- linux-20180930-3maxp.orig/arch/mips/include/asm/io.h
+++ linux-20180930-3maxp/arch/mips/include/asm/io.h
@@ -337,7 +337,7 @@ static inline void iounmap(const volatil
#define war_io_reorder_wmb() barrier()
#endif

-#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq) \
+#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, barrier, irq) \
\
static inline void pfx##write##bwlq(type val, \
volatile void __iomem *mem) \
@@ -345,7 +345,10 @@ static inline void pfx##write##bwlq(type
volatile type *__mem; \
type __val; \
\
- war_io_reorder_wmb(); \
+ if (barrier) \
+ iobarrier_rw(); \
+ else \
+ war_io_reorder_wmb(); \
\
__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem)); \
\
@@ -382,6 +385,9 @@ static inline type pfx##read##bwlq(const
\
__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem)); \
\
+ if (barrier) \
+ iobarrier_rw(); \
+ \
if (sizeof(type) != sizeof(u64) || sizeof(u64) == sizeof(long)) \
__val = *__mem; \
else if (cpu_has_64bits) { \
@@ -409,14 +415,17 @@ static inline type pfx##read##bwlq(const
return pfx##ioswab##bwlq(__mem, __val); \
}

-#define __BUILD_IOPORT_SINGLE(pfx, bwlq, type, p, slow) \
+#define __BUILD_IOPORT_SINGLE(pfx, bwlq, type, barrier, p, slow) \
\
static inline void pfx##out##bwlq##p(type val, unsigned long port) \
{ \
volatile type *__addr; \
type __val; \
\
- war_io_reorder_wmb(); \
+ if (barrier) \
+ iobarrier_rw(); \
+ else \
+ war_io_reorder_wmb(); \
\
__addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
\
@@ -438,6 +447,9 @@ static inline type pfx##in##bwlq##p(unsi
\
BUILD_BUG_ON(sizeof(type) > sizeof(unsigned long)); \
\
+ if (barrier) \
+ iobarrier_rw(); \
+ \
__val = *__addr; \
slow; \
\
@@ -448,7 +460,7 @@ static inline type pfx##in##bwlq##p(unsi

#define __BUILD_MEMORY_PFX(bus, bwlq, type) \
\
-__BUILD_MEMORY_SINGLE(bus, bwlq, type, 1)
+__BUILD_MEMORY_SINGLE(bus, bwlq, type, 1, 1)

#define BUILDIO_MEM(bwlq, type) \
\
@@ -462,8 +474,8 @@ BUILDIO_MEM(l, u32)
BUILDIO_MEM(q, u64)

#define __BUILD_IOPORT_PFX(bus, bwlq, type) \
- __BUILD_IOPORT_SINGLE(bus, bwlq, type, ,) \
- __BUILD_IOPORT_SINGLE(bus, bwlq, type, _p, SLOW_DOWN_IO)
+ __BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, ,) \
+ __BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, _p, SLOW_DOWN_IO)

#define BUILDIO_IOPORT(bwlq, type) \
__BUILD_IOPORT_PFX(, bwlq, type) \
@@ -478,7 +490,7 @@ BUILDIO_IOPORT(q, u64)

#define __BUILDIO(bwlq, type) \
\
-__BUILD_MEMORY_SINGLE(____raw_, bwlq, type, 0)
+__BUILD_MEMORY_SINGLE(____raw_, bwlq, type, 1, 0)

__BUILDIO(q, u64)


2018-10-11 16:41:49

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 0/4] MIPS: Ordering enforcement fixes for MMIO accessors

Hi Maciej,

On Mon, Oct 08, 2018 at 01:36:55AM +0100, Maciej W. Rozycki wrote:
> This patch series is a follow-up to my earlier consideration about MMIO
> access ordering recorded here: <https://lkml.org/lkml/2014/4/28/201>.
>
> As I have learnt in a recent Alpha/Linux discussion starting here:
> <https://marc.info/?i=alpine.LRH.2.02.1808161556450.13597%20()%20file01%20!%20intranet%20!%20prod%20!%20int%20!%20rdu2%20!%20redhat%20!%20com>
> related to MMIO accessor ordering barriers ports are actually required to
> follow the x86 strongly ordered semantics. As the ordering is not
> specified in the MIPS architecture except for the SYNC instruction we do
> have to put explicit barriers in MMIO accessors as otherwise ordering may
> not be guaranteed.
>
> Fortunately on strongly ordered systems SYNC is expected to be as cheap
> as a NOP, and on weakly ordered ones it is needed anyway. As from
> revision 2.60 of the MIPS architecture specification however we have a
> number of SYNC operations defined, and SYNC 0 has been upgraded from an
> ordering to a completion barrier. We currently don't make use of these
> extra operations and always use SYNC 0 instead, which this means that we
> may be doing too much synchronisation with the barriers we have already
> defined.
>
> This patch series does not make an attempt to optimise for SYNC operation
> use, which belongs to a separate improvement. Instead it focuses on
> fixing MMIO accesses so that drivers can rely on our own API definition.

Agreed, using the lightweight sync types is a whole other can of worms.
I did speak with the architecture team about the description of SYNC
recently (in the context of nanoMIPS documentation if I recall) and hope
the tweaks that were made to the architectural description of it might
help with using them one day soon.

> Following the original consideration specific MMIO barrier operations are
> added. As they have turned out to be required to be implied by MMIO
> accessors there is no immediate need to make them form a generic
> cross-architecture internal Linux API. Therefore I defined them for the
> MIPS architecture only, using the names originally coined by mostly taking
> them from the PowerPC port.
>
> Then I have used them to fix `mmiowb', and then `readX' and `writeX'
> accessors. Finally I have updated the `_relaxed' accessors to avoid
> unnecessary synchronisation WRT DMA.

Thanks - this definitely leaves us in a better place than we were :)

All 4 patches applied to mips-next for 4.20.

Paul