2018-04-05 13:12:13

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH v4 1/5] io: define several IO & PIO barrier types for the asm-generic version

Getting ready to harden readX()/writeX() and inX()/outX() semantics for the
generic implementation.

Defining two set of macros as __io_br() and __io_ar() to indicate actions
to be taken before and after MMIO read.

Defining two set of macros as __io_bw() and __io_aw() to indicate actions
to be taken before and after MMIO write.

Defining two set of macros as __io_pbw() and __io_paw() to indicate actions
to be taken before and after Port IO write.

Defining two set of macros as __io_pbr() and __io_par() to indicate actions
to be taken before and after Port IO read.

If rmb() is available for the architecture, prefer rmb() as the default
implementation of __io_ar()/__io_par().

If wmb() is available for the architecture, prefer wmb() as the default
implementation of __io_bw()/__io_pbw().

Signed-off-by: Sinan Kaya <[email protected]>
---
include/asm-generic/io.h | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index b4531e3..570433b 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -25,6 +25,50 @@
#define mmiowb() do {} while (0)
#endif

+#ifndef __io_br
+#define __io_br() barrier()
+#endif
+
+/* prevent prefetching of coherent DMA data ahead of a dma-complete */
+#ifndef __io_ar
+#ifdef rmb
+#define __io_ar() rmb()
+#else
+#define __io_ar() barrier()
+#endif
+#endif
+
+/* flush writes to coherent DMA data before possibly triggering a DMA read */
+#ifndef __io_bw
+#ifdef wmb
+#define __io_bw() wmb()
+#else
+#define __io_bw() barrier()
+#endif
+#endif
+
+/* serialize device access against a spin_unlock, usually handled there. */
+#ifndef __io_aw
+#define __io_aw() barrier()
+#endif
+
+#ifndef __io_pbw
+#define __io_pbw() __io_bw()
+#endif
+
+#ifndef __io_paw
+#define __io_paw() __io_aw()
+#endif
+
+#ifndef __io_pbr
+#define __io_pbr() __io_br()
+#endif
+
+#ifndef __io_par
+#define __io_par() __io_ar()
+#endif
+
+
/*
* __raw_{read,write}{b,w,l,q}() access memory in native endianness.
*
--
2.7.4



2018-04-05 13:11:08

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH v4 2/5] io: define stronger ordering for the default readX() implementation

The default implementation of mapping readX() to __raw_readX() is wrong.
readX() has stronger ordering semantics. Compiler is allowed to reorder
__raw_readX() against the memory accesses following register read.

Use the previously defined __io_ar() and __io_br() macros to harden
code generation according to architecture support.

Signed-off-by: Sinan Kaya <[email protected]>
---
include/asm-generic/io.h | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 570433b..d27e8af 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -154,7 +154,12 @@ static inline void __raw_writeq(u64 value, volatile void __iomem *addr)
#define readb readb
static inline u8 readb(const volatile void __iomem *addr)
{
- return __raw_readb(addr);
+ u8 val;
+
+ __io_br();
+ val = __raw_readb(addr);
+ __io_ar();
+ return val;
}
#endif

@@ -162,7 +167,12 @@ static inline u8 readb(const volatile void __iomem *addr)
#define readw readw
static inline u16 readw(const volatile void __iomem *addr)
{
- return __le16_to_cpu(__raw_readw(addr));
+ u16 val;
+
+ __io_br();
+ val = __le16_to_cpu(__raw_readw(addr));
+ __io_ar();
+ return val;
}
#endif

@@ -170,7 +180,12 @@ static inline u16 readw(const volatile void __iomem *addr)
#define readl readl
static inline u32 readl(const volatile void __iomem *addr)
{
- return __le32_to_cpu(__raw_readl(addr));
+ u32 val;
+
+ __io_br();
+ val = __le32_to_cpu(__raw_readl(addr));
+ __io_ar();
+ return val;
}
#endif

@@ -179,7 +194,12 @@ static inline u32 readl(const volatile void __iomem *addr)
#define readq readq
static inline u64 readq(const volatile void __iomem *addr)
{
- return __le64_to_cpu(__raw_readq(addr));
+ u64 val;
+
+ __io_br();
+ val = __le64_to_cpu(__raw_readq(addr));
+ __io_ar();
+ return val;
}
#endif
#endif /* CONFIG_64BIT */
--
2.7.4


2018-04-05 13:11:21

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH v4 4/5] io: change outX() to have their own IO barrier overrides

Open code writeX() inside outX() so that outX() variants have their own
overrideable Port IO barrier combinations as __io_pbw() and __io_paw() for
actions to be taken before port IO and after port IO write.

Signed-off-by: Sinan Kaya <[email protected]>
---
include/asm-generic/io.h | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 964725e..53226d9 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -457,7 +457,9 @@ static inline u32 inl(unsigned long addr)
#define outb outb
static inline void outb(u8 value, unsigned long addr)
{
- writeb(value, PCI_IOBASE + addr);
+ __io_pbw();
+ __raw_writeb(value, PCI_IOBASE + addr);
+ __io_paw();
}
#endif

@@ -465,7 +467,9 @@ static inline void outb(u8 value, unsigned long addr)
#define outw outw
static inline void outw(u16 value, unsigned long addr)
{
- writew(value, PCI_IOBASE + addr);
+ __io_pbw();
+ __raw_writew(cpu_to_le16(value), PCI_IOBASE + addr);
+ __io_paw();
}
#endif

@@ -473,7 +477,9 @@ static inline void outw(u16 value, unsigned long addr)
#define outl outl
static inline void outl(u32 value, unsigned long addr)
{
- writel(value, PCI_IOBASE + addr);
+ __io_pbw();
+ __raw_writel(cpu_to_le32(value), PCI_IOBASE + addr);
+ __io_paw();
}
#endif

--
2.7.4


2018-04-05 13:11:22

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH v4 5/5] io: change inX() to have their own IO barrier overrides

Open code readX() inside inX() so that inX() variants have their own
overrideable Port IO barrier combinations as __io_pbr() and __io_par() for
actions to be taken before port IO and after port IO read.

Signed-off-by: Sinan Kaya <[email protected]>
---
include/asm-generic/io.h | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 53226d9..578b688 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -433,7 +433,12 @@ static inline void writesq(volatile void __iomem *addr, const void *buffer,
#define inb inb
static inline u8 inb(unsigned long addr)
{
- return readb(PCI_IOBASE + addr);
+ u8 val;
+
+ __io_pbr();
+ val = __raw_readb(PCI_IOBASE + addr);
+ __io_par();
+ return val;
}
#endif

@@ -441,7 +446,12 @@ static inline u8 inb(unsigned long addr)
#define inw inw
static inline u16 inw(unsigned long addr)
{
- return readw(PCI_IOBASE + addr);
+ u16 val;
+
+ __io_pbr();
+ val = __le16_to_cpu(__raw_readw(PCI_IOBASE + addr));
+ __io_par();
+ return val;
}
#endif

@@ -449,7 +459,12 @@ static inline u16 inw(unsigned long addr)
#define inl inl
static inline u32 inl(unsigned long addr)
{
- return readl(PCI_IOBASE + addr);
+ u32 val;
+
+ __io_pbr();
+ val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
+ __io_par();
+ return val;
}
#endif

--
2.7.4


2018-04-05 13:11:52

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH v4 3/5] io: define stronger ordering for the default writeX() implementation

The default implementation of mapping writeX() to __raw_writeX() is wrong.
writeX() has stronger ordering semantics. Compiler is allowed to reorder
memory writes against __raw_writeX().

Use the previously defined __io_aw() and __io_bw() macros to harden
code generation according to architecture support.

Signed-off-by: Sinan Kaya <[email protected]>
---
include/asm-generic/io.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index d27e8af..964725e 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -208,7 +208,9 @@ static inline u64 readq(const volatile void __iomem *addr)
#define writeb writeb
static inline void writeb(u8 value, volatile void __iomem *addr)
{
+ __io_bw();
__raw_writeb(value, addr);
+ __io_aw();
}
#endif

@@ -216,7 +218,9 @@ static inline void writeb(u8 value, volatile void __iomem *addr)
#define writew writew
static inline void writew(u16 value, volatile void __iomem *addr)
{
+ __io_bw();
__raw_writew(cpu_to_le16(value), addr);
+ __io_aw();
}
#endif

@@ -224,7 +228,9 @@ static inline void writew(u16 value, volatile void __iomem *addr)
#define writel writel
static inline void writel(u32 value, volatile void __iomem *addr)
{
+ __io_bw();
__raw_writel(__cpu_to_le32(value), addr);
+ __io_aw();
}
#endif

@@ -233,7 +239,9 @@ static inline void writel(u32 value, volatile void __iomem *addr)
#define writeq writeq
static inline void writeq(u64 value, volatile void __iomem *addr)
{
+ __io_bw();
__raw_writeq(__cpu_to_le64(value), addr);
+ __io_aw();
}
#endif
#endif /* CONFIG_64BIT */
--
2.7.4


2018-04-06 10:21:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] io: define several IO & PIO barrier types for the asm-generic version

On Thu, Apr 5, 2018 at 3:09 PM, Sinan Kaya <[email protected]> wrote:
> Getting ready to harden readX()/writeX() and inX()/outX() semantics for the
> generic implementation.
>
> Defining two set of macros as __io_br() and __io_ar() to indicate actions
> to be taken before and after MMIO read.
>
> Defining two set of macros as __io_bw() and __io_aw() to indicate actions
> to be taken before and after MMIO write.
>
> Defining two set of macros as __io_pbw() and __io_paw() to indicate actions
> to be taken before and after Port IO write.
>
> Defining two set of macros as __io_pbr() and __io_par() to indicate actions
> to be taken before and after Port IO read.
>
> If rmb() is available for the architecture, prefer rmb() as the default
> implementation of __io_ar()/__io_par().
>
> If wmb() is available for the architecture, prefer wmb() as the default
> implementation of __io_bw()/__io_pbw().
>
> Signed-off-by: Sinan Kaya <[email protected]>

I've applied the series to my asm-generic tree now, I will give it a few days
in linux-next to see if any obvious regressions happen, and then send
a pull request.

Checking the list of architectures that are affected by this, I see
h8300, microblaze, nios2, openrisc, s390, sparc, um, unicore32,
and xtensa, all of which use asm-generic/io.h without overriding
the default readl/writel.

I would guess that at least s390 doesn't need the barriers
(maintainers on Cc now), but there may be others that want to
override the new barriers with weaker ones where an MMIO
access is guaranteed to serialize against DMA, or where
a specialized barrier for this case exists.

Looking over the asm-generic implementation once more now,
I wonder if we should change the relaxed accessors to not have
any barriers (back to the version before your series) rather than
defaulting them to having the same barriers as the regular
readl/writel.

Arnd

2018-04-06 12:52:20

by Sinan Kaya

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] io: define several IO & PIO barrier types for the asm-generic version

On 2018-04-06 06:19, Arnd Bergmann wrote:
> On Thu, Apr 5, 2018 at 3:09 PM, Sinan Kaya <[email protected]>
> wrote:
>> Getting ready to harden readX()/writeX() and inX()/outX() semantics
>> for the
>> generic implementation.
>>
>> Defining two set of macros as __io_br() and __io_ar() to indicate
>> actions
>> to be taken before and after MMIO read.
>>
>> Defining two set of macros as __io_bw() and __io_aw() to indicate
>> actions
>> to be taken before and after MMIO write.
>>
>> Defining two set of macros as __io_pbw() and __io_paw() to indicate
>> actions
>> to be taken before and after Port IO write.
>>
>> Defining two set of macros as __io_pbr() and __io_par() to indicate
>> actions
>> to be taken before and after Port IO read.
>>
>> If rmb() is available for the architecture, prefer rmb() as the
>> default
>> implementation of __io_ar()/__io_par().
>>
>> If wmb() is available for the architecture, prefer wmb() as the
>> default
>> implementation of __io_bw()/__io_pbw().
>>
>> Signed-off-by: Sinan Kaya <[email protected]>
>
> I've applied the series to my asm-generic tree now, I will give it a
> few days
> in linux-next to see if any obvious regressions happen, and then send
> a pull request.
>
> Checking the list of architectures that are affected by this, I see
> h8300, microblaze, nios2, openrisc, s390, sparc, um, unicore32,
> and xtensa, all of which use asm-generic/io.h without overriding
> the default readl/writel.
>
> I would guess that at least s390 doesn't need the barriers
> (maintainers on Cc now), but there may be others that want to
> override the new barriers with weaker ones where an MMIO
> access is guaranteed to serialize against DMA, or where
> a specialized barrier for this case exists.
>
> Looking over the asm-generic implementation once more now,
> I wonder if we should change the relaxed accessors to not have
> any barriers (back to the version before your series) rather than
> defaulting them to having the same barriers as the regular
> readl/writel.

I can do a follow up patch. You want to map them to raw api without any
barriers as before. Right?


>
> Arnd

2018-04-06 13:22:00

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] io: define several IO & PIO barrier types for the asm-generic version

On Fri, Apr 6, 2018 at 2:50 PM, <[email protected]> wrote:
> On 2018-04-06 06:19, Arnd Bergmann wrote:
>>
>> On Thu, Apr 5, 2018 at 3:09 PM, Sinan Kaya <[email protected]> wrote:
>>>
>>
>> I would guess that at least s390 doesn't need the barriers
>> (maintainers on Cc now), but there may be others that want to
>> override the new barriers with weaker ones where an MMIO
>> access is guaranteed to serialize against DMA, or where
>> a specialized barrier for this case exists.
>>
>> Looking over the asm-generic implementation once more now,
>> I wonder if we should change the relaxed accessors to not have
>> any barriers (back to the version before your series) rather than
>> defaulting them to having the same barriers as the regular
>> readl/writel.
>
>
> I can do a follow up patch. You want to map them to raw api without any
> barriers as before. Right?

Right, but of course with the byteswap.

Arnd