2019-02-11 20:34:41

by Will Deacon

[permalink] [raw]
Subject: [PATCH 0/2] Ensure inX() is ordered wrt delay() routines

Hi all,

Ordering port read accesses against non-memory-mapped clocksource reads can
require funky dependency code in conjunction with memory barriers. This isn't
possible to implement with the asm-generic definition of io.h, since the value
read from the device is not passed through to the underlying barrier macro and
therefore the dependency information is lost.

This series passes the value through and hooks up the fence on arm64.

Will

--->8

Will Deacon (2):
asm-generic/io: Pass result on inX() accessor to __io_par()
arm64: io: Hook up __io_par() for inX() ordering

arch/arm64/include/asm/io.h | 1 +
include/asm-generic/io.h | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)

--
2.11.0



2019-02-11 20:34:46

by Will Deacon

[permalink] [raw]
Subject: [PATCH 2/2] arm64: io: Hook up __io_par() for inX() ordering

Ensure that inX() provides the same ordering guarantees as readX()
by hooking up __io_par() so that it maps directly to __iormb().

Reported-by: Andrew Murray <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/include/asm/io.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index ee723835c1f4..2985febe63ec 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -121,6 +121,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
: "memory"); \
})

+#define __io_par __iormb
#define __iowmb() wmb()

#define mmiowb() do { } while (0)
--
2.11.0


2019-02-11 20:35:20

by Will Deacon

[permalink] [raw]
Subject: [PATCH 1/2] asm-generic/io: Pass result on inX() accessor to __io_par()

The inX() I/O accessors must enforce ordering against subsequent calls
to the delay() routines, so that a read-back from a device can be used
to postpone a subsequent write to the same device.

On some architectures, including arm64, this ordering can only be
achieved by creating a dependency on the value returned by the inX()
operation, so we need to pass the value we read to the __io_par()
macro in this case.

Reported-by: Andrew Murray <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
include/asm-generic/io.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index d356f802945a..b5737c0d8316 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -65,7 +65,7 @@
#endif

#ifndef __io_par
-#define __io_par() __io_ar()
+#define __io_par(v) __io_ar()
#endif


@@ -471,7 +471,7 @@ static inline u8 inb(unsigned long addr)

__io_pbr();
val = __raw_readb(PCI_IOBASE + addr);
- __io_par();
+ __io_par(val);
return val;
}
#endif
@@ -484,7 +484,7 @@ static inline u16 inw(unsigned long addr)

__io_pbr();
val = __le16_to_cpu(__raw_readw(PCI_IOBASE + addr));
- __io_par();
+ __io_par(val);
return val;
}
#endif
@@ -497,7 +497,7 @@ static inline u32 inl(unsigned long addr)

__io_pbr();
val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
- __io_par();
+ __io_par(val);
return val;
}
#endif
--
2.11.0


2019-02-12 10:44:58

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm64: io: Hook up __io_par() for inX() ordering

Hi Will,

On Mon, Feb 11, 2019 at 9:34 PM Will Deacon <[email protected]> wrote:
> Ensure that inX() provides the same ordering guarantees as readX()
> by hooking up __io_par() so that it maps directly to __iormb().
>
> Reported-by: Andrew Murray <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>

Thanks for your patch!

> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -121,6 +121,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
> : "memory"); \
> })
>
> +#define __io_par __iormb

I think it makes sense to make the parameter passing explicit, for
documentation purposes:

#define __io_par(v) __iormb(v)

> #define __iowmb() wmb()
>
> #define mmiowb() do { } while (0)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-02-12 12:07:54

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] asm-generic/io: Pass result on inX() accessor to __io_par()

On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <[email protected]> wrote:
>
> The inX() I/O accessors must enforce ordering against subsequent calls
> to the delay() routines, so that a read-back from a device can be used
> to postpone a subsequent write to the same device.
>
> On some architectures, including arm64, this ordering can only be
> achieved by creating a dependency on the value returned by the inX()
> operation, so we need to pass the value we read to the __io_par()
> macro in this case.
>
> Reported-by: Andrew Murray <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> include/asm-generic/io.h | 8 ++++----

For changing the asm-generic file in the arm64 tree,

Acked-by: Arnd Bergmann <[email protected]>

For all I can see, this should not conflict with the usage of the
same macros on RISC-V, though it does make add a significant
difference, so I'd like to see an Ack from the RISC-V folks as
well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
to do a corresponding change.

Arnd

2019-02-13 23:06:07

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm64: io: Hook up __io_par() for inX() ordering

On Tue, Feb 12, 2019 at 11:42:44AM +0100, Geert Uytterhoeven wrote:
> On Mon, Feb 11, 2019 at 9:34 PM Will Deacon <[email protected]> wrote:
> > Ensure that inX() provides the same ordering guarantees as readX()
> > by hooking up __io_par() so that it maps directly to __iormb().
> >
> > Reported-by: Andrew Murray <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
>
> Thanks for your patch!
>
> > --- a/arch/arm64/include/asm/io.h
> > +++ b/arch/arm64/include/asm/io.h
> > @@ -121,6 +121,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
> > : "memory"); \
> > })
> >
> > +#define __io_par __iormb
>
> I think it makes sense to make the parameter passing explicit, for
> documentation purposes:
>
> #define __io_par(v) __iormb(v)

Thanks, I'll make that change for v2.

Will

2019-02-13 23:08:21

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] asm-generic/io: Pass result on inX() accessor to __io_par()

On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
> On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <[email protected]> wrote:
> >
> > The inX() I/O accessors must enforce ordering against subsequent calls
> > to the delay() routines, so that a read-back from a device can be used
> > to postpone a subsequent write to the same device.
> >
> > On some architectures, including arm64, this ordering can only be
> > achieved by creating a dependency on the value returned by the inX()
> > operation, so we need to pass the value we read to the __io_par()
> > macro in this case.
> >
> > Reported-by: Andrew Murray <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> > include/asm-generic/io.h | 8 ++++----
>
> For changing the asm-generic file in the arm64 tree,
>
> Acked-by: Arnd Bergmann <[email protected]>

Thanks, Arnd.

> For all I can see, this should not conflict with the usage of the
> same macros on RISC-V, though it does make add a significant
> difference, so I'd like to see an Ack from the RISC-V folks as
> well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
> to do a corresponding change.

There's already a comment in that header which says that the accesses are
ordered wrt timer reads, so I don't think anything needs to change there.
For consistency with the macro arguments, I could augment their __io_par to
take the read value as an unused argument, if that's what you mean?

Will

2019-02-14 08:34:08

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] asm-generic/io: Pass result on inX() accessor to __io_par()

On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <[email protected]> wrote:

> On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
>
> > For all I can see, this should not conflict with the usage of the
> > same macros on RISC-V, though it does make add a significant
> > difference, so I'd like to see an Ack from the RISC-V folks as
> > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
> > to do a corresponding change.
>
> There's already a comment in that header which says that the accesses are
> ordered wrt timer reads, so I don't think anything needs to change there.
> For consistency with the macro arguments, I could augment their __io_par to
> take the read value as an unused argument, if that's what you mean?

Yes, that's what I meant, I should have been clearer there.

Arnd

2019-02-14 09:06:10

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH 1/2] asm-generic/io: Pass result on inX() accessor to __io_par()

On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote:
> On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <[email protected]> wrote:
>
>> On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
>>
>> > For all I can see, this should not conflict with the usage of the
>> > same macros on RISC-V, though it does make add a significant
>> > difference, so I'd like to see an Ack from the RISC-V folks as
>> > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
>> > to do a corresponding change.

Thanks, the original patches didn't make it through my filters.

>> There's already a comment in that header which says that the accesses are
>> ordered wrt timer reads, so I don't think anything needs to change there.
>> For consistency with the macro arguments, I could augment their __io_par to
>> take the read value as an unused argument, if that's what you mean?

FWIW, we don't really have a way to mandate this in the ISA yet as there's no
formal model for either CSR orderings or the IO memory space.

> Yes, that's what I meant, I should have been clearer there.

That sounds reasonable to me. It looks like we can also go ahead and delete a
bunch of arch/riscv/include/asm/io.h now that this stuff is in asm-generic,
which would cause us to actually start using these things. I didn't know this
had all been moved to asm-generic otherwise I would have cleaned this up
earlier.

I think this should do it, but this does bring up a bit of an issue: the RISC-V
versions of reads and friends put barriers outside the loop, while the
asm-generic version don't. What are these actually supposed to do?

Either way that resolves, feel free to consider something like

diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
index b269451e7e85..378975f180a7 100644
--- a/arch/riscv/include/asm/io.h
+++ b/arch/riscv/include/asm/io.h
@@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
* writes.
*/
#define __io_pbr() __asm__ __volatile__ ("fence io,i" : : : "memory");
-#define __io_par() __asm__ __volatile__ ("fence i,ior" : : : "memory");
+#define __io_par(v) __asm__ __volatile__ ("fence i,ior" : : : "memory");
#define __io_pbw() __asm__ __volatile__ ("fence iow,o" : : : "memory");
#define __io_paw() __asm__ __volatile__ ("fence o,io" : : : "memory");

-#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
-#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
-#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
+#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
+#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
+#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })

#define outb(v,c) ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
#define outw(v,c) ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
#define outl(v,c) ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })

#ifdef CONFIG_64BIT
-#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; })
+#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; })
#define outq(v,c) ({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); })
#endif

@@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar())
#define readsw(addr, buffer, count) __readsw(addr, buffer, count)
#define readsl(addr, buffer, count) __readsl(addr, buffer, count)

-__io_reads_ins(ins, u8, b, __io_pbr(), __io_par())
-__io_reads_ins(ins, u16, w, __io_pbr(), __io_par())
-__io_reads_ins(ins, u32, l, __io_pbr(), __io_par())
+__io_reads_ins(ins, u8, b, __io_pbr(), __io_par(addr))
+__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))
+__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))
#define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count)
#define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count)
#define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count)

as

Revewied-by: Palmer Dabbelt <[email protected]>

when included along with the other diff. That way we can at least keep the
macro signatures matching, the cleanup can come later...

Thanks!

2019-02-18 18:50:43

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] asm-generic/io: Pass result on inX() accessor to __io_par()

On Wed, Feb 13, 2019 at 01:57:50PM -0800, Palmer Dabbelt wrote:
> On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote:
> > On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <[email protected]> wrote:
> >
> > > On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
> > >
> > > > For all I can see, this should not conflict with the usage of the
> > > > same macros on RISC-V, though it does make add a significant
> > > > difference, so I'd like to see an Ack from the RISC-V folks as
> > > > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
> > > > to do a corresponding change.
>
> Thanks, the original patches didn't make it through my filters.
>
> > > There's already a comment in that header which says that the accesses are
> > > ordered wrt timer reads, so I don't think anything needs to change there.
> > > For consistency with the macro arguments, I could augment their __io_par to
> > > take the read value as an unused argument, if that's what you mean?
>
> FWIW, we don't really have a way to mandate this in the ISA yet as there's
> no formal model for either CSR orderings or the IO memory space.

Ah, so you may end up needing the dependency trick too, depending on where
you land with the ISA.

> > Yes, that's what I meant, I should have been clearer there.
>
> That sounds reasonable to me. It looks like we can also go ahead and delete
> a bunch of arch/riscv/include/asm/io.h now that this stuff is in
> asm-generic, which would cause us to actually start using these things. I
> didn't know this had all been moved to asm-generic otherwise I would have
> cleaned this up earlier.
>
> I think this should do it, but this does bring up a bit of an issue: the
> RISC-V versions of reads and friends put barriers outside the loop, while
> the asm-generic version don't. What are these actually supposed to do?

You're referring to the string accessors (e.g. insb() and readsw()), right?
arm and arm64 don't provide barriers here either, and I don't think they
should have to given that these routines are usually used to poll data
register-based FIFOs and therefore don't need to provide ordering guarantees
against DMA operations. However, this is woefully undocumented and I shall
try to address this in the next version of my memory-barriers.txt patch
relating to this area [1].

> Either way that resolves, feel free to consider something like
>
> diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
> index b269451e7e85..378975f180a7 100644
> --- a/arch/riscv/include/asm/io.h
> +++ b/arch/riscv/include/asm/io.h
> @@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
> * writes.
> */
> #define __io_pbr() __asm__ __volatile__ ("fence io,i" : : : "memory");
> -#define __io_par() __asm__ __volatile__ ("fence i,ior" : : : "memory");
> +#define __io_par(v) __asm__ __volatile__ ("fence i,ior" : : : "memory");
> #define __io_pbw() __asm__ __volatile__ ("fence iow,o" : : : "memory");
> #define __io_paw() __asm__ __volatile__ ("fence o,io" : : : "memory");
>
> -#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
> -#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
> -#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
> +#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
> +#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
> +#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
>
> #define outb(v,c) ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
> #define outw(v,c) ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
> #define outl(v,c) ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
>
> #ifdef CONFIG_64BIT
> -#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; })
> +#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; })
> #define outq(v,c) ({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); })
> #endif
>
> @@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar())
> #define readsw(addr, buffer, count) __readsw(addr, buffer, count)
> #define readsl(addr, buffer, count) __readsl(addr, buffer, count)
>
> -__io_reads_ins(ins, u8, b, __io_pbr(), __io_par())
> -__io_reads_ins(ins, u16, w, __io_pbr(), __io_par())
> -__io_reads_ins(ins, u32, l, __io_pbr(), __io_par())
> +__io_reads_ins(ins, u8, b, __io_pbr(), __io_par(addr))
> +__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))
> +__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))
> #define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count)
> #define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count)
> #define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count)
>
> as
>
> Revewied-by: Palmer Dabbelt <[email protected]>
>
> when included along with the other diff. That way we can at least keep the
> macro signatures matching, the cleanup can come later...

Thanks, Palmer! I'll send a v2 of this patch, updated to fix up insq() as
well as the readX() macros too, since they're likely to suffer the exact
same issues as inX() in this regard.

Will

[1] https://lkml.org/lkml/2019/2/11/1803