2019-11-08 13:07:08

by Rasmus Villemoes

[permalink] [raw]
Subject: [PATCH v4 04/47] soc: fsl: qe: introduce qe_io{read,write}* wrappers

The QUICC engine drivers use the powerpc-specific out_be32() etc. In
order to allow those drivers to build for other architectures, those
must be replaced by iowrite32be(). However, on powerpc, out_be32() is
a simple inline function while iowrite32be() is out-of-line. So in
order not to introduce a performance regression on powerpc when making
the drivers work on other architectures, introduce qe_io* helpers.

Also define the qe_{clr,set,clrset}bits* helpers in terms of these new
macros.

Signed-off-by: Rasmus Villemoes <[email protected]>
---
include/soc/fsl/qe/qe.h | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h
index a1aa4eb28f0c..9cac04c692fd 100644
--- a/include/soc/fsl/qe/qe.h
+++ b/include/soc/fsl/qe/qe.h
@@ -241,21 +241,37 @@ static inline int qe_alive_during_sleep(void)
#define qe_muram_offset cpm_muram_offset
#define qe_muram_dma cpm_muram_dma

-#define qe_setbits_be32(_addr, _v) iowrite32be(ioread32be(_addr) | (_v), (_addr))
-#define qe_clrbits_be32(_addr, _v) iowrite32be(ioread32be(_addr) & ~(_v), (_addr))
+#ifdef CONFIG_PPC32
+#define qe_iowrite8(val, addr) out_8(addr, val)
+#define qe_iowrite16be(val, addr) out_be16(addr, val)
+#define qe_iowrite32be(val, addr) out_be32(addr, val)
+#define qe_ioread8(addr) in_8(addr)
+#define qe_ioread16be(addr) in_be16(addr)
+#define qe_ioread32be(addr) in_be32(addr)
+#else
+#define qe_iowrite8(val, addr) iowrite8(val, addr)
+#define qe_iowrite16be(val, addr) iowrite16be(val, addr)
+#define qe_iowrite32be(val, addr) iowrite32be(val, addr)
+#define qe_ioread8(addr) ioread8(addr)
+#define qe_ioread16be(addr) ioread16be(addr)
+#define qe_ioread32be(addr) ioread32be(addr)
+#endif
+
+#define qe_setbits_be32(_addr, _v) qe_iowrite32be(qe_ioread32be(_addr) | (_v), (_addr))
+#define qe_clrbits_be32(_addr, _v) qe_iowrite32be(qe_ioread32be(_addr) & ~(_v), (_addr))

-#define qe_setbits_be16(_addr, _v) iowrite16be(ioread16be(_addr) | (_v), (_addr))
-#define qe_clrbits_be16(_addr, _v) iowrite16be(ioread16be(_addr) & ~(_v), (_addr))
+#define qe_setbits_be16(_addr, _v) qe_iowrite16be(qe_ioread16be(_addr) | (_v), (_addr))
+#define qe_clrbits_be16(_addr, _v) qe_iowrite16be(qe_ioread16be(_addr) & ~(_v), (_addr))

-#define qe_setbits_8(_addr, _v) iowrite8(ioread8(_addr) | (_v), (_addr))
-#define qe_clrbits_8(_addr, _v) iowrite8(ioread8(_addr) & ~(_v), (_addr))
+#define qe_setbits_8(_addr, _v) qe_iowrite8(qe_ioread8(_addr) | (_v), (_addr))
+#define qe_clrbits_8(_addr, _v) qe_iowrite8(qe_ioread8(_addr) & ~(_v), (_addr))

#define qe_clrsetbits_be32(addr, clear, set) \
- iowrite32be((ioread32be(addr) & ~(clear)) | (set), (addr))
+ qe_iowrite32be((qe_ioread32be(addr) & ~(clear)) | (set), (addr))
#define qe_clrsetbits_be16(addr, clear, set) \
- iowrite16be((ioread16be(addr) & ~(clear)) | (set), (addr))
+ qe_iowrite16be((qe_ioread16be(addr) & ~(clear)) | (set), (addr))
#define qe_clrsetbits_8(addr, clear, set) \
- iowrite8((ioread8(addr) & ~(clear)) | (set), (addr))
+ qe_iowrite8((qe_ioread8(addr) & ~(clear)) | (set), (addr))

/* Structure that defines QE firmware binary files.
*
--
2.23.0


2019-11-12 05:21:20

by Timur Tabi

[permalink] [raw]
Subject: Re: [PATCH v4 04/47] soc: fsl: qe: introduce qe_io{read,write}* wrappers

On Fri, Nov 8, 2019 at 7:03 AM Rasmus Villemoes
<[email protected]> wrote:
>
> The QUICC engine drivers use the powerpc-specific out_be32() etc. In
> order to allow those drivers to build for other architectures, those
> must be replaced by iowrite32be(). However, on powerpc, out_be32() is
> a simple inline function while iowrite32be() is out-of-line. So in
> order not to introduce a performance regression on powerpc when making
> the drivers work on other architectures, introduce qe_io* helpers.

Isn't it also true that iowrite32be() assumes a little-endian platform
and always does a byte swap?

2019-11-12 07:16:19

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH v4 04/47] soc: fsl: qe: introduce qe_io{read,write}* wrappers

On 12/11/2019 06.17, Timur Tabi wrote:
> On Fri, Nov 8, 2019 at 7:03 AM Rasmus Villemoes
> <[email protected]> wrote:
>>
>> The QUICC engine drivers use the powerpc-specific out_be32() etc. In
>> order to allow those drivers to build for other architectures, those
>> must be replaced by iowrite32be(). However, on powerpc, out_be32() is
>> a simple inline function while iowrite32be() is out-of-line. So in
>> order not to introduce a performance regression on powerpc when making
>> the drivers work on other architectures, introduce qe_io* helpers.
>
> Isn't it also true that iowrite32be() assumes a little-endian platform
> and always does a byte swap?
>

No. You're probably thinking of the implementation in lib/iomap.c where
one has

#define mmio_read32be(addr) swab32(readl(addr))
unsigned int ioread32be(void __iomem *addr)
{
IO_COND(addr, return pio_read32be(port), return
mmio_read32be(addr));
return 0xffffffff;
}


#define mmio_write32be(val,port) writel(swab32(val),port)
void iowrite32be(u32 val, void __iomem *addr)
{
IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr));
}

but that's because readl and writel by definition work on little-endian
registers. I.e., on a BE platform, the readl and writel implementation
must themselves contain a swab, so the above would end up doing two
swabs on a BE platform.

(On PPC, there's a separate definition of mmio_read32be, namely
writel_be, which in turn does a out_be32, so on PPC that doesn't
actually end up doing two swabs).

So ioread32be etc. have well-defined semantics: access a big-endian
register and return the result in native endianness.

2019-11-14 05:09:41

by Timur Tabi

[permalink] [raw]
Subject: Re: [PATCH v4 04/47] soc: fsl: qe: introduce qe_io{read,write}* wrappers

On 11/12/19 1:14 AM, Rasmus Villemoes wrote:
> but that's because readl and writel by definition work on little-endian
> registers. I.e., on a BE platform, the readl and writel implementation
> must themselves contain a swab, so the above would end up doing two
> swabs on a BE platform.

Do you know whether the compiler optimizes-out the double swab?

> (On PPC, there's a separate definition of mmio_read32be, namely
> writel_be, which in turn does a out_be32, so on PPC that doesn't
> actually end up doing two swabs).
>
> So ioread32be etc. have well-defined semantics: access a big-endian
> register and return the result in native endianness.

It seems weird that there aren't any cross-arch lightweight
endian-specific I/O accessors.

2019-11-14 08:57:44

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH v4 04/47] soc: fsl: qe: introduce qe_io{read,write}* wrappers

On 14/11/2019 06.08, Timur Tabi wrote:
> On 11/12/19 1:14 AM, Rasmus Villemoes wrote:
>> but that's because readl and writel by definition work on little-endian
>> registers. I.e., on a BE platform, the readl and writel implementation
>> must themselves contain a swab, so the above would end up doing two
>> swabs on a BE platform.
>
> Do you know whether the compiler optimizes-out the double swab?
>

Depends. It's almost impossible to figure out how swab32() is defined,
so how much visibility gcc has into how it works is hard to say. But a
further complication is that the arch may not have, say (simplifying
somewhat)

#define readl(x) swab32(*(volatile u32*)x)

but instead have readl implemented as inline asm which includes the
byteswap. PPC being a case in point, where the readl is in_le32 which is
done with a lwbrx instruction, and certainly gcc couldn't in any way
change a swab32(asm("lwbrx")) into asm("lwz"). But ppc defines its own
mmio_read32be, so that's not an issue.

>> (On PPC, there's a separate definition of mmio_read32be, namely
>> writel_be, which in turn does a out_be32, so on PPC that doesn't
>> actually end up doing two swabs).
>>
>> So ioread32be etc. have well-defined semantics: access a big-endian
>> register and return the result in native endianness.
>
> It seems weird that there aren't any cross-arch lightweight
> endian-specific I/O accessors.

Agreed, but I'm really not prepared for trying to go down that rabbit
hole again.

Rasmus