2015-06-15 15:59:13

by Steffen Trumtrar

[permalink] [raw]
Subject: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

Hi!

I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
drivers/crypto/caam driver only works for PowerPC AFAIK.
Actually, there isn't that much to do, to get support for the i.MX6 but
one patch breaks the driver severely:

commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
crypto: caam - Add definition of rd/wr_reg64 for little endian platform

This patch adds

+#ifdef __LITTLE_ENDIAN
+static inline void wr_reg64(u64 __iomem *reg, u64 data)
+{
+ wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
+ wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull);
+}

The wr_reg64 function is only used in one place in the drivers/crypto/caam/jr.c
driver: to write the dma_addr_t to the register. Without that patch everything
works fine on ARM (little endian, 32bit), with that patch the driver will write
0's into the register that holds the DMA address (the numerically-higher) -> kernel hangs.
Also, from my understanding, the comment above the defines, stating that you
have to first write the numerically-lower and then the numerically-higher address
on 32bit systems doesn't match with the implementation.

What I don't know/understand is if this makes any sense for any PowerPC implementation.

So, the question is, how to fix this? I'd prefer to do it directly in the jr driver
instead of the ifdef-ery.

Something like
if (sizeof(dma_addr_t) == sizeof(u32))
wr_reg32(&jrp->rregs->inpring_base + 1, inpbusaddr);
else if (sizeof(dma_addr_t) == sizeof(u64))
wr_reg64(...)

or just go by DT compatible and then remove the inline function definitions.

As far as I can tell, the compatible wouldn't be needed for anything else in the
jr driver, so maybe that is not optimal. On the other hand the sizeof(..)
solution would only catch little endian on 32bit and not big endian (?!)
I however don't know what combinations actually *have* to be caught, as I don't
know, which exist.

So, what do you think people?

Thanks,
Steffen Trumtrar

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


2015-06-15 16:28:26

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

On Mon, Jun 15, 2015 at 05:59:07PM +0200, Steffen Trumtrar wrote:
> I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
> drivers/crypto/caam driver only works for PowerPC AFAIK.
> Actually, there isn't that much to do, to get support for the i.MX6 but
> one patch breaks the driver severely:
>
> commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
> crypto: caam - Add definition of rd/wr_reg64 for little endian platform

You're not the only one who's hitting problems with this - Jon Nettleton
has been working on it recently.

The way this has been done is fairly yucky to start with: several things
about it are particularly horrid. The first is the repetitive code
for the BE and LE cases, when all that's actually different is the
register order between the two code cases.

The second thing is the excessive use of masking - I'm pretty sure the
compiler won't complain with the below.

diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
index 378ddc17f60e..ba0fa630a25d 100644
--- a/drivers/crypto/caam/regs.h
+++ b/drivers/crypto/caam/regs.h
@@ -84,34 +84,28 @@
#endif

#ifndef CONFIG_64BIT
-#ifdef __BIG_ENDIAN
-static inline void wr_reg64(u64 __iomem *reg, u64 data)
-{
- wr_reg32((u32 __iomem *)reg, (data & 0xffffffff00000000ull) >> 32);
- wr_reg32((u32 __iomem *)reg + 1, data & 0x00000000ffffffffull);
-}
-
-static inline u64 rd_reg64(u64 __iomem *reg)
-{
- return (((u64)rd_reg32((u32 __iomem *)reg)) << 32) |
- ((u64)rd_reg32((u32 __iomem *)reg + 1));
-}
+#if defined(__BIG_ENDIAN)
+#define REG64_HI32(reg) ((u32 __iomem *)(reg))
+#define REG64_LO32(reg) ((u32 __iomem *)(reg) + 1)
+#elif defined(__LITTLE_ENDIAN)
+#define REG64_HI32(reg) ((u32 __iomem *)(reg) + 1)
+#define REG64_LO32(reg) ((u32 __iomem *)(reg))
#else
-#ifdef __LITTLE_ENDIAN
+#error Broken endian?
+#endif
+
static inline void wr_reg64(u64 __iomem *reg, u64 data)
{
- wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
- wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull);
+ wr_reg32(REG64_HI32(reg), data >> 32);
+ wr_reg32(REG64_LO32(reg), data);
}

static inline u64 rd_reg64(u64 __iomem *reg)
{
- return (((u64)rd_reg32((u32 __iomem *)reg + 1)) << 32) |
- ((u64)rd_reg32((u32 __iomem *)reg));
+ return ((u64)rd_reg32(REG64_HI32(reg))) << 32 |
+ rd_reg32(REG64_LO32(reg));
}
#endif
-#endif
-#endif

/*
* jr_outentry

The second issue is that apparently, the register order doesn't actually
change for LE devices - in other words, the byte order within each register
does change, but they aren't a 64-bit register, they're two separate 32-bit
registers. So, they should always be written as such.

So, I'd get rid of the #if defined(__BIG_ENDIAN) stuff and instead have:

+/*
+ * The DMA address register is a pair of 32-bit registers, and should
+ * always be accessed as such.
+ */
+#define REG64_HI32(reg) ((u32 __iomem *)(reg))
+#define REG64_LO32(reg) ((u32 __iomem *)(reg) + 1)


--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-06-15 16:33:19

by Jon Nettleton

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

On Mon, Jun 15, 2015 at 5:59 PM, Steffen Trumtrar
<[email protected]> wrote:
> Hi!
>
> I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
> drivers/crypto/caam driver only works for PowerPC AFAIK.
> Actually, there isn't that much to do, to get support for the i.MX6 but
> one patch breaks the driver severely:
>
> commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
> crypto: caam - Add definition of rd/wr_reg64 for little endian platform
>
> This patch adds
>
> +#ifdef __LITTLE_ENDIAN
> +static inline void wr_reg64(u64 __iomem *reg, u64 data)
> +{
> + wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
> + wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull);
> +}
>
> The wr_reg64 function is only used in one place in the drivers/crypto/caam/jr.c
> driver: to write the dma_addr_t to the register. Without that patch everything
> works fine on ARM (little endian, 32bit), with that patch the driver will write
> 0's into the register that holds the DMA address (the numerically-higher) -> kernel hangs.
> Also, from my understanding, the comment above the defines, stating that you
> have to first write the numerically-lower and then the numerically-higher address
> on 32bit systems doesn't match with the implementation.
>
> What I don't know/understand is if this makes any sense for any PowerPC implementation.
>
> So, the question is, how to fix this? I'd prefer to do it directly in the jr driver
> instead of the ifdef-ery.
>
> Something like
> if (sizeof(dma_addr_t) == sizeof(u32))
> wr_reg32(&jrp->rregs->inpring_base + 1, inpbusaddr);
> else if (sizeof(dma_addr_t) == sizeof(u64))
> wr_reg64(...)
>
> or just go by DT compatible and then remove the inline function definitions.
>
> As far as I can tell, the compatible wouldn't be needed for anything else in the
> jr driver, so maybe that is not optimal. On the other hand the sizeof(..)
> solution would only catch little endian on 32bit and not big endian (?!)
> I however don't know what combinations actually *have* to be caught, as I don't
> know, which exist.
>
> So, what do you think people?

Funny enough I tackled this problem over the weekend as well. My
approach was to switch the driver over to use the *_relaxed() io
functions and then special case the bits missing from the various
ARCHs. Basically adding setbits32 and clrbits32 for !PPC
architectures and letting PPC and ARM share a writeq/readq set of
functions. I left the existing LITTLE_ENDIAN special case until I
could verify if it was needed, or had been tested.

For reference, the patch against 4.1-rc5/6 is attached.

-Jon


Attachments:
0001-crypto-caam-use-generic-_relaxed-io-functions.patch (13.82 kB)

2015-06-15 17:18:48

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

On Mon, Jun 15, 2015 at 06:33:17PM +0200, Jon Nettleton wrote:
> Funny enough I tackled this problem over the weekend as well. My
> approach was to switch the driver over to use the *_relaxed() io
> functions and then special case the bits missing from the various
> ARCHs. Basically adding setbits32 and clrbits32 for !PPC
> architectures and letting PPC and ARM share a writeq/readq set of
> functions. I left the existing LITTLE_ENDIAN special case until I
> could verify if it was needed, or had been tested.

I'll follow up here with what I've mentioned elsewhere, and some further
thoughts.

I think this shows the dangers of using struct { } to define register
offsets. Let's start here:

/*
* caam_job_ring - direct job ring setup
* 1-4 possible per instantiation, base + 1000/2000/3000/4000
* Padded out to 0x1000
*/
struct caam_job_ring {
/* Input ring */
u64 inpring_base; /* IRBAx - Input desc ring baseaddr */
u32 rsvd1;

Apparently, this is a CPU-endian 64-bit value (it's not defined using
le64 or be64 which would "fix" it's endian.)

The second question, which comes up in light of the breakage that's
being reported is: is this really a 64-bit register, or is it a pair
of 32-bit registers side-by-side?

The documentation I'm looking at doesn't document the register at
base + 0x1000, but documents the one at base + 0x1004, and the one
at 0x1004 is given the name "IRBAR0_LS", which presumably stands
for "input ring base address register 0, least significant".

As the code originally stood for PPC, IRBAR0_LS is also at 0x1004,
but appears to be big endian.

On ARM, IRBAR0_LS appears at the same address, but is little endian.

This is *not* a 64-bit register at all, but is a pair of 32-bit
registers side by side. Moreover, readq() should not be used - no
amount of arch mangling could ever produce a sane readq() which
coped with this.

So, the CAAM code is buggy in this regard: using readq() here when
endian-portability is required is wrong. It's got to be two 32-bit
reads or two 32-bit writes in the appropriate endian.

Also, note that there's a big difference between __raw_readl() and
readl_relaxed(). readl_relaxed() is always little-endian. __raw_readl()
is god-knows-what-the-archtecture-decides endian. Switching PPC
drivers from __raw_readl() to readl_relaxed() is really not a good
idea unless someone from the PPC camp reviews and tests the code.

So, what I'd suggest is just fixing rd_reg64() and wr_reg64() to do
the right thing: keeping the two 32-bit words in the same order
irrespective of the endian-ness, and staying with the __raw_*
accessors until PPC people can look at this.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-06-15 22:05:13

by Herbert Xu

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

On Mon, Jun 15, 2015 at 05:59:07PM +0200, Steffen Trumtrar wrote:
> Hi!
>
> I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
> drivers/crypto/caam driver only works for PowerPC AFAIK.
> Actually, there isn't that much to do, to get support for the i.MX6 but
> one patch breaks the driver severely:
>
> commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
> crypto: caam - Add definition of rd/wr_reg64 for little endian platform
>
> This patch adds
>
> +#ifdef __LITTLE_ENDIAN
> +static inline void wr_reg64(u64 __iomem *reg, u64 data)
> +{
> + wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
> + wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull);
> +}
>
> The wr_reg64 function is only used in one place in the drivers/crypto/caam/jr.c
> driver: to write the dma_addr_t to the register. Without that patch everything
> works fine on ARM (little endian, 32bit), with that patch the driver will write
> 0's into the register that holds the DMA address (the numerically-higher) -> kernel hangs.
> Also, from my understanding, the comment above the defines, stating that you
> have to first write the numerically-lower and then the numerically-higher address
> on 32bit systems doesn't match with the implementation.
>
> What I don't know/understand is if this makes any sense for any PowerPC implementation.
>
> So, the question is, how to fix this? I'd prefer to do it directly in the jr driver
> instead of the ifdef-ery.
>
> Something like
> if (sizeof(dma_addr_t) == sizeof(u32))
> wr_reg32(&jrp->rregs->inpring_base + 1, inpbusaddr);
> else if (sizeof(dma_addr_t) == sizeof(u64))
> wr_reg64(...)
>
> or just go by DT compatible and then remove the inline function definitions.
>
> As far as I can tell, the compatible wouldn't be needed for anything else in the
> jr driver, so maybe that is not optimal. On the other hand the sizeof(..)
> solution would only catch little endian on 32bit and not big endian (?!)
> I however don't know what combinations actually *have* to be caught, as I don't
> know, which exist.
>
> So, what do you think people?

I'm adding a couple of CCs.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-06-16 03:27:54

by Victoria Milhoan

[permalink] [raw]
Subject: RE: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

All,

Freescale has been adding i.MX6 support to the CAAM driver and testing on both i.MX6 and QorIQ platforms. The patch series is now available for review. Your feedback for the provided patches is appreciated.

Thanks,
Victoria

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Herbert Xu
Sent: Monday, June 15, 2015 3:05 PM
To: Steffen Trumtrar
Cc: [email protected]; [email protected]; [email protected]; Gupta Ruchika-R66431; [email protected]; Geanta Neag Horia Ioan-B05471; Kim Phillips
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

On Mon, Jun 15, 2015 at 05:59:07PM +0200, Steffen Trumtrar wrote:
> Hi!
>
> I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
> drivers/crypto/caam driver only works for PowerPC AFAIK.
> Actually, there isn't that much to do, to get support for the i.MX6
> but one patch breaks the driver severely:
>
> commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
> crypto: caam - Add definition of rd/wr_reg64 for little endian
> platform
>
> This patch adds
>
> +#ifdef __LITTLE_ENDIAN
> +static inline void wr_reg64(u64 __iomem *reg, u64 data) {
> + wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
> + wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull); }
>
> The wr_reg64 function is only used in one place in the
> drivers/crypto/caam/jr.c
> driver: to write the dma_addr_t to the register. Without that patch
> everything works fine on ARM (little endian, 32bit), with that patch
> the driver will write 0's into the register that holds the DMA address (the numerically-higher) -> kernel hangs.
> Also, from my understanding, the comment above the defines, stating
> that you have to first write the numerically-lower and then the
> numerically-higher address on 32bit systems doesn't match with the implementation.
>
> What I don't know/understand is if this makes any sense for any PowerPC implementation.
>
> So, the question is, how to fix this? I'd prefer to do it directly in
> the jr driver instead of the ifdef-ery.
>
> Something like
> if (sizeof(dma_addr_t) == sizeof(u32))
> wr_reg32(&jrp->rregs->inpring_base + 1, inpbusaddr);
> else if (sizeof(dma_addr_t) == sizeof(u64))
> wr_reg64(...)
>
> or just go by DT compatible and then remove the inline function definitions.
>
> As far as I can tell, the compatible wouldn't be needed for anything
> else in the jr driver, so maybe that is not optimal. On the other hand
> the sizeof(..) solution would only catch little endian on 32bit and
> not big endian (?!) I however don't know what combinations actually
> *have* to be caught, as I don't know, which exist.
>
> So, what do you think people?

I'm adding a couple of CCs.

Thanks,
--
Email: Herbert Xu <[email protected]> Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-06-16 07:45:34

by Steffen Trumtrar

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

Hi!

On Mon, Jun 15, 2015 at 05:28:16PM +0100, Russell King - ARM Linux wrote:
> On Mon, Jun 15, 2015 at 05:59:07PM +0200, Steffen Trumtrar wrote:
> > I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
> > drivers/crypto/caam driver only works for PowerPC AFAIK.
> > Actually, there isn't that much to do, to get support for the i.MX6 but
> > one patch breaks the driver severely:
> >
> > commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
> > crypto: caam - Add definition of rd/wr_reg64 for little endian platform
>
> You're not the only one who's hitting problems with this - Jon Nettleton
> has been working on it recently.
>
> The way this has been done is fairly yucky to start with: several things
> about it are particularly horrid. The first is the repetitive code
> for the BE and LE cases, when all that's actually different is the
> register order between the two code cases.
>
> The second thing is the excessive use of masking - I'm pretty sure the
> compiler won't complain with the below.
>
> diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
> index 378ddc17f60e..ba0fa630a25d 100644
> --- a/drivers/crypto/caam/regs.h
> +++ b/drivers/crypto/caam/regs.h
> @@ -84,34 +84,28 @@
> #endif
>
> #ifndef CONFIG_64BIT
> -#ifdef __BIG_ENDIAN
> -static inline void wr_reg64(u64 __iomem *reg, u64 data)
> -{
> - wr_reg32((u32 __iomem *)reg, (data & 0xffffffff00000000ull) >> 32);
> - wr_reg32((u32 __iomem *)reg + 1, data & 0x00000000ffffffffull);
> -}
> -
> -static inline u64 rd_reg64(u64 __iomem *reg)
> -{
> - return (((u64)rd_reg32((u32 __iomem *)reg)) << 32) |
> - ((u64)rd_reg32((u32 __iomem *)reg + 1));
> -}
> +#if defined(__BIG_ENDIAN)
> +#define REG64_HI32(reg) ((u32 __iomem *)(reg))
> +#define REG64_LO32(reg) ((u32 __iomem *)(reg) + 1)
> +#elif defined(__LITTLE_ENDIAN)
> +#define REG64_HI32(reg) ((u32 __iomem *)(reg) + 1)
> +#define REG64_LO32(reg) ((u32 __iomem *)(reg))
> #else
> -#ifdef __LITTLE_ENDIAN
> +#error Broken endian?
> +#endif
> +
> static inline void wr_reg64(u64 __iomem *reg, u64 data)
> {
> - wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
> - wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull);
> + wr_reg32(REG64_HI32(reg), data >> 32);
> + wr_reg32(REG64_LO32(reg), data);
> }
>
> static inline u64 rd_reg64(u64 __iomem *reg)
> {
> - return (((u64)rd_reg32((u32 __iomem *)reg + 1)) << 32) |
> - ((u64)rd_reg32((u32 __iomem *)reg));
> + return ((u64)rd_reg32(REG64_HI32(reg))) << 32 |
> + rd_reg32(REG64_LO32(reg));
> }
> #endif
> -#endif
> -#endif
>
> /*
> * jr_outentry
>
> The second issue is that apparently, the register order doesn't actually
> change for LE devices - in other words, the byte order within each register
> does change, but they aren't a 64-bit register, they're two separate 32-bit
> registers. So, they should always be written as such.
>
> So, I'd get rid of the #if defined(__BIG_ENDIAN) stuff and instead have:
>
> +/*
> + * The DMA address register is a pair of 32-bit registers, and should
> + * always be accessed as such.
> + */
> +#define REG64_HI32(reg) ((u32 __iomem *)(reg))
> +#define REG64_LO32(reg) ((u32 __iomem *)(reg) + 1)
>

Thanks for all your explanations (in the other mail, too).
I, personally, like this approach the best; at least in the current state
of the driver.

Thanks,
Steffen

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2015-06-16 08:11:10

by Jon Nettleton

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

Victoria,

I was hoping you would join the conversation. I know you have a
series of patches in Freescale's 3.14 git repository. Have you
updated those for mainline and published them for review and inclusion
in the upstream kernel? If yes to any could you post a link?

-Jon

On Tue, Jun 16, 2015 at 5:27 AM, Victoria Milhoan
<[email protected]> wrote:
> All,
>
> Freescale has been adding i.MX6 support to the CAAM driver and testing on both i.MX6 and QorIQ platforms. The patch series is now available for review. Your feedback for the provided patches is appreciated.
>
> Thanks,
> Victoria
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Herbert Xu
> Sent: Monday, June 15, 2015 3:05 PM
> To: Steffen Trumtrar
> Cc: [email protected]; [email protected]; [email protected]; Gupta Ruchika-R66431; [email protected]; Geanta Neag Horia Ioan-B05471; Kim Phillips
> Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC
>
> On Mon, Jun 15, 2015 at 05:59:07PM +0200, Steffen Trumtrar wrote:
>> Hi!
>>
>> I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
>> drivers/crypto/caam driver only works for PowerPC AFAIK.
>> Actually, there isn't that much to do, to get support for the i.MX6
>> but one patch breaks the driver severely:
>>
>> commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
>> crypto: caam - Add definition of rd/wr_reg64 for little endian
>> platform
>>
>> This patch adds
>>
>> +#ifdef __LITTLE_ENDIAN
>> +static inline void wr_reg64(u64 __iomem *reg, u64 data) {
>> + wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
>> + wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull); }
>>
>> The wr_reg64 function is only used in one place in the
>> drivers/crypto/caam/jr.c
>> driver: to write the dma_addr_t to the register. Without that patch
>> everything works fine on ARM (little endian, 32bit), with that patch
>> the driver will write 0's into the register that holds the DMA address (the numerically-higher) -> kernel hangs.
>> Also, from my understanding, the comment above the defines, stating
>> that you have to first write the numerically-lower and then the
>> numerically-higher address on 32bit systems doesn't match with the implementation.
>>
>> What I don't know/understand is if this makes any sense for any PowerPC implementation.
>>
>> So, the question is, how to fix this? I'd prefer to do it directly in
>> the jr driver instead of the ifdef-ery.
>>
>> Something like
>> if (sizeof(dma_addr_t) == sizeof(u32))
>> wr_reg32(&jrp->rregs->inpring_base + 1, inpbusaddr);
>> else if (sizeof(dma_addr_t) == sizeof(u64))
>> wr_reg64(...)
>>
>> or just go by DT compatible and then remove the inline function definitions.
>>
>> As far as I can tell, the compatible wouldn't be needed for anything
>> else in the jr driver, so maybe that is not optimal. On the other hand
>> the sizeof(..) solution would only catch little endian on 32bit and
>> not big endian (?!) I however don't know what combinations actually
>> *have* to be caught, as I don't know, which exist.
>>
>> So, what do you think people?
>
> I'm adding a couple of CCs.
>
> Thanks,
> --
> Email: Herbert Xu <[email protected]> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2015-06-16 12:53:58

by Steffen Trumtrar

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

Hi!

On Tue, Jun 16, 2015 at 03:27:54AM +0000, Victoria Milhoan wrote:
> All,
>
> Freescale has been adding i.MX6 support to the CAAM driver and testing on both i.MX6 and QorIQ platforms. The patch series is now available for review. Your feedback for the provided patches is appreciated.
>

Could you maybe bounce me that thread?
I can't find it in a format that allows me to reply to the patches.

Thanks,
Steffen

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2015-06-16 16:00:09

by Victoria Milhoan

[permalink] [raw]
Subject: RE: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

Jon,

The patches published to the mailing list yesterday for comment (titled crypto: caam - Add i.MX6 support to the Freescale CAAM driver) are based on the patches in the Freescale 3.14 git repository which enable i.MX6 support. The initial patch set is intended to provide a functional driver, based on the mainline kernel, which supports both QorIQ and i.MX6.

Thanks,
Victoria

-----Original Message-----
From: Jon Nettleton [mailto:[email protected]]
Sent: Tuesday, June 16, 2015 1:11 AM
To: Milhoan Victoria-B42089
Cc: Herbert Xu; Steffen Trumtrar; Kim Phillips; [email protected]; Gupta Ruchika-R66431; [email protected]; [email protected]; Geanta Neag Horia Ioan-B05471; [email protected]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

Victoria,

I was hoping you would join the conversation. I know you have a series of patches in Freescale's 3.14 git repository. Have you updated those for mainline and published them for review and inclusion in the upstream kernel? If yes to any could you post a link?

-Jon

On Tue, Jun 16, 2015 at 5:27 AM, Victoria Milhoan <[email protected]> wrote:
> All,
>
> Freescale has been adding i.MX6 support to the CAAM driver and testing on both i.MX6 and QorIQ platforms. The patch series is now available for review. Your feedback for the provided patches is appreciated.
>
> Thanks,
> Victoria
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Herbert Xu
> Sent: Monday, June 15, 2015 3:05 PM
> To: Steffen Trumtrar
> Cc: [email protected];
> [email protected]; [email protected];
> Gupta Ruchika-R66431; [email protected]; Geanta Neag Horia
> Ioan-B05471; Kim Phillips
> Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC
>
> On Mon, Jun 15, 2015 at 05:59:07PM +0200, Steffen Trumtrar wrote:
>> Hi!
>>
>> I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current
>> drivers/crypto/caam driver only works for PowerPC AFAIK.
>> Actually, there isn't that much to do, to get support for the i.MX6
>> but one patch breaks the driver severely:
>>
>> commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6
>> crypto: caam - Add definition of rd/wr_reg64 for little endian
>> platform
>>
>> This patch adds
>>
>> +#ifdef __LITTLE_ENDIAN
>> +static inline void wr_reg64(u64 __iomem *reg, u64 data) {
>> + wr_reg32((u32 __iomem *)reg + 1, (data & 0xffffffff00000000ull) >> 32);
>> + wr_reg32((u32 __iomem *)reg, data & 0x00000000ffffffffull); }
>>
>> The wr_reg64 function is only used in one place in the
>> drivers/crypto/caam/jr.c
>> driver: to write the dma_addr_t to the register. Without that patch
>> everything works fine on ARM (little endian, 32bit), with that patch
>> the driver will write 0's into the register that holds the DMA address (the numerically-higher) -> kernel hangs.
>> Also, from my understanding, the comment above the defines, stating
>> that you have to first write the numerically-lower and then the
>> numerically-higher address on 32bit systems doesn't match with the implementation.
>>
>> What I don't know/understand is if this makes any sense for any PowerPC implementation.
>>
>> So, the question is, how to fix this? I'd prefer to do it directly in
>> the jr driver instead of the ifdef-ery.
>>
>> Something like
>> if (sizeof(dma_addr_t) == sizeof(u32))
>> wr_reg32(&jrp->rregs->inpring_base + 1, inpbusaddr);
>> else if (sizeof(dma_addr_t) == sizeof(u64))
>> wr_reg64(...)
>>
>> or just go by DT compatible and then remove the inline function definitions.
>>
>> As far as I can tell, the compatible wouldn't be needed for anything
>> else in the jr driver, so maybe that is not optimal. On the other
>> hand the sizeof(..) solution would only catch little endian on 32bit
>> and not big endian (?!) I however don't know what combinations
>> actually
>> *have* to be caught, as I don't know, which exist.
>>
>> So, what do you think people?
>
> I'm adding a couple of CCs.
>
> Thanks,
> --
> Email: Herbert Xu <[email protected]> Home Page:
> http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-crypto" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2015-08-04 17:56:02

by Horia Geantă

[permalink] [raw]
Subject: Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC

On 6/15/2015 8:18 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 15, 2015 at 06:33:17PM +0200, Jon Nettleton wrote:
>> Funny enough I tackled this problem over the weekend as well. My
>> approach was to switch the driver over to use the *_relaxed() io
>> functions and then special case the bits missing from the various
>> ARCHs. Basically adding setbits32 and clrbits32 for !PPC
>> architectures and letting PPC and ARM share a writeq/readq set of
>> functions. I left the existing LITTLE_ENDIAN special case until I
>> could verify if it was needed, or had been tested.
>
> I'll follow up here with what I've mentioned elsewhere, and some further
> thoughts.
>
> I think this shows the dangers of using struct { } to define register
> offsets. Let's start here:
>
> /*
> * caam_job_ring - direct job ring setup
> * 1-4 possible per instantiation, base + 1000/2000/3000/4000
> * Padded out to 0x1000
> */
> struct caam_job_ring {
> /* Input ring */
> u64 inpring_base; /* IRBAx - Input desc ring baseaddr */
> u32 rsvd1;
>
> Apparently, this is a CPU-endian 64-bit value (it's not defined using
> le64 or be64 which would "fix" it's endian.)

The IP in question (CAAM) has different endianness, depending on the
integration on the SoC. Would it be ok to #define a sec64 type which
would translate either to be64 or le64?

>
> The second question, which comes up in light of the breakage that's
> being reported is: is this really a 64-bit register, or is it a pair
> of 32-bit registers side-by-side?
>
> The documentation I'm looking at doesn't document the register at
> base + 0x1000, but documents the one at base + 0x1004, and the one
> at 0x1004 is given the name "IRBAR0_LS", which presumably stands
> for "input ring base address register 0, least significant".
>
> As the code originally stood for PPC, IRBAR0_LS is also at 0x1004,
> but appears to be big endian.
>
> On ARM, IRBAR0_LS appears at the same address, but is little endian.

In some cases, CAAM endianness does not match CPU endianness (for e.g.
i.MX). We're talking about default endianness here, both for CAAM and
for CPU - no CONFIG_CPU_{BIG,LITTLE}_ENDIAN.
ARCH CPU CAAM SoC
PPC BIG BIG P4080, T4240 etc.
ARM LITTLE LITTLE LS1021A, LS2085A etc.
ARM LITTLE BIG i.MX6, i.MX7, LS1043A

> This is *not* a 64-bit register at all, but is a pair of 32-bit
> registers side by side. Moreover, readq() should not be used - no
> amount of arch mangling could ever produce a sane readq() which
> coped with this.
>
> So, the CAAM code is buggy in this regard: using readq() here when
> endian-portability is required is wrong. It's got to be two 32-bit
> reads or two 32-bit writes in the appropriate endian.
>
> Also, note that there's a big difference between __raw_readl() and
> readl_relaxed(). readl_relaxed() is always little-endian. __raw_readl()
> is god-knows-what-the-archtecture-decides endian. Switching PPC
> drivers from __raw_readl() to readl_relaxed() is really not a good
> idea unless someone from the PPC camp reviews and tests the code.
>
> So, what I'd suggest is just fixing rd_reg64() and wr_reg64() to do
> the right thing: keeping the two 32-bit words in the same order
> irrespective of the endian-ness, and staying with the __raw_*
> accessors until PPC people can look at this.

Agree that the I/O accessors need to be revisited.
Unfortunately, the proposed change does not work for LE CAAM (LS1021A).
I'll send a fix.

Thanks,
Horia