2005-10-28 07:01:10

by John Bowler

[permalink] [raw]
Subject: [PATCH] 2.6.14-rc3 ixp4xx_copy_from little endian/alignment

I'm resubmitting this because originally we just submitted it to
linux-arm-kernel and I believe this means it hasn't been seen in
the right places to get it into the MM patch set. This patch and
the ones I will be submitting immediately afterwards or that we have
submitted before are sufficient to allow a bootable, working,
kernel on the LinkSys NSLU2 in either little-endian or big-endian
mode. The patches have been tested against both 2.6.14 and
2.6.14-rc5-mm1 with a variety of rootfs's (arm and thumb uclibc,
arm glibc).

John Bowler <[email protected]>

>From: Alessandro Zummo
this short patch will modify the ixp4xx mtd code in order to fix
a couple of problems we have verified while working
with the NSLU2.

- word-wide accesses vs byte-wide
- little endian support

There's an insightful description in the first lines of the
patch.

The author, John Bowler, has put a lot of efforts in this patch,
verifying that this is the cleanest and shortest patch that
can fix those problem.

John and others wroted at least three different solutions
analyzing both the current code, our needs and the
philosophical aspects of this code.. and I'm not kidding!

The patch has been tested to work on the NSLU2 in both
LE and BE mode.

Signed-off-by: John Bowler <[email protected]>
Signed-off-by: Alessandro Zummo <[email protected]>

--- linux-2.6.13/drivers/mtd/maps/ixp4xx.c 2005-10-07 15:55:08.958509801 -0700
+++ linux-2.6.13/drivers/mtd/maps/ixp4xx.c 2005-10-07 19:06:22.352484966 -0700
@@ -22,6 +22,7 @@
#include <linux/string.h>
#include <linux/mtd/mtd.h>
#include <linux/mtd/map.h>
+#include <linux/mtd/cfi_endian.h>
#include <linux/mtd/partitions.h>
#include <linux/ioport.h>
#include <linux/device.h>
@@ -30,18 +31,45 @@

#include <linux/reboot.h>

+/* On a little-endian IXP4XX system (tested on NSLU2) contrary to the
+ * Intel documentation LDRH/STRH appears to XOR the address with 10b.
+ * This causes the cfi commands (sent to the command address, 0xAA for
+ * 16 bit flash) to fail. This is fixed here by XOR'ing the address
+ * before use with 10b. The cost of this is that the flash layout ends
+ * up with pdp-endiannes (on an LE syste), however this is not a problem
+ * as the access code consistently only accesses half words - so the
+ * endianness is not determinable on stuff which is written and read
+ * consistently in the little endian world.
+ *
+ * For flash data from the big-endian world, however, the results are
+ * weird - the pdp-endianness results in the data apparently being
+ * 2-byte swapped (as in dd conv=swab). To work round this the 16
+ * bit values are written and read using cpu_to_cfi16 and cfi16_to_cpu,
+ * by default these are no-ops, but if the MTD driver is configed with
+ * CONFIG_MTD_CFI_BE_BYTE_SWAP the macros will byte swap the data,
+ * resulting in a consistently BE view of the flash on both BE (no
+ * op) and LE systems. This config setting also causes the command
+ * data from the CFI implementation to get swapped - as is required
+ * so that this code will *unswap* it and give the correct command
+ * data to the flash.
+ */
#ifndef __ARMEB__
#define BYTE0(h) ((h) & 0xFF)
#define BYTE1(h) (((h) >> 8) & 0xFF)
+#define FLASHWORD(a) (*(__u16*)((u32)(a) ^ 2))
#else
#define BYTE0(h) (((h) >> 8) & 0xFF)
#define BYTE1(h) ((h) & 0xFF)
+#define FLASHWORD(a) (*(__u16*)(a))
#endif

+#define FLASHW(a) cfi16_to_cpu(FLASHWORD(a))
+#define FLASHSET(a,v) (FLASHWORD(a) = cpu_to_cfi16(v))
+
static map_word ixp4xx_read16(struct map_info *map, unsigned long ofs)
{
map_word val;
- val.x[0] = *(__u16 *) (map->map_priv_1 + ofs);
+ val.x[0] = FLASHW(map->map_priv_1 + ofs);
return val;
}

@@ -53,19 +81,23 @@
static void ixp4xx_copy_from(struct map_info *map, void *to,
unsigned long from, ssize_t len)
{
- int i;
- u8 *dest = (u8 *) to;
- u16 *src = (u16 *) (map->map_priv_1 + from);
- u16 data;
+ if (len <= 0)
+ return;

- for (i = 0; i < (len / 2); i++) {
- data = src[i];
- dest[i * 2] = BYTE0(data);
- dest[i * 2 + 1] = BYTE1(data);
+ u8 *dest = (u8 *) to;
+ u8 *src = (u8 *) (map->map_priv_1 + from);
+ if (from & 1)
+ *dest++ = BYTE1(FLASHW(src-1)), ++src, --len;
+
+ while (len >= 2) {
+ u16 data = FLASHW(src); src += 2;
+ *dest++ = BYTE0(data);
+ *dest++ = BYTE1(data);
+ len -= 2;
}

- if (len & 1)
- dest[len - 1] = BYTE0(src[i]);
+ if (len > 0)
+ *dest++ = BYTE0(FLASHW(src));
}

/*
@@ -75,7 +107,7 @@
static void ixp4xx_probe_write16(struct map_info *map, map_word d, unsigned long adr)
{
if (!(adr & 1))
- *(__u16 *) (map->map_priv_1 + adr) = d.x[0];
+ FLASHSET(map->map_priv_1 + adr, d.x[0]);
}

/*
@@ -83,7 +115,7 @@
*/
static void ixp4xx_write16(struct map_info *map, map_word d, unsigned long adr)
{
- *(__u16 *) (map->map_priv_1 + adr) = d.x[0];
+ FLASHSET(map->map_priv_1 + adr, d.x[0]);
}

struct ixp4xx_flash_info {

--

Best regards,

Alessandro Zummo,
Tower Technologies - Turin, Italy

http://www.towertech.it


2005-10-28 08:14:04

by Lennert Buytenhek

[permalink] [raw]
Subject: Re: [PATCH] 2.6.14-rc3 ixp4xx_copy_from little endian/alignment

On Thu, Oct 27, 2005 at 11:09:31PM -0700, John Bowler wrote:

> +/* On a little-endian IXP4XX system (tested on NSLU2) contrary to the
> + * Intel documentation LDRH/STRH appears to XOR the address with 10b.

It's unlikely that ldrh/strh in LE mode behave differently than
on any other little-endian ARM system out there. Maybe what was
meant is that this flaw only applies to accesses to the flash
controller (more likely)? If so, can the comment be updated to
reflect that?


thanks,
Lennert

2005-10-28 12:40:40

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH] 2.6.14-rc3 ixp4xx_copy_from little endian/alignment

John Bowler wrote:
>
> +/* On a little-endian IXP4XX system (tested on NSLU2) contrary to the
> + * Intel documentation LDRH/STRH appears to XOR the address with 10b.

I don't think this is correct. i.e., I think the Intel docmentation is
correct.

The Application Note on IXP4xx endianess operation[1] says that (by
default) the XScale core operates in address coherency mode (i.e., it
flips address bits). I suspect you need to set BYTE_SWAP_EN in
EXP_CNFG1 and use the P bit in the MMU to get data coherency mode for
various peripherals (probably all expansion bus periperals and possibly
all the APB peripherals).

Also, I've noticed that the PCI_CSR is mis-configured when the XScale
core is in little-endian mode. ABE (AHB is big-endian) /must/ always be
set -- remember that the NPEs are always big-endian devices.

Since I'd never run an IXP4xx in little-endian mode I've not looked at
this issue in any great depth so I could be wrong here. Regardless, the
proposed hack to the flash map driver is wrong since all expansion bus
peripherals are affected not just flash (i.e., the solution needs to be
more generic rather than flash driver specific).

David Vrabel

[1] http://www.intel.com/design/network/applnots/25423701.pdf

2005-10-28 17:00:20

by John Bowler

[permalink] [raw]
Subject: RE: [PATCH] 2.6.14-rc3 ixp4xx_copy_from little endian/alignment

From: David Vrabel [mailto:[email protected]]
>John Bowler wrote:
>> +/* On a little-endian IXP4XX system (tested on NSLU2) contrary to the
>> + * Intel documentation LDRH/STRH appears to XOR the address with 10b.

This turns out to be a misleading comment, and I have deleted it:

>The Application Note on IXP4xx endianess operation[1] says that (by
>default) the XScale core operates in address coherency mode (i.e., it
>flips address bits). I suspect you need to set BYTE_SWAP_EN in
>EXP_CNFG1 and use the P bit in the MMU to get data coherency mode for
>various peripherals (probably all expansion bus periperals and possibly
>all the APB peripherals).

This is the observed behaviour (address coherency) - the address bits are
flipped on LE - and I hope the rest of the comments in the patch make
this clear because the central problem being fixed is that the command
address seen by the CPU is flipped from 0xAA to 0xA8 on LE systems.

The only problem with the address coherency setting is that the flash
command address bit flipping causes the flash not to work (there's no
way of configuring the MTD code to use a different address for command,
operations which require data coherency, and data operations which require
address coherency, because the command code *does* do data reads to
get the results back.)

*All* the other peripherals on the system work - including the USB which
is on the PCI bus - we do have this patch:

>Also, I've noticed that the PCI_CSR is mis-configured when the XScale
>core is in little-endian mode. ABE (AHB is big-endian) /must/ always be
>set -- remember that the NPEs are always big-endian devices.

This doesn't affect the flash (we've verified that - i.e. *with* the
patch the flash works in LE regardless of the patch for the PCI_CSR
setting).

>Since I'd never run an IXP4xx in little-endian mode I've not looked at
>this issue in any great depth so I could be wrong here. Regardless, the
>proposed hack to the flash map driver is wrong since all expansion bus
>peripherals are affected not just flash (i.e., the solution needs to be
>more generic rather than flash driver specific).

No, that's incorrect. The patch has been demonstrated to be correct with
all devices (along with the PCI_CSR patch, which Deepak has already pushed
upstream). I.e. *without* the patch everything works (BE and LE) except
the flash is unuseable, *with* the patch the flash works too.

We examined a number of solutions (i.e. I tried them, had systems which
booted with them and looked at the resultant problems). I haven't tried
setting data coherency, but I'm assuming this would break at least some
of the peripherals.

Of the solutions I tried this is the only solution which allows us to switch
from LE to BE kernel/rootfs and share data which does not change (the NSLU2
configuration partition, the all important NSLU2 ethernet MAC which is
actually inside the boot loader and the RedBoot FIS directory.)

So I'm effectively saying we need data coherency in the flash, but what we
have in everything *else* is working just find with address coherency.

John Bowler <[email protected]>

2005-10-28 20:34:13

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH] 2.6.14-rc3 ixp4xx_copy_from little endian/alignment

John Bowler wrote:
>
>>Also, I've noticed that the PCI_CSR is mis-configured when the XScale
>>core is in little-endian mode. ABE (AHB is big-endian) /must/ always be
>>set -- remember that the NPEs are always big-endian devices.
>
>
> This doesn't affect the flash (we've verified that - i.e. *with* the
> patch the flash works in LE regardless of the patch for the PCI_CSR
> setting).

Now that you mention it I do remember seeing this patch floating around.

>>Since I'd never run an IXP4xx in little-endian mode I've not looked at
>>this issue in any great depth so I could be wrong here. Regardless, the
>>proposed hack to the flash map driver is wrong since all expansion bus
>>peripherals are affected not just flash (i.e., the solution needs to be
>>more generic rather than flash driver specific).
>
>
> No, that's incorrect. The patch has been demonstrated to be correct with
> all devices (along with the PCI_CSR patch, which Deepak has already pushed
> upstream). I.e. *without* the patch everything works (BE and LE) except
> the flash is unuseable, *with* the patch the flash works too.

It appears that the NSLU2 only has the flash on the expansion bus which
is why you believe it's a flash specific problem.

> So I'm effectively saying we need data coherency in the flash, but what we
> have in everything *else* is working just find with address coherency.

Data coherency can be set on a per 1 Mibyte page basis so all other (APB
and PCI) peripherals would continue to use address coherency and thus
would continue to function as they are now.

David Vrabel

2005-10-29 01:00:21

by John Bowler

[permalink] [raw]
Subject: RE: [PATCH] 2.6.14-rc3 ixp4xx_copy_from little endian/alignment

From: David Vrabel [mailto:[email protected]]
>It appears that the NSLU2 only has the flash on the expansion bus which
>is why you believe it's a flash specific problem.

Which would also explain why a flash specific solution works... I'll
try an experiment without the XOR on the address and with data coherency.

The issue here is whether that's the right answer for hypothetical other
IXP4XX LE systems which have both flash and non-flash peripherals on EXP.

Still, it doesn't much matter - NSLU2 doesn't have such devices (so far as
I know - i.e. I believe that your statement about there being nothing else
on the EXP bus is correct) and we implemented this patch for NSLU2, even
though it isn't NSLU2 specific.

>Data coherency can be set on a per 1 Mibyte page basis so all other (APB
>and PCI) peripherals would continue to use address coherency and thus
>would continue to function as they are now.

Ok... so I'll try to find a way to do this in the board level code, or
maybe better in the IXP4XX setup (drivers/mtd/maps/ixp4xx.c?)

It's worth considering that, so far as I can see, nothing which uses
drivers/mtd/maps/ixp4xx.c will currently work in LE unless it is already
setting data coherency on the flash addresses. Is NSLU2 the first IXP4XX
system to run LE and to access the flash (from a running system, not from
the boot loader?)

John Bowler <[email protected]>