This fixed some jffs2 alignment problems we saw on an IXP425 based
XScale board. I just got pinged that I was supposed to post this patch
in case anyone else finds it usefull. This was against a modified 2.4.19
kernel.
Greg Weeks
Signed-off-by: Greg Weeks <[email protected]> under TS0067
--- kernel/drivers/mtd/chips/cfi_cmdset_0001.c-orig
+++ kernel/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -933,7 +933,7 @@
cfi_word status, status_OK;
unsigned long cmd_adr, timeo;
DECLARE_WAITQUEUE(wait, current);
- int wbufsize, z, suspended=0, ret=0;
+ int wbufsize, z, y, suspended=0, ret=0;
wbufsize = CFIDEV_INTERLEAVE << cfi->cfiq->MaxBufWriteSize;
adr += chip->start;
@@ -1064,12 +1064,18 @@
/* Write data */
for (z = 0; z < len; z += CFIDEV_BUSWIDTH) {
+ cfi_word aligned_val;
+ u_char *align_ptr = (u_char *) &aligned_val;
+
+ for (y = 0; y < CFIDEV_BUSWIDTH; y++)
+ align_ptr[y] = *(buf++);
+
if (cfi_buswidth_is_1()) {
- map->write8 (map, *((__u8*)buf)++, adr+z);
+ map->write8 (map, *((__u8 *) &aligned_val), adr+z);
} else if (cfi_buswidth_is_2()) {
- map->write16 (map, *((__u16*)buf)++, adr+z);
+ map->write16 (map, *((__u16 *) &aligned_val),
adr+z);
} else if (cfi_buswidth_is_4()) {
- map->write32 (map, *((__u32*)buf)++, adr+z);
+ map->write32 (map, *((__u32 *) &aligned_val),
adr+z);
} else {
DISABLE_VPP(map);
ret = -EINVAL;
On Monday 07 June 2004 17:08, Greg Weeks wrote:
> This fixed some jffs2 alignment problems we saw on an IXP425 based
> XScale board. I just got pinged that I was supposed to post this patch
> in case anyone else finds it usefull. This was against a modified 2.4.19
> kernel.
Enable CONFIG_ALIGNMENT_TRAP instead of tweaking the code.
JFFS2 / MTD must be allowed to do unaligned access
--
Thomas
________________________________________________________________________
Steve Ballmer quotes the statistic that IT pros spend 70 percent of their
time managing existing systems. That couldn’t have anything to do with
the fact that 99 percent of these systems run Windows, could it?
________________________________________________________________________
linutronix - competence in embedded & realtime linux
http://www.linutronix.de
mail: [email protected]
On Mon, 7 Jun 2004, Thomas Gleixner wrote:
>
> On Monday 07 June 2004 17:08, Greg Weeks wrote:
> > This fixed some jffs2 alignment problems we saw on an IXP425 based
> > XScale board. I just got pinged that I was supposed to post this patch
> > in case anyone else finds it usefull. This was against a modified 2.4.19
> > kernel.
>
> Enable CONFIG_ALIGNMENT_TRAP instead of tweaking the code.
> JFFS2 / MTD must be allowed to do unaligned access
Wrong.
Pleas fix jffs2 to use the proper "get_unaligned()"/"put_unaligned()"
instead.
Emulating unaligned accesses with traps (even even the architecture
supports it, which isn't universally true) is _stupid_ when we have
perfectly well-defined macros for them that do it faster and are
_designed_ for this.
On architectures where it doesn't matter, the macros just do the access,
so it's not like you're slowing anything down.
Linus
On Mon, Jun 07, 2004 at 09:03:07AM -0700, Linus Torvalds wrote:
> On Mon, 7 Jun 2004, Thomas Gleixner wrote:
> >
> > On Monday 07 June 2004 17:08, Greg Weeks wrote:
> > > This fixed some jffs2 alignment problems we saw on an IXP425 based
> > > XScale board. I just got pinged that I was supposed to post this patch
> > > in case anyone else finds it usefull. This was against a modified 2.4.19
> > > kernel.
> >
> > Enable CONFIG_ALIGNMENT_TRAP instead of tweaking the code.
> > JFFS2 / MTD must be allowed to do unaligned access
>
> Wrong.
>
> Pleas fix jffs2 to use the proper "get_unaligned()"/"put_unaligned()"
> instead.
>
> Emulating unaligned accesses with traps (even even the architecture
> supports it, which isn't universally true) is _stupid_ when we have
> perfectly well-defined macros for them that do it faster and are
> _designed_ for this.
>
> On architectures where it doesn't matter, the macros just do the access,
> so it's not like you're slowing anything down.
>
> Linus
I'll let you have a bun fight with dwmw2 and networking people over
this. I'm standing well clear. 8)
[Added dwmw2 and dropped linux-arm-kernel]
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
On Mon, 2004-06-07 at 17:41 +0100, Russell King wrote:
> > Pleas fix jffs2 to use the proper "get_unaligned()"/"put_unaligned()"
> > instead.
> I'll let you have a bun fight with dwmw2 and networking people over
> this. I'm standing well clear. 8)
S'fine by me. I already added get_unaligned() to the flash drivers years
ago -- it's actually those, not JFFS2 itself which needs it. Alan
insisted I remove it on the basis that much of our network code is
similarly broken anyway; an arch without fixups is dead in the water
whatever happens.
Admittedly it's not _entirely_ stupid in the network code, because a
direct load is often a lot faster than the byte-wise load emitted by
get_unaligned(). If alignment is the common case, it makes some sense to
optimise that and take the trap only occasionally.
But with uClinux being merged (many uClinux arches can't fix up
alignment at all), I think we should revisit that decision.
Anything which _could_ be unaligned should be marked as such, even if we
do have two levels ('possibly unaligned', 'probably unaligned') where
the latter behaves purely as an optimisation on most arches, just like
our current get_unaligned() does.
Or in fact since the ratio between the speeds of the inline and the trap
version varies wildly from arch to arch, and hence the point at which it
becomes more efficient just to use the current get_unaligned() varies --
perhaps we should have 'get_unaligned(ptr, probability)' and let the
arch code decide where the threshold should be. It's not that hard to
make it go away completely given __builtin_constant_p(probability) and
some preprocessor macros.
--
dwmw2
On Mon, 7 Jun 2004, David Woodhouse wrote:
> > > Pleas fix jffs2 to use the proper "get_unaligned()"/"put_unaligned()"
> > > instead.
>
> > I'll let you have a bun fight with dwmw2 and networking people over
> > this. I'm standing well clear. 8)
>
> S'fine by me. I already added get_unaligned() to the flash drivers years
> ago -- it's actually those, not JFFS2 itself which needs it. Alan
> insisted I remove it on the basis that much of our network code is
> similarly broken anyway; an arch without fixups is dead in the water
> whatever happens.
I don't see it as a correctness issue, I see it as a performance issue.
Yes, the old ARM chips that can't do unaligned accesses and can't even
trap on them probably have a number of cases where they literally do the
wrong thing, and I think most people will say "tough luck, don't do that
then".
But get_unaligned() makes sense quite apart from that usage too. Notably,
many architectures can cheaply do unaligned accesses when they are known
to be unaligned, but take thousands of cycles to fix up traps. Alpha comes
to mind, and this was actually what "get_unaligned()" was really designed
for.
> Anything which _could_ be unaligned should be marked as such, even if we
> do have two levels ('possibly unaligned', 'probably unaligned') where
> the latter behaves purely as an optimisation on most arches, just like
> our current get_unaligned() does.
Right now we might as well consider the "get_unaligned()" to be a "quite
possibly unaligned" as opposed to "this just _might_ be unaligned".
That's the current usage pattern anyway. Nothing really _guarantees_ that
unaligned accesses are marked with "get_unaligned()", but the unaligned
fixup is at least supposed to be _rare_.
Put another way: imagine if the unaligned handler did a printk(). That
shouldn't flood the logs under normal load.
Linus
On Mon, 2004-06-07 at 12:22 -0700, Linus Torvalds wrote:
> I don't see it as a correctness issue, I see it as a performance issue.
In the case in question it's very much _not_ a performance issue. We're
writing a buffer to FLASH memory. The time it takes to read the word
from RAM is entirely lost in the noise compared with the time it takes
to write it to the flash.
Most of the time the buffer passed to the flash write routines will be
word-aligned. Occasionally it'll be unaligned but since there's a
distinct correlation between those arches on which we can't do fixups
and those machines on which flash is primary storage, putting
get_unaligned() in is a _correctness_ issue. And I don't care that it
might be slightly slower in the common case; as I said, it's in the
noise.
> Yes, the old ARM chips that can't do unaligned accesses and can't even
> trap on them probably have a number of cases where they literally do the
> wrong thing, and I think most people will say "tough luck, don't do that
> then".
Not just old ARM chips. Some new chips too; especially MMU-less ones.
> But get_unaligned() makes sense quite apart from that usage too. Notably,
> many architectures can cheaply do unaligned accesses when they are known
> to be unaligned, but take thousands of cycles to fix up traps. Alpha comes
> to mind, and this was actually what "get_unaligned()" was really designed
> for.
Yes. That's why I suggested we should have two forms -- for 'possibly'
and 'probably' unaligned.
> > Anything which _could_ be unaligned should be marked as such, even if we
> > do have two levels ('possibly unaligned', 'probably unaligned') where
> > the latter behaves purely as an optimisation on most arches, just like
> > our current get_unaligned() does.
>
> Right now we might as well consider the "get_unaligned()" to be a "quite
> possibly unaligned" as opposed to "this just _might_ be unaligned".
Yes. That's why I was told to remove our current get_unaligned() from
the flash drivers. I'm perfectly happy to put it back.
--
dwmw2
On Mon, 7 Jun 2004, David Woodhouse wrote:
>
> On Mon, 2004-06-07 at 12:22 -0700, Linus Torvalds wrote:
> > I don't see it as a correctness issue, I see it as a performance issue.
>
> In the case in question it's very much _not_ a performance issue. We're
> writing a buffer to FLASH memory. The time it takes to read the word
> from RAM is entirely lost in the noise compared with the time it takes
> to write it to the flash.
Not if you have to take an alignment fault, which is easily several
thousand cycles.
Think of "get_unaligned()" as a worst-case limiter. It can make the best
case be worse on architectures where it matters, but it can make the worst
case go from thousands of cycles to just single cycles.
And your flash isn't _that_ slow. Thousands of cycles that can't even
overlap with any flash IO _does_ show up.
Now, whether the unaligned case is common enough for people to even worry,
I don't know.
Linus
On Monday 07 June 2004 22:54, Linus Torvalds wrote:
> On Mon, 7 Jun 2004, David Woodhouse wrote:
> > On Mon, 2004-06-07 at 12:22 -0700, Linus Torvalds wrote:
> > > I don't see it as a correctness issue, I see it as a performance issue.
> >
> > In the case in question it's very much _not_ a performance issue. We're
> > writing a buffer to FLASH memory. The time it takes to read the word
> > from RAM is entirely lost in the noise compared with the time it takes
> > to write it to the flash.
>
> Not if you have to take an alignment fault, which is easily several
> thousand cycles.
>
> Think of "get_unaligned()" as a worst-case limiter. It can make the best
> case be worse on architectures where it matters, but it can make the worst
> case go from thousands of cycles to just single cycles.
>
> And your flash isn't _that_ slow. Thousands of cycles that can't even
> overlap with any flash IO _does_ show up.
Not the IO write to the FLASH is the slow and noisy part, its the programming
of the FLASH after writing the data which blocks for a non deteministic time
in the range of milliseconds.
So we did not care if it took ms + x µs due to an alignement trap
--
Thomas
________________________________________________________________________
Steve Ballmer quotes the statistic that IT pros spend 70 percent of their
time managing existing systems. That couldn’t have anything to do with
the fact that 99 percent of these systems run Windows, could it?
________________________________________________________________________
linutronix - competence in embedded & realtime linux
http://www.linutronix.de
mail: [email protected]
On Mon, 2004-06-07 at 22:56 +0200, Thomas Gleixner wrote:
> So we did not care if it took ms + x µs due to an alignement trap
Indeed. But since many machines I care about can't fix up unaligned
accesses I'm _more_ than happy to obey the original decree that I should
put back the get_unaligned() calls; just ignoring the stated reasons.
Let's not argue Linus out of it :)
Greg, please make sure you Cc me on flash or jffs2-related changes in
future.
--
dwmw2
On Monday 07 June 2004 23:29, David Woodhouse wrote:
> On Mon, 2004-06-07 at 22:56 +0200, Thomas Gleixner wrote:
> > So we did not care if it took ms + x µs due to an alignement trap
>
> Indeed. But since many machines I care about can't fix up unaligned
> accesses I'm _more_ than happy to obey the original decree that I should
> put back the get_unaligned() calls; just ignoring the stated reasons.
> Let's not argue Linus out of it :)
I do not argue. I just stated why nobody took care of it.
Cite from old mail's:
> Yes. Enable the alignment abort handler. In later kernels, its a
> fundamental requirement that the alignment abort handler is enabled
> before you'll even get to see the jffs2 or MTD options.
>> Checking this before the unaligned access would be faster <SNIP>
> It's handled by the alignment abort handler already.
--
Thomas
________________________________________________________________________
Steve Ballmer quotes the statistic that IT pros spend 70 percent of their
time managing existing systems. That couldn’t have anything to do with
the fact that 99 percent of these systems run Windows, could it?
________________________________________________________________________
linutronix - competence in embedded & realtime linux
http://www.linutronix.de
mail: [email protected]