2011-02-02 16:00:38

by Arnd Bergmann

[permalink] [raw]
Subject: ARM unaligned MMIO access with attribute((packed))

As noticed by Peter Maydell, the EHCI device driver in Linux gets
miscompiled by some versions of arm-gcc (still need to find out which)
due to a combination of problems:

1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined
with __attribute__((packed)), for no good reason. This is clearly
a bug and needs to get fixed, but other drivers have the same bug
and it used to work. The attribute forces byte access on all members
accessed through pointer dereference, which is not allowed on
MMIO accesses in general. The specific code triggering the problem
in Peter's case is in ehci-omap.c:
omap->ehci->regs = hcd->regs
+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase));


2. The ARM version of the readl() function is implemented as a macro
doing a direct pointer dereference with a typecast:

#define __raw_readl(a) (__chk_io_ptr(a), *(volatile unsigned int __force *)(a))
#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \
__raw_readl(__mem_pci(c))); __v; })
#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; })

On other architectures, readl() is implemented using an inline assembly
specifically to prevent gcc from issuing anything but a single 32-bit
load instruction. readl() only makes sense on aligned memory, so in case
of a misaligned pointer argument, it should cause a trap anyway.

3. gcc does not seem to clearly define what happens during a cast between
aligned an packed pointers. In this case, the original pointer is packed
(byte aligned), while the access is done through a 32-bit aligned
volatile unsigned int pointer. In gcc-4.4, casting from "unsigned int
__attribute__((packed))" to "volatile unsigned int" resulted in a 32-bit
aligned access, while casting to "unsigned int" (without volatile) resulted
in four byte accesses. gcc-4.5 seems to have changed this to always do
a byte access in both cases, but still does not document the behavior.
(need to confirm this).

I would suggest fixing this by:

1. auditing all uses of __attribute__((packed)) in the Linux USB code
and other drivers, removing the ones that are potentially harmful.

2. Changing the ARM MMIO functions to use inline assembly instead of
direct pointer dereference.

3. Documenting the gcc behavior as undefined.

Other suggestions?

Arnd


2011-02-02 16:37:43

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, Feb 02, 2011 at 05:00:20PM +0100, Arnd Bergmann wrote:
> I would suggest fixing this by:
>
> 1. auditing all uses of __attribute__((packed)) in the Linux USB code
> and other drivers, removing the ones that are potentially harmful.
>
> 2. Changing the ARM MMIO functions to use inline assembly instead of
> direct pointer dereference.
>
> 3. Documenting the gcc behavior as undefined.

We used to use inline assembly at one point, but that got chucked out.
The problem is that using asm() for this causes GCC to generate horrid
code.

1. there's no way to tell GCC that the inline assembly is a load
instruction and therefore it needs to schedule the following
instructions appropriately.

2. GCC will needlessly reload pointers from structures and other such
behaviour because it can't be told clearly what the inline assembly
is doing, so the inline asm needs to have a "memory" clobber.

3. It seems to misses out using the pre-index addressing, prefering to
create add/sub instructions prior to each inline assembly load/store.

4. There are no (documented) constraints in GCC to allow you to represent
the offset format for the half-word instructions.

Overall, it means greater register pressure, more instructions, larger
functions, greater instruction cache pressure, etc.

2011-02-02 16:51:30

by Richard Biener

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, Feb 2, 2011 at 5:00 PM, Arnd Bergmann <[email protected]> wrote:
> As noticed by Peter Maydell, the EHCI device driver in Linux gets
> miscompiled by some versions of arm-gcc (still need to find out which)
> due to a combination of problems:
>
> 1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined
> with __attribute__((packed)), for no good reason. This is clearly
> a bug and needs to get fixed, but other drivers have the same bug
> and it used to work. The attribute forces byte access on all members
> accessed through pointer dereference, which is not allowed on
> MMIO accesses in general. The specific code triggering the problem
> in Peter's case is in ehci-omap.c:
> ? ? ? ?omap->ehci->regs = hcd->regs
> ? ? ? ? ? ? ? ?+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase));
>
>
> 2. The ARM version of the readl() function is implemented as a macro
> doing a direct pointer dereference with a typecast:
>
> #define __raw_readl(a) ? ? ? ? ?(__chk_io_ptr(a), *(volatile unsigned int __force ? *)(a))
> #define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?__raw_readl(__mem_pci(c))); __v; })
> #define readl(c) ? ? ? ? ? ? ? ?({ u32 __v = readl_relaxed(c); __iormb(); __v; })
>
> On other architectures, readl() is implemented using an inline assembly
> specifically to prevent gcc from issuing anything but a single 32-bit
> load instruction. readl() only makes sense on aligned memory, so in case
> of a misaligned pointer argument, it should cause a trap anyway.
>
> 3. gcc does not seem to clearly define what happens during a cast between
> aligned an packed pointers. In this case, the original pointer is packed
> (byte aligned), while the access is done through a 32-bit aligned
> volatile unsigned int pointer. In gcc-4.4, casting from "unsigned int
> __attribute__((packed))" to "volatile unsigned int" resulted in a 32-bit
> aligned access, while casting to "unsigned int" (without volatile) resulted
> in four byte accesses. gcc-4.5 seems to have changed this to always do
> a byte access in both cases, but still does not document the behavior.
> (need to confirm this).
>
> I would suggest fixing this by:
>
> 1. auditing all uses of __attribute__((packed)) in the Linux USB code
> and other drivers, removing the ones that are potentially harmful.
>
> 2. Changing the ARM MMIO functions to use inline assembly instead of
> direct pointer dereference.
>
> 3. Documenting the gcc behavior as undefined.

The pointer conversions already invoke undefined behavior as specified by the
C standard (6.3.2.3/7).

Richard.

2011-02-02 17:10:24

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, Feb 02, 2011 at 05:51:27PM +0100, Richard Guenther wrote:
> > I would suggest fixing this by:
> >
> > 1. auditing all uses of __attribute__((packed)) in the Linux USB code
> > and other drivers, removing the ones that are potentially harmful.
> >
> > 2. Changing the ARM MMIO functions to use inline assembly instead of
> > direct pointer dereference.
> >
> > 3. Documenting the gcc behavior as undefined.
>
> The pointer conversions already invoke undefined behavior as specified by the
> C standard (6.3.2.3/7).

Just to be clear: you are not saying that the ARM implementation is
undefined.

What you're saying is that converting from a pointer with less strict
alignment requirements to a pointer with more strict alignment
requirements is undefined.

IOW:

unsigned long *blah(unsigned char *c)
{
return (unsigned long *)c;
}

would be undefined, but:

unsigned char *blah(unsigned long *c)
{
return (unsigned char *)c;
}

would not be.

If you're saying something else, please explain with reference to the
point in the C standard you quote above.

2011-02-02 17:39:32

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wednesday 02 February 2011 17:37:02 Russell King - ARM Linux wrote:
> We used to use inline assembly at one point, but that got chucked out.
> The problem is that using asm() for this causes GCC to generate horrid
> code.
>
> 1. there's no way to tell GCC that the inline assembly is a load
> instruction and therefore it needs to schedule the following
> instructions appropriately.
>
> 2. GCC will needlessly reload pointers from structures and other such
> behaviour because it can't be told clearly what the inline assembly
> is doing, so the inline asm needs to have a "memory" clobber.
>
> 3. It seems to misses out using the pre-index addressing, prefering to
> create add/sub instructions prior to each inline assembly load/store.
>
> 4. There are no (documented) constraints in GCC to allow you to represent
> the offset format for the half-word instructions.
>
> Overall, it means greater register pressure, more instructions, larger
> functions, greater instruction cache pressure, etc.

Another solution would be to declare the readl function extern and define
it out of line, but I assume that this would be at least as bad as an
inline assembly for all the points you brought up, right?

Would it be possible to add the proper constraints for defining readl
in an efficient way to a future version of gcc? That wouldn't help us
in the near future, but we could at some points use those in a number
of places.

Arnd

2011-02-02 17:46:17

by Joseph Myers

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, 2 Feb 2011, Richard Guenther wrote:

> The pointer conversions already invoke undefined behavior as specified by the
> C standard (6.3.2.3/7).

I would say: the conversions are undefined if the pointer is
insufficiently aligned for any of the pointer types involved (source,
destination or intermediate), where the appropriate alignment for a packed
type is 1. Thus, the conversion from packed to non-packed is OK iff the
pointer target is sufficiently aligned for the non-packed type.

In general from a sequence of casts the compiler is permitted to deduce
that the pointer is sufficiently aligned for whatever type in the sequence
has the greatest alignment requirement (the middle-end may not have that
information at present, but the front end could insert some form of
alignment assertion if useful for optimization). *But* that is what is
permitted in standards terms; it is not necessarily safe in practice. In
particular, on non-strict-alignment targets such as x86 people do in
practice assume that unaligned accesses are OK at the C level, not just
the assembly level (glibc does so, for example), so it might be a bad idea
to assume alignment in a way that would cause that to break.

--
Joseph S. Myers
[email protected]

2011-02-02 19:14:54

by Ian Lance Taylor

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

Arnd Bergmann <[email protected]> writes:

> On Wednesday 02 February 2011 17:37:02 Russell King - ARM Linux wrote:
>> We used to use inline assembly at one point, but that got chucked out.
>> The problem is that using asm() for this causes GCC to generate horrid
>> code.
>>
>> 1. there's no way to tell GCC that the inline assembly is a load
>> instruction and therefore it needs to schedule the following
>> instructions appropriately.
>>
>> 2. GCC will needlessly reload pointers from structures and other such
>> behaviour because it can't be told clearly what the inline assembly
>> is doing, so the inline asm needs to have a "memory" clobber.
>>
>> 3. It seems to misses out using the pre-index addressing, prefering to
>> create add/sub instructions prior to each inline assembly load/store.
>>
>> 4. There are no (documented) constraints in GCC to allow you to represent
>> the offset format for the half-word instructions.
>>
>> Overall, it means greater register pressure, more instructions, larger
>> functions, greater instruction cache pressure, etc.
>
> Another solution would be to declare the readl function extern and define
> it out of line, but I assume that this would be at least as bad as an
> inline assembly for all the points you brought up, right?
>
> Would it be possible to add the proper constraints for defining readl
> in an efficient way to a future version of gcc? That wouldn't help us
> in the near future, but we could at some points use those in a number
> of places.

I think it would be quite difficult to implement item 1 above in a way
that was actually usable. It would require some way to describe the
scheduling requirements of an asm. But the details of scheduling are
backend specific. Internally there are define_insn_reservation
structures which have names, but the names are processor specific which
is not what you want in source code (by processor specific I mean
specific to particular CPUs within a family). There are define_cpu_unit
structures which also have names, but are again processor specific.
What you want here is some non-processor-specific way to describe the
characteristics of an instruction. gcc does not have that today.

Even if somebody implemented all that, most inline asms are not a single
instructions and thus would find it difficult to take advantage of it.
I don't see this as paying off in the long run.

A more likely payoff would be to develop builtin functions for whatever
functionality is required that can not expressed in source code.

Item 2 above can be done. It is possible to describe precisely which
areas of memory are clobbered.

Item 3 above seems impossible to me. There is no way to combine
compiler generated instructions with user written inline asm such that
pre-index addressing can be used. Perhaps I misunderstand.

Item 4 can be implemented; please consider opening a feature request in
bugzilla.

Ian

2011-02-02 21:37:57

by David Miller

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

From: Russell King - ARM Linux <[email protected]>
Date: Wed, 2 Feb 2011 16:37:02 +0000

> 1. there's no way to tell GCC that the inline assembly is a load
> instruction and therefore it needs to schedule the following
> instructions appropriately.

Just add a dummy '"m" (pointer)' asm input argument to the inline asm
statement. Just make sure "typeof(pointer)" has a size matching the
size of the load your are performing.

> 2. GCC will needlessly reload pointers from structures and other such
> behaviour because it can't be told clearly what the inline assembly
> is doing, so the inline asm needs to have a "memory" clobber.

This behavior is correct, and in fact needed. Writing to chip registers
can trigger changes to arbitrary main memory locations.

> 3. It seems to misses out using the pre-index addressing, prefering to
> create add/sub instructions prior to each inline assembly load/store.

Yes, this is indeed a problem.

But you really need that memory clobber there whether you like it or
not, see above.

2011-02-02 21:45:57

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, Feb 02, 2011 at 01:38:31PM -0800, David Miller wrote:
> From: Russell King - ARM Linux <[email protected]>
> Date: Wed, 2 Feb 2011 16:37:02 +0000
>
> > 1. there's no way to tell GCC that the inline assembly is a load
> > instruction and therefore it needs to schedule the following
> > instructions appropriately.
>
> Just add a dummy '"m" (pointer)' asm input argument to the inline asm
> statement. Just make sure "typeof(pointer)" has a size matching the
> size of the load your are performing.

That involves this problematical cast from a packed struct pointer to
an unsigned long pointer, which according to the C standard and GCC
folk is undefined.

> > 2. GCC will needlessly reload pointers from structures and other such
> > behaviour because it can't be told clearly what the inline assembly
> > is doing, so the inline asm needs to have a "memory" clobber.
>
> This behavior is correct, and in fact needed. Writing to chip registers
> can trigger changes to arbitrary main memory locations.

That is really not an argument which stands up to analysis.

When does main memory locations change as a result of a write to a chip
register? The answer is: when DMA is performed - which could be
many microseconds or even milliseconds after you've written the
register, which would be long after you've exited the function doing
the writing.

Not only that, but we have the DMA API to deal with the implications of
that. On ARM, that's a function call, and GCC can't make any assumptions
about memory contents across function calls where it doesn't know what
the function does.

Practice over the last 15 years on ARM has also shown that this is not
necessary.

2011-02-02 21:59:20

by David Miller

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

From: Russell King - ARM Linux <[email protected]>
Date: Wed, 2 Feb 2011 21:45:22 +0000

> On Wed, Feb 02, 2011 at 01:38:31PM -0800, David Miller wrote:
>> From: Russell King - ARM Linux <[email protected]>
>> Date: Wed, 2 Feb 2011 16:37:02 +0000
>>
>> > 1. there's no way to tell GCC that the inline assembly is a load
>> > instruction and therefore it needs to schedule the following
>> > instructions appropriately.
>>
>> Just add a dummy '"m" (pointer)' asm input argument to the inline asm
>> statement. Just make sure "typeof(pointer)" has a size matching the
>> size of the load your are performing.
>
> That involves this problematical cast from a packed struct pointer to
> an unsigned long pointer, which according to the C standard and GCC
> folk is undefined.

It's alignment may be undefined, but it's size definitely is well
defined and that's what matters here.

> Practice over the last 15 years on ARM has also shown that this is not
> necessary.

Sorry oh big super man, little ole' me is only a kernel newbie.

2011-02-02 23:08:22

by Måns Rullgård

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

David Miller <[email protected]> writes:

> From: Russell King - ARM Linux <[email protected]>
> Date: Wed, 2 Feb 2011 16:37:02 +0000
>
>> 1. there's no way to tell GCC that the inline assembly is a load
>> instruction and therefore it needs to schedule the following
>> instructions appropriately.
>
> Just add a dummy '"m" (pointer)' asm input argument to the inline asm
> statement. Just make sure "typeof(pointer)" has a size matching the
> size of the load your are performing.

That should be "m"(*pointer).

>> 2. GCC will needlessly reload pointers from structures and other such
>> behaviour because it can't be told clearly what the inline assembly
>> is doing, so the inline asm needs to have a "memory" clobber.
>
> This behavior is correct, and in fact needed. Writing to chip registers
> can trigger changes to arbitrary main memory locations.
>
>> 3. It seems to misses out using the pre-index addressing, prefering to
>> create add/sub instructions prior to each inline assembly load/store.
>
> Yes, this is indeed a problem.

GCC has trouble doing anything more complicated than simple indexing.
Load/store instructions with writeback seem not to be in its
vocabulary at all.

> But you really need that memory clobber there whether you like it or
> not, see above.

I don't know of any device where the side-effects are not explicitly
indicated by other means in the code triggering them, so it probably
is safe without the clobber as Russel says.

--
M?ns Rullg?rd
[email protected]

2011-02-02 23:23:16

by David Miller

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

From: M?ns Rullg?rd <[email protected]>
Date: Wed, 02 Feb 2011 23:08:01 +0000

> David Miller <[email protected]> writes:
>
>> From: Russell King - ARM Linux <[email protected]>
>> Date: Wed, 2 Feb 2011 16:37:02 +0000
>>
>>> 1. there's no way to tell GCC that the inline assembly is a load
>>> instruction and therefore it needs to schedule the following
>>> instructions appropriately.
>>
>> Just add a dummy '"m" (pointer)' asm input argument to the inline asm
>> statement. Just make sure "typeof(pointer)" has a size matching the
>> size of the load your are performing.
>
> That should be "m"(*pointer).

Right, thanks for the correction.

>> But you really need that memory clobber there whether you like it or
>> not, see above.
>
> I don't know of any device where the side-effects are not explicitly
> indicated by other means in the code triggering them, so it probably
> is safe without the clobber as Russel says.

You're probably right.

2011-02-03 09:27:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Thursday 03 February 2011 00:08:01 M?ns Rullg?rd wrote:
> > But you really need that memory clobber there whether you like it or
> > not, see above.
>
> I don't know of any device where the side-effects are not explicitly
> indicated by other means in the code triggering them, so it probably
> is safe without the clobber as Russel says.

On configurations that have CONFIG_ARM_DMA_MEM_BUFFERABLE set, this is
definitely true, since they use the rmb() and wmb() that include
both an IO memory barrier instruction where required and a compiler barrier
(i.e. __asm__ __volatile__ ("" : : : "memory")):

8<-------------- from arch/arm/include/asm/io.h

#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#define __iormb() rmb()
#define __iowmb() wmb()
#else
#define __iormb() do { } while (0)
#define __iowmb() do { } while (0)
#endif

#define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(); __v; })
#define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(); __v; })
#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; })

#define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); })
#define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); })
#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

8<---------------

Also, as Russell mentioned, anything using the streaming DMA mapping API
is fine because of the barriers included in the function calls there.

However, I would think that this fictional piece of code would be valid
for a possible PCI device driver (though inefficient) and not require
any additional synchronizing operations:

void foo_perform_operation(struct foo_dev *d, u32 in, u32 *out)
{
dma_addr_t dma_addr;
u32 *cpu_addr;

/*
* get memory from the consistent DMA mapping API, typically
* uncached memory on ARM, but could be anywhere if the DMA
* is coherent.
*/
cpu_addr = dma_alloc_coherent(&d->dev, sizeof (*cpu_addr),
&dma_addr, GFP_KERNEL);

/* lock the device, not required for the example, but normally
* needed in practice for SMP operation.
*/
spin_lock(&d->lock);

/* initialize the DMA data */
*cpu_addr = in;

/*
* send a posted 32 bit write to the device, triggering the
* start of the DMA read from *cpu_addr, which is followed by
* a DMA write to *cpu_addr. writel includes a barrier that
* makes sure that the previous store to *cpu_addr is visible
* to the DMA, but does not block until the completion like
* outl() would.
*/
writel(dma_addr, d->mmio_addr);

/*
* synchronize the outbound posted write, wait for the device
* to complete the DMA and synchronize the DMA data on its
* inbound path.
*/
(void)readl(d->mmio_addr);

/*
* *cpu_addr contains data just written to by the device, and
* the readl includes all the necessary barriers to ensure
* it's really there when we access it.
*/
*out = *cpu_addr;

/* unlock the device */
spin_unlock(&d->lock);

/* free the DMA memory */
dma_free_coherent(&d->dev, sizeof (*cpu_addr), cpu_addr, dma_addr);
}

However, when readl contains no explicit or implicit synchronization, the
load from *cpu_addr might get pulled in front of the load from mmio_addr,
resulting in invalid output data. If this is the case, it is be a problem
on almost all architectures (not x86, powerpc or sparc64).

Am I missing something here?

Arnd

2011-02-03 15:04:15

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wednesday 02 February 2011, Russell King - ARM Linux wrote:
> On Wed, Feb 02, 2011 at 05:00:20PM +0100, Arnd Bergmann wrote:
> > I would suggest fixing this by:
> >
> > 2. Changing the ARM MMIO functions to use inline assembly instead of
> > direct pointer dereference.
>
> We used to use inline assembly at one point, but that got chucked out.
> The problem is that using asm() for this causes GCC to generate horrid
> code.

Here is an alternative suggestion, would that work?

8<-----------
arm: avoid unaligned arguments to MMIO functions

The readw/writew/readl/writel functions require aligned naturally arguments
which are accessed only using atomic load and store instructions, never
using byte or read-modify-write write accesses.

At least one driver (ehci-hcd) annotates its MMIO registers as
__attribute__((packed)), which causes some versions of gcc to generate
byte accesses due to an undefined behavior when casting a packed u32
to an aligned u32 variable.

There does not seem to be an efficient way to force gcc to do word
accesses, but we can detect the problem and refuse to build broken
code: This adds a check in these functions to ensure that their arguments
are either naturally aligned or they are void pointers.

Signed-off-by: Arnd Bergmann <[email protected]>
---

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 20e0f7c..b98f0bc 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -180,17 +180,36 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
* IO port primitives for more information.
*/
#ifdef __mem_pci
+
+/*
+ * Ensure natural alignment of arguments:
+ * It is not allowed to pass a pointer to a packed variable into
+ * the readl/writel family of functions, because gcc may decide
+ * to create byte accesses that are illegal on multi-byte MMIO
+ * registers.
+ * A lot of code uses void pointers, which is fine.
+ */
+#define __checkalign(p) BUILD_BUG_ON( \
+ !__builtin_types_compatible_p(__typeof__(p), void *) && \
+ (__alignof__ (*(p)) < sizeof (*(p))))
+
#define readb_relaxed(c) ({ u8 __v = __raw_readb(__mem_pci(c)); __v; })
-#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16) \
- __raw_readw(__mem_pci(c))); __v; })
-#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \
- __raw_readl(__mem_pci(c))); __v; })
+#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16) \
+ __raw_readw(__mem_pci(c))); \
+ __checkalign(c); __v; })
+#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \
+ __raw_readl(__mem_pci(c))); \
+ __checkalign(c); __v; })

#define writeb_relaxed(v,c) ((void)__raw_writeb(v,__mem_pci(c)))
-#define writew_relaxed(v,c) ((void)__raw_writew((__force u16) \
- cpu_to_le16(v),__mem_pci(c)))
-#define writel_relaxed(v,c) ((void)__raw_writel((__force u32) \
- cpu_to_le32(v),__mem_pci(c)))
+#define writew_relaxed(v,c) do { (void)__raw_writew((__force u16) \
+ cpu_to_le16(v),__mem_pci(c)); \
+ __checkalign(c); \
+ } while (0);
+#define writel_relaxed(v,c) do { (void)__raw_writel((__force u32) \
+ cpu_to_le32(v),__mem_pci(c)); \
+ __checkalign(c); \
+ } while (0);

#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#define __iormb() rmb()

2011-04-26 15:01:34

by Rabin Vincent

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, Feb 2, 2011 at 21:30, Arnd Bergmann <[email protected]> wrote:
> As noticed by Peter Maydell, the EHCI device driver in Linux gets
> miscompiled by some versions of arm-gcc (still need to find out which)
> due to a combination of problems:
>
> 1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined
> with __attribute__((packed)), for no good reason. This is clearly
> a bug and needs to get fixed, but other drivers have the same bug

Was a patch submitted for this? I couldn't find it in the archives.
U-Boot seems to be fixing this by adding an "aligned(4)" instead
of removing the packed:

http://www.mail-archive.com/[email protected]/msg51418.html

> and it used to work. The attribute forces byte access on all members
> accessed through pointer dereference, which is not allowed on
> MMIO accesses in general. The specific code triggering the problem
> in Peter's case is in ehci-omap.c:
> ? ? ? ?omap->ehci->regs = hcd->regs
> ? ? ? ? ? ? ? ?+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase));

In my case it's this writel() in ehci-hub.c that gets chopped into
strbs:

/* force reset to complete */
ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET),
status_reg);

2011-04-26 18:51:04

by Alan Stern

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Tue, 26 Apr 2011, Rabin Vincent wrote:

> On Wed, Feb 2, 2011 at 21:30, Arnd Bergmann <[email protected]> wrote:
> > As noticed by Peter Maydell, the EHCI device driver in Linux gets
> > miscompiled by some versions of arm-gcc (still need to find out which)
> > due to a combination of problems:
> >
> > 1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined
> > with __attribute__((packed)), for no good reason. This is clearly
> > a bug and needs to get fixed, but other drivers have the same bug
>
> Was a patch submitted for this? I couldn't find it in the archives.
> U-Boot seems to be fixing this by adding an "aligned(4)" instead
> of removing the packed:
>
> http://www.mail-archive.com/[email protected]/msg51418.html

ISTR a patch was submitted, but apparently it never got picked up.

> > and it used to work. The attribute forces byte access on all members
> > accessed through pointer dereference, which is not allowed on
> > MMIO accesses in general. The specific code triggering the problem
> > in Peter's case is in ehci-omap.c:
> > ? ? ? ?omap->ehci->regs = hcd->regs
> > ? ? ? ? ? ? ? ?+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase));
>
> In my case it's this writel() in ehci-hub.c that gets chopped into
> strbs:
>
> /* force reset to complete */
> ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET),
> status_reg);

Why would that get messed up? The status_reg variable doesn't have any
__atribute__((packed)) associated with it.

Alan Stern

2011-04-27 14:06:45

by Rabin Vincent

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, Apr 27, 2011 at 00:21, Alan Stern <[email protected]> wrote:
> On Tue, 26 Apr 2011, Rabin Vincent wrote:
>> In my case it's this writel() in ehci-hub.c that gets chopped into
>> strbs:
>>
>> ? ? ? /* force reset to complete */
>> ? ? ? ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET),
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? status_reg);
>
> Why would that get messed up? ?The status_reg variable doesn't have any
> __atribute__((packed)) associated with it.

The initialization of status_reg is:

u32 __iomem *status_reg
= &ehci->regs->port_status[(wIndex & 0xff) - 1];

where ehci->regs is a pointer to the packed struct ehci_regs. So, this
is the same problem of casting pointers to stricter alignment.

2011-04-27 16:25:43

by Alan Stern

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, 27 Apr 2011, Rabin Vincent wrote:

> On Wed, Apr 27, 2011 at 00:21, Alan Stern <[email protected]> wrote:
> > On Tue, 26 Apr 2011, Rabin Vincent wrote:
> >> In my case it's this writel() in ehci-hub.c that gets chopped into
> >> strbs:
> >>
> >> ? ? ? /* force reset to complete */
> >> ? ? ? ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET),
> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? status_reg);
> >
> > Why would that get messed up? ?The status_reg variable doesn't have any
> > __atribute__((packed)) associated with it.
>
> The initialization of status_reg is:
>
> u32 __iomem *status_reg
> = &ehci->regs->port_status[(wIndex & 0xff) - 1];
>
> where ehci->regs is a pointer to the packed struct ehci_regs. So, this
> is the same problem of casting pointers to stricter alignment.

Right. I can understand the compiler complaining about the cast to
stricter alignment during the initialization. But I don't understand
why that would affect the code generated for the writel function.

Alan Stern

2011-04-27 16:39:12

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wednesday 27 April 2011 18:25:40 Alan Stern wrote:
> On Wed, 27 Apr 2011, Rabin Vincent wrote:
>
> > On Wed, Apr 27, 2011 at 00:21, Alan Stern <[email protected]> wrote:
> > > On Tue, 26 Apr 2011, Rabin Vincent wrote:
> > >> In my case it's this writel() in ehci-hub.c that gets chopped into
> > >> strbs:
> > >>
> > >> � � � /* force reset to complete */
> > >> � � � ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET),
> > >> � � � � � � � � � � � � � � � status_reg);
> > >
> > > Why would that get messed up? �The status_reg variable doesn't have any
> > > __atribute__((packed)) associated with it.
> >
> > The initialization of status_reg is:
> >
> > u32 __iomem *status_reg
> > = &ehci->regs->port_status[(wIndex & 0xff) - 1];
> >
> > where ehci->regs is a pointer to the packed struct ehci_regs. So, this
> > is the same problem of casting pointers to stricter alignment.
>
> Right. I can understand the compiler complaining about the cast to
> stricter alignment during the initialization. But I don't understand
> why that would affect the code generated for the writel function.

The compiler does not complain, it just silently assumes that it needs
to do byte accesses. There is no way to tell the compiler to ignore
what it knows about the alignment, other than using inline assembly
for the actual pointer dereference. Most architectures today do that,
but on ARM it comes down to "*(u32 *)status_reg = temp".

Arnd

2011-04-28 13:35:29

by Alan Stern

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Wed, 27 Apr 2011, Arnd Bergmann wrote:

> On Wednesday 27 April 2011 18:25:40 Alan Stern wrote:
> > On Wed, 27 Apr 2011, Rabin Vincent wrote:
> >
> > > On Wed, Apr 27, 2011 at 00:21, Alan Stern <[email protected]> wrote:
> > > > On Tue, 26 Apr 2011, Rabin Vincent wrote:
> > > >> In my case it's this writel() in ehci-hub.c that gets chopped into
> > > >> strbs:
> > > >>
> > > >> � � � /* force reset to complete */
> > > >> � � � ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET),
> > > >> � � � � � � � � � � � � � � � status_reg);
> > > >
> > > > Why would that get messed up? �The status_reg variable doesn't have any
> > > > __atribute__((packed)) associated with it.
> > >
> > > The initialization of status_reg is:
> > >
> > > u32 __iomem *status_reg
> > > = &ehci->regs->port_status[(wIndex & 0xff) - 1];
> > >
> > > where ehci->regs is a pointer to the packed struct ehci_regs. So, this
> > > is the same problem of casting pointers to stricter alignment.
> >
> > Right. I can understand the compiler complaining about the cast to
> > stricter alignment during the initialization. But I don't understand
> > why that would affect the code generated for the writel function.
>
> The compiler does not complain, it just silently assumes that it needs
> to do byte accesses. There is no way to tell the compiler to ignore
> what it knows about the alignment, other than using inline assembly
> for the actual pointer dereference. Most architectures today do that,
> but on ARM it comes down to "*(u32 *)status_reg = temp".

Ah -- so the compiler associates the alignment attribute with the data
value and not with the variable's type? I didn't know that.

Alan Stern

2011-04-28 14:18:07

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ARM unaligned MMIO access with attribute((packed))

On Thursday 28 April 2011, Alan Stern wrote:
> > The compiler does not complain, it just silently assumes that it needs
> > to do byte accesses. There is no way to tell the compiler to ignore
> > what it knows about the alignment, other than using inline assembly
> > for the actual pointer dereference. Most architectures today do that,
> > but on ARM it comes down to "*(u32 *)status_reg = temp".
>
> Ah -- so the compiler associates the alignment attribute with the data
> value and not with the variable's type? I didn't know that.

The behavior here is unspecified because the underlying typecase
is not valid. Gcc apparently uses some heuristics trying to do
the right thing, and in recent versions that heuristic seems to
have changed.

Arnd