2010-08-10 20:36:27

by Krzysztof Halasa

[permalink] [raw]
Subject: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

Hi,

Noticed this problem a bit late :-(

6fee48cd330c68332f9712bc968d934a1a84a32a broke
pci_set_consistent_dma_mask() on IXP4xx and most probably PXA. Affected
devices are e.g. IDE controller (CS5536-based: disk inaccessible) and
e1000 ethernet ("Detected Tx Unit Hang").

The attached patch makes it work again, though I'm not sure it's the
best solution.

Ideas?

Signed-off-by: Krzysztof HaƂasa <[email protected]>

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ca32ed7..bd2a7d3 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -129,6 +129,14 @@ static inline u64 dma_get_mask(struct device *dev)

static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
{
+#ifdef CONFIG_DMABOUNCE
+ if (dev->archdata.dmabounce) {
+ if (mask >= ISA_DMA_THRESHOLD)
+ return 0;
+ else
+ return -EIO;
+ }
+#endif
if (!dma_supported(dev, mask))
return -EIO;
dev->coherent_dma_mask = mask;


2010-08-11 02:06:25

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Tue, 10 Aug 2010 22:36:21 +0200
Krzysztof Halasa <[email protected]> wrote:

> 6fee48cd330c68332f9712bc968d934a1a84a32a broke
> pci_set_consistent_dma_mask() on IXP4xx and most probably PXA. Affected
> devices are e.g. IDE controller (CS5536-based: disk inaccessible) and
> e1000 ethernet ("Detected Tx Unit Hang").

Sorry about that.

> The attached patch makes it work again, though I'm not sure it's the
> best solution.

I think that we should avoid adding "#ifdef CONFIG_DMABOUNCE" to a
generic place.

Why the above patch breaks dmabounce.c? We can't set dev->coherent_dma_mask?


> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index ca32ed7..bd2a7d3 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -129,6 +129,14 @@ static inline u64 dma_get_mask(struct device *dev)
>
> static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
> {
> +#ifdef CONFIG_DMABOUNCE
> + if (dev->archdata.dmabounce) {
> + if (mask >= ISA_DMA_THRESHOLD)
> + return 0;
> + else
> + return -EIO;
> + }
> +#endif
> if (!dma_supported(dev, mask))
> return -EIO;
> dev->coherent_dma_mask = mask;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-08-11 07:25:55

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Wed, Aug 11, 2010 at 11:06:00AM +0900, FUJITA Tomonori wrote:
> On Tue, 10 Aug 2010 22:36:21 +0200
> Krzysztof Halasa <[email protected]> wrote:
>
> > 6fee48cd330c68332f9712bc968d934a1a84a32a broke
> > pci_set_consistent_dma_mask() on IXP4xx and most probably PXA. Affected
> > devices are e.g. IDE controller (CS5536-based: disk inaccessible) and
> > e1000 ethernet ("Detected Tx Unit Hang").
>
> Sorry about that.
>
> > The attached patch makes it work again, though I'm not sure it's the
> > best solution.
>
> I think that we should avoid adding "#ifdef CONFIG_DMABOUNCE" to a
> generic place.
>
> Why the above patch breaks dmabounce.c? We can't set dev->coherent_dma_mask?

It doesn't break dmabounce.

What it breaks is the fact that a PCI device which can do 32-bit DMA is
connected to a PCI bus which can only access the first 64MB of memory
through the host bridge, but the system has more than 64MB available.

Allowing a 32-bit DMA mask means that dmabounce can't detect that memory
above 64MB needs to be bounced to memory below the 64MB boundary.

2010-08-13 06:24:27

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Wed, 11 Aug 2010 08:25:32 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Wed, Aug 11, 2010 at 11:06:00AM +0900, FUJITA Tomonori wrote:
> > On Tue, 10 Aug 2010 22:36:21 +0200
> > Krzysztof Halasa <[email protected]> wrote:
> >
> > > 6fee48cd330c68332f9712bc968d934a1a84a32a broke
> > > pci_set_consistent_dma_mask() on IXP4xx and most probably PXA. Affected
> > > devices are e.g. IDE controller (CS5536-based: disk inaccessible) and
> > > e1000 ethernet ("Detected Tx Unit Hang").
> >
> > Sorry about that.
> >
> > > The attached patch makes it work again, though I'm not sure it's the
> > > best solution.
> >
> > I think that we should avoid adding "#ifdef CONFIG_DMABOUNCE" to a
> > generic place.
> >
> > Why the above patch breaks dmabounce.c? We can't set dev->coherent_dma_mask?
>
> It doesn't break dmabounce.
>
> What it breaks is the fact that a PCI device which can do 32-bit DMA is
> connected to a PCI bus which can only access the first 64MB of memory
> through the host bridge, but the system has more than 64MB available.
>
> Allowing a 32-bit DMA mask means that dmabounce can't detect that memory
> above 64MB needs to be bounced to memory below the 64MB boundary.

But dmabounce doesn't look at dev->coherent_dma_mask.

The change breaks __dma_alloc_buffer()? If we set dev->coherent_dma_mask
to DMA_BIT_MASK(32) for ixp4xx's pci devices, __dma_alloc_buffer()
doesn't use GFP_DMA.

2010-08-13 21:54:33

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Fri, Aug 13, 2010 at 03:23:53PM +0900, FUJITA Tomonori wrote:
> On Wed, 11 Aug 2010 08:25:32 +0100
> Russell King - ARM Linux <[email protected]> wrote:
> > It doesn't break dmabounce.
> >
> > What it breaks is the fact that a PCI device which can do 32-bit DMA is
> > connected to a PCI bus which can only access the first 64MB of memory
> > through the host bridge, but the system has more than 64MB available.
> >
> > Allowing a 32-bit DMA mask means that dmabounce can't detect that memory
> > above 64MB needs to be bounced to memory below the 64MB boundary.
>
> But dmabounce doesn't look at dev->coherent_dma_mask.
>
> The change breaks __dma_alloc_buffer()? If we set dev->coherent_dma_mask
> to DMA_BIT_MASK(32) for ixp4xx's pci devices, __dma_alloc_buffer()
> doesn't use GFP_DMA.

With an incorrect coherent_dma_mask, dma_alloc_coherent() will return
memory outside of the 64MB window. This means that when dmabounce comes
to allocate the replacement buffer, it gets a buffer which won't be
accessible to the DMA controller - which is a condition it doesn't check
for.

Ergo, it breaks dmabounce.

2010-08-14 09:31:16

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Fri, 13 Aug 2010 22:54:13 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Fri, Aug 13, 2010 at 03:23:53PM +0900, FUJITA Tomonori wrote:
> > On Wed, 11 Aug 2010 08:25:32 +0100
> > Russell King - ARM Linux <[email protected]> wrote:
> > > It doesn't break dmabounce.
> > >
> > > What it breaks is the fact that a PCI device which can do 32-bit DMA is
> > > connected to a PCI bus which can only access the first 64MB of memory
> > > through the host bridge, but the system has more than 64MB available.
> > >
> > > Allowing a 32-bit DMA mask means that dmabounce can't detect that memory
> > > above 64MB needs to be bounced to memory below the 64MB boundary.
> >
> > But dmabounce doesn't look at dev->coherent_dma_mask.
> >
> > The change breaks __dma_alloc_buffer()? If we set dev->coherent_dma_mask
> > to DMA_BIT_MASK(32) for ixp4xx's pci devices, __dma_alloc_buffer()
> > doesn't use GFP_DMA.
>
> With an incorrect coherent_dma_mask, dma_alloc_coherent() will return
> memory outside of the 64MB window.

Yeah, that's what I wrote above, I think.


> This means that when dmabounce comes to allocate the replacement
> buffer, it gets a buffer which won't be accessible to the DMA
> controller

Really? looks like dmabounce does nothing for coherent memory that
dma_alloc_coherent() allocates.

The following very hacky patch works?

Or we could introduce something like ARCH_HAS_DMA_SET_COHERENT_MASK to
let architectures to have dma_set_coherent_mask.

A long solution would be having two dma_mask for a device and a
bus. We also need something to represent a DMA-capable range instead
of the dma mask.

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c704eed..2a3fc2e 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -77,6 +77,11 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
if (mask < 0xffffffffULL)
gfp |= GFP_DMA;

+#ifdef CONFIG_DMABOUNCE
+ if (dev->archdata.dmabounce)
+ gfp |= GFP_DMA;
+#endif
+
page = alloc_pages(gfp, order);
if (!page)
return NULL;

2010-08-14 18:46:20

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Sat, Aug 14, 2010 at 06:30:37PM +0900, FUJITA Tomonori wrote:
> On Fri, 13 Aug 2010 22:54:13 +0100
> Russell King - ARM Linux <[email protected]> wrote:
> > This means that when dmabounce comes to allocate the replacement
> > buffer, it gets a buffer which won't be accessible to the DMA
> > controller
>
> Really? looks like dmabounce does nothing for coherent memory that
> dma_alloc_coherent() allocates.
>
> The following very hacky patch works?

So what happens if you use a driver which uses dma_alloc_coherent()
directly? Should the driver really be passed memory which is
inaccessible to the device because its outside the host bridge PCI
window?

No, this is clearly wrong, so this patch doesn't fix anything. It's
a bodge, nothing more. The real solution is to have _both_ masks
both reduced down according to the host bridge, as we used to do.

So I suggest 6fee48c is reverted so that these platforms don't regress
for -rc1.

As I said when you sent the originally patch, it _looked_ okay, but I
don't have any way to test it. It seems from testing (which
unfortunately only seems to only happen after patches hit mainline)
that it is not okay after all.

2010-08-15 05:43:16

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Sat, 14 Aug 2010 19:46:05 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Sat, Aug 14, 2010 at 06:30:37PM +0900, FUJITA Tomonori wrote:
> > On Fri, 13 Aug 2010 22:54:13 +0100
> > Russell King - ARM Linux <[email protected]> wrote:
> > > This means that when dmabounce comes to allocate the replacement
> > > buffer, it gets a buffer which won't be accessible to the DMA
> > > controller
> >
> > Really? looks like dmabounce does nothing for coherent memory that
> > dma_alloc_coherent() allocates.
> >
> > The following very hacky patch works?
>
> So what happens if you use a driver which uses dma_alloc_coherent()
> directly? Should the driver really be passed memory which is
> inaccessible to the device because its outside the host bridge PCI
> window?

I'm not sure what you mean.

A driver which uses dma_alloc_coherent() directly should
work. dma_alloc_coherent() allocates memory with GFP_DMA with that
patch for dmabounce devices. So the driver gets the access-able
memory.

The memory that dma_alloc_coherent() returns should be always
consistent. We can't bounce it. All we can do is returning a memory
that a device (and its bus) can access to.

Krzysztof, can you try the patch?


> No, this is clearly wrong, so this patch doesn't fix anything. It's
> a bodge, nothing more. The real solution is to have _both_ masks
> both reduced down according to the host bridge, as we used to do.
>
> So I suggest 6fee48c is reverted so that these platforms don't regress
> for -rc1.

Sorry, the original ARM code is wrong. dev->dma_mask and
dev->coherent_dma_mask represent the driver of dma restriction. ARM
tries to use them for both a driver and a bus so ARM needs a
workaround that doesn't set the driver of dma restriction to
dev->dma_mask and dev->coherent_dma_mask.

The proper solution is having a separate dma mask for a bus. I think
that POWERPC already does the similar. It has max_direct_dma_addr in
struct dev_archdata, which represents the dma restriction of a bus.

2010-08-15 15:39:11

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Sun, Aug 15, 2010 at 02:42:51PM +0900, FUJITA Tomonori wrote:
> On Sat, 14 Aug 2010 19:46:05 +0100
> Russell King - ARM Linux <[email protected]> wrote:
>
> > On Sat, Aug 14, 2010 at 06:30:37PM +0900, FUJITA Tomonori wrote:
> > > On Fri, 13 Aug 2010 22:54:13 +0100
> > > Russell King - ARM Linux <[email protected]> wrote:
> > > > This means that when dmabounce comes to allocate the replacement
> > > > buffer, it gets a buffer which won't be accessible to the DMA
> > > > controller
> > >
> > > Really? looks like dmabounce does nothing for coherent memory that
> > > dma_alloc_coherent() allocates.
> > >
> > > The following very hacky patch works?
> >
> > So what happens if you use a driver which uses dma_alloc_coherent()
> > directly? Should the driver really be passed memory which is
> > inaccessible to the device because its outside the host bridge PCI
> > window?
>
> I'm not sure what you mean.
>
> A driver which uses dma_alloc_coherent() directly should
> work. dma_alloc_coherent() allocates memory with GFP_DMA with that
> patch for dmabounce devices. So the driver gets the access-able
> memory.
>
> The memory that dma_alloc_coherent() returns should be always
> consistent. We can't bounce it. All we can do is returning a memory
> that a device (and its bus) can access to.
>
> Krzysztof, can you try the patch?

Why bother when we both agree that the patch is a dirty hack?

Come up with something cleaner first.

2010-08-15 15:56:16

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Sun, 15 Aug 2010 09:23:28 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Sun, Aug 15, 2010 at 02:42:51PM +0900, FUJITA Tomonori wrote:
> > On Sat, 14 Aug 2010 19:46:05 +0100
> > Russell King - ARM Linux <[email protected]> wrote:
> >
> > > On Sat, Aug 14, 2010 at 06:30:37PM +0900, FUJITA Tomonori wrote:
> > > > On Fri, 13 Aug 2010 22:54:13 +0100
> > > > Russell King - ARM Linux <[email protected]> wrote:
> > > > > This means that when dmabounce comes to allocate the replacement
> > > > > buffer, it gets a buffer which won't be accessible to the DMA
> > > > > controller
> > > >
> > > > Really? looks like dmabounce does nothing for coherent memory that
> > > > dma_alloc_coherent() allocates.
> > > >
> > > > The following very hacky patch works?
> > >
> > > So what happens if you use a driver which uses dma_alloc_coherent()
> > > directly? Should the driver really be passed memory which is
> > > inaccessible to the device because its outside the host bridge PCI
> > > window?
> >
> > I'm not sure what you mean.
> >
> > A driver which uses dma_alloc_coherent() directly should
> > work. dma_alloc_coherent() allocates memory with GFP_DMA with that
> > patch for dmabounce devices. So the driver gets the access-able
> > memory.
> >
> > The memory that dma_alloc_coherent() returns should be always
> > consistent. We can't bounce it. All we can do is returning a memory
> > that a device (and its bus) can access to.
> >
> > Krzysztof, can you try the patch?
>
> Why bother when we both agree that the patch is a dirty hack?
>
> Come up with something cleaner first.

Because this fix needs to go to stable trees too. A simple patch is
better even if it's hacky.

For example, we can unify dma_needs_bounce functions in arm with a
clean solution, I think. But dma_needs_bounce() was changed after
2.6.35 so it would be difficult to backport a clean solution.

btw, will we have more like this case? If so, I think that it's worth
having a generic solution for this case instead of having the arch
(arm and powerpc) specific solution.

2010-08-16 23:29:56

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

Hi,

FUJITA Tomonori <[email protected]> writes:

> A long solution would be having two dma_mask for a device and a
> bus. We also need something to represent a DMA-capable range instead
> of the dma mask.
>
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -77,6 +77,11 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
> if (mask < 0xffffffffULL)
> gfp |= GFP_DMA;
>
> +#ifdef CONFIG_DMABOUNCE
> + if (dev->archdata.dmabounce)
> + gfp |= GFP_DMA;
> +#endif
> +
> page = alloc_pages(gfp, order);
> if (!page)
> return NULL;

This patch fixes the problem on my IXP425.
--
Krzysztof Halasa

2010-08-19 08:52:25

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Tue, 17 Aug 2010 01:29:49 +0200
Krzysztof Halasa <[email protected]> wrote:

> Hi,
>
> FUJITA Tomonori <[email protected]> writes:
>
> > A long solution would be having two dma_mask for a device and a
> > bus. We also need something to represent a DMA-capable range instead
> > of the dma mask.
> >
> > --- a/arch/arm/mm/dma-mapping.c
> > +++ b/arch/arm/mm/dma-mapping.c
> > @@ -77,6 +77,11 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
> > if (mask < 0xffffffffULL)
> > gfp |= GFP_DMA;
> >
> > +#ifdef CONFIG_DMABOUNCE
> > + if (dev->archdata.dmabounce)
> > + gfp |= GFP_DMA;
> > +#endif
> > +
> > page = alloc_pages(gfp, order);
> > if (!page)
> > return NULL;
>
> This patch fixes the problem on my IXP425.

Thanks a lot!

I'll re-send the patch in the proper format. Can you send it to
mainline for 2.6.36?

I'll work on the proper solution for this issue for 2.6.37.

2010-08-19 10:31:52

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Sat, 2010-08-14 at 18:30 +0900, FUJITA Tomonori wrote:
>
> A long solution would be having two dma_mask for a device and a
> bus. We also need something to represent a DMA-capable range instead
> of the dma mask.

I'd rather have the arch (aka the bus) be able to filter the mask,
better than having to deal with multiple masks in the generic code.
Besides, in embedded-land, you never know how many busses are stacked
before you reach the device, ie, you'd end up having to AND quite a few
masks before getting there in some cases.

Sounds better to establish that once, at set_coherent_dma_mask() time.

Cheers,
Ben.

2010-08-19 14:50:37

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Thu, 19 Aug 2010 20:31:22 +1000
Benjamin Herrenschmidt <[email protected]> wrote:

> On Sat, 2010-08-14 at 18:30 +0900, FUJITA Tomonori wrote:
> >
> > A long solution would be having two dma_mask for a device and a
> > bus. We also need something to represent a DMA-capable range instead
> > of the dma mask.
>
> I'd rather have the arch (aka the bus) be able to filter the mask,
> better than having to deal with multiple masks in the generic code.
> Besides, in embedded-land, you never know how many busses are stacked
> before you reach the device, ie, you'd end up having to AND quite a few
> masks before getting there in some cases.

You mean that you like to permit architectures to modify
dev->coherent_dma_mask behind a device? If so, I'm against it because
it means dev->coherent_dma_mask has two meanings. That's confusing.

I don't plan to have the generic code to deal with multiple masks. I
thought about simply moving max_direct_dma_addr in POWERPC's
dev_archdata to a generic place (possibly, struct
device_dma_parameters). I think that having the generic place for bus'
dma mask would be better rather than architecture specific
places. Adding a new API to set bus' dma mask would make sense too.


> Besides, in embedded-land, you never know how many busses are stacked
> before you reach the device, ie, you'd end up having to AND quite a
> few masks before getting there in some cases.
>
> Sounds better to establish that once, at set_coherent_dma_mask() time.

As long as dev->coherent_dma_mask represents the same thing on every
architecture, permitting architectures to have the own
dma_set_coherent_mask() is fine by me. I like to avoid it if possible
though.

2010-08-19 16:53:42

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

FUJITA Tomonori <[email protected]> writes:

>> I'd rather have the arch (aka the bus) be able to filter the mask,
>> better than having to deal with multiple masks in the generic code.
>> Besides, in embedded-land, you never know how many busses are stacked
>> before you reach the device, ie, you'd end up having to AND quite a few
>> masks before getting there in some cases.
>
> You mean that you like to permit architectures to modify
> dev->coherent_dma_mask behind a device? If so, I'm against it because
> it means dev->coherent_dma_mask has two meanings. That's confusing.

Well, I think it may be the only really correct solution, and in fact
it's arch-independent.

The coherent_dma_mask would mean one thing: address space shared between
the CPU(s) and the device.
This usually equals device's address space - only because CPU and
bridges next to it have wide (logical) address busses. It's not always
the case, though, and may be not the case on any arch.

We should make sure we got it right (including drivers), since any
reduction of the dma*mask would be irreversible (new masks would be
ANDed with the existing masks).

> I think that having the generic place for bus'
> dma mask would be better rather than architecture specific
> places.

Definitely, if possible. BTW the dmabounce (and equivalent code on other
archs, including probably swiotlb on x86-64) could probably be merged as
well. I don't know the internals very well, though. At least it may be
worth it looking at them.

> Adding a new API to set bus' dma mask would make sense too.

Not sure. Which bus? There could be many :-)
In practice - 64-bit PCIe -> 32-bit PCI -> 24-bit ISA - etc.
Or, like with IXP/PXA - 26-bit PCI -> 32-bit device.

> As long as dev->coherent_dma_mask represents the same thing on every
> architecture, permitting architectures to have the own
> dma_set_coherent_mask() is fine by me.

That would be ideal. Buses work on all archs the same after all.
--
Krzysztof Halasa

2010-08-19 16:56:58

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

FUJITA Tomonori <[email protected]> writes:

>> > +++ b/arch/arm/mm/dma-mapping.c
>> > @@ -77,6 +77,11 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
>> > if (mask < 0xffffffffULL)
>> > gfp |= GFP_DMA;
>> >
>> > +#ifdef CONFIG_DMABOUNCE
>> > + if (dev->archdata.dmabounce)
>> > + gfp |= GFP_DMA;
>> > +#endif
>> > +
>> > page = alloc_pages(gfp, order);
>> > if (!page)
>> > return NULL;

> I'll re-send the patch in the proper format. Can you send it to
> mainline for 2.6.36?

Guess that's a question for Russell.
--
Krzysztof Halasa

2010-08-19 17:20:58

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Thu, 19 Aug 2010 18:53:38 +0200
Krzysztof Halasa <[email protected]> wrote:

> FUJITA Tomonori <[email protected]> writes:
>
> >> I'd rather have the arch (aka the bus) be able to filter the mask,
> >> better than having to deal with multiple masks in the generic code.
> >> Besides, in embedded-land, you never know how many busses are stacked
> >> before you reach the device, ie, you'd end up having to AND quite a few
> >> masks before getting there in some cases.
> >
> > You mean that you like to permit architectures to modify
> > dev->coherent_dma_mask behind a device? If so, I'm against it because
> > it means dev->coherent_dma_mask has two meanings. That's confusing.
>
> Well, I think it may be the only really correct solution, and in fact
> it's arch-independent.
>
> The coherent_dma_mask would mean one thing: address space shared between
> the CPU(s) and the device.

linux/device.h defines that it's the device dma mask.

Except for ARM, coherent_dma_mask represents the device dma mask.

We sometimes want to know the device dma mask. Moving to your
definition means that we lose that information.


> This usually equals device's address space - only because CPU and
> bridges next to it have wide (logical) address busses. It's not always
> the case, though, and may be not the case on any arch.
>
> We should make sure we got it right (including drivers), since any
> reduction of the dma*mask would be irreversible (new masks would be
> ANDed with the existing masks).
>
> > I think that having the generic place for bus'
> > dma mask would be better rather than architecture specific
> > places.
>
> Definitely, if possible. BTW the dmabounce (and equivalent code on other
> archs, including probably swiotlb on x86-64) could probably be merged as
> well. I don't know the internals very well, though. At least it may be
> worth it looking at them.

btw, swiotlb is used by x86_64, ia64, and powerpc.

I'm sure that I can convert ARM to use it instead of dmabounce. But
well, even a bug fix patch for dmabounce haven't been merged so I'm
not sure that ARM people would accept such change.

http://marc.info/?l=linux-arm-kernel&m=128047064008554&w=2


> > Adding a new API to set bus' dma mask would make sense too.
>
> Not sure. Which bus? There could be many :-)
> In practice - 64-bit PCIe -> 32-bit PCI -> 24-bit ISA - etc.
> Or, like with IXP/PXA - 26-bit PCI -> 32-bit device.

Seems that we are not on the same page. As I said before, have you
seen how POWERPC uses max_direct_dma_addr in dev_archdata struct? I
was talking about moving it to the generic place and the API to set
it.

2010-08-19 21:52:26

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Thu, 2010-08-19 at 23:50 +0900, FUJITA Tomonori wrote:
>
> You mean that you like to permit architectures to modify
> dev->coherent_dma_mask behind a device? If so, I'm against it because
> it means dev->coherent_dma_mask has two meanings. That's confusing.

No it's not. It has one and only one meaning which is the mask defining
where the coherent memory can come from for that device. Nobody cares if
the device can do more and has been "clipped" at set_coherent_dma_mask()
time by the bus. This is not useful information.

So I beleive the arch should hook the later and modify the mask as it
gets applied -once-.

> I don't plan to have the generic code to deal with multiple masks. I
> thought about simply moving max_direct_dma_addr in POWERPC's
> dev_archdata to a generic place (possibly, struct
> device_dma_parameters). I think that having the generic place for bus'
> dma mask would be better rather than architecture specific
> places. Adding a new API to set bus' dma mask would make sense too.

Well, max_direct_dma_addr used on powerpc is a bit nasty because it
doesn't necessarily represent a power of 2, so a mask is no good here
indeed.

> > Besides, in embedded-land, you never know how many busses are
> stacked
> > before you reach the device, ie, you'd end up having to AND quite a
> > few masks before getting there in some cases.
> >
> > Sounds better to establish that once, at set_coherent_dma_mask()
> time.
>
> As long as dev->coherent_dma_mask represents the same thing on every
> architecture, permitting architectures to have the own
> dma_set_coherent_mask() is fine by me. I like to avoid it if possible
> though.

Ben.

2010-08-19 21:54:58

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)


> linux/device.h defines that it's the device dma mask.
>
> Except for ARM, coherent_dma_mask represents the device dma mask.

That's just verbiage in a header file. Nobody cares. The point is, the
device DMA mask per-se is a completely useless piece of information, and
thus there is absolutely no point keeping it around there.

The only thing that matters is the intersection of all the masks on the
way to memory, which defines where memory can be allocated.

Now the mask thing itself might end up not being enough for ARM in the
long run, I wouldn't be surprised if we end up with busses that can DMA
to areas of memory that are not 0 based, but that's a discussion for
another day.

> We sometimes want to know the device dma mask. Moving to your
> definition means that we lose that information.

When ?

Cheers,
Ben.

> > This usually equals device's address space - only because CPU and
> > bridges next to it have wide (logical) address busses. It's not always
> > the case, though, and may be not the case on any arch.
> >
> > We should make sure we got it right (including drivers), since any
> > reduction of the dma*mask would be irreversible (new masks would be
> > ANDed with the existing masks).
> >
> > > I think that having the generic place for bus'
> > > dma mask would be better rather than architecture specific
> > > places.
> >
> > Definitely, if possible. BTW the dmabounce (and equivalent code on other
> > archs, including probably swiotlb on x86-64) could probably be merged as
> > well. I don't know the internals very well, though. At least it may be
> > worth it looking at them.
>
> btw, swiotlb is used by x86_64, ia64, and powerpc.
>
> I'm sure that I can convert ARM to use it instead of dmabounce. But
> well, even a bug fix patch for dmabounce haven't been merged so I'm
> not sure that ARM people would accept such change.
>
> http://marc.info/?l=linux-arm-kernel&m=128047064008554&w=2
>
>
> > > Adding a new API to set bus' dma mask would make sense too.
> >
> > Not sure. Which bus? There could be many :-)
> > In practice - 64-bit PCIe -> 32-bit PCI -> 24-bit ISA - etc.
> > Or, like with IXP/PXA - 26-bit PCI -> 32-bit device.
>
> Seems that we are not on the same page. As I said before, have you
> seen how POWERPC uses max_direct_dma_addr in dev_archdata struct? I
> was talking about moving it to the generic place and the API to set
> it.

2010-08-26 11:55:49

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Fri, 20 Aug 2010 07:51:52 +1000
Benjamin Herrenschmidt <[email protected]> wrote:

> On Thu, 2010-08-19 at 23:50 +0900, FUJITA Tomonori wrote:
> >
> > You mean that you like to permit architectures to modify
> > dev->coherent_dma_mask behind a device? If so, I'm against it because
> > it means dev->coherent_dma_mask has two meanings. That's confusing.
>
> No it's not. It has one and only one meaning which is the mask defining
> where the coherent memory can come from for that device. Nobody cares if
> the device can do more and has been "clipped" at set_coherent_dma_mask()
> time by the bus. This is not useful information.

Ok.

I've seen that someone submitted a patch to print the dma_mask under
sysfs, I supposed, for debugging to check if a driver misuses the DMA
API or the bus can't do 64bit DMA when the device can support 64bit
DMA but only gets a buffer under 32bit.

But yeah, we can live withtout it.

Lots of drivers call dma_set_coherent_mask with 64bit mask and then
call it with 32bit mask if 64bit mask fails. So we don't have driver's
coherent mask anyway.


> So I beleive the arch should hook the later and modify the mask as it
> gets applied -once-.

OK. like dma_set_mask(), we could make every architecutre have the own
dma_set_coherent_mask(). But looks like only ARM needs own
dma_set_coherent_mask() (at least now), so adding
ARCH_HAS_DMA_SET_COHERENT_MASK define might be better. I don't like
the asymmetry of dma_set_mask and dma_set_coherent_mask much though.

2010-08-26 13:55:04

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Thu, 26 Aug 2010 20:55:09 +0900
FUJITA Tomonori <[email protected]> wrote:

> On Fri, 20 Aug 2010 07:51:52 +1000
> Benjamin Herrenschmidt <[email protected]> wrote:
>
> > On Thu, 2010-08-19 at 23:50 +0900, FUJITA Tomonori wrote:
> > >
> > > You mean that you like to permit architectures to modify
> > > dev->coherent_dma_mask behind a device? If so, I'm against it because
> > > it means dev->coherent_dma_mask has two meanings. That's confusing.
> >
> > No it's not. It has one and only one meaning which is the mask defining
> > where the coherent memory can come from for that device. Nobody cares if
> > the device can do more and has been "clipped" at set_coherent_dma_mask()
> > time by the bus. This is not useful information.
>
> Ok.

btw, I'm still not sure, letting architectures to clip the dma mask
(and coherent mask) behind a driver is correct by defintion of the DMA
API (it's not a real problem).

DMA-API.txt defines dma_set_mask is "checks to see if the mask is
possible and update the device parameters if it is". It means that
architectures can't clip the mask behind a driver, I think.

Lots of drivers do something like:

if (dma_set_mask(dev, DMA_BIT_MASK(64)))
if (dma_set_mask(dev, DMA_BIT_MASK(32)))

What arm does is accepting whatever dma mask and setting the clipped
mask behind a driver. If we use this sementics (archs are free to clip
the mask), drivers don't need the second dma_set_mask call. And the
driver wrongly assumes that it successfully set 64bit dma mask (and
possibly set the hardware to 64bit dma mode needlessly).

2010-08-26 16:02:51

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

FUJITA Tomonori <[email protected]> writes:

> Lots of drivers call dma_set_coherent_mask with 64bit mask and then
> call it with 32bit mask if 64bit mask fails.

Which seems strange to me. If the driver asks for 64-bit mask and the
system can only give it 32-bits, why return an error? Every 32-bit
address is also 64-bit, with the most significant bits simply cleared.

It makes sense the other way around, if the device wants e.g. 24-bit
mask (ISA or something) but the OS doesn't have a memory pool smaller
than e.g. 4 GB then returning with error is ok.

Same with IXP4xx (and perhaps PXA) - the device wants 32 bits and it's
fine, even if the mask is set to 28 bits since the CPU can't let PCI
access more.

If the device wants to know if it should issue DACs or something like
that, sure - just check the current masks (effective).
--
Krzysztof Halasa

2010-08-26 17:57:38

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Thu, Aug 26, 2010 at 10:54:39PM +0900, FUJITA Tomonori wrote:
> On Thu, 26 Aug 2010 20:55:09 +0900
> FUJITA Tomonori <[email protected]> wrote:
>
> > On Fri, 20 Aug 2010 07:51:52 +1000
> > Benjamin Herrenschmidt <[email protected]> wrote:
> >
> > > On Thu, 2010-08-19 at 23:50 +0900, FUJITA Tomonori wrote:
> > > >
> > > > You mean that you like to permit architectures to modify
> > > > dev->coherent_dma_mask behind a device? If so, I'm against it because
> > > > it means dev->coherent_dma_mask has two meanings. That's confusing.
> > >
> > > No it's not. It has one and only one meaning which is the mask defining
> > > where the coherent memory can come from for that device. Nobody cares if
> > > the device can do more and has been "clipped" at set_coherent_dma_mask()
> > > time by the bus. This is not useful information.
> >
> > Ok.
>
> btw, I'm still not sure, letting architectures to clip the dma mask
> (and coherent mask) behind a driver is correct by defintion of the DMA
> API (it's not a real problem).
>
> DMA-API.txt defines dma_set_mask is "checks to see if the mask is
> possible and update the device parameters if it is". It means that
> architectures can't clip the mask behind a driver, I think.
>
> Lots of drivers do something like:
>
> if (dma_set_mask(dev, DMA_BIT_MASK(64)))
> if (dma_set_mask(dev, DMA_BIT_MASK(32)))
>
> What arm does is accepting whatever dma mask and setting the clipped
> mask behind a driver. If we use this sementics (archs are free to clip
> the mask), drivers don't need the second dma_set_mask call. And the
> driver wrongly assumes that it successfully set 64bit dma mask (and
> possibly set the hardware to 64bit dma mode needlessly).

Ok, in that case lets disable all PCI drivers which do this on IXP4xx
then, because they obviously can't cope with the 64MB window that this
platform has.

Clearly they need to be rewritten such that they can cope with this,
irrespective of the fact that they've worked for ages with the current
solution.

2010-08-27 00:26:47

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Thu, 26 Aug 2010 18:02:46 +0200
Krzysztof Halasa <[email protected]> wrote:

> FUJITA Tomonori <[email protected]> writes:
>
> > Lots of drivers call dma_set_coherent_mask with 64bit mask and then
> > call it with 32bit mask if 64bit mask fails.
>
> Which seems strange to me. If the driver asks for 64-bit mask and the
> system can only give it 32-bits, why return an error? Every 32-bit
> address is also 64-bit, with the most significant bits simply cleared.

As I wrote, the DMA API simply wasn't designed in that way (let
architectures to clip the mask), I guess.

If it was, dma_set_coherent_mask might return the actual mask.

2010-08-27 06:55:48

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

On Thu, 26 Aug 2010 18:57:19 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Thu, Aug 26, 2010 at 10:54:39PM +0900, FUJITA Tomonori wrote:
> > On Thu, 26 Aug 2010 20:55:09 +0900
> > FUJITA Tomonori <[email protected]> wrote:
> >
> > > On Fri, 20 Aug 2010 07:51:52 +1000
> > > Benjamin Herrenschmidt <[email protected]> wrote:
> > >
> > > > On Thu, 2010-08-19 at 23:50 +0900, FUJITA Tomonori wrote:
> > > > >
> > > > > You mean that you like to permit architectures to modify
> > > > > dev->coherent_dma_mask behind a device? If so, I'm against it because
> > > > > it means dev->coherent_dma_mask has two meanings. That's confusing.
> > > >
> > > > No it's not. It has one and only one meaning which is the mask defining
> > > > where the coherent memory can come from for that device. Nobody cares if
> > > > the device can do more and has been "clipped" at set_coherent_dma_mask()
> > > > time by the bus. This is not useful information.
> > >
> > > Ok.
> >
> > btw, I'm still not sure, letting architectures to clip the dma mask
> > (and coherent mask) behind a driver is correct by defintion of the DMA
> > API (it's not a real problem).
> >
> > DMA-API.txt defines dma_set_mask is "checks to see if the mask is
> > possible and update the device parameters if it is". It means that
> > architectures can't clip the mask behind a driver, I think.
> >
> > Lots of drivers do something like:
> >
> > if (dma_set_mask(dev, DMA_BIT_MASK(64)))
> > if (dma_set_mask(dev, DMA_BIT_MASK(32)))
> >
> > What arm does is accepting whatever dma mask and setting the clipped
> > mask behind a driver. If we use this sementics (archs are free to clip
> > the mask), drivers don't need the second dma_set_mask call. And the
> > driver wrongly assumes that it successfully set 64bit dma mask (and
> > possibly set the hardware to 64bit dma mode needlessly).
>
> Ok, in that case lets disable all PCI drivers which do this on IXP4xx
> then, because they obviously can't cope with the 64MB window that this
> platform has.
>
> Clearly they need to be rewritten such that they can cope with this,
> irrespective of the fact that they've worked for ages with the current
> solution.

I didn't insist such (I wrote, "it's not a real proble").

As I wrote in another mail, we could make every architecutre have the own
dma_set_coherent_mask(). It's fine by me.

I simply wanted to know your opinions:

Looks like that the DMA API doesn't expect architectures to clip the
mask. It might be better to add the new API to set the dma mask in the
long term? (i.e. calling dma_set_mask twice is the best interface?)
It's better to forget it since it's only about one architecture (too
much change provides little benefit).