2008-11-10 04:48:06

by Yasunori Goto

[permalink] [raw]
Subject: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?


Hello.

I have a (may be dumb) question about dma_alloc_coherent() for ia64.


Why does dma_alloc_coherent() of ia64 force GFP_DMA yet?
And why is swiotlb_dma_alloc_coherent() default routine of
platform_dma_alloc_coherent()?

-------
#define dma_alloc_coherent(dev, size, handle, gfp) \
platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
--------

Even if a device allows over 4G access and the driver doesn't specify
GFP_DMA, dma_alloc_coherent() returns under 4G area.
I guess many people think this is not so big issue because drivers require
very small memory generally.

However, I think this has the possibility of a finishing blow of OOM.
For example,

1) Page caches occupy normal zone, and DMA zone is free.
2) A user's application requires a few GB memory and mlock it.
All DMA zone is occupied by it.
3) A device which allows over 4GB is hot-added.
But dma_alloc_coherent() try to allocate DMA zone.
Then OOM occurs because there is no freeable pages.

I heard there are some users who require a few GB mlock.
There are similar trouble in past.


If GFP_DMA is removed from above definition of dma_alloc_coherent(),
what will happen?

If it is not allowed, how is followings?

dma_alloc_coherent()
-> platform_dma_alloc_coherent()
-> normal_alloc_coherent()
{
if (dma_mask allow over 4G)
ret = __get_free_pages();
:
(check validation of returned address)
:
else
swiotlb_alloc_coherent();
}
}

If I'm something misunderstanding, please let me know.


Thanks.

--
Yasunori Goto


2008-11-10 12:42:39

by Andi Kleen

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

Yasunori Goto <[email protected]> writes:
>
> However, I think this has the possibility of a finishing blow of OOM.
> For example,
>
> 1) Page caches occupy normal zone, and DMA zone is free.
> 2) A user's application requires a few GB memory and mlock it.
> All DMA zone is occupied by it.

The VM has special "lower zone protection" to protect against these
kinds of deadlocks. They can be circumvented, but it takes effort.

> 3) A device which allows over 4GB is hot-added.
> But dma_alloc_coherent() try to allocate DMA zone.
> Then OOM occurs because there is no freeable pages.
>
> I heard there are some users who require a few GB mlock.

Normally mlock is limited to half the memory exactly to avoid
such problems.

Also I believe there are some issues with non continuous memory on
some IA64 systems -- e.g. Altixes iirc have the requirement
that you use the IOMMU for higher memory. So it's probably
not easy to change.

-Andi

--
[email protected]

2008-11-10 19:07:27

by Robin Holt

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Mon, Nov 10, 2008 at 01:59:57PM +0100, Andi Kleen wrote:
> Yasunori Goto <[email protected]> writes:

> > 3) A device which allows over 4GB is hot-added.
> > But dma_alloc_coherent() try to allocate DMA zone.
> > Then OOM occurs because there is no freeable pages.
> >
> > I heard there are some users who require a few GB mlock.
>
> Normally mlock is limited to half the memory exactly to avoid
> such problems.
>
> Also I believe there are some issues with non continuous memory on
> some IA64 systems -- e.g. Altixes iirc have the requirement
> that you use the IOMMU for higher memory. So it's probably
> not easy to change.

I am not sure what is be referred to here, but all of an Altix's memory is
DMA capable with the exception of the stuff covered by the MSPEC driver
(that is uncached memory). There are certainly all sort of special
requirements for doing transfers on Altix to eliminate memory ordering
problems, but nothing specific that I recall related to address ranges
and DMA.

I appologize in advance if I am answering a different question that
was asked.

Thanks,
Robin

2008-11-10 21:58:24

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Mon, 10 Nov 2008, Robin Holt wrote:

> I am not sure what is be referred to here, but all of an Altix's memory is
> DMA capable with the exception of the stuff covered by the MSPEC driver
> (that is uncached memory). There are certainly all sort of special
> requirements for doing transfers on Altix to eliminate memory ordering
> problems, but nothing specific that I recall related to address ranges
> and DMA.

But then ZONE_DMA has nothing to do with memory being dmaable or not.
ZONE_DMA is for legacy devices that cannot do DMA to all of memory.
I vaguely remember having stuffed all the memory into ZONE_NORMAL at some
point. ZONE_DMA vanishes for Altix configurations.

2008-11-10 22:07:23

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Mon, 10 Nov 2008, Yasunori Goto wrote:

> Even if a device allows over 4G access and the driver doesn't specify
> GFP_DMA, dma_alloc_coherent() returns under 4G area.

GFP_DMA can become 0 for configurations that have
!CONFIG_ZONE_DMA. Then all of memory is available.

The call is subarch specific. So f.e. Altix sn_dma_alloc_coherent does
not set __GFP_DMA.

If you have an IA64 arch that only support 32bit I/O then __GFP_DMA in
dma_alloc_coherent makes sense.




2008-11-11 05:14:30

by Yasunori Goto

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

> Yasunori Goto <[email protected]> writes:
> >
> > However, I think this has the possibility of a finishing blow of OOM.
> > For example,
> >
> > 1) Page caches occupy normal zone, and DMA zone is free.
> > 2) A user's application requires a few GB memory and mlock it.
> > All DMA zone is occupied by it.
>
> The VM has special "lower zone protection" to protect against these
> kinds of deadlocks. They can be circumvented, but it takes effort.

I wrote documentation about lowmem_reserve_ratio, which is the current name of
lower_zone_protection, in Documantation/filesystem/proc.txt.
But, I really doubt there are many users who can understand how to estimate
there value.


>
> > 3) A device which allows over 4GB is hot-added.
> > But dma_alloc_coherent() try to allocate DMA zone.
> > Then OOM occurs because there is no freeable pages.
> >
> > I heard there are some users who require a few GB mlock.
>
> Normally mlock is limited to half the memory exactly to avoid
> such problems.

Half? I don't know which documentation desribes it. :-P
Probably, the most of users don't know it...

To be honest, I can understand kernel hacker hope there is NO user who
uses mlock for some GB memory. However, the reality is relentless.
There were some real user who tried it. I remember the user who had 8GB memory
box and mlocked 5GB.
But, even if they mlocked only 4GB, OOM must occur.

(In addition, Fujitsu PrimeQuest reserves about 2GB area for maximum
I/O equipment. Then, Zone DMA is only 2GB..... (Ueeeeeeeep!))


Anyway, I don't want to discuss about mlock's specification.
Because users can understand the side effect of mlock if OOM occurs.
But they can't understand why Zone DMA is used even if driver doesn't
require Zone DMA. It looks simply BUG from user's view point
when it was finishing blow of OOM. I have to explain why its BUG(?)
still remains. But I'm newbie around this area. So, I would like to know.


Thanks.

--
Yasunori Goto

2008-11-11 05:39:35

by Yasunori Goto

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

> On Mon, 10 Nov 2008, Yasunori Goto wrote:
>
> > Even if a device allows over 4G access and the driver doesn't specify
> > GFP_DMA, dma_alloc_coherent() returns under 4G area.
>
> GFP_DMA can become 0 for configurations that have
> !CONFIG_ZONE_DMA. Then all of memory is available.
>
> The call is subarch specific. So f.e. Altix sn_dma_alloc_coherent does
> not set __GFP_DMA.

I heard that Altix has IOMMU and __GFP_DMA is not necessary.

> If you have an IA64 arch that only support 32bit I/O then __GFP_DMA in
> dma_alloc_coherent makes sense.

Agree.
But our box supports both of 32bit I/O and 64bit I/O without IOMMU.
Is it abnormal platform? New interface is necessary for our box like Altix?


Bye.

--
Yasunori Goto

2008-11-11 05:42:14

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Mon, 10 Nov 2008 13:47:51 +0900
Yasunori Goto <[email protected]> wrote:

> I have a (may be dumb) question about dma_alloc_coherent() for ia64.
>
>
> Why does dma_alloc_coherent() of ia64 force GFP_DMA yet?
> And why is swiotlb_dma_alloc_coherent() default routine of
> platform_dma_alloc_coherent()?
>
> -------
> #define dma_alloc_coherent(dev, size, handle, gfp) \
> platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
> --------
>
> Even if a device allows over 4G access and the driver doesn't specify
> GFP_DMA, dma_alloc_coherent() returns under 4G area.
> I guess many people think this is not so big issue because drivers require
> very small memory generally.
>
> However, I think this has the possibility of a finishing blow of OOM.
> For example,
>
> 1) Page caches occupy normal zone, and DMA zone is free.
> 2) A user's application requires a few GB memory and mlock it.
> All DMA zone is occupied by it.
> 3) A device which allows over 4GB is hot-added.
> But dma_alloc_coherent() try to allocate DMA zone.
> Then OOM occurs because there is no freeable pages.
>
> I heard there are some users who require a few GB mlock.
> There are similar trouble in past.
>
>
> If GFP_DMA is removed from above definition of dma_alloc_coherent(),
> what will happen?

Probably, it breaks swiotlb with devices that don't have
DMA_64BIT_MASK coherent_dma_mask.


> If it is not allowed, how is followings?
>
> dma_alloc_coherent()
> -> platform_dma_alloc_coherent()
> -> normal_alloc_coherent()
> {
> if (dma_mask allow over 4G)
> ret = __get_free_pages();
> :
> (check validation of returned address)
> :
> else
> swiotlb_alloc_coherent();
> }
> }
>
> If I'm something misunderstanding, please let me know.

Hmm, platform_dma_alloc_coherent is supposed to handle multiple dma
ops, swiotlb, hardware IOMMUs like VT-d and sba, etc?

In IA64, the gfp zone flag matters for only swiotlb, I think.


=
From: FUJITA Tomonori <[email protected]>
Subject: [PATCH] IA64: use GFP_DMA in dma_alloc_coherent only when necessary

For swiotlb, we need to set GFP_DMA if a device doesn't have
DMA_64BIT_MASK coherent_dma_mask. hardware IOMMUs like VT-d and sba
should ignore gfp zone flag.

diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h
index bbab7e2..d4de41b 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -52,7 +52,9 @@ extern struct ia64_machine_vector ia64_mv;
extern void set_iommu_machvec(void);

#define dma_alloc_coherent(dev, size, handle, gfp) \
- platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
+ platform_dma_alloc_coherent(dev, size, handle, \
+ (dev)->coherent_dma_mask != DMA_64BIT_MASK ? \
+ (gfp) | GFP_DMA : gfp)

/* coherent mem. is cheap */
static inline void *

2008-11-11 05:48:39

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Mon, 10 Nov 2008 13:59:57 +0100
Andi Kleen <[email protected]> wrote:

> Also I believe there are some issues with non continuous memory on
> some IA64 systems -- e.g. Altixes iirc have the requirement
> that you use the IOMMU for higher memory. So it's probably
> not easy to change.

Are you talking about this odd dma requirement?

http://www.gelato.unsw.edu.au/archives/linux-ia64/0305/5604.html

2008-11-11 05:54:18

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Mon, 10 Nov 2008 16:06:52 -0600 (CST)
Christoph Lameter <[email protected]> wrote:

> On Mon, 10 Nov 2008, Yasunori Goto wrote:
>
> > Even if a device allows over 4G access and the driver doesn't specify
> > GFP_DMA, dma_alloc_coherent() returns under 4G area.
>
> GFP_DMA can become 0 for configurations that have
> !CONFIG_ZONE_DMA. Then all of memory is available.
>
> The call is subarch specific. So f.e. Altix sn_dma_alloc_coherent does
> not set __GFP_DMA.

Is it because it does some kinda address translation
(provider->dma_map_consistent) later? The zone flag is meaningless if
you do sorta address translation (e.g. hardware IOMMU like VT-d).

2008-11-11 06:22:01

by Yasunori Goto

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

> On Mon, 10 Nov 2008 13:47:51 +0900
> Yasunori Goto <[email protected]> wrote:
>
> > I have a (may be dumb) question about dma_alloc_coherent() for ia64.
> >
> >
> > Why does dma_alloc_coherent() of ia64 force GFP_DMA yet?
> > And why is swiotlb_dma_alloc_coherent() default routine of
> > platform_dma_alloc_coherent()?
> >
> > -------
> > #define dma_alloc_coherent(dev, size, handle, gfp) \
> > platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
> > --------
> >
> > Even if a device allows over 4G access and the driver doesn't specify
> > GFP_DMA, dma_alloc_coherent() returns under 4G area.
> > I guess many people think this is not so big issue because drivers require
> > very small memory generally.
> >
> > However, I think this has the possibility of a finishing blow of OOM.
> > For example,
> >
> > 1) Page caches occupy normal zone, and DMA zone is free.
> > 2) A user's application requires a few GB memory and mlock it.
> > All DMA zone is occupied by it.
> > 3) A device which allows over 4GB is hot-added.
> > But dma_alloc_coherent() try to allocate DMA zone.
> > Then OOM occurs because there is no freeable pages.
> >
> > I heard there are some users who require a few GB mlock.
> > There are similar trouble in past.
> >
> >
> > If GFP_DMA is removed from above definition of dma_alloc_coherent(),
> > what will happen?
>
> Probably, it breaks swiotlb with devices that don't have
> DMA_64BIT_MASK coherent_dma_mask.
>
>
> > If it is not allowed, how is followings?
> >
> > dma_alloc_coherent()
> > -> platform_dma_alloc_coherent()
> > -> normal_alloc_coherent()
> > {
> > if (dma_mask allow over 4G)
> > ret = __get_free_pages();
> > :
> > (check validation of returned address)
> > :
> > else
> > swiotlb_alloc_coherent();
> > }
> > }
> >
> > If I'm something misunderstanding, please let me know.
>
> Hmm, platform_dma_alloc_coherent is supposed to handle multiple dma
> ops, swiotlb, hardware IOMMUs like VT-d and sba, etc?
>
> In IA64, the gfp zone flag matters for only swiotlb, I think.
>
>
> =
> From: FUJITA Tomonori <[email protected]>
> Subject: [PATCH] IA64: use GFP_DMA in dma_alloc_coherent only when necessary
>
> For swiotlb, we need to set GFP_DMA if a device doesn't have
> DMA_64BIT_MASK coherent_dma_mask. hardware IOMMUs like VT-d and sba
> should ignore gfp zone flag.
>
> diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h
> index bbab7e2..d4de41b 100644
> --- a/arch/ia64/include/asm/dma-mapping.h
> +++ b/arch/ia64/include/asm/dma-mapping.h
> @@ -52,7 +52,9 @@ extern struct ia64_machine_vector ia64_mv;
> extern void set_iommu_machvec(void);
>
> #define dma_alloc_coherent(dev, size, handle, gfp) \
> - platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
> + platform_dma_alloc_coherent(dev, size, handle, \
> + (dev)->coherent_dma_mask != DMA_64BIT_MASK ? \
> + (gfp) | GFP_DMA : gfp)
>
> /* coherent mem. is cheap */
> static inline void *

Wao! Seems reasonable! Ack Ack!

Acked-by: Yasunori Goto <[email protected]>

Thanks a lot!

--
Yasunori Goto

2008-11-11 20:33:17

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Tue, 11 Nov 2008, FUJITA Tomonori wrote:

> Is it because it does some kinda address translation
> (provider->dma_map_consistent) later? The zone flag is meaningless if
> you do sorta address translation (e.g. hardware IOMMU like VT-d).

Yes it can do address translation. Therefore a < 4G address can show up at
any 64 bit address. So no need for a special DMA zone. The same is true
for more x86_64 platforms that have an IOMMU.

2008-11-11 20:34:45

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Tue, 11 Nov 2008, Yasunori Goto wrote:

> But our box supports both of 32bit I/O and 64bit I/O without IOMMU.
> Is it abnormal platform? New interface is necessary for our box like Altix?

No its like x86 with the GFP_DMA zone for < 16M addresses. The special
memory creates an imbalance that sometimes leads to weird VM behavior. I'd
make sure to set GFP_DMA only for devices that actually require < 4GB
memory and only use it if no IOMMU like stuff is available. Its best to
not use GFP_DMA.

2008-11-12 01:41:24

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

On Tue, 11 Nov 2008 14:32:46 -0600 (CST)
Christoph Lameter <[email protected]> wrote:

> On Tue, 11 Nov 2008, FUJITA Tomonori wrote:
>
> > Is it because it does some kinda address translation
> > (provider->dma_map_consistent) later? The zone flag is meaningless if
> > you do sorta address translation (e.g. hardware IOMMU like VT-d).
>
> Yes it can do address translation. Therefore a < 4G address can show up at
> any 64 bit address. So no need for a special DMA zone. The same is true
> for more x86_64 platforms that have an IOMMU.

Yes, with address translation hardware such as IOMMU, the zone is
meaningless. The IOMMU drivers ignore the zone flag
(e.g. intel_alloc_coherent).

But the GFP_DMA in IA64's platform_dma_alloc_coherent() is still
necessary for swiotlb with devices that don't have DMA_64BIT_MASK
coherent_dma_mask. They need a < 4G address.

This is exactly what x86 and x86_64 do, dma_alloc_coherent in
arch/x86/include/asm/dma-mapping.h. It sets GFP_DMA and GFP_DMA32 for
swiotlb and pci-nommu.c when necessary.