The bounce buffer logic is included on systems that do not need it.
If a system does not have zones like ZONE_DMA and ZONE_HIGHMEM that
can lead to the use of bounce buffers then there is no need to reserve
memory pools etc etc. This is true f.e. for SGI Altix.
Also nicifies the Makefile and gets rid of the tricky "and" there.
Signed-off-by: Christoph Lameter <[email protected]>
---
include/linux/blkdev.h | 2 +-
mm/Kconfig | 4 ++++
mm/Makefile | 4 +---
3 files changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6/include/linux/blkdev.h
===================================================================
--- linux-2.6.orig/include/linux/blkdev.h 2007-05-18 15:29:43.000000000 -0700
+++ linux-2.6/include/linux/blkdev.h 2007-05-18 15:32:48.000000000 -0700
@@ -607,7 +607,7 @@ extern unsigned long blk_max_low_pfn, bl
#define BLK_BOUNCE_ANY ((u64)blk_max_pfn << PAGE_SHIFT)
#define BLK_BOUNCE_ISA (ISA_DMA_THRESHOLD)
-#ifdef CONFIG_MMU
+#ifdef CONFIG_BOUNCE
extern int init_emergency_isa_pool(void);
extern void blk_queue_bounce(request_queue_t *q, struct bio **bio);
#else
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2007-05-18 15:31:18.000000000 -0700
+++ linux-2.6/mm/Kconfig 2007-05-18 15:38:25.000000000 -0700
@@ -163,6 +163,10 @@ config ZONE_DMA_FLAG
default "0" if !ZONE_DMA
default "1"
+config BOUNCE
+ def_bool y
+ depends on BLOCK && MMU && (ZONE_DMA || HIGHMEM)
+
config NR_QUICK
int
depends on QUICKLIST
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2007-05-18 15:27:57.000000000 -0700
+++ linux-2.6/mm/Makefile 2007-05-18 15:33:17.000000000 -0700
@@ -13,9 +13,7 @@ obj-y := bootmem.o filemap.o mempool.o
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
$(mmu-y)
-ifeq ($(CONFIG_MMU)$(CONFIG_BLOCK),yy)
-obj-y += bounce.o
-endif
+obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
On Mon, 21 May 2007 21:03:40 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:
> The bounce buffer logic is included on systems that do not need it.
> If a system does not have zones like ZONE_DMA and ZONE_HIGHMEM that
> can lead to the use of bounce buffers then there is no need to reserve
> memory pools etc etc. This is true f.e. for SGI Altix.
> +config BOUNCE
> + def_bool y
> + depends on BLOCK && MMU && (ZONE_DMA || HIGHMEM)
> +
AFAIK, ppc has only ZONE_DMA and it never needs bounce.
Is this ok ?
-Kame
On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:
> > +config BOUNCE
> > + def_bool y
> > + depends on BLOCK && MMU && (ZONE_DMA || HIGHMEM)
> > +
>
> AFAIK, ppc has only ZONE_DMA and it never needs bounce.
> Is this ok ?
That is wrong. ppc should have ZONE_NORMAL and no ZONE_DMA.
Otherwise you cannot switch off ZONE_DMA and you cannot switch off
bounce. ZONE_DMA is a zone for exceptional allocs. If you do not have
those then you only have normal allocs -> ZONE_NORMAL.
On Mon, May 21 2007, Christoph Lameter wrote:
> The bounce buffer logic is included on systems that do not need it.
> If a system does not have zones like ZONE_DMA and ZONE_HIGHMEM that
> can lead to the use of bounce buffers then there is no need to reserve
> memory pools etc etc. This is true f.e. for SGI Altix.
>
> Also nicifies the Makefile and gets rid of the tricky "and" there.
>
> Signed-off-by: Christoph Lameter <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
--
Jens Axboe
On Mon, May 21, 2007 at 10:27:16PM -0700, Christoph Lameter wrote:
> On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:
>
> > > +config BOUNCE
> > > + def_bool y
> > > + depends on BLOCK && MMU && (ZONE_DMA || HIGHMEM)
> > > +
> >
> > AFAIK, ppc has only ZONE_DMA and it never needs bounce.
> > Is this ok ?
>
> That is wrong. ppc should have ZONE_NORMAL and no ZONE_DMA.
> Otherwise you cannot switch off ZONE_DMA and you cannot switch off
> bounce. ZONE_DMA is a zone for exceptional allocs. If you do not have
> those then you only have normal allocs -> ZONE_NORMAL.
That sounds very wrong to me. Since about 1995 ARM has always placed
all DMA-able memory in the DMA zone, and none in the normal zone.
The reason for doing this is that normal allocations fall back to DMA
allocations when the normal zone becomes full/empty. However, DMA
allocations can never be satisfied by allocations from the normal zone.
Moreover, special casing the "doesn't use __GFP_DMA allocations on this
machine so places all memory in ZONE_NORMAL" is just too complicated -
I've no idea which of the 100+ ARM machine support currently merged
into the Linux kernel uses __GFP_DMA allocations and which don't.
The DMA zone is for memory allocations _for_ _DMA_. If all your memory
is DMA-able then it belongs in the DMA zone.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
On Wed, 23 May 2007, Russell King wrote:
> > That is wrong. ppc should have ZONE_NORMAL and no ZONE_DMA.
> > Otherwise you cannot switch off ZONE_DMA and you cannot switch off
> > bounce. ZONE_DMA is a zone for exceptional allocs. If you do not have
> > those then you only have normal allocs -> ZONE_NORMAL.
>
> That sounds very wrong to me. Since about 1995 ARM has always placed
> all DMA-able memory in the DMA zone, and none in the normal zone.
>
> The reason for doing this is that normal allocations fall back to DMA
> allocations when the normal zone becomes full/empty. However, DMA
> allocations can never be satisfied by allocations from the normal zone.
Usually DMA is done via ZONE_NORMAL allocations. GFP_DMA allocs and
ZONE_DMA are for devices that cannot performa DMA to all of memory.
There is no need to fall back if you do not have such devices. So no need
for ZONE_DMA.
> Moreover, special casing the "doesn't use __GFP_DMA allocations on this
> machine so places all memory in ZONE_NORMAL" is just too complicated -
> I've no idea which of the 100+ ARM machine support currently merged
> into the Linux kernel uses __GFP_DMA allocations and which don't.
GFP_DMA allocations are an exception and that exception can be removed
from the core VM by not defining ZONE_DMA. You cannot switch off the
NORMAL zone.
> The DMA zone is for memory allocations _for_ _DMA_. If all your memory
> is DMA-able then it belongs in the DMA zone.
Nope. The DMA zone is for crappy DMA devices that can only use a portion
of memory.
On Wed, May 23, 2007 at 10:15:10AM -0700, Christoph Lameter wrote:
> On Wed, 23 May 2007, Russell King wrote:
>
> > > That is wrong. ppc should have ZONE_NORMAL and no ZONE_DMA.
> > > Otherwise you cannot switch off ZONE_DMA and you cannot switch off
> > > bounce. ZONE_DMA is a zone for exceptional allocs. If you do not have
> > > those then you only have normal allocs -> ZONE_NORMAL.
> >
> > That sounds very wrong to me. Since about 1995 ARM has always placed
> > all DMA-able memory in the DMA zone, and none in the normal zone.
> >
> > The reason for doing this is that normal allocations fall back to DMA
> > allocations when the normal zone becomes full/empty. However, DMA
> > allocations can never be satisfied by allocations from the normal zone.
>
> Usually DMA is done via ZONE_NORMAL allocations. GFP_DMA allocs and
> ZONE_DMA are for devices that cannot performa DMA to all of memory.
>
> There is no need to fall back if you do not have such devices. So no need
> for ZONE_DMA.
While that is true, that assertion does not hold everywhere.
> > Moreover, special casing the "doesn't use __GFP_DMA allocations on this
> > machine so places all memory in ZONE_NORMAL" is just too complicated -
> > I've no idea which of the 100+ ARM machine support currently merged
> > into the Linux kernel uses __GFP_DMA allocations and which don't.
>
> GFP_DMA allocations are an exception and that exception can be removed
> from the core VM by not defining ZONE_DMA. You cannot switch off the
> NORMAL zone.
I'd like to be able to switch off the normal and highmem zones and leave
just the DMA zone behind. The normal and highmem zones are just a waste
of space on ARM.
> > The DMA zone is for memory allocations _for_ _DMA_. If all your memory
> > is DMA-able then it belongs in the DMA zone.
>
> Nope. The DMA zone is for crappy DMA devices that can only use a portion
> of memory.
I'm sorry, we're going to have to agree to disagree then.
As for your assertion that it's "crappy DMA devices" there are modern
PCI based platforms being produced by Marvell (which were designed by
Intel) which can only DMA from 64MB of memory, inspite of the PCI
device having full PCI busmastering capabilities.
At the end of the day, it is _far_ simpler from an architectural point
of view for memory to live in the DMA zone and disable the normal and
highmem zones than it is to selectively populate the normal zone
depending on whether device X is configured, and then also have to hack
around various drivers which decide they want to use __GFP_DMA because
of some antequated x86ism which doesn't apply on non-x86.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
On Wed, 23 May 2007, Russell King wrote:
> > GFP_DMA allocations are an exception and that exception can be removed
> > from the core VM by not defining ZONE_DMA. You cannot switch off the
> > NORMAL zone.
>
> I'd like to be able to switch off the normal and highmem zones and leave
> just the DMA zone behind. The normal and highmem zones are just a waste
> of space on ARM.
If you do not have exceptional memory requirements then you do not need a
DMA zone. The core code allows you to switch off ZONE_DMA but not
ZONE_NORMAL. For a description of the roles of the zones see
include/linux/mmzone.h
> > Nope. The DMA zone is for crappy DMA devices that can only use a portion
> > of memory.
>
> I'm sorry, we're going to have to agree to disagree then.
>
> As for your assertion that it's "crappy DMA devices" there are modern
> PCI based platforms being produced by Marvell (which were designed by
> Intel) which can only DMA from 64MB of memory, inspite of the PCI
> device having full PCI busmastering capabilities.
Ok sorry for the word choice.
> At the end of the day, it is _far_ simpler from an architectural point
> of view for memory to live in the DMA zone and disable the normal and
> highmem zones than it is to selectively populate the normal zone
> depending on whether device X is configured, and then also have to hack
> around various drivers which decide they want to use __GFP_DMA because
> of some antequated x86ism which doesn't apply on non-x86.
If you switch off CONFIG_ZONE_DMA then __GFP_DMA becomes a no op. So no
problem. Many of us want to rid the kernel of __GFP_DMA. Please join the
club and nuke the useless ZONE_DMA on your platforms.
Christoph Lameter writes:
> > The DMA zone is for memory allocations _for_ _DMA_. If all your memory
> > is DMA-able then it belongs in the DMA zone.
>
> Nope. The DMA zone is for crappy DMA devices that can only use a portion
> of memory.
That is (presumably) true today, but is in fact a redefinition of what
ZONE_DMA historically was for.
Also there is the problem that some drivers use ZONE_DMA allocations
because their device can only generate addresses below some limit, but
on a platform with an IOMMU there is in fact no restriction on what
memory the device can access.
Paul.
On Thu, 24 May 2007, Paul Mackerras wrote:
> That is (presumably) true today, but is in fact a redefinition of what
> ZONE_DMA historically was for.
I do not know too much about the history but when I tried to correlate
all the different ways that arches use the zone this definition was the
most consistent between all of them. Add the fact that we always had the
MAX_DMA_ADDRESS limit for DMA. I think the history is due to the
platform at the time only being able to do DMA to low memory addresses.
The definition of ZONE_DMA is rather problematic. The only common
denominator is the limitaiton by MAX_DMA_ADDRESS. But that limit varies
from platform to platform. Thus the meaning of GFP_DMA is also varying
from platfom to platform.
> Also there is the problem that some drivers use ZONE_DMA allocations
> because their device can only generate addresses below some limit, but
> on a platform with an IOMMU there is in fact no restriction on what
> memory the device can access.
That problem is to some extend addressed by switching ZONE_DMA off which
results in GFP_DMA becoming meaningless. And if GFP_DMA and ZONE_DMA is
gone from a platform then the MAX_DMA_ADDRESS inconsistencies are solved
since the cause of the inconsistencies has evaporated.
On Wed, May 23, 2007 at 04:07:12PM -0700, Christoph Lameter wrote:
> On Wed, 23 May 2007, Russell King wrote:
> > At the end of the day, it is _far_ simpler from an architectural point
> > of view for memory to live in the DMA zone and disable the normal and
> > highmem zones than it is to selectively populate the normal zone
> > depending on whether device X is configured, and then also have to hack
> > around various drivers which decide they want to use __GFP_DMA because
> > of some antequated x86ism which doesn't apply on non-x86.
>
> If you switch off CONFIG_ZONE_DMA then __GFP_DMA becomes a no op. So no
> problem. Many of us want to rid the kernel of __GFP_DMA. Please join the
> club and nuke the useless ZONE_DMA on your platforms.
Absolutely and utterly impossible - with over 100 different machine
types I don't have anything approaching the necessary knowledge to
_special_ _case_ those platforms which _might_ be able to survive
without the DMA zone. And that's what it'll be - a set of special
cases in the initialisation path.
As I've tried to explain, from day 1 of the multi-zone Linux MM, it's
always made more sense to put everything into zone DMA and (eventually)
hope that the normal and highmem zones eventually go away.
Who cares what happens to __GFP_DMA - it's irrelevant, since both
__GFP_DMA and non-_GFP_DMA allocations will come from the DMA zone
when the normal zone is empty. Where the normal zone is _never_
populated with any memory, defining __GFP_DMA to zero will have no
effect.
So, let me repeat: it makes much much much more sense to get rid of
__GFP_DMA _and_ the normal and highmem zones than it does to get rid
of the DMA zone.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
> Also there is the problem that some drivers use ZONE_DMA allocations
> because their device can only generate addresses below some limit, but
> on a platform with an IOMMU there is in fact no restriction on what
> memory the device can access.
Bzzzzzt - have to call your bluff on that one
The IOMMU mapping window itself may be out of range of some devices. This
is a big problem on AMD64 where the GART window is above the 2GB boundary
- you can do 64->32 nicely but 32->31/30/28/24, all of which turn up on
PC hardware, are no solved by the IOMMU only by GFP_DMA
Alan
On Wed, May 23, 2007 at 09:03:02PM -0700, Christoph Lameter wrote:
> On Thu, 24 May 2007, Paul Mackerras wrote:
> > Also there is the problem that some drivers use ZONE_DMA allocations
> > because their device can only generate addresses below some limit, but
> > on a platform with an IOMMU there is in fact no restriction on what
> > memory the device can access.
>
> That problem is to some extend addressed by switching ZONE_DMA off which
> results in GFP_DMA becoming meaningless. And if GFP_DMA and ZONE_DMA is
> gone from a platform then the MAX_DMA_ADDRESS inconsistencies are solved
> since the cause of the inconsistencies has evaporated.
Suffice it to say then that with this approach ARM will _never_ be able
to have ZONE_DMA turned off, even on platforms where there are no DMA
restrictions. I guess that's something we'll just have to live with.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: