2007-05-10 21:43:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On Thu, 10 May 2007 14:28:18 -0700
[email protected] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8464
>
> Summary: autoreconf: page allocation failure. order:2,
> mode:0x84020
> Kernel Version: 2.6.21-mm2 with SLUB
> Status: NEW
> Severity: normal
> Owner: [email protected]
> Submitter: [email protected]
> CC: [email protected]
>
>
> Most recent kernel where this bug did *NOT* occur: 2.6.21-rc6.mm1 with SLAB
> Distribution: Fedora Devel
> Hardware Environment: AMD X2 on CK804
> Software Environment: N/A
> Problem Description:
>
> Just noticed this in kernel logs :
>
> autoreconf: page allocation failure. order:2, mode:0x84020
> May 10 20:13:13 rousalka kernel:
> May 10 20:13:13 rousalka kernel: Call Trace:
> May 10 20:13:13 rousalka kernel: [<ffffffff8025b56a>] __alloc_pages+0x2aa/0x2c3
> May 10 20:13:13 rousalka kernel: [<ffffffff8029c05f>] bio_alloc+0x10/0x1f
> May 10 20:13:13 rousalka kernel: [<ffffffff8027519d>] __slab_alloc+0x196/0x586
> May 10 20:13:13 rousalka kernel: [<ffffffff80300d21>]
> radix_tree_node_alloc+0x36/0x7e
> May 10 20:13:13 rousalka kernel: [<ffffffff80275922>] kmem_cache_alloc+0x32/0x4e
> May 10 20:13:13 rousalka kernel: [<ffffffff80300d21>]
> radix_tree_node_alloc+0x36/0x7e
> May 10 20:13:13 rousalka kernel: [<ffffffff803011a4>] radix_tree_insert+0xcb/0x18c
> May 10 20:13:13 rousalka kernel: [<ffffffff88029bd0>] :ext3:ext3_get_block+0x0/0xe4
> May 10 20:13:13 rousalka kernel: [<ffffffff80256ac4>] add_to_page_cache+0x3d/0x95
> May 10 20:13:13 rousalka kernel: [<ffffffff8029fe29>] mpage_readpages+0x85/0x12c
> May 10 20:13:13 rousalka kernel: [<ffffffff88029bd0>] :ext3:ext3_get_block+0x0/0xe4
> May 10 20:13:13 rousalka kernel: [<ffffffff8025cde1>]
> __do_page_cache_readahead+0x158/0x22d
> May 10 20:13:13 rousalka kernel: [<ffffffff88084aa7>]
> :dm_mod:dm_table_any_congested+0x46/0x63
> May 10 20:13:13 rousalka kernel: [<ffffffff88082ce8>]
> :dm_mod:dm_any_congested+0x3b/0x42
> May 10 20:13:13 rousalka kernel: [<ffffffff80258802>] filemap_fault+0x162/0x347
> May 10 20:13:13 rousalka kernel: [<ffffffff80261c66>] __do_fault+0x66/0x446
> May 10 20:13:13 rousalka kernel: [<ffffffff80263ca9>] __handle_mm_fault+0x4b1/0x8f5
> May 10 20:13:13 rousalka kernel: [<ffffffff80419e84>] do_page_fault+0x39a/0x7b7
> May 10 20:13:13 rousalka kernel: [<ffffffff80419f31>] do_page_fault+0x447/0x7b7
> May 10 20:13:13 rousalka kernel: [<ffffffff8041847d>] error_exit+0x0/0x84
>

This looks bad.

It's a bit hard to tell who failed - was it bio_alloc() or was it
radix-tree node allocation? Give the allocation mode, I'd suspect
bio_alloc(), but I don't immediately see where we'd be doing an atomic
allocation for a bio.

Either way, it would worry me greatly if slub is fiddling with the mapping
of object-size-to-page-allocation-order. A _lot_ of things which were
previously relaible and hugely tested would become less reliable, and less
tested.

Christoph, can we please take a look at /proc/slabinfo and its slub
equivalent (I forget what that is?) and review any and all changes to the
underlying allocation size for each cache?

Because this is *not* something we should change lightly.


2007-05-10 21:49:39

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On Thu, 10 May 2007, Andrew Morton wrote:

> Christoph, can we please take a look at /proc/slabinfo and its slub
> equivalent (I forget what that is?) and review any and all changes to the
> underlying allocation size for each cache?
>
> Because this is *not* something we should change lightly.

It was changed specially for mm in order to stress the antifrag code. If
this causes trouble then do not merge the patches against SLUB that
exploit the antifrag methods. This failure should help see how effective
Mel's antifrag patches are. He needs to get on this dicussion.

Upstream has slub_max_order=1.

2007-05-10 22:12:00

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On Thu, 10 May 2007, Mel Gorman wrote:

> I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> it would explain why this situation could still occur even though high-order
> allocations that could sleep would succeed.

SLUB is following the gfp mask of the caller like all well behaved slab
allocators do. If the caller does not set __GFP_WAIT then the page
allocator also cannot wait.

2007-05-10 22:16:32

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (10/05/07 15:11), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
>
> > I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> > right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> > it would explain why this situation could still occur even though high-order
> > allocations that could sleep would succeed.
>
> SLUB is following the gfp mask of the caller like all well behaved slab
> allocators do. If the caller does not set __GFP_WAIT then the page
> allocator also cannot wait.

Then SLUB should not use the higher orders for slab allocations that cannot
sleep during allocations. What could be done in the longer term is decide
how to tell kswapd to keep pages free at an order other than 0 when it is
known there are a large number of high-order long-lived allocations like this.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-10 22:27:31

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On Thu, 10 May 2007, Mel Gorman wrote:

> On (10/05/07 15:11), Christoph Lameter didst pronounce:
> > On Thu, 10 May 2007, Mel Gorman wrote:
> >
> > > I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> > > right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> > > it would explain why this situation could still occur even though high-order
> > > allocations that could sleep would succeed.
> >
> > SLUB is following the gfp mask of the caller like all well behaved slab
> > allocators do. If the caller does not set __GFP_WAIT then the page
> > allocator also cannot wait.
>
> Then SLUB should not use the higher orders for slab allocations that cannot
> sleep during allocations. What could be done in the longer term is decide
> how to tell kswapd to keep pages free at an order other than 0 when it is
> known there are a large number of high-order long-lived allocations like this.

I cannot predict how allocations on a slab will be performed. In order
to avoid the higher order allocations in we would have to add a flag
that tells SLUB at slab creation creation time that this cache will be
used for atomic allocs and thus we can avoid configuring slabs in such a
way that they use higher order allocs.

The other solution is not to use higher order allocations by dropping the
antifrag patches in mm that allow SLUB to use higher order allocations.
But then there would be no higher order allocations at all that would use
the benefits of antifrag measures.

2007-05-10 22:35:24

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (10/05/07 14:49), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Andrew Morton wrote:
>
> > Christoph, can we please take a look at /proc/slabinfo and its slub
> > equivalent (I forget what that is?) and review any and all changes to the
> > underlying allocation size for each cache?
> >
> > Because this is *not* something we should change lightly.
>
> It was changed specially for mm in order to stress the antifrag code. If
> this causes trouble then do not merge the patches against SLUB that
> exploit the antifrag methods. This failure should help see how effective
> Mel's antifrag patches are. He needs to get on this dicussion.
>

The antfrag mechanism depends on the caller being able to sleep and reclaim
pages if necessary to get the contiguous allocation. No attempts are being
currently made to keep pages at a particular order free.

I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
right? Does that mean that SLUB is trying to allocate pages atomically? If so,
it would explain why this situation could still occur even though high-order
allocations that could sleep would succeed.

> Upstream has slub_max_order=1.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-10 22:44:52

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (10/05/07 15:27), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
>
> > On (10/05/07 15:11), Christoph Lameter didst pronounce:
> > > On Thu, 10 May 2007, Mel Gorman wrote:
> > >
> > > > I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> > > > right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> > > > it would explain why this situation could still occur even though high-order
> > > > allocations that could sleep would succeed.
> > >
> > > SLUB is following the gfp mask of the caller like all well behaved slab
> > > allocators do. If the caller does not set __GFP_WAIT then the page
> > > allocator also cannot wait.
> >
> > Then SLUB should not use the higher orders for slab allocations that cannot
> > sleep during allocations. What could be done in the longer term is decide
> > how to tell kswapd to keep pages free at an order other than 0 when it is
> > known there are a large number of high-order long-lived allocations like this.
>
> I cannot predict how allocations on a slab will be performed. In order
> to avoid the higher order allocations in we would have to add a flag
> that tells SLUB at slab creation creation time that this cache will be
> used for atomic allocs and thus we can avoid configuring slabs in such a
> way that they use higher order allocs.
>

It is an option. I had the gfp flags passed in to kmem_cache_create() in
mind for determining this but SLUB creates slabs differently and different
flags could be passed into kmem_cache_alloc() of course.

> The other solution is not to use higher order allocations by dropping the
> antifrag patches in mm that allow SLUB to use higher order allocations.
> But then there would be no higher order allocations at all that would
> use the benefits of antifrag measures.

That would be an immediate solution.

Another alternative is that anti-frag used to also group high-order
allocations together and make it hard to fallback to those areas
for non-atomic allocations. It is currently backed out by the
patch dont-group-high-order-atomic-allocations.patch because
it was intended for rare high-order short-lived allocations
such as e1000 that are currently dealt with by MIGRATE_RESERVE
(bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
. The high-order atomic groupings may help here because the high-order
allocations are long-lived and would claim contiguous areas.

The last alternative I think I mentioned already is to have the minimum
order kswapd reclaims as the same order SLUB uses instead of 0 so that
min_free_kbytes is kept at higher orders than current.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-10 22:49:36

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On Thu, 10 May 2007, Mel Gorman wrote:

> > I cannot predict how allocations on a slab will be performed. In order
> > to avoid the higher order allocations in we would have to add a flag
> > that tells SLUB at slab creation creation time that this cache will be
> > used for atomic allocs and thus we can avoid configuring slabs in such a
> > way that they use higher order allocs.
> >
>
> It is an option. I had the gfp flags passed in to kmem_cache_create() in
> mind for determining this but SLUB creates slabs differently and different
> flags could be passed into kmem_cache_alloc() of course.

So we have a collection of flags to add

SLAB_USES_ATOMIC
SLAB_TEMPORARY
SLAB_PERSISTENT
SLAB_RECLAIMABLE
SLAB_MOVABLE

?

> Another alternative is that anti-frag used to also group high-order
> allocations together and make it hard to fallback to those areas
> for non-atomic allocations. It is currently backed out by the
> patch dont-group-high-order-atomic-allocations.patch because
> it was intended for rare high-order short-lived allocations
> such as e1000 that are currently dealt with by MIGRATE_RESERVE
> (bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
> The high-order atomic groupings may help here because the high-order
> allocations are long-lived and would claim contiguous areas.
>
> The last alternative I think I mentioned already is to have the minimum
> order kswapd reclaims as the same order SLUB uses instead of 0 so that
> min_free_kbytes is kept at higher orders than current.

Would you get a patch to Nicholas to test either of these solutions?

2007-05-10 23:00:58

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (10/05/07 15:49), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
>
> > > I cannot predict how allocations on a slab will be performed. In order
> > > to avoid the higher order allocations in we would have to add a flag
> > > that tells SLUB at slab creation creation time that this cache will be
> > > used for atomic allocs and thus we can avoid configuring slabs in such a
> > > way that they use higher order allocs.
> > >
> >
> > It is an option. I had the gfp flags passed in to kmem_cache_create() in
> > mind for determining this but SLUB creates slabs differently and different
> > flags could be passed into kmem_cache_alloc() of course.
>
> So we have a collection of flags to add
>
> SLAB_USES_ATOMIC

This is a possibility.

> SLAB_TEMPORARY

I have a patch for this sitting in a queue waiting for testing

> SLAB_PERSISTENT
> SLAB_RECLAIMABLE
> SLAB_MOVABLE

I don't think these are required because the necessary information is
available from the GFP flags.

>
> ?
>
> > Another alternative is that anti-frag used to also group high-order
> > allocations together and make it hard to fallback to those areas
> > for non-atomic allocations. It is currently backed out by the
> > patch dont-group-high-order-atomic-allocations.patch because
> > it was intended for rare high-order short-lived allocations
> > such as e1000 that are currently dealt with by MIGRATE_RESERVE
> > (bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
> > The high-order atomic groupings may help here because the high-order
> > allocations are long-lived and would claim contiguous areas.
> >
> > The last alternative I think I mentioned already is to have the minimum
> > order kswapd reclaims as the same order SLUB uses instead of 0 so that
> > min_free_kbytes is kept at higher orders than current.
>
> Would you get a patch to Nicholas to test either of these solutions?

I do not have a kswapd related patch ready but the first alternative is
readily available.

Nicholas, could you backout the patch
dont-group-high-order-atomic-allocations.patch and test again please?
The following patch has the same effect. Thanks

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/mmzone.h linux-2.6.21-mm2-grouphigh/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-grouphigh/include/linux/mmzone.h 2007-05-10 23:54:45.000000000 +0100
@@ -38,8 +38,9 @@ extern int page_group_by_mobility_disabl
#define MIGRATE_UNMOVABLE 0
#define MIGRATE_RECLAIMABLE 1
#define MIGRATE_MOVABLE 2
-#define MIGRATE_RESERVE 3
-#define MIGRATE_TYPES 4
+#define MIGRATE_HIGHATOMIC 3
+#define MIGRATE_RESERVE 4
+#define MIGRATE_TYPES 5

#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/pageblock-flags.h linux-2.6.21-mm2-grouphigh/include/linux/pageblock-flags.h
--- linux-2.6.21-mm2-clean/include/linux/pageblock-flags.h 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-grouphigh/include/linux/pageblock-flags.h 2007-05-10 23:54:45.000000000 +0100
@@ -31,7 +31,7 @@

/* Bit indices that affect a whole block of pages */
enum pageblock_bits {
- PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
+ PB_range(PB_migrate, 3), /* 3 bits required for migrate types */
NR_PAGEBLOCK_BITS
};

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/page_alloc.c linux-2.6.21-mm2-grouphigh/mm/page_alloc.c
--- linux-2.6.21-mm2-clean/mm/page_alloc.c 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-grouphigh/mm/page_alloc.c 2007-05-10 23:54:45.000000000 +0100
@@ -167,6 +167,11 @@ static inline int allocflags_to_migratet
if (unlikely(page_group_by_mobility_disabled))
return MIGRATE_UNMOVABLE;

+ /* Cluster high-order atomic allocations together */
+ if (unlikely(order > 0) &&
+ (!(gfp_flags & __GFP_WAIT) || in_interrupt()))
+ return MIGRATE_HIGHATOMIC;
+
/* Cluster based on mobility */
return (((gfp_flags & __GFP_MOVABLE) != 0) << 1) |
((gfp_flags & __GFP_RECLAIMABLE) != 0);
@@ -713,10 +718,11 @@ static struct page *__rmqueue_smallest(s
* the free lists for the desirable migrate type are depleted
*/
static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
- [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
- [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
- [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
- [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
+ [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_HIGHATOMIC, MIGRATE_RESERVE },
+ [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_HIGHATOMIC, MIGRATE_RESERVE },
+ [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_HIGHATOMIC, MIGRATE_RESERVE },
+ [MIGRATE_HIGHATOMIC] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
+ [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
};

/*
@@ -810,7 +816,9 @@ static struct page *__rmqueue_fallback(s
int current_order;
struct page *page;
int migratetype, i;
+ int nonatomic_fallback_atomic = 0;

+retry:
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1; current_order >= order;
--current_order) {
@@ -820,6 +828,14 @@ static struct page *__rmqueue_fallback(s
/* MIGRATE_RESERVE handled later if necessary */
if (migratetype == MIGRATE_RESERVE)
continue;
+ /*
+ * Make it hard to fallback to blocks used for
+ * high-order atomic allocations
+ */
+ if (migratetype == MIGRATE_HIGHATOMIC &&
+ start_migratetype != MIGRATE_UNMOVABLE &&
+ !nonatomic_fallback_atomic)
+ continue;

area = &(zone->free_area[current_order]);
if (list_empty(&area->free_list[migratetype]))
@@ -845,7 +861,8 @@ static struct page *__rmqueue_fallback(s
start_migratetype);

/* Claim the whole block if over half of it is free */
- if ((pages << current_order) >= (1 << (MAX_ORDER-2)))
+ if ((pages << current_order) >= (1 << (MAX_ORDER-2)) &&
+ migratetype != MIGRATE_HIGHATOMIC)
set_pageblock_migratetype(page,
start_migratetype);

@@ -867,6 +884,12 @@ static struct page *__rmqueue_fallback(s
}
}

+ /* Allow fallback to high-order atomic blocks if memory is that low */
+ if (!nonatomic_fallback_atomic) {
+ nonatomic_fallback_atomic = 1;
+ goto retry;
+ }
+
/* Use MIGRATE_RESERVE rather than fail an allocation */
return __rmqueue_smallest(zone, order, MIGRATE_RESERVE);
}
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-10 23:01:47

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On Fri, 11 May 2007, Mel Gorman wrote:

> Nicholas, could you backout the patch
> dont-group-high-order-atomic-allocations.patch and test again please?
> The following patch has the same effect. Thanks

Great! Thanks.

2007-05-11 05:57:52

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le jeudi 10 mai 2007 à 16:01 -0700, Christoph Lameter a écrit :
> On Fri, 11 May 2007, Mel Gorman wrote:
>
> > Nicholas, could you backout the patch
> > dont-group-high-order-atomic-allocations.patch and test again please?
> > The following patch has the same effect. Thanks
>
> Great! Thanks.

The proposed patch did not apply

+ cd /builddir/build/BUILD
+ rm -rf linux-2.6.21
+ /usr/bin/bzip2 -dc /builddir/build/SOURCES/linux-2.6.21.tar.bz2
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd linux-2.6.21
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ echo 'Patch #2 (2.6.21-mm2.bz2):'
Patch #2 (2.6.21-mm2.bz2):
+ /usr/bin/bzip2 -d
+ patch -p1 -s
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ echo 'Patch #3 (md-improve-partition-detection-in-md-array.patch):'
Patch #3 (md-improve-partition-detection-in-md-array.patch):
+ patch -p1 -R -s
+ echo 'Patch #4 (bug-8464.patch):'
Patch #4 (bug-8464.patch):
+ patch -p1 -s
1 out of 1 hunk FAILED -- saving rejects to file
include/linux/pageblock-flags.h
.rej
6 out of 6 hunks FAILED -- saving rejects to file mm/page_alloc.c.rej

Backing out dont-group-high-order-atomic-allocations.patch worked and
seems to have cured the system so far (need to charge it a bit longer to
be sure)

--
Nicolas Mailhot


Attachments:
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign

2007-05-11 09:09:44

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (11/05/07 07:56), Nicolas Mailhot didst pronounce:
> Le jeudi 10 mai 2007 ? 16:01 -0700, Christoph Lameter a ?crit :
> > On Fri, 11 May 2007, Mel Gorman wrote:
> >
> > > Nicholas, could you backout the patch
> > > dont-group-high-order-atomic-allocations.patch and test again please?
> > > The following patch has the same effect. Thanks
> >
> > Great! Thanks.
>
> The proposed patch did not apply
>
> + cd /builddir/build/BUILD
> + rm -rf linux-2.6.21
> + /usr/bin/bzip2 -dc /builddir/build/SOURCES/linux-2.6.21.tar.bz2
> + tar -xf -
> + STATUS=0
> + '[' 0 -ne 0 ']'
> + cd linux-2.6.21
> ++ /usr/bin/id -u
> + '[' 499 = 0 ']'
> ++ /usr/bin/id -u
> + '[' 499 = 0 ']'
> + /bin/chmod -Rf a+rX,u+w,g-w,o-w .
> + echo 'Patch #2 (2.6.21-mm2.bz2):'
> Patch #2 (2.6.21-mm2.bz2):
> + /usr/bin/bzip2 -d
> + patch -p1 -s
> + STATUS=0
> + '[' 0 -ne 0 ']'
> + echo 'Patch #3 (md-improve-partition-detection-in-md-array.patch):'
> Patch #3 (md-improve-partition-detection-in-md-array.patch):
> + patch -p1 -R -s
> + echo 'Patch #4 (bug-8464.patch):'
> Patch #4 (bug-8464.patch):
> + patch -p1 -s
> 1 out of 1 hunk FAILED -- saving rejects to file
> include/linux/pageblock-flags.h
> .rej
> 6 out of 6 hunks FAILED -- saving rejects to file mm/page_alloc.c.rej
>
> Backing out dont-group-high-order-atomic-allocations.patch worked and

Odd, because they should have been the same thing. As long as it
worked..

> seems to have cured the system so far (need to charge it a bit longer to
> be sure)
>

The longer it runs the better, particularly under load and after
updatedb has run. Thanks a lot for testing

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-11 11:52:10

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le vendredi 11 mai 2007 à 10:08 +0100, Mel Gorman a écrit :

> > seems to have cured the system so far (need to charge it a bit longer to
> > be sure)
> >
>
> The longer it runs the better, particularly under load and after
> updatedb has run. Thanks a lot for testing

After a few hours of load testing still nothing in the logs, so the
revert was probably the right thing to do

--
Nicolas Mailhot


Attachments:
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign

2007-05-11 17:38:22

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (11/05/07 13:51), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 ? 10:08 +0100, Mel Gorman a ?crit :
>
> > > seems to have cured the system so far (need to charge it a bit longer to
> > > be sure)
> > >
> >
> > The longer it runs the better, particularly under load and after
> > updatedb has run. Thanks a lot for testing
>
> After a few hours of load testing still nothing in the logs, so the
> revert was probably the right thing to do

Excellent. I am somewhat suprised by the result so I'd like to look at the
alternative option with kswapd as well. Could you put that patch back in again
please and try the following patch instead? The patch causes kswapd to reclaim
at higher orders if it's requested to. Christoph, can you look at the patch
as well and make sure it's doing the right thing with respect to SLUB please?

Ultimatly, it's probably still a bad plan for atomic allocations to
depend on high-order allocations being possible but it's interesting to
see how it behaves.

Thanks

=====

Subject: [RFC] Have kswapd keep a minimum order free other than order-0

kswapd normally reclaims at order 0 unless there is a higher-order allocation
under way. However, in some cases, it is known that there are a minimum
order size of general interest such as the SLUB allocator requiring regular
high-order allocations. This allows a minimum order to be set to that
min_free_kbytes is kept at higher orders.

With a simple stress test, buddyinfo looks like this at the end with kswapd at ddefault;

Node 0, zone DMA 10 12 11 10 10 10 6 5 4 1 0
Node 0, zone Normal 87 232 601 490 369 282 197 116 79 39 28

With kswapd attempting to keep pages free at order-4, it looks like

Node 0, zone DMA 35 37 29 28 22 18 8 6 0 1 0
Node 0, zone Normal 96 203 361 355 265 203 141 97 57 34 48

---
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/mmzone.h linux-2.6.21-mm2-kswapd_minorder/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/include/linux/mmzone.h 2007-05-11 11:12:43.000000000 +0100
@@ -499,6 +499,8 @@ typedef struct pglist_data {
void get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free);
void build_all_zonelists(void);
+int kswapd_order(unsigned int order);
+void set_kswapd_order(unsigned int order);
void wakeup_kswapd(struct zone *zone, int order);
int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/slub.c linux-2.6.21-mm2-kswapd_minorder/mm/slub.c
--- linux-2.6.21-mm2-clean/mm/slub.c 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/mm/slub.c 2007-05-11 11:10:08.000000000 +0100
@@ -2131,6 +2131,7 @@ static struct kmem_cache *kmalloc_caches
static int __init setup_slub_min_order(char *str)
{
get_option (&str, &slub_min_order);
+ set_kswapd_order(slub_min_order);
user_override = 1;
return 1;
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/vmscan.c linux-2.6.21-mm2-kswapd_minorder/mm/vmscan.c
--- linux-2.6.21-mm2-clean/mm/vmscan.c 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/mm/vmscan.c 2007-05-11 11:55:45.000000000 +0100
@@ -1407,6 +1407,32 @@ out:
return nr_reclaimed;
}

+static unsigned int kswapd_min_order __read_mostly;
+
+/**
+ * set_kswapd_order - Set the minimum order that kswapd reclaims at
+ * @order: The new minimum order
+ *
+ * kswapd normally reclaims at order 0 unless there is a higher-order
+ * allocation under way. However, in some cases, it is known that there
+ * are a minimum order size of general interest such as the SLUB allocator
+ * requiring regular high-order allocations. This allows a minimum order
+ * to be set to that min_free_kbytes is kept at higher orders
+ */
+void set_kswapd_order(unsigned int order)
+{
+ if (order >= MAX_ORDER)
+ return;
+
+ printk(KERN_INFO "kswapd reclaim order set to %d\n", order);
+ kswapd_min_order = order;
+}
+
+int kswapd_order(unsigned int order)
+{
+ return max(kswapd_min_order, order);
+}
+
/*
* The background pageout daemon, started as a kernel thread
* from the init process.
@@ -1450,13 +1476,13 @@ static int kswapd(void *p)
*/
tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;

- order = 0;
+ order = kswapd_order(0);
for ( ; ; ) {
unsigned long new_order;

prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
- new_order = pgdat->kswapd_max_order;
- pgdat->kswapd_max_order = 0;
+ new_order = kswapd_order(pgdat->kswapd_max_order);
+ pgdat->kswapd_max_order = kswapd_order(0);
if (order < new_order) {
/*
* Don't sleep if someone wants a larger 'order'
@@ -1467,7 +1493,7 @@ static int kswapd(void *p)
if (!freezing(current))
schedule();

- order = pgdat->kswapd_max_order;
+ order = kswapd_order(pgdat->kswapd_max_order);
}
finish_wait(&pgdat->kswapd_wait, &wait);

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-11 17:46:23

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le vendredi 11 mai 2007 à 18:38 +0100, Mel Gorman a écrit :
> On (11/05/07 13:51), Nicolas Mailhot didst pronounce:
> > Le vendredi 11 mai 2007 à 10:08 +0100, Mel Gorman a écrit :
> >
> > > > seems to have cured the system so far (need to charge it a bit longer to
> > > > be sure)
> > > >
> > >
> > > The longer it runs the better, particularly under load and after
> > > updatedb has run. Thanks a lot for testing
> >
> > After a few hours of load testing still nothing in the logs, so the
> > revert was probably the right thing to do
>
> Excellent. I am somewhat suprised by the result

And you're probably right, it just banged after a day working fine

19:20:00 tar: page allocation failure. order:2, mode:0x84020
19:20:00
19:20:00 Call Trace:
19:20:00 [<ffffffff8025b5c3>] __alloc_pages+0x2aa/0x2c3
19:20:00 [<ffffffff802751f5>] __slab_alloc+0x196/0x586
19:20:00 [<ffffffff80300d79>] radix_tree_node_alloc+0x36/0x7e
19:20:00 [<ffffffff8027597a>] kmem_cache_alloc+0x32/0x4e
19:20:00 [<ffffffff80300d79>] radix_tree_node_alloc+0x36/0x7e
19:20:00 [<ffffffff8030118e>] radix_tree_insert+0x5d/0x18c
19:20:00 [<ffffffff80256ac4>] add_to_page_cache+0x3d/0x95
19:20:00 [<ffffffff80257aa4>] generic_file_buffered_write+0x222/0x7c8
19:20:00 [<ffffffff88013c74>] :jbd:do_get_write_access+0x506/0x53d
19:20:00 [<ffffffff8022c7d5>] current_fs_time+0x3b/0x40
19:20:00 [<ffffffff8025838c>] __generic_file_aio_write_nolock+0x342/0x3ac
19:20:00 [<ffffffff80416ac1>] __mutex_lock_slowpath+0x216/0x221
19:20:00 [<ffffffff80258457>] generic_file_aio_write+0x61/0xc1
19:20:00 [<ffffffff880271be>] :ext3:ext3_file_write+0x16/0x94
19:20:00 [<ffffffff8027938c>] do_sync_write+0xc9/0x10c
19:20:00 [<ffffffff80239c56>] autoremove_wake_function+0x0/0x2e
19:20:00 [<ffffffff80279ba7>] vfs_write+0xce/0x177
19:20:00 [<ffffffff8027a16a>] sys_write+0x45/0x6e
19:20:00 [<ffffffff8020955c>] tracesys+0xdc/0xe1
19:20:00
19:20:00 Mem-info:
19:20:00 DMA per-cpu:
19:20:00 CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
19:20:00 CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
19:20:00 DMA32 per-cpu:
19:20:00 CPU 0: Hot: hi: 186, btch: 31 usd: 149 Cold: hi: 62, btch: 15 usd: 19
19:20:00 CPU 1: Hot: hi: 186, btch: 31 usd: 147 Cold: hi: 62, btch: 15 usd: 2
19:20:00 Active:348968 inactive:105561 dirty:23054 writeback:0 unstable:0
19:20:00 free:9776 slab:28092 mapped:23015 pagetables:10226 bounce:0
19:20:00 DMA free:7960kB min:20kB low:24kB high:28kB active:0kB inactive:0kB present:7648kB pages_scanned:0 all_unreclaimable? yes
19:20:00 lowmem_reserve[]: 0 1988 1988 1988
19:20:00 DMA32 free:31144kB min:5692kB low:7112kB high:8536kB active:1395872kB inactive:422244kB present:2036004kB pages_scanned:0 all_unreclaimable? no
19:20:00 lowmem_reserve[]: 0 0 0 0
19:20:00 DMA: 6*4kB 6*8kB 7*16kB 3*32kB 8*64kB 8*128kB 6*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 7960kB
19:20:00 DMA32: 7560*4kB 0*8kB 8*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 31072kB
19:20:00 Swap cache: add 1527, delete 1521, find 216/286, race 397+0
19:20:00 Free swap = 4192824kB
19:20:00 Total swap = 4192944kB
19:20:00 Free swap: 4192824kB
19:20:00 524272 pages of RAM
19:20:00 14123 reserved pages
19:20:00 252562 pages shared
19:20:00 6 pages swap cached

> so I'd like to look at the
> alternative option with kswapd as well. Could you put that patch back in again
> please and try the following patch instead?

I'll try this one now (if it applies)

Regards,

--
Nicolas Mailhot


Attachments:
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign

2007-05-11 17:46:39

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On Fri, 11 May 2007, Mel Gorman wrote:

> Excellent. I am somewhat suprised by the result so I'd like to look at the
> alternative option with kswapd as well. Could you put that patch back in again
> please and try the following patch instead? The patch causes kswapd to reclaim
> at higher orders if it's requested to. Christoph, can you look at the patch
> as well and make sure it's doing the right thing with respect to SLUB please?

Well this gives the impression that SLUB depends on larger orders. It
*can* take advantage of higher order allocations. No must. It may be a
performance benefit to be able to do higher order allocs though (it is not
really established yet what kind of tradeoffs there are).

Looks fine to me. If this is stable then I want this to be merged ASAP
(deal with the issues later???) .... Good stuff.

2007-05-11 18:31:53

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le vendredi 11 mai 2007 à 19:45 +0200, Nicolas Mailhot a écrit :
> Le vendredi 11 mai 2007 à 18:38 +0100, Mel Gorman a écrit :

> > so I'd like to look at the
> > alternative option with kswapd as well. Could you put that patch back in again
> > please and try the following patch instead?
>
> I'll try this one now (if it applies)

Well it doesn't seem to apply. Are you sure you have a clean tree?
(I have vanilla mm2 + revert of
md-improve-partition-detection-in-md-array.patch for another bug)

+ umask 022
+ cd /builddir/build/BUILD
+ LANG=C
+ export LANG
+ unset DISPLAY
+ cd /builddir/build/BUILD
+ rm -rf linux-2.6.21
+ /usr/bin/bzip2 -dc /builddir/build/SOURCES/linux-2.6.21.tar.bz2
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd linux-2.6.21
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ echo 'Patch #2 (2.6.21-mm2.bz2):'
Patch #2 (2.6.21-mm2.bz2):
+ /usr/bin/bzip2 -d
+ patch -p1 -s
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ echo 'Patch #3 (md-improve-partition-detection-in-md-array.patch):'
Patch #3 (md-improve-partition-detection-in-md-array.patch):
+ patch -p1 -R -s
+ echo 'Patch #4 (bug-8464.patch):'
Patch #4 (bug-8464.patch):
+ patch -p1 -s
1 out of 1 hunk FAILED -- saving rejects to file mm/slub.c.rej
2 out of 3 hunks FAILED -- saving rejects to file mm/vmscan.c.r
--
Nicolas Mailhot


Attachments:
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign

2007-05-11 20:36:21

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (11/05/07 20:30), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 ? 19:45 +0200, Nicolas Mailhot a ?crit :
> > Le vendredi 11 mai 2007 ? 18:38 +0100, Mel Gorman a ?crit :
>
> > > so I'd like to look at the
> > > alternative option with kswapd as well. Could you put that patch back in again
> > > please and try the following patch instead?
> >
> > I'll try this one now (if it applies)
>
> Well it doesn't seem to apply. Are you sure you have a clean tree?
> (I have vanilla mm2 + revert of
> md-improve-partition-detection-in-md-array.patch for another bug)
>

I'm pretty sure I have. I recreated the tree and reverted the same patch as
you and regenerated the diff below. I sent it to myself and it appeared ok
and another automated system was able to use it.

In case it's a mailer problem, the patch can be downloaded from
http://www.csn.ul.ie/~mel/kswapd-minorder.patch . Here is a rediff
against the tree you describe.

Sorry for the confusion.

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-revertmd/include/linux/mmzone.h linux-2.6.21-mm2-kswapdorder/include/linux/mmzone.h
--- linux-2.6.21-mm2-revertmd/include/linux/mmzone.h 2007-05-11 21:16:56.000000000 +0100
+++ linux-2.6.21-mm2-kswapdorder/include/linux/mmzone.h 2007-05-11 21:23:00.000000000 +0100
@@ -499,6 +499,8 @@ typedef struct pglist_data {
void get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free);
void build_all_zonelists(void);
+int kswapd_order(unsigned int order);
+void set_kswapd_order(unsigned int order);
void wakeup_kswapd(struct zone *zone, int order);
int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-revertmd/mm/slub.c linux-2.6.21-mm2-kswapdorder/mm/slub.c
--- linux-2.6.21-mm2-revertmd/mm/slub.c 2007-05-11 21:16:57.000000000 +0100
+++ linux-2.6.21-mm2-kswapdorder/mm/slub.c 2007-05-11 21:23:00.000000000 +0100
@@ -2131,6 +2131,7 @@ static struct kmem_cache *kmalloc_caches
static int __init setup_slub_min_order(char *str)
{
get_option (&str, &slub_min_order);
+ set_kswapd_order(slub_min_order);
user_override = 1;
return 1;
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-revertmd/mm/vmscan.c linux-2.6.21-mm2-kswapdorder/mm/vmscan.c
--- linux-2.6.21-mm2-revertmd/mm/vmscan.c 2007-05-11 21:16:57.000000000 +0100
+++ linux-2.6.21-mm2-kswapdorder/mm/vmscan.c 2007-05-11 21:23:00.000000000 +0100
@@ -1407,6 +1407,32 @@ out:
return nr_reclaimed;
}

+static unsigned int kswapd_min_order __read_mostly;
+
+/**
+ * set_kswapd_order - Set the minimum order that kswapd reclaims at
+ * @order: The new minimum order
+ *
+ * kswapd normally reclaims at order 0 unless there is a higher-order
+ * allocation under way. However, in some cases, it is known that there
+ * are a minimum order size of general interest such as the SLUB allocator
+ * requiring regular high-order allocations. This allows a minimum order
+ * to be set to that min_free_kbytes is kept at higher orders
+ */
+void set_kswapd_order(unsigned int order)
+{
+ if (order >= MAX_ORDER)
+ return;
+
+ printk(KERN_INFO "kswapd reclaim order set to %d\n", order);
+ kswapd_min_order = order;
+}
+
+int kswapd_order(unsigned int order)
+{
+ return max(kswapd_min_order, order);
+}
+
/*
* The background pageout daemon, started as a kernel thread
* from the init process.
@@ -1450,13 +1476,13 @@ static int kswapd(void *p)
*/
tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;

- order = 0;
+ order = kswapd_order(0);
for ( ; ; ) {
unsigned long new_order;

prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
- new_order = pgdat->kswapd_max_order;
- pgdat->kswapd_max_order = 0;
+ new_order = kswapd_order(pgdat->kswapd_max_order);
+ pgdat->kswapd_max_order = kswapd_order(0);
if (order < new_order) {
/*
* Don't sleep if someone wants a larger 'order'
@@ -1467,7 +1493,7 @@ static int kswapd(void *p)
if (!freezing(current))
schedule();

- order = pgdat->kswapd_max_order;
+ order = kswapd_order(pgdat->kswapd_max_order);
}
finish_wait(&pgdat->kswapd_wait, &wait);

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-12 08:13:09

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le vendredi 11 mai 2007 à 21:36 +0100, Mel Gorman a écrit :

> I'm pretty sure I have. I recreated the tree and reverted the same patch as
> you and regenerated the diff below. I sent it to myself and it appeared ok
> and another automated system was able to use it.
>
> In case it's a mailer problem, the patch can be downloaded from
> http://www.csn.ul.ie/~mel/kswapd-minorder.patch .

This one applies, but the kernel still has allocation failures (I just
found rpm -Va was a good trigger). So so far we have two proposed fixes
none of which work

--
Nicolas Mailhot


Attachments:
alloc-failure.txt (60.03 kB)
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign
Download all attachments

2007-05-12 16:42:51

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (12/05/07 10:11), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 ? 21:36 +0100, Mel Gorman a ?crit :
>
> > I'm pretty sure I have. I recreated the tree and reverted the same patch as
> > you and regenerated the diff below. I sent it to myself and it appeared ok
> > and another automated system was able to use it.
> >
> > In case it's a mailer problem, the patch can be downloaded from
> > http://www.csn.ul.ie/~mel/kswapd-minorder.patch .
>
> This one applies, but the kernel still has allocation failures (I just
> found rpm -Va was a good trigger). So so far we have two proposed fixes
> none of which work
>

Sorry about this. What is most preplexing is that the memory was free.
In your log we see;

> May 12 10:00:47 rousalka kernel: DMA: 6*4kB 4*8kB 9*16kB 3*32kB 6*64kB 7*128kB 5*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 7976kB
> May 12 10:00:47 rousalka kernel: DMA32: 2619*4kB 27*8kB 6*16kB 0*32kB 0*64kB 2*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 11556kB

and

> May 12 10:00:47 rousalka kernel: DMA: 6*4kB 4*8kB 9*16kB 3*32kB 6*64kB 7*128kB 5*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 7976kB
> May 12 10:00:47 rousalka kernel: DMA32: 1651*4kB 29*8kB 10*16kB 0*32kB 0*64kB 2*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 7764kB

order-2 (at least 19 pages but more are there) and higher pages were free
and this was a NORMAL allocation. It should also be above watermarks so
something screwy is happening

*peers suspiciously*

Can you try the following patch on top of the kswapd patch please? It is
also available from http://www.csn.ul.ie/~mel/watermarks.patch

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-revertmd/mm/page_alloc.c linux-2.6.21-mm2-watermarks/mm/page_alloc.c
--- linux-2.6.21-mm2-revertmd/mm/page_alloc.c 2007-05-11 21:16:57.000000000 +0100
+++ linux-2.6.21-mm2-watermarks/mm/page_alloc.c 2007-05-12 17:34:10.000000000 +0100
@@ -1627,7 +1627,7 @@ restart:
/* This allocation should allow future memory freeing. */

rebalance:
- if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
+ if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE) || !wait))
&& !in_interrupt()) {
if (!(gfp_mask & __GFP_NOMEMALLOC)) {
nofail_alloc:
@@ -1636,7 +1636,7 @@ nofail_alloc:
zonelist, ALLOC_NO_WATERMARKS);
if (page)
goto got_pg;
- if (gfp_mask & __GFP_NOFAIL) {
+ if (gfp_mask & __GFP_NOFAIL && wait) {
congestion_wait(WRITE, HZ/50);
goto nofail_alloc;
}
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-12 18:09:52

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le samedi 12 mai 2007 à 17:42 +0100, Mel Gorman a écrit :

> order-2 (at least 19 pages but more are there) and higher pages were free
> and this was a NORMAL allocation. It should also be above watermarks so
> something screwy is happening
>
> *peers suspiciously*
>
> Can you try the following patch on top of the kswapd patch please? It is
> also available from http://www.csn.ul.ie/~mel/watermarks.patch

Ok, testing now

--
Nicolas Mailhot


Attachments:
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign

2007-05-12 18:59:59

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le samedi 12 mai 2007 à 20:09 +0200, Nicolas Mailhot a écrit :
> Le samedi 12 mai 2007 à 17:42 +0100, Mel Gorman a écrit :
>
> > order-2 (at least 19 pages but more are there) and higher pages were free
> > and this was a NORMAL allocation. It should also be above watermarks so
> > something screwy is happening
> >
> > *peers suspiciously*
> >
> > Can you try the following patch on top of the kswapd patch please? It is
> > also available from http://www.csn.ul.ie/~mel/watermarks.patch
>
> Ok, testing now

And this one failed testing too

--
Nicolas Mailhot


Attachments:
crash.txt (57.10 kB)
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign
Download all attachments

2007-05-12 19:24:20

by mel

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (12/05/07 20:58), Nicolas Mailhot didst pronounce:
> Le samedi 12 mai 2007 ? 20:09 +0200, Nicolas Mailhot a ?crit :
> > Le samedi 12 mai 2007 ? 17:42 +0100, Mel Gorman a ?crit :
> >
> > > order-2 (at least 19 pages but more are there) and higher pages were free
> > > and this was a NORMAL allocation. It should also be above watermarks so
> > > something screwy is happening
> > >
> > > *peers suspiciously*
> > >
> > > Can you try the following patch on top of the kswapd patch please? It is
> > > also available from http://www.csn.ul.ie/~mel/watermarks.patch
> >
> > Ok, testing now
>
> And this one failed testing too

And same thing, you have suitable free memory. The last patch was
wrong because I forgot the !in_interrupt() part which was careless
and dumb. Please try the following, again on top of the kswapd patch -
http://www.csn.ul.ie/~mel/watermarks-v2.patch

Thanks for all the testing, it's appreciated.

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-revertmd/mm/page_alloc.c linux-2.6.21-mm2-watermarks/mm/page_alloc.c
--- linux-2.6.21-mm2-revertmd/mm/page_alloc.c 2007-05-11 21:16:57.000000000 +0100
+++ linux-2.6.21-mm2-watermarks/mm/page_alloc.c 2007-05-12 20:20:19.000000000 +0100
@@ -1645,8 +1645,16 @@ nofail_alloc:
}

/* Atomic allocations - we can't balance anything */
- if (!wait)
+ if (!wait) {
+
+ /* Attempt to allocate ignoring watermarks */
+ page = get_page_from_freelist(gfp_mask, order,
+ zonelist, ALLOC_NO_WATERMARKS);
+ if (page)
+ goto got_pg;
+
goto nopage;
+ }

cond_resched();

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-05-13 08:17:41

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

Le samedi 12 mai 2007 à 20:24 +0100, Mel Gorman a écrit :
> On (12/05/07 20:58), Nicolas Mailhot didst pronounce:
> > Le samedi 12 mai 2007 à 20:09 +0200, Nicolas Mailhot a écrit :
> > > Le samedi 12 mai 2007 à 17:42 +0100, Mel Gorman a écrit :
> > >
> > > > order-2 (at least 19 pages but more are there) and higher pages were free
> > > > and this was a NORMAL allocation. It should also be above watermarks so
> > > > something screwy is happening
> > > >
> > > > *peers suspiciously*
> > > >
> > > > Can you try the following patch on top of the kswapd patch please? It is
> > > > also available from http://www.csn.ul.ie/~mel/watermarks.patch

> > And this one failed testing too
>
> And same thing, you have suitable free memory. The last patch was
> wrong because I forgot the !in_interrupt() part which was careless
> and dumb. Please try the following, again on top of the kswapd patch -
> http://www.csn.ul.ie/~mel/watermarks-v2.patch

This one survived 12h of testing so far.

Regards,

--
Nicolas Mailhot


Attachments:
signature.asc (197.00 B)
Ceci est une partie de message num?riquement sign