2005-05-16 20:06:53

by christoph

[permalink] [raw]
Subject: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

Memory for nodes on i386 is currently aligned on 2 MB boundaries.
However, the buddy allocator needs pages to be aligned on
PAGE_SIZE << MAX_ORDER which is 8MB if MAX_ORDER = 11.

The following patch determines the maximum alignment needed and will
chose 2MB boundaries if MAX_ORDER is very low but will align correctly
to a MAX_ORDER boundary if the alignment requirements of the page allocator
are greater than 2MB.

Signed-off-by: Christoph Lameter <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>

Index: linux-2.6.11/arch/i386/mm/discontig.c
===================================================================
--- linux-2.6.11.orig/arch/i386/mm/discontig.c 2005-05-12 20:04:56.000000000 -0700
+++ linux-2.6.11/arch/i386/mm/discontig.c 2005-05-12 20:05:58.000000000 -0700
@@ -100,8 +100,6 @@ extern unsigned long max_low_pfn;
extern unsigned long totalram_pages;
extern unsigned long totalhigh_pages;

-#define LARGE_PAGE_BYTES (PTRS_PER_PTE * PAGE_SIZE)
-
unsigned long node_remap_start_pfn[MAX_NUMNODES];
unsigned long node_remap_size[MAX_NUMNODES];
unsigned long node_remap_offset[MAX_NUMNODES];
@@ -185,6 +183,12 @@ static unsigned long calculate_numa_rema
{
int nid;
unsigned long size, reserve_pages = 0;
+ /*
+ * Alignment is either to the PMD boundary or to the boundary of the
+ * largest order allowed by the buddy allocator whichever is bigger
+ */
+ unsigned long alignment = (( 1 << MAX_ORDER ) > PTRS_PER_PTE) ?
+ 1 << MAX_ORDER : PTRS_PER_PTE;

for_each_online_node(nid) {
if (nid == 0)
@@ -204,10 +208,10 @@ static unsigned long calculate_numa_rema
/* ensure the remap includes space for the pgdat. */
size = node_remap_size[nid] + sizeof(pg_data_t);

- /* convert size to large (pmd size) pages, rounding up */
- size = (size + LARGE_PAGE_BYTES - 1) / LARGE_PAGE_BYTES;
+ /* convert size to the alignment, rounding up */
+ size = (size + (alignment * PAGE_SIZE) - 1) / (alignment * PAGE_SIZE);
/* now the roundup is correct, convert to PAGE_SIZE pages */
- size = size * PTRS_PER_PTE;
+ size = size * alignment;
printk("Reserving %ld pages of KVA for lmem_map of node %d\n",
size, nid);
node_remap_size[nid] = size;


2005-05-16 20:19:17

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Mon, 2005-05-16 at 12:05 -0700, christoph wrote:
> Memory for nodes on i386 is currently aligned on 2 MB boundaries.
> However, the buddy allocator needs pages to be aligned on
> PAGE_SIZE << MAX_ORDER which is 8MB if MAX_ORDER = 11.

Why do you need this? Are you planning on allowing NUMA KVA remap pages
to be handed over to the buddy allocator? That would be a major
departure from what we do now, and I'd be very interested in seeing how
that is implemented before a infrastructure for it goes in.

BTW, how sure are you that those alignment restrictions really still
exist? Some of them went away when we got rid of the buddy bitmap. You
might want to check that you definitely need this.

-- Dave

2005-05-16 20:41:38

by christoph

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Mon, 16 May 2005, Dave Hansen wrote:

> On Mon, 2005-05-16 at 12:05 -0700, christoph wrote:
> > Memory for nodes on i386 is currently aligned on 2 MB boundaries.
> > However, the buddy allocator needs pages to be aligned on
> > PAGE_SIZE << MAX_ORDER which is 8MB if MAX_ORDER = 11.
>
> Why do you need this? Are you planning on allowing NUMA KVA remap pages
> to be handed over to the buddy allocator? That would be a major
> departure from what we do now, and I'd be very interested in seeing how
> that is implemented before a infrastructure for it goes in.

Because the buddy allocator is complaining about wrongly allocated zones!

in page_alloc.c:

static void __init free_area_init_core(struct pglist_data *pgdat,
unsigned long *zones_size, unsigned long *zholes_size)
{
...

const unsigned long zone_required_alignment = 1UL << (MAX_ORDER-1);

...

if ((zone_start_pfn) & (zone_required_alignment-1))
printk(KERN_CRIT "BUG: wrong zone alignment, it will crash\n");


2005-05-16 20:47:44

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Mon, 2005-05-16 at 12:43 -0700, christoph wrote:
> On Mon, 16 May 2005, Dave Hansen wrote:
> > On Mon, 2005-05-16 at 12:05 -0700, christoph wrote:
> > > Memory for nodes on i386 is currently aligned on 2 MB boundaries.
> > > However, the buddy allocator needs pages to be aligned on
> > > PAGE_SIZE << MAX_ORDER which is 8MB if MAX_ORDER = 11.
> >
> > Why do you need this? Are you planning on allowing NUMA KVA remap pages
> > to be handed over to the buddy allocator? That would be a major
> > departure from what we do now, and I'd be very interested in seeing how
> > that is implemented before a infrastructure for it goes in.
>
> Because the buddy allocator is complaining about wrongly allocated zones!

Just because it complains doesn't mean that anything is actually
wrong :)

Do you know which pieces of code actually break if the alignment doesn't
meet what that warning says?

-- Dave

2005-05-16 20:51:52

by Martin Bligh

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment



--On Monday, May 16, 2005 12:43:17 -0700 christoph <[email protected]> wrote:

> On Mon, 16 May 2005, Dave Hansen wrote:
>
>> On Mon, 2005-05-16 at 12:05 -0700, christoph wrote:
>> > Memory for nodes on i386 is currently aligned on 2 MB boundaries.
>> > However, the buddy allocator needs pages to be aligned on
>> > PAGE_SIZE << MAX_ORDER which is 8MB if MAX_ORDER = 11.
>>
>> Why do you need this? Are you planning on allowing NUMA KVA remap pages
>> to be handed over to the buddy allocator? That would be a major
>> departure from what we do now, and I'd be very interested in seeing how
>> that is implemented before a infrastructure for it goes in.
>
> Because the buddy allocator is complaining about wrongly allocated zones!
>
> in page_alloc.c:
>
> static void __init free_area_init_core(struct pglist_data *pgdat,
> unsigned long *zones_size, unsigned long *zholes_size)
> {
> ...
>
> const unsigned long zone_required_alignment = 1UL << (MAX_ORDER-1);
>
> ...
>
> if ((zone_start_pfn) & (zone_required_alignment-1))
> printk(KERN_CRIT "BUG: wrong zone alignment, it will crash\n");

IIRC, we decided that warning was worthless ... and I *think* Andy fixed
it to do non-aligned zones, though it might have been someone else. Andy,
can you remember what you did to fix this up?

M.


2005-05-16 20:54:14

by christoph

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Mon, 16 May 2005, Dave Hansen wrote:

> > Because the buddy allocator is complaining about wrongly allocated zones!
>
> Just because it complains doesn't mean that anything is actually
> wrong :)
>
> Do you know which pieces of code actually break if the alignment doesn't
> meet what that warning says?

I have seen nothing break but 4 MB allocations f.e. will not be allocated
on a 4MB boundary with a 2 MB zone alignment. The page allocator always
returnes properly aligned pages but 4MB allocations are an exception?

Some present or future hardware device or some other code may find that
surprising and crash.

2005-05-16 20:57:33

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Mon, 2005-05-16 at 12:55 -0700, christoph wrote:
> On Mon, 16 May 2005, Dave Hansen wrote:
> > > Because the buddy allocator is complaining about wrongly allocated zones!
> >
> > Just because it complains doesn't mean that anything is actually
> > wrong :)
> >
> > Do you know which pieces of code actually break if the alignment doesn't
> > meet what that warning says?
>
> I have seen nothing break but 4 MB allocations f.e. will not be allocated
> on a 4MB boundary with a 2 MB zone alignment. The page allocator always
> returnes properly aligned pages but 4MB allocations are an exception?

I wasn't aware there was an alignment exception in the allocator for 4MB
pages. Could you provide some examples?

-- Dave

2005-05-16 21:09:21

by Martin Bligh

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment



--On Monday, May 16, 2005 12:55:39 -0700 christoph <[email protected]> wrote:

> On Mon, 16 May 2005, Dave Hansen wrote:
>
>> > Because the buddy allocator is complaining about wrongly allocated zones!
>>
>> Just because it complains doesn't mean that anything is actually
>> wrong :)
>>
>> Do you know which pieces of code actually break if the alignment doesn't
>> meet what that warning says?
>
> I have seen nothing break but 4 MB allocations f.e. will not be allocated
> on a 4MB boundary with a 2 MB zone alignment. The page allocator always
> returnes properly aligned pages but 4MB allocations are an exception?
>
> Some present or future hardware device or some other code may find that
> surprising and crash.

Now that it's fixed it is meant to notice that the start of the zone is
not aligned, and not key off that, but the aligment itself ... the start
and end roundoff bits shouldn't throw the rest out of alignment, as long
as we do the calculation sensibly for how we regroup buddies.

M.

2005-05-16 21:13:40

by christoph

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Mon, 16 May 2005, Dave Hansen wrote:

> > > Do you know which pieces of code actually break if the alignment doesn't
> > > meet what that warning says?
> >
> > I have seen nothing break but 4 MB allocations f.e. will not be allocated
> > on a 4MB boundary with a 2 MB zone alignment. The page allocator always
> > returnes properly aligned pages but 4MB allocations are an exception?
>
> I wasn't aware there was an alignment exception in the allocator for 4MB
> pages. Could you provide some examples?

I never said that there was an aligment exception. The special case for
4MB pages is created by the failure to properly align the zones in
discontig.c.

But may be that is okay? Then we just need to remove the lines that
detect the misalignment in the page allocator.

2005-05-17 12:25:41

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

> if ((zone_start_pfn) & (zone_required_alignment-1))
> printk(KERN_CRIT "BUG: wrong zone alignment, it will crash\n");

I am confused. Your patch attempts to change the alignment of the
mem_map for the node. However, the warning which is triggering is
talking about the relative alignment of the physical pages which
make up the zone, ie those which will be represented by the mem_map.
The alignment of the mem_map is being forced because we are going
to use large pages to map them for performance and tlb coverage
not to match the alignment constraint above.

That said, I believe that this warning is no longer accurate.
The order in which buddies are combined and the alignment thereof
is handled by __free_pages_bulk. This now uses the low order
bits of the physical page frame number to calculate the free'd
objects relative position within the MAX_ORDER aligned buddies,
not the relative position of the page structure within the mem_map.
This allows _either_ edge of the zone to contain partial MAX_ORDER
sized buddies. These simply never will have matching buddies and
thus will never make it to the 'top' of the pyramid.

Indeed looking back at the original patch I commented on how this
change fixed the problem highlighted by the warning, it seems I
failed to follow up and remove it.

In short I think that this warning is now bogus and can be removed.

Andrew, please consider this patch for -mm.

-apw

===
Originally __free_pages_bulk used the relative page number within
a zone to define its buddies. This meant that to maintain the
"maximally aligned" requirements (that an allocation of size N will
be aligned at least to N physically) zones had to also be aligned to
1<<MAX_ORDER pages. When __free_pages_bulk was updated to use the
relative page frame numbers of the free'd pages to pair buddies this
released the alignment constraint on the 'left' edge of the zone.
This allows _either_ edge of the zone to contain partial MAX_ORDER
sized buddies. These simply never will have matching buddies and
thus will never make it to the 'top' of the pyramid.

The patch below removes a now redundant check ensuring that the
mem_map was aligned to MAX_ORDER.

Signed-off-by: Andy Whitcroft <[email protected]>

diffstat free_area_init_core-remove-bogus-warning
---
page_alloc.c | 4 ----
1 files changed, 4 deletions(-)

diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/mm/page_alloc.c current/mm/page_alloc.c
--- reference/mm/page_alloc.c
+++ current/mm/page_alloc.c
@@ -1942,7 +1942,6 @@ static void __init free_area_init_core(s
unsigned long *zones_size, unsigned long *zholes_size)
{
unsigned long i, j;
- const unsigned long zone_required_alignment = 1UL << (MAX_ORDER-1);
int cpu, nid = pgdat->node_id;
unsigned long zone_start_pfn = pgdat->node_start_pfn;

@@ -2033,9 +2032,6 @@ static void __init free_area_init_core(s
zone->zone_mem_map = pfn_to_page(zone_start_pfn);
zone->zone_start_pfn = zone_start_pfn;

- if ((zone_start_pfn) & (zone_required_alignment-1))
- printk(KERN_CRIT "BUG: wrong zone alignment, it will crash\n");
-
memmap_init(size, nid, j, zone_start_pfn);

zonetable_add(zone, nid, j, zone_start_pfn, size);

2005-05-17 13:12:23

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Mon, May 16, 2005 at 01:47:19PM -0700, Dave Hansen wrote:
> Just because it complains doesn't mean that anything is actually
> wrong :)
>
> Do you know which pieces of code actually break if the alignment doesn't
> meet what that warning says?

Be sure in early 2001 the alpha wildfire wasn't booting without having
natural alingment from the 2^order allocation, after several days of
debugging and crashing eventually I figured it out and added the printk
(it couldn't be a BUG since it was early in the boot to see it). The
kernel stack on x86 w/o 4k stacks depends on the natural alignment of
the 2^order buddy allocations for example. No idea how much other code
would break with not naturally aligned 2^order allocations.

2005-05-17 14:18:20

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

Andrea Arcangeli wrote:
> On Mon, May 16, 2005 at 01:47:19PM -0700, Dave Hansen wrote:
>
>>Just because it complains doesn't mean that anything is actually
>>wrong :)
>>
>>Do you know which pieces of code actually break if the alignment doesn't
>>meet what that warning says?
>
>
> Be sure in early 2001 the alpha wildfire wasn't booting without having
> natural alingment from the 2^order allocation, after several days of
> debugging and crashing eventually I figured it out and added the printk
> (it couldn't be a BUG since it was early in the boot to see it). The
> kernel stack on x86 w/o 4k stacks depends on the natural alignment of
> the 2^order buddy allocations for example. No idea how much other code
> would break with not naturally aligned 2^order allocations.

Absolutly there are cases which will break if the alignment of
allocations arn't correct. The key here is the free algorithm will now
correctly merge buddies at the physical alignement. This allows the
boundries of the zones to be miss-aligned. Partial pages simply have no
buddies at the nigher level and do not coalesce. The warning is
checking for such a miss-alignment and now is no longer required.

-apw

2005-05-17 18:18:27

by christoph

[permalink] [raw]
Subject: Re: [PATCH] Factor in buddy allocator alignment requirements in node memory alignment

On Tue, 17 May 2005, Andy Whitcroft wrote:

> Andrew, please consider this patch for -mm.

I agree. Forget about my patch and include this one.

> Originally __free_pages_bulk used the relative page number within
> a zone to define its buddies. This meant that to maintain the
> "maximally aligned" requirements (that an allocation of size N will
> be aligned at least to N physically) zones had to also be aligned to
> 1<<MAX_ORDER pages. When __free_pages_bulk was updated to use the
> relative page frame numbers of the free'd pages to pair buddies this
> released the alignment constraint on the 'left' edge of the zone.
> This allows _either_ edge of the zone to contain partial MAX_ORDER
> sized buddies. These simply never will have matching buddies and
> thus will never make it to the 'top' of the pyramid.
>
> The patch below removes a now redundant check ensuring that the
> mem_map was aligned to MAX_ORDER.
>
> Signed-off-by: Andy Whitcroft <[email protected]>
>
> diffstat free_area_init_core-remove-bogus-warning
> ---
> page_alloc.c | 4 ----
> 1 files changed, 4 deletions(-)
>
> diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/mm/page_alloc.c current/mm/page_alloc.c
> --- reference/mm/page_alloc.c
> +++ current/mm/page_alloc.c
> @@ -1942,7 +1942,6 @@ static void __init free_area_init_core(s
> unsigned long *zones_size, unsigned long *zholes_size)
> {
> unsigned long i, j;
> - const unsigned long zone_required_alignment = 1UL << (MAX_ORDER-1);
> int cpu, nid = pgdat->node_id;
> unsigned long zone_start_pfn = pgdat->node_start_pfn;
>
> @@ -2033,9 +2032,6 @@ static void __init free_area_init_core(s
> zone->zone_mem_map = pfn_to_page(zone_start_pfn);
> zone->zone_start_pfn = zone_start_pfn;
>
> - if ((zone_start_pfn) & (zone_required_alignment-1))
> - printk(KERN_CRIT "BUG: wrong zone alignment, it will crash\n");
> -
> memmap_init(size, nid, j, zone_start_pfn);
>
> zonetable_add(zone, nid, j, zone_start_pfn, size);
>
>
>