LinuxLists.cc - 2.6.23 boot failures on x86-64.

2007-10-29 17:51:24

Subject: 2.6.23 boot failures on x86-64.

We've had a number of people reporting that their x86-64s stopped booting
when they moved to 2.6.23. It rebooted just after discovering the AGP bridge
as a result of the IOMMU init.

Martin tracked this down to the following commit.

commit 2e1c49db4c640b35df13889b86b9d62215ade4b6
Author: Zou Nan hai <[email protected]>
Date: Fri Jun 1 00:46:28 2007 -0700

x86_64: allocate sparsemem memmap above 4G

On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.

There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.

This patch add fix to sparsemem model by first try to allocate memmap above
4G.

Signed-off-by: Zou Nan hai <[email protected]>
Acked-by: Suresh Siddha <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

This should be probably be reverted for 2.6.23-stable, and either fixed
properly in .24, or reverted there too.

More info at https://bugzilla.redhat.com/show_bug.cgi?id=249174

Dave

--
http://www.codemonkey.org.uk

2007-10-29 18:05:41

by Greg KH

[permalink] [raw]

Subject: Re: [stable] 2.6.23 boot failures on x86-64.

On Mon, Oct 29, 2007 at 01:50:14PM -0400, Dave Jones wrote:
> We've had a number of people reporting that their x86-64s stopped booting
> when they moved to 2.6.23. It rebooted just after discovering the AGP bridge
> as a result of the IOMMU init.
>
> Martin tracked this down to the following commit.
>
>
> commit 2e1c49db4c640b35df13889b86b9d62215ade4b6
> Author: Zou Nan hai <[email protected]>
> Date: Fri Jun 1 00:46:28 2007 -0700
>
> x86_64: allocate sparsemem memmap above 4G
>
> On systems with huge amount of physical memory, VFS cache and memory memmap
> may eat all available system memory under 4G, then the system may fail to
> allocate swiotlb bounce buffer.
>
> There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
> not cover sparsemem model.
>
> This patch add fix to sparsemem model by first try to allocate memmap above
> 4G.
>
> Signed-off-by: Zou Nan hai <[email protected]>
> Acked-by: Suresh Siddha <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
>
> This should be probably be reverted for 2.6.23-stable, and either fixed
> properly in .24, or reverted there too.

I'll be glad to revert it in -stable, if it's also reverted in Linus's
tree first :)

thanks,

greg k-h

2007-10-29 18:19:01

by Andi Kleen

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Monday 29 October 2007 18:50:14 Dave Jones wrote:
> We've had a number of people reporting that their x86-64s stopped booting
> when they moved to 2.6.23. It rebooted just after discovering the AGP bridge
> as a result of the IOMMU init.

It's probably the usual "nobody tests sparsemem at all" issue.

But if allocating bootmem >4G doesn't work on these systems
most likely they have more problems anyways. It might be better
to find out what goes wrong exactly.

-Andi

2007-10-29 18:39:43

by Linus Torvalds

[permalink] [raw]

Subject: Re: [stable] 2.6.23 boot failures on x86-64.

On Mon, 29 Oct 2007, Greg KH wrote:
>
> I'll be glad to revert it in -stable, if it's also reverted in Linus's
> tree first :)

We've had some changes since 2.6.23, and afaik, the
"alloc_bootmem_high_node()" code is alreadt effectively dead there. It's
only called if CONFIG_SPARSEMEM_VMEMMAP is *not* enabled, and I *think* we
enable it by force on x86-64 these days.

More people added to Cc, just to clarify whether I'm just confused.

Andy, Christoph, Mel: commit 2e1c49db4c640b35df13889b86b9d62215ade4b6 aka
"x86_64: allocate sparsemem memmap above 4G" is the one that causes the
failures, just fyi.

Martin - it would be great if you could try out your failing machine with
2.6.24-rc1 (or a nightly snapshot or current git.. the more recent the
better).

But if I'm right, that commit should be reverted from 2.6.24 just because
it's pointless (even if the bug itself is gone). And if I'm wrong, it
should be reverted. So something like the appended would make sense
regardless.

Can I get a "tested-by"? And/or ack/nack's on my half-arsed theory above?

Linus
--
From: Linus Torvalds <[email protected]>

Revert "x86_64: allocate sparsemem memmap above 4G"

This reverts commit 2e1c49db4c640b35df13889b86b9d62215ade4b6, since
testing in Fedora has shown it to cause boot failures, as per Dave
Jones. Bisected down by Martin Ebourne.

Cc: Dave Jones <[email protected]>
Cc: Martin Ebourne <[email protected]>
Cc: Zou Nan hai <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 1e3862e..a7308b2 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -728,12 +728,6 @@ int in_gate_area_no_task(unsigned long addr)
return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
}

-void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
-{
- return __alloc_bootmem_core(pgdat->bdata, size,
- SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
-}
-
const char *arch_vma_name(struct vm_area_struct *vma)
{
if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index c83534e..0365ec9 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -59,7 +59,6 @@ extern void *__alloc_bootmem_core(struct bootmem_data *bdata,
unsigned long align,
unsigned long goal,
unsigned long limit);
-extern void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size);

#ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
extern void reserve_bootmem(unsigned long addr, unsigned long size);
diff --git a/mm/sparse.c b/mm/sparse.c
index 08fb14f..e06f514 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -220,12 +220,6 @@ static int __meminit sparse_init_one_section(struct mem_section *ms,
return 1;
}

-__attribute__((weak)) __init
-void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
-{
- return NULL;
-}
-
static unsigned long usemap_size(void)
{
unsigned long size_bytes;
@@ -267,11 +261,6 @@ struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
if (map)
return map;

- map = alloc_bootmem_high_node(NODE_DATA(nid),
- sizeof(struct page) * PAGES_PER_SECTION);
- if (map)
- return map;
-
map = alloc_bootmem_node(NODE_DATA(nid),
sizeof(struct page) * PAGES_PER_SECTION);
return map;

2007-10-29 18:48:33

by Dave Jones

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Mon, Oct 29, 2007 at 07:18:43PM +0100, Andi Kleen wrote:
> On Monday 29 October 2007 18:50:14 Dave Jones wrote:
> > We've had a number of people reporting that their x86-64s stopped booting
> > when they moved to 2.6.23. It rebooted just after discovering the AGP bridge
> > as a result of the IOMMU init.
>
> It's probably the usual "nobody tests sparsemem at all" issue.

We've been using SPARSEMEM in Fedora for a *long* time.
So long in fact, I forget why we moved away from DISCONTIGMEM, so there's
a significant number of users using that configuration for some time.

> But if allocating bootmem >4G doesn't work on these systems
> most likely they have more problems anyways. It might be better
> to find out what goes wrong exactly.

Any ideas on what to instrument ?

Dave

--
http://www.codemonkey.org.uk

2007-10-29 19:03:26

by Andi Kleen

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Monday 29 October 2007 19:47:47 Dave Jones wrote:
> On Mon, Oct 29, 2007 at 07:18:43PM +0100, Andi Kleen wrote:
> > On Monday 29 October 2007 18:50:14 Dave Jones wrote:
> > > We've had a number of people reporting that their x86-64s stopped booting
> > > when they moved to 2.6.23. It rebooted just after discovering the AGP bridge
> > > as a result of the IOMMU init.
> >
> > It's probably the usual "nobody tests sparsemem at all" issue.
>
> We've been using SPARSEMEM in Fedora for a *long* time.
> So long in fact, I forget why we moved away from DISCONTIGMEM, so there's
> a significant number of users using that configuration for some time.

Supposedly you wanted a slower kernel that needs more memory?

Ok I wasn't aware of that. I tended to get sparsemem reports usually
at least 1-2 releases after the fact, so it looked like it was undertested.

>
> > But if allocating bootmem >4G doesn't work on these systems
> > most likely they have more problems anyways. It might be better
> > to find out what goes wrong exactly.
>
> Any ideas on what to instrument ?

See what address the bootmem_alloc_high returns; check if it overlaps
with something etc.

Fill the memory on the system and see if it can access all of its memory.

-Andi

2007-10-29 19:46:57

by Dave Jones

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Mon, Oct 29, 2007 at 08:03:09PM +0100, Andi Kleen wrote:

> > > It's probably the usual "nobody tests sparsemem at all" issue.
> >
> > We've been using SPARSEMEM in Fedora for a *long* time.
> > So long in fact, I forget why we moved away from DISCONTIGMEM, so there's
> > a significant number of users using that configuration for some time.
>
> Supposedly you wanted a slower kernel that needs more memory?
>
> Ok I wasn't aware of that. I tended to get sparsemem reports usually
> at least 1-2 releases after the fact, so it looked like it was undertested.

Looking at cvs history, I can't figure out what the reasoning was,
but every Fedora (and RHEL5) kernel since 2006/07/05 has been that way.

Curious how no-one noticed either of the side-effects you mention.

> > > But if allocating bootmem >4G doesn't work on these systems
> > > most likely they have more problems anyways. It might be better
> > > to find out what goes wrong exactly.
> > Any ideas on what to instrument ?
>
> See what address the bootmem_alloc_high returns; check if it overlaps
> with something etc.
>
> Fill the memory on the system and see if it can access all of its memory.

Martin, as you have one of the affected systems, do you feel up to this?

Dave

--
http://www.codemonkey.org.uk

2007-10-29 19:52:00

by Christoph Lameter

[permalink] [raw]

Subject: Re: [stable] 2.6.23 boot failures on x86-64.

On Mon, 29 Oct 2007, Linus Torvalds wrote:

> We've had some changes since 2.6.23, and afaik, the
> "alloc_bootmem_high_node()" code is alreadt effectively dead there. It's
> only called if CONFIG_SPARSEMEM_VMEMMAP is *not* enabled, and I *think* we
> enable it by force on x86-64 these days.

CONFIG_SPARSEMEM_VMEMMAP was introduced in 2.6.24-rc1.

If I read this Kconfig.x86_64 correctly then it seems that DISCONTIG is
still the default. Andy?

config ARCH_DISCONTIGMEM_ENABLE
bool
depends on NUMA
default y

config ARCH_DISCONTIGMEM_DEFAULT
def_bool y
depends on NUMA

config ARCH_SPARSEMEM_ENABLE
def_bool y
depends on (NUMA || EXPERIMENTAL)
select SPARSEMEM_VMEMMAP_ENABLE

config ARCH_MEMORY_PROBE
def_bool y
depends on MEMORY_HOTPLUG

config ARCH_FLATMEM_ENABLE
def_bool y
depends on !NUMA

2007-10-29 19:55:37

by Suresh Siddha

[permalink] [raw]

Subject: Re: [stable] 2.6.23 boot failures on x86-64.

2007-10-29 19:56:19

by Andi Kleen

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Monday 29 October 2007 20:43:11 Dave Jones wrote:
> On Mon, Oct 29, 2007 at 08:03:09PM +0100, Andi Kleen wrote:
>
> > > > It's probably the usual "nobody tests sparsemem at all" issue.
> > >
> > > We've been using SPARSEMEM in Fedora for a *long* time.
> > > So long in fact, I forget why we moved away from DISCONTIGMEM, so there's
> > > a significant number of users using that configuration for some time.
> >
> > Supposedly you wanted a slower kernel that needs more memory?
> >
> > Ok I wasn't aware of that. I tended to get sparsemem reports usually
> > at least 1-2 releases after the fact, so it looked like it was undertested.
>
> Looking at cvs history, I can't figure out what the reasoning was,
> but every Fedora (and RHEL5) kernel since 2006/07/05 has been that way.
>
> Curious how no-one noticed either of the side-effects you mention.

It's a few percent on a few benchmarks iirc. vmemmap (now in .24) was supposed
to address that.

-Andi

2007-10-29 20:07:35

by Dave Jones

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Mon, Oct 29, 2007 at 08:03:09PM +0100, Andi Kleen wrote:
> On Monday 29 October 2007 19:47:47 Dave Jones wrote:
> > On Mon, Oct 29, 2007 at 07:18:43PM +0100, Andi Kleen wrote:
> > > On Monday 29 October 2007 18:50:14 Dave Jones wrote:
> > > > We've had a number of people reporting that their x86-64s stopped booting
> > > > when they moved to 2.6.23. It rebooted just after discovering the AGP bridge
> > > > as a result of the IOMMU init.
> > >
> > > It's probably the usual "nobody tests sparsemem at all" issue.
> >
> > We've been using SPARSEMEM in Fedora for a *long* time.
> > So long in fact, I forget why we moved away from DISCONTIGMEM, so there's
> > a significant number of users using that configuration for some time.
>
> Supposedly you wanted a slower kernel that needs more memory?

Actually if what you say is true, the Kconfig entry for sparsemem
could use changing as it suggests the opposite...

This option provides some potential
performance benefits, along with decreased code complexity,
but it is newer, and more experimental.

I'm still unclear why exactly we enabled it. The other comment
in the Kconfig..

This will be the only option for some systems, including
memory hotplug systems. This is normal.

Sounds unlikely to be the reason, but maybe.
Maybe benchmarking at some point in history showed sparsemem
actually beat out discontigmem. I'm at a loss to explain it
thanks to a particularly unhelpful changelog entry I wrote
at the time.

Dave

--
http://www.codemonkey.org.uk

2007-10-29 20:10:01

by Christoph Lameter

[permalink] [raw]

Subject: Re: [stable] 2.6.23 boot failures on x86-64.

On Mon, 29 Oct 2007, Siddha, Suresh B wrote:

> But I can't explain how 2e1c49db4c640b35df13889b86b9d62215ade4b6 can be
> the root cause of Dave's issue in 2.6.23.

2.6.23 has no VMEMMAP support for x86_64.

2007-10-29 20:24:43

by Andy Whitcroft

[permalink] [raw]

Subject: Re: [stable] 2.6.23 boot failures on x86-64.

On Mon, Oct 29, 2007 at 11:37:40AM -0700, Linus Torvalds wrote:
>
>
> On Mon, 29 Oct 2007, Greg KH wrote:
> >
> > I'll be glad to revert it in -stable, if it's also reverted in Linus's
> > tree first :)
>
> We've had some changes since 2.6.23, and afaik, the
> "alloc_bootmem_high_node()" code is alreadt effectively dead there. It's
> only called if CONFIG_SPARSEMEM_VMEMMAP is *not* enabled, and I *think* we
> enable it by force on x86-64 these days.

CONFIG_SPARSEMEM_VMEMMAP is the default when SPARSEMEM is enabled
on x86_64. The overall default remains DISCONTIGMEM, mainly as a
safety measure while the i386/x86_64 => x86 merge stablises. But yes
this code is only used when SPARSEMEM is enabled but VMEMMAP is not.
So it is effectivly redundant.

> More people added to Cc, just to clarify whether I'm just confused.
>
> Andy, Christoph, Mel: commit 2e1c49db4c640b35df13889b86b9d62215ade4b6 aka
> "x86_64: allocate sparsemem memmap above 4G" is the one that causes the
> failures, just fyi.

That patch seems to have a laudable goal of trying to push the memory
which backs the sparsemem memmap out to non-dma memory. I would have
expected that call to actually succeed as the bootmem allocator seems to
try at the goal which would likely be outside the node on a small
machine, and then retry without a goal. Which is what the code without
the goal does. Most illogical.

> Martin - it would be great if you could try out your failing machine with
> 2.6.24-rc1 (or a nightly snapshot or current git.. the more recent the
> better).
>
> But if I'm right, that commit should be reverted from 2.6.24 just because
> it's pointless (even if the bug itself is gone). And if I'm wrong, it
> should be reverted. So something like the appended would make sense
> regardless.
>
> Can I get a "tested-by"? And/or ack/nack's on my half-arsed theory above?

This code is definatly only used when SPARSEMEM is enabled, and VMEMMAP
is not which is not a combination we see on x86_64.

Acked-by: Andy Whitcroft <[email protected]>

> Linus
> --
> From: Linus Torvalds <[email protected]>
>
> Revert "x86_64: allocate sparsemem memmap above 4G"
>
> This reverts commit 2e1c49db4c640b35df13889b86b9d62215ade4b6, since
> testing in Fedora has shown it to cause boot failures, as per Dave
> Jones. Bisected down by Martin Ebourne.
>
> Cc: Dave Jones <[email protected]>
> Cc: Martin Ebourne <[email protected]>
> Cc: Zou Nan hai <[email protected]>
> Cc: Suresh Siddha <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 1e3862e..a7308b2 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -728,12 +728,6 @@ int in_gate_area_no_task(unsigned long addr)
> return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
> }
>
> -void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
> -{
> - return __alloc_bootmem_core(pgdat->bdata, size,
> - SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
> -}
> -
> const char *arch_vma_name(struct vm_area_struct *vma)
> {
> if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
> diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
> index c83534e..0365ec9 100644
> --- a/include/linux/bootmem.h
> +++ b/include/linux/bootmem.h
> @@ -59,7 +59,6 @@ extern void *__alloc_bootmem_core(struct bootmem_data *bdata,
> unsigned long align,
> unsigned long goal,
> unsigned long limit);
> -extern void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size);
>
> #ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
> extern void reserve_bootmem(unsigned long addr, unsigned long size);
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 08fb14f..e06f514 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -220,12 +220,6 @@ static int __meminit sparse_init_one_section(struct mem_section *ms,
> return 1;
> }
>
> -__attribute__((weak)) __init
> -void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
> -{
> - return NULL;
> -}
> -
> static unsigned long usemap_size(void)
> {
> unsigned long size_bytes;
> @@ -267,11 +261,6 @@ struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
> if (map)
> return map;
>
> - map = alloc_bootmem_high_node(NODE_DATA(nid),
> - sizeof(struct page) * PAGES_PER_SECTION);
> - if (map)
> - return map;
> -
> map = alloc_bootmem_node(NODE_DATA(nid),
> sizeof(struct page) * PAGES_PER_SECTION);
> return map;

-apw

2007-10-29 21:21:33

by Martin Ebourne

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Mon, 2007-10-29 at 15:43 -0400, Dave Jones wrote:
> On Mon, Oct 29, 2007 at 08:03:09PM +0100, Andi Kleen wrote:
> > > > But if allocating bootmem >4G doesn't work on these systems
> > > > most likely they have more problems anyways. It might be better
> > > > to find out what goes wrong exactly.
> > > Any ideas on what to instrument ?
> >
> > See what address the bootmem_alloc_high returns; check if it overlaps
> > with something etc.
> >
> > Fill the memory on the system and see if it can access all of its memory.
>
> Martin, as you have one of the affected systems, do you feel up to this?

Faking a node at 0000000000000000-000000001fff0000
Bootmem setup node 0 0000000000000000-000000001fff0000
sparse_early_mem_map_alloc: returned address ffff81000070b000

My box has 512MB of RAM.

Cheers,

Martin.

2007-10-29 21:58:29

by Martin Ebourne

[permalink] [raw]

Subject: Re: [stable] 2.6.23 boot failures on x86-64.

On Mon, 2007-10-29 at 11:37 -0700, Linus Torvalds wrote:
> Martin - it would be great if you could try out your failing machine with
> 2.6.24-rc1 (or a nightly snapshot or current git.. the more recent the
> better).
>
> But if I'm right, that commit should be reverted from 2.6.24 just because
> it's pointless (even if the bug itself is gone). And if I'm wrong, it
> should be reverted. So something like the appended would make sense
> regardless.
>
> Can I get a "tested-by"? And/or ack/nack's on my half-arsed theory above?

Current git boots ok as is.

I used the config from the fedora 2.6.23 kernel and accepted defaults
for all the new options.

Cheers,

Martin.

2007-10-31 06:11:23

by Zou, Nanhai

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Tue, 2007-10-30 at 05:21, Martin Ebourne wrote:
> On Mon, 2007-10-29 at 15:43 -0400, Dave Jones wrote:
> > On Mon, Oct 29, 2007 at 08:03:09PM +0100, Andi Kleen wrote:
> > > > > But if allocating bootmem >4G doesn't work on these systems
> > > > > most likely they have more problems anyways. It might be better
> > > > > to find out what goes wrong exactly.
> > > > Any ideas on what to instrument ?
> > >
> > > See what address the bootmem_alloc_high returns; check if it overlaps
> > > with something etc.
> > >
> > > Fill the memory on the system and see if it can access all of its memory.
> >
> > Martin, as you have one of the affected systems, do you feel up to this?
>
> Faking a node at 0000000000000000-000000001fff0000
> Bootmem setup node 0 0000000000000000-000000001fff0000
> sparse_early_mem_map_alloc: returned address ffff81000070b000
>
> My box has 512MB of RAM.
>
> Cheers,
>
> Martin.

Oops, sorry,
seem to be a mistake of me.
I forget to exclude the DMA range.

Does the following patch fix the issue?

Thanks
Zou Nan hai

--- a/arch/x86/mm/init_64.c 2007-10-31 11:24:11.000000000 +0800
+++ b/arch/x86/mm/init_64.c 2007-10-31 12:31:02.000000000 +0800
@@ -731,7 +731,7 @@ int in_gate_area_no_task(unsigned long a
void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
{
return __alloc_bootmem_core(pgdat->bdata, size,
- SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
+ SMP_CACHE_BYTES, (4UL*1024*1024*1024), __pa(MAX_DMA_ADDRESS));
}

const char *arch_vma_name(struct vm_area_struct *vma)

2007-10-31 06:26:31

by Zou, Nanhai

[permalink] [raw]

Subject: Re: 2.6.23 boot failures on x86-64.

On Wed, 2007-10-31 at 14:04, Zou Nan hai wrote:
> On Tue, 2007-10-30 at 05:21, Martin Ebourne wrote:
> > On Mon, 2007-10-29 at 15:43 -0400, Dave Jones wrote:
> > > On Mon, Oct 29, 2007 at 08:03:09PM +0100, Andi Kleen wrote:
> > > > > > But if allocating bootmem >4G doesn't work on these systems
> > > > > > most likely they have more problems anyways. It might be better
> > > > > > to find out what goes wrong exactly.
> > > > > Any ideas on what to instrument ?
> > > >
> > > > See what address the bootmem_alloc_high returns; check if it overlaps
> > > > with something etc.
> > > >
> > > > Fill the memory on the system and see if it can access all of its memory.
> > >
> > > Martin, as you have one of the affected systems, do you feel up to this?
> >
> > Faking a node at 0000000000000000-000000001fff0000
> > Bootmem setup node 0 0000000000000000-000000001fff0000
> > sparse_early_mem_map_alloc: returned address ffff81000070b000
> >
> > My box has 512MB of RAM.
> >
> > Cheers,
> >
> > Martin.
>
> Oops, sorry,
> seem to be a mistake of me.
> I forget to exclude the DMA range.
>
> Does the following patch fix the issue?
>
> Thanks
> Zou Nan hai
>
> --- a/arch/x86/mm/init_64.c 2007-10-31 11:24:11.000000000 +0800
> +++ b/arch/x86/mm/init_64.c 2007-10-31 12:31:02.000000000 +0800
> @@ -731,7 +731,7 @@ int in_gate_area_no_task(unsigned long a
> void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
> {
> return __alloc_bootmem_core(pgdat->bdata, size,
> - SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
> + SMP_CACHE_BYTES, (4UL*1024*1024*1024), __pa(MAX_DMA_ADDRESS));
> }
>
> const char *arch_vma_name(struct vm_area_struct *vma)
>
>
>
>

Please ignore the patch, the patch is wrong.

However I think the root cause is when __alloc_bootmem_core fail to
allocate a memory above 4G it will fall back to allocate from the lowest
page.
Then happens to be allocated in DMA region sometimes...

Since this code path is dead, I am OK to revert the patch.

Suresh and I will check the CONFIG_SPARSE_VMEMMAP path.
Thanks
Zou Nan hai