2009-09-24 00:53:50

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] percpu: make pcpu_setup_first_chunk() failures more verbose

The parameters to pcpu_setup_first_chunk() come from different sources
depending on architecture and can be quite complex. The function runs
various sanity checks on the parameters and triggers BUG() if
something isn't right. However, this is very early during the boot
and not reporting exactly what the problem is makes debugging even
harder.

Add PCPU_SETUP_BUG() macro which prints out enough information about
the parameters. As the macro still puts separate BUG() for each
check, it won't lose any information even on the situations where only
the program counter can be retrieved.

While at it, also bump pcpu_dump_alloc_info() message to KERN_INFO so
that it's visible on the console if boot fails to complete.

Signed-off-by: Tejun Heo <[email protected]>
---
Here's more complete version to make pcpu init failure more verbose.
Tony, you can also use this version but the original one should work
fine for your specific case.

If there's no objection, I'll push to Linus with other percpu fixes in
a week or so.

Thanks.

mm/percpu.c | 37 ++++++++++++++++++++++++++-----------
1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 43d8cac..bb2ca3c 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1574,6 +1574,7 @@ static void pcpu_dump_alloc_info(const char *lvl,
int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
void *base_addr)
{
+ static char cpus_buf[4096] __initdata;
static int smap[2], dmap[2];
size_t dyn_size = ai->dyn_size;
size_t size_sum = ai->static_size + ai->reserved_size + dyn_size;
@@ -1585,17 +1586,26 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
int *unit_map;
int group, unit, i;

+ cpumask_scnprintf(cpus_buf, sizeof(cpus_buf), cpu_possible_mask);
+
+#define PCPU_SETUP_BUG_ON(cond) do { \
+ if (unlikely(cond)) { \
+ printk(KERN_CRIT "PERCPU: failed to initialize, %s", #cond); \
+ printk(KERN_CRIT "PERCPU: cpu_possible_mask=%s\n", cpus_buf); \
+ pcpu_dump_alloc_info(KERN_CRIT, ai); \
+ BUG(); \
+ } \
+} while (0)
+
/* sanity checks */
BUILD_BUG_ON(ARRAY_SIZE(smap) >= PCPU_DFL_MAP_ALLOC ||
ARRAY_SIZE(dmap) >= PCPU_DFL_MAP_ALLOC);
- BUG_ON(ai->nr_groups <= 0);
- BUG_ON(!ai->static_size);
- BUG_ON(!base_addr);
- BUG_ON(ai->unit_size < size_sum);
- BUG_ON(ai->unit_size & ~PAGE_MASK);
- BUG_ON(ai->unit_size < PCPU_MIN_UNIT_SIZE);
-
- pcpu_dump_alloc_info(KERN_DEBUG, ai);
+ PCPU_SETUP_BUG_ON(ai->nr_groups <= 0);
+ PCPU_SETUP_BUG_ON(!ai->static_size);
+ PCPU_SETUP_BUG_ON(!base_addr);
+ PCPU_SETUP_BUG_ON(ai->unit_size < size_sum);
+ PCPU_SETUP_BUG_ON(ai->unit_size & ~PAGE_MASK);
+ PCPU_SETUP_BUG_ON(ai->unit_size < PCPU_MIN_UNIT_SIZE);

/* process group information and build config tables accordingly */
group_offsets = alloc_bootmem(ai->nr_groups * sizeof(group_offsets[0]));
@@ -1618,8 +1628,9 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
if (cpu == NR_CPUS)
continue;

- BUG_ON(cpu > nr_cpu_ids || !cpu_possible(cpu));
- BUG_ON(unit_map[cpu] != NR_CPUS);
+ PCPU_SETUP_BUG_ON(cpu > nr_cpu_ids);
+ PCPU_SETUP_BUG_ON(!cpu_possible(cpu));
+ PCPU_SETUP_BUG_ON(unit_map[cpu] != NR_CPUS);

unit_map[cpu] = unit + i;
unit_off[cpu] = gi->base_offset + i * ai->unit_size;
@@ -1632,7 +1643,11 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
pcpu_nr_units = unit;

for_each_possible_cpu(cpu)
- BUG_ON(unit_map[cpu] == NR_CPUS);
+ PCPU_SETUP_BUG_ON(unit_map[cpu] == NR_CPUS);
+
+ /* we're done parsing the input, undefine BUG macro and dump config */
+#undef PCPU_SETUP_BUG_ON
+ pcpu_dump_alloc_info(KERN_INFO, ai);

pcpu_nr_groups = ai->nr_groups;
pcpu_group_offsets = group_offsets;
--
1.6.4.2


2009-09-24 12:50:20

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] percpu: make allocation failures more verbose

Warn and dump stack when percpu allocation fails. percpu allocator is
still young and unchecked NULL percpu pointer usage can result in
random memory corruption when combined with the pointer shifting in
access macros. Allocation failures should be rare and the warning
message will be disabled after certain times.

Signed-off-by: Tejun Heo <[email protected]>
---
Another patch to make things more verbose when things don't go too
well.

Thanks.

mm/percpu.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

Index: work/mm/percpu.c
===================================================================
--- work.orig/mm/percpu.c
+++ work/mm/percpu.c
@@ -1043,7 +1043,9 @@ static struct pcpu_chunk *alloc_pcpu_chu
*/
static void *pcpu_alloc(size_t size, size_t align, bool reserved)
{
+ static int warn_limit = 10;
struct pcpu_chunk *chunk;
+ const char *err;
int slot, off;

if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
@@ -1059,11 +1061,14 @@ static void *pcpu_alloc(size_t size, siz
if (reserved && pcpu_reserved_chunk) {
chunk = pcpu_reserved_chunk;
if (size > chunk->contig_hint ||
- pcpu_extend_area_map(chunk) < 0)
+ pcpu_extend_area_map(chunk) < 0) {
+ err = "failed to extend area map of reserved chunk";
goto fail_unlock;
+ }
off = pcpu_alloc_area(chunk, size, align);
if (off >= 0)
goto area_found;
+ err = "alloc from reserved chunk failed";
goto fail_unlock;
}

@@ -1080,6 +1085,7 @@ restart:
case 1:
goto restart; /* pcpu_lock dropped, restart */
default:
+ err = "failed to extend area map";
goto fail_unlock;
}

@@ -1093,8 +1099,10 @@ restart:
spin_unlock_irq(&pcpu_lock);

chunk = alloc_pcpu_chunk();
- if (!chunk)
+ if (!chunk) {
+ err = "failed to allocate new chunk";
goto fail_unlock_mutex;
+ }

spin_lock_irq(&pcpu_lock);
pcpu_chunk_relocate(chunk, -1);
@@ -1107,6 +1115,7 @@ area_found:
if (pcpu_populate_chunk(chunk, off, size)) {
spin_lock_irq(&pcpu_lock);
pcpu_free_area(chunk, off);
+ err = "failed to populate";
goto fail_unlock;
}

@@ -1119,6 +1128,13 @@ fail_unlock:
spin_unlock_irq(&pcpu_lock);
fail_unlock_mutex:
mutex_unlock(&pcpu_alloc_mutex);
+ if (warn_limit) {
+ pr_warning("PERCPU: allocation failed, size=%zu align=%zu, "
+ "%s\n", size, align, err);
+ dump_stack();
+ if (!--warn_limit)
+ pr_info("PERCPU: limit reached, disable warning\n");
+ }
return NULL;
}

2009-09-27 12:48:21

by Tony Vroon

[permalink] [raw]
Subject: 2.6.31-09194-g0d9df25 Early boot exception

On Thu, 2009-09-24 at 09:53 +0900, Tejun Heo wrote:
> Add PCPU_SETUP_BUG() macro which prints out enough information about
> the parameters. As the macro still puts separate BUG() for each
> check, it won't lose any information even on the situations where only
> the program counter can be retrieved.

On Thu, 2009-09-24 at 21:49 +0900, Tejun Heo wrote:
> Warn and dump stack when percpu allocation fails.

Thank you Tejun, both patches applied, more information is now visible:

ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x10de8201 base: 0xfed00000
SMP: allowing 12 CPUs, 0 hotplug CPUs
Allocating PCI resources starting at b00000000 (gap: b0000000:30000000)
NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:2
PERCPU: Embedded 23 pages/cpu @ffff880028200000 s73176 r0 d21032 u262144
PERCPU: failed to initialize, unit_map[cpu] == NR_CPUS
PERCPU: cpu_possible_mask=fff
pcpu-alloc: s73176 r0 d21032 u262144 alloc=1*2097152
pcpu-alloc: [0] 00 01 02 03 04 05 -- -- [1] 06 07 08 09 10 11 -- --
PANIC: early exception 06 rip 10:ffffffff81597feb error 0 cr2 d0fff6
Pid: 0, comm: swapper Not tainted 2.6.31-09194-g0d9df25-dirty #1
Call Trace:
[<ffffffff81582195>] early_idt_handler+0x55/0x68
[<ffffffff81597feb>] ? pcpu_setup_first_chunk+0x472/0x769
[<ffffffff81597feb>] ? pcpu_setup_first_chunk+0x472/0x769
[<ffffffff815987e8>] pcpu_embed_first_chunk+0x1e2/0x241
[<ffffffff8158b51d>] ? pcpu_fc_alloc+0x0/0xac
[<ffffffff8158b4fe>] ? pcpu_fc_free+0x0/0x1f
[<ffffffff8158b33f>] setup_per_cpu_areas+0x65/0x219
[<ffffffff815829df>] start_kernel+0x124/0x2c5
[<ffffffff81582272>] x86_64_start_reservations+0x82/0x86
[<ffffffff8158235a>] x86_64_start_kernel+0xe4/0xeb
RIP pcpu_setup_first_chunk+0x472/0x769

If you would like the vmlinux image again, let me know.

Regards,
Tony V.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2009-09-28 16:20:08

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
unit number while unit number can equal and go over NR_CPUS with
sparse unit map. This triggers BUG_ON() spuriously on machines which
have non-power-of-two number of cpus. Use UINT_MAX instead.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Tony Vroon <[email protected]>
---
Erghh... stupid mistake. This should fix it. Can you please verify?

Thanks.

mm/percpu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: work/mm/percpu.c
===================================================================
--- work.orig/mm/percpu.c
+++ work/mm/percpu.c
@@ -1604,7 +1604,7 @@ int __init pcpu_setup_first_chunk(const
unit_off = alloc_bootmem(nr_cpu_ids * sizeof(unit_off[0]));

for (cpu = 0; cpu < nr_cpu_ids; cpu++)
- unit_map[cpu] = NR_CPUS;
+ unit_map[cpu] = UINT_MAX;
pcpu_first_unit_cpu = NR_CPUS;

for (group = 0, unit = 0; group < ai->nr_groups; group++, unit += i) {
@@ -1619,7 +1619,7 @@ int __init pcpu_setup_first_chunk(const
continue;

BUG_ON(cpu > nr_cpu_ids || !cpu_possible(cpu));
- BUG_ON(unit_map[cpu] != NR_CPUS);
+ BUG_ON(unit_map[cpu] != UINT_MAX);

unit_map[cpu] = unit + i;
unit_off[cpu] = gi->base_offset + i * ai->unit_size;
@@ -1632,7 +1632,7 @@ int __init pcpu_setup_first_chunk(const
pcpu_nr_units = unit;

for_each_possible_cpu(cpu)
- BUG_ON(unit_map[cpu] == NR_CPUS);
+ BUG_ON(unit_map[cpu] == UINT_MAX);

pcpu_nr_groups = ai->nr_groups;
pcpu_group_offsets = group_offsets;

2009-09-28 16:30:12

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

On Tue, 29 Sep 2009, Tejun Heo wrote:

> pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
> unit number while unit number can equal and go over NR_CPUS with
> sparse unit map. This triggers BUG_ON() spuriously on machines which
> have non-power-of-two number of cpus. Use UINT_MAX instead.

Uhhh. Funky. The assumption nr_cpu_ids < NR_CPUS has been broken. Wonder
what other effects that will have. In particular since the slab and page
allocators have arrays indexed by the cpu number and those arrays are
dimensioned for NR_CPUS.

2009-09-28 16:36:01

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

Christoph Lameter wrote:
> On Tue, 29 Sep 2009, Tejun Heo wrote:
>
>> pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
>> unit number while unit number can equal and go over NR_CPUS with
>> sparse unit map. This triggers BUG_ON() spuriously on machines which
>> have non-power-of-two number of cpus. Use UINT_MAX instead.
>
> Uhhh. Funky. The assumption nr_cpu_ids < NR_CPUS has been broken. Wonder
> what other effects that will have. In particular since the slab and page
> allocators have arrays indexed by the cpu number and those arrays are
> dimensioned for NR_CPUS.

Heh.. that's not what's broken. unit# can legally go over NR_CPUS or
nr_cpu_ids as cpus can be sparsely mapped to units. The problem is
that I didn't fix unit_map allocation. Will update the patch. Just a
sec.

--
tejun

2009-09-28 16:39:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

Tejun Heo wrote:
> Christoph Lameter wrote:
>> On Tue, 29 Sep 2009, Tejun Heo wrote:
>>
>>> pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
>>> unit number while unit number can equal and go over NR_CPUS with
>>> sparse unit map. This triggers BUG_ON() spuriously on machines which
>>> have non-power-of-two number of cpus. Use UINT_MAX instead.
>> Uhhh. Funky. The assumption nr_cpu_ids < NR_CPUS has been broken. Wonder
>> what other effects that will have. In particular since the slab and page
>> allocators have arrays indexed by the cpu number and those arrays are
>> dimensioned for NR_CPUS.
>
> Heh.. that's not what's broken. unit# can legally go over NR_CPUS or
> nr_cpu_ids as cpus can be sparsely mapped to units. The problem is
> that
^
I'm delusional at the moment. :-)

Christoph, can you please elaborate why nr_cpu_ids < NR_CPUS is
broken?

Thanks.

--
tejun

2009-09-28 16:45:33

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

On Tue, 29 Sep 2009, Tejun Heo wrote:

>
> Christoph, can you please elaborate why nr_cpu_ids < NR_CPUS is
> broken?

If its just an internal number that becomes larger than NR_CPUS then
everything is fine.

But if a real cpu id (returned by smp_processor_id()) gets larger than
NR_CPUS then we get into trouble with the parts of the kernel that index
by cpu id.

2009-09-28 16:53:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

Christoph Lameter wrote:
> On Tue, 29 Sep 2009, Tejun Heo wrote:
>
>> Christoph, can you please elaborate why nr_cpu_ids < NR_CPUS is
>> broken?
>
> If its just an internal number that becomes larger than NR_CPUS then
> everything is fine.
>
> But if a real cpu id (returned by smp_processor_id()) gets larger than
> NR_CPUS then we get into trouble with the parts of the kernel that index
> by cpu id.

It's a unit number. Each cpu has exactly one unit number assigned to
it but there can be holes, so unit number can legally go over both
nr_cpu_ids and NR_CPUS, so yeap, it's just an internal number used
inside percpu allocator.

Tony, the original patch should be correct. Please verify.

Thanks.

--
tejun

2009-09-28 22:06:09

by Tony Vroon

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

On Tue, 2009-09-29 at 01:19 +0900, Tejun Heo wrote:
> pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
> unit number while unit number can equal and go over NR_CPUS with
> sparse unit map. This triggers BUG_ON() spuriously on machines which
> have non-power-of-two number of cpus. Use UINT_MAX instead.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Tony Vroon <[email protected]>
Tested-by: Tony Vroon <[email protected]>

> Erghh... stupid mistake. This should fix it. Can you please verify?

Confirmed fixed. Thank you Tejun.

Regards,
Tony V.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2009-09-29 00:15:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix unit_map[] verification in pcpu_setup_first_chunk()

Tony Vroon wrote:
> On Tue, 2009-09-29 at 01:19 +0900, Tejun Heo wrote:
>> pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
>> unit number while unit number can equal and go over NR_CPUS with
>> sparse unit map. This triggers BUG_ON() spuriously on machines which
>> have non-power-of-two number of cpus. Use UINT_MAX instead.
>>
>> Signed-off-by: Tejun Heo <[email protected]>
>> Reported-by: Tony Vroon <[email protected]>
> Tested-by: Tony Vroon <[email protected]>
>
>> Erghh... stupid mistake. This should fix it. Can you please verify?
>
> Confirmed fixed. Thank you Tejun.

Thank you. Will push out later today.

--
tejun