2012-11-06 20:49:58

by Dave Jones

[permalink] [raw]
Subject: sched_debug / traverse allocation failures.

While fuzz-testing, I frequently run into this..

trinity-child4: page allocation failure: order:4, mode:0x40d0
Pid: 21842, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
Call Trace:
[<ffffffff81179ce9>] warn_alloc_failed+0xe9/0x150
[<ffffffff816c24fc>] ? __alloc_pages_direct_compact+0x1f8/0x209
[<ffffffff8117e6f6>] __alloc_pages_nodemask+0x936/0xb60
[<ffffffff811c2020>] alloc_pages_current+0xb0/0x120
[<ffffffff81178ada>] __get_free_pages+0x2a/0x80
[<ffffffff811ce299>] kmalloc_order_trace+0x39/0x190
[<ffffffff811ce66d>] __kmalloc+0x27d/0x2b0
[<ffffffff8120cc8d>] traverse+0x1bd/0x230
[<ffffffff8120d0c0>] ? seq_lseek+0x120/0x120
[<ffffffff8120d0c0>] ? seq_lseek+0x120/0x120
[<ffffffff8120d41b>] seq_read+0x35b/0x3e0
[<ffffffff816d1969>] ? sub_preempt_count+0x79/0xd0
[<ffffffff8120d0c0>] ? seq_lseek+0x120/0x120
[<ffffffff81257890>] proc_reg_read+0x80/0xc0
[<ffffffff811e72dc>] vfs_read+0xac/0x180
[<ffffffff811e758a>] sys_pread64+0x9a/0xb0
[<ffffffff816d5f88>] tracesys+0xe1/0xe6


I added some instrumentation to traverse, and it appears that the /proc file
in question is 'sched_debug'.

Most the time this is quite small, but can grow to large lengths it seems,
which when we're under memory fragmentation results in the spew above.

>From my reading of the code, it doesn't actually use the seq_operations,
to print out things record-at-a-time, but just dumps everything
in its ->open routine.

thoughts?

Dave


2012-11-06 22:33:36

by Al Viro

[permalink] [raw]
Subject: Re: sched_debug / traverse allocation failures.

On Tue, Nov 06, 2012 at 03:49:47PM -0500, Dave Jones wrote:

> I added some instrumentation to traverse, and it appears that the /proc file
> in question is 'sched_debug'.
>
> Most the time this is quite small, but can grow to large lengths it seems,
> which when we're under memory fragmentation results in the spew above.
>
> >From my reading of the code, it doesn't actually use the seq_operations,
> to print out things record-at-a-time, but just dumps everything
> in its ->open routine.

Not ->open(), first ->read(), actually. I'd suggest turning that
sucker into a saner iterator, though - it spews a smallish header
followed by a set of per-CPU entries, so I'd probably start with
something like
static void *c_start(struct seq_file *m, loff_t *pos)
{
unsigned long n = *pos;
if (n == 0)
return (void *)1; /* header */
n--;
/* find the first online CPU >= requested position */
if (n > 0)
n = cpumask_next(n - 1, cpu_online_mask);
else
n = cpumask_first(cpu_online_mask);
*pos = n + 1;
if (n < nr_cpu_ids)
return (void *)(unsigned long)(n + 2); /* CPU #n */
return NULL; /* EOF */
}

static void *c_next(struct seq_file *m, void *v, loff_t *pos)
{
(*pos)++;
return c_start(m, pos);
}

static int c_show(struct seq_file *m, void *v)
{
if (v == (void *)1) {
print header
} else {
unsigned long cpu = (unsigned long)v - 2;
print for that cpu
}
}