LinuxLists.cc - Re: Random panic in load_balance() with 3.16-rc

2014-07-23 16:54:28

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 8:55 AM, Peter Zijlstra <[email protected]> wrote:
>>
>> I haven't seen the full oops, can you forward the screenshot? The
>> exact register state might give some clues.
>
> Sure, here goes.

So the length is fine, and the disassembly shows that it is fixed (16
32-bit words - why the heck does it use "movsl" rather than "movsq",
whatever).

The problem is %rdi, which has the value ffff10043c803e8c, which isn't
canonical. Which is why it GP-faults.

That value is loaded from the stack:

mov -0x88(%rbp),%rdi

so apparently the original "__get_cpu_var(load_balance_mask)" is
already corrupted, or something has corrupted it on the stack since
loading (but that looks unlikely).

And I wonder if I have a clue. Look, load_balance_mask is a
"cpumask_var_t", but I don't see a "alloc_cpumask_var()" for it.
That's broken with CONFIG_CPUMASK_OFFSTACK.

I think you actually want "load_balance_mask" to be a "struct cpumask *", no?

Alternatively, keep it a "cpumask_var_t", but then you need to use
__get_cpu_pointer() to get the address of it, and use
"alloc_cpumask_var()" to allocate area for the OFFSTACK case.

TOTALLY UNTESTED AND PROBABLY PURE CRAP PATCH ATTACHED.

WARNING! WARNING! WARNING! This is just looking at the code, not
really knowing it, and saying "that looks really really wrong". Maybe
I'm full of shit.

Linus

Attachments:

patch.diff (1.26 kB)

2014-07-23 17:03:34

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote:
> On Wed, Jul 23, 2014 at 8:55 AM, Peter Zijlstra <[email protected]> wrote:
> >>
> >> I haven't seen the full oops, can you forward the screenshot? The
> >> exact register state might give some clues.
> >
> > Sure, here goes.
>
> So the length is fine, and the disassembly shows that it is fixed (16
> 32-bit words - why the heck does it use "movsl" rather than "movsq",
> whatever).
>
> The problem is %rdi, which has the value ffff10043c803e8c, which isn't
> canonical. Which is why it GP-faults.
>
> That value is loaded from the stack:
>
> mov -0x88(%rbp),%rdi
>
> so apparently the original "__get_cpu_var(load_balance_mask)" is
> already corrupted, or something has corrupted it on the stack since
> loading (but that looks unlikely).
>
> And I wonder if I have a clue. Look, load_balance_mask is a
> "cpumask_var_t", but I don't see a "alloc_cpumask_var()" for it.
> That's broken with CONFIG_CPUMASK_OFFSTACK.

kernel/sched/core.c:sched_init()

plays horrible allocation tricks.. which I suppose we should clean up,
sched_init() appears to be called late enough to use regular per-cpu
allocations.

> I think you actually want "load_balance_mask" to be a "struct cpumask *", no?
>
> Alternatively, keep it a "cpumask_var_t", but then you need to use
> __get_cpu_pointer() to get the address of it, and use
> "alloc_cpumask_var()" to allocate area for the OFFSTACK case.

I'm always terminally confused on that interface.. but this code hasn't
changed in a long while and I would expect other crashes if this was
really funky like that.

2014-07-23 17:04:53

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote:

> So the length is fine, and the disassembly shows that it is fixed (16
> 32-bit words - why the heck does it use "movsl" rather than "movsq",
> whatever).

Which is exactly right btw, he's got CONFIG_NR_CPUS=512 and 8*4*16=512.

2014-07-23 17:12:37

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 10:03 AM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote:
>>
>> And I wonder if I have a clue. Look, load_balance_mask is a
>> "cpumask_var_t", but I don't see a "alloc_cpumask_var()" for it.
>> That's broken with CONFIG_CPUMASK_OFFSTACK.
>
> kernel/sched/core.c:sched_init()
>
> plays horrible allocation tricks..

No it does not. It allocates a cpumask. Nothing more. If you think it
allocates a "cpumask_var()", you are wrong.

I agree that the code is an unreadable mess, but that's what
"cpumask_size()" is: the minimum required size of the bitmask in a
cpumask.

A cpumask_var is TOTALLY DIFFERENT. It's *either* a cpumask _or_ just
a pointer to an externally allocated cpumask.

sched_init() definitely does _not_ allocate a cpumask_var.

Linus

2014-07-23 17:15:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 10:04 AM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote:
>
>> So the length is fine, and the disassembly shows that it is fixed (16
>> 32-bit words - why the heck does it use "movsl" rather than "movsq",
>> whatever).
>
> Which is exactly right btw, he's got CONFIG_NR_CPUS=512 and 8*4*16=512.

That's not my point. Why the f*ck does it use "movsl", when "movsq"
should work as well or better.

Then it should use a count of 8. Because 8*8*8 is also 512 bits.

Of course, with the enhanced string instructions, it's quite possible
that "movsb" with a count of 64 (64*8) is the best option.

Anyway, my gcc version creates a series of 8 "movq" pairs instead,
which will beat all other cases, at the cost of much bigger code
footprint.

Linus

2014-07-23 17:26:24

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 10:12 AM, Linus Torvalds
<[email protected]> wrote:
>
> sched_init() definitely does _not_ allocate a cpumask_var.

Side note: another good rule of thumb for per-cpu variables is:

- if you use __get_cpu_var() without taking the address of it, you're
doing something wrong and stupid.

The whole - and really *only* - point of __get_cpu_var is to get the
address of a a cpu variable. If you want to read the *value* of the
variable, you should use "this_cpu_read()", which can use things like
special instructions or segments to read the percpu area.

I agree that the interface is not all that great, there's historical
baggage there. We would have been better off with
"__this_cpu_ptr(var)" instead of "&__get_cpu_var(var)". But that
"__get_cpu_var()" is the old way of doing things (predating the new
and better "this_cpu_read/write/ops()" stuff), which is why we have
that odd interface with "&__get_cpu_var()".

Linus

2014-07-23 18:07:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 10:12 AM, Linus Torvalds
<[email protected]> wrote:
>
> A cpumask_var is TOTALLY DIFFERENT. It's *either* a cpumask _or_ just
> a pointer to an externally allocated cpumask.
>
> sched_init() definitely does _not_ allocate a cpumask_var.

I take that back. It does end up allocating it properly, it just
avoids all the correct abstractions.

In general, the rule of thumb should be:

- stack allocations should use "cpumask_var_t cpus" and they
absolutely *have* to be paired with an "alloc_cpumask_var(&cpus,
GFP_KERNEL)". Having a "struct cpumask" on stack is very wrong.

- random single nonstack allocations should probably just use a plain
"struct cpumask" (or cpumask_t, but we really shouldn't use typedef's
unless they actively abstract some per-config *changing* type).

- dynamic allocations that are size-conscious (because there's a lot
of them) should allocate a "struct cpumask *" by using
"cpumask_size()". They have a pointer anyway, they allocate things
dynamically anyway, extra indirection through a cpumask_var_t would
just be unnecessary.

- *static* per-cpu allocations might want to use "cpumask_var_t" (to
avoid having a full "struct cpumask_t") along with doing a
"zalloc_cpumask_var_node(..)" for each cpu.

sched_init() follows that last pattern, except it open-codes that
zalloc_cpumask_var_node() in an odd way that confused me.

So I take my patch back. It's wrong, because it only allocates that
cpumask_size() if CONFIG_CPUMASK_OFFSTACK is true.

Ugh, that code really is unreadable.

Linus

2014-07-23 18:23:14

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote:
> Alternatively, keep it a "cpumask_var_t", but then you need to use
> __get_cpu_pointer() to get the address of it, and use
> "alloc_cpumask_var()" to allocate area for the OFFSTACK case.
>
> TOTALLY UNTESTED AND PROBABLY PURE CRAP PATCH ATTACHED.
>
> WARNING! WARNING! WARNING! This is just looking at the code, not
> really knowing it, and saying "that looks really really wrong". Maybe
> I'm full of shit.

If we're doing that, then we also need to unconditionally allocate
memory for that pointer.

The below is something that seems to be consistent and uses struct
cpumask * as you suggest.

Still wondering how the heck any of that worked and didn't generate more
crashing.

---
kernel/sched/core.c | 17 +++++++----------
kernel/sched/fair.c | 4 ++--
2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7bc599dc4aa4..976d520587a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6867,7 +6867,7 @@ struct task_group root_task_group;
LIST_HEAD(task_groups);
#endif

-DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
+DECLARE_PER_CPU(struct cpumask *, load_balance_mask);

void __init sched_init(void)
{
@@ -6880,9 +6880,6 @@ void __init sched_init(void)
#ifdef CONFIG_RT_GROUP_SCHED
alloc_size += 2 * nr_cpu_ids * sizeof(void **);
#endif
-#ifdef CONFIG_CPUMASK_OFFSTACK
- alloc_size += num_possible_cpus() * cpumask_size();
-#endif
if (alloc_size) {
ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);

@@ -6902,12 +6899,12 @@ void __init sched_init(void)
ptr += nr_cpu_ids * sizeof(void **);

#endif /* CONFIG_RT_GROUP_SCHED */
-#ifdef CONFIG_CPUMASK_OFFSTACK
- for_each_possible_cpu(i) {
- per_cpu(load_balance_mask, i) = (void *)ptr;
- ptr += cpumask_size();
- }
-#endif /* CONFIG_CPUMASK_OFFSTACK */
+ }
+
+ for_each_possible_cpu(i) {
+ per_cpu(load_balance_mask, i) = kzalloc_node(cpumask_size(),
+ GFP_NOWAIT,
+ cpu_to_node(i));
}

init_rt_bandwidth(&def_rt_bandwidth,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 45943b2fa82b..e4d939dc1084 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6469,7 +6469,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
#define MAX_PINNED_INTERVAL 512

/* Working cpumask for load_balance and load_balance_newidle. */
-DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);
+DEFINE_PER_CPU(struct cpumask *, load_balance_mask);

static int need_active_balance(struct lb_env *env)
{
@@ -6538,7 +6538,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
- struct cpumask *cpus = __get_cpu_var(load_balance_mask);
+ struct cpumask *cpus = this_cpu_read(load_balance_mask);

struct lb_env env = {
.sd = sd,

2014-07-23 18:24:16

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 10:12:35AM -0700, Linus Torvalds wrote:
> On Wed, Jul 23, 2014 at 10:03 AM, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote:
> >>
> >> And I wonder if I have a clue. Look, load_balance_mask is a
> >> "cpumask_var_t", but I don't see a "alloc_cpumask_var()" for it.
> >> That's broken with CONFIG_CPUMASK_OFFSTACK.
> >
> > kernel/sched/core.c:sched_init()
> >
> > plays horrible allocation tricks..
>
> No it does not. It allocates a cpumask. Nothing more. If you think it
> allocates a "cpumask_var()", you are wrong.

I was merely saying 'something' got allocated, but yes I'll agree its
not a cpumask_var_t thingy.

> I agree that the code is an unreadable mess, but that's what
> "cpumask_size()" is: the minimum required size of the bitmask in a
> cpumask.

Latest patch cures at least this part of that mess.

2014-07-23 18:25:30

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 10:26:21AM -0700, Linus Torvalds wrote:
> On Wed, Jul 23, 2014 at 10:12 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > sched_init() definitely does _not_ allocate a cpumask_var.
>
> Side note: another good rule of thumb for per-cpu variables is:
>
> - if you use __get_cpu_var() without taking the address of it, you're
> doing something wrong and stupid.
>
> The whole - and really *only* - point of __get_cpu_var is to get the
> address of a a cpu variable. If you want to read the *value* of the
> variable, you should use "this_cpu_read()", which can use things like
> special instructions or segments to read the percpu area.

I think this code predates all the this_cpu* magic. But yes, agreed.

> I agree that the interface is not all that great, there's historical
> baggage there. We would have been better off with
> "__this_cpu_ptr(var)" instead of "&__get_cpu_var(var)". But that
> "__get_cpu_var()" is the old way of doing things (predating the new
> and better "this_cpu_read/write/ops()" stuff), which is why we have
> that odd interface with "&__get_cpu_var()".

I think there's a whole bunch of patches by Christoph Lameter, queued by
TJ that remove all __get_cpu_var usage and eventually the interface.

2014-07-23 18:25:54

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 10:15:23AM -0700, Linus Torvalds wrote:
> On Wed, Jul 23, 2014 at 10:04 AM, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote:
> >
> >> So the length is fine, and the disassembly shows that it is fixed (16
> >> 32-bit words - why the heck does it use "movsl" rather than "movsq",
> >> whatever).
> >
> > Which is exactly right btw, he's got CONFIG_NR_CPUS=512 and 8*4*16=512.
>
> That's not my point. Why the f*ck does it use "movsl", when "movsq"
> should work as well or better.

Agreed, big question there. I was merely pointing out that the value is
consistent with his .config.

2014-07-23 18:31:21

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 11:07:40AM -0700, Linus Torvalds wrote:

> - *static* per-cpu allocations might want to use "cpumask_var_t" (to
> avoid having a full "struct cpumask_t") along with doing a
> "zalloc_cpumask_var_node(..)" for each cpu.
>
> sched_init() follows that last pattern, except it open-codes that
> zalloc_cpumask_var_node() in an odd way that confused me.
>
> So I take my patch back. It's wrong, because it only allocates that
> cpumask_size() if CONFIG_CPUMASK_OFFSTACK is true.

What it doesn't do is keep the allocation on the right node, so killing
that stuff is helping more than just readability.

2014-07-23 18:35:10

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Jul 23, 2014 at 10:26:21AM -0700, Linus Torvalds wrote:
>>
>> The whole - and really *only* - point of __get_cpu_var is to get the
>> address of a a cpu variable. If you want to read the *value* of the
>> variable, you should use "this_cpu_read()", which can use things like
>> special instructions or segments to read the percpu area.
>
> I think this code predates all the this_cpu* magic. But yes, agreed.

It turns out - now that I've stared at the code for much too long for
my own sanity - that this code actually depends very subtly on
"__get_cpu_var()".

And not in good ways.

So what happens is that the games that "cpumask_var_t" plays in order
to make code work correctly with both on-stack and off-stack
configurations are really toxic to good per-cpu use.

For the off-stack case, a cpumask_var_t is a pointer to the real
allocation, and using "__get_cpu_var()" is very suboptimal, because it
gets that pointer by following the percpu offset explicitly. So we
load the percpu offset, then we load the offset to "load_balance_mask"
within that, and then we load the pointer off that. We'd be much
better off using "this_cpu_read()", which can just use the percpu area
directly to read the pointer.

HOWEVER.

For the direct case, a "cpumask_var_t" is an array, exactly so that
accessing it will just return the address of it, so that you can get
the "struct cpumask *" directly. And there the whole dance with
adding the percpu offset is actually the right thing, because what you
want is the address to the percpu area. And you cannot use
"this_cpu_read()", because that wants to read the _value_, which is
not what we want at all.

Ugh. So it's not just the initialization that is subtle, the use of
those per-cpu "cpumask_var_t" is sadly suboptimal too.

But the code does appear to be correct. It just is messy, avoids the
proper abstractions, and generates suboptimal code for the off-stack
case.

Linus

2014-07-23 18:41:14

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 11:35:06AM -0700, Linus Torvalds wrote:
> But the code does appear to be correct. It just is messy, avoids the
> proper abstractions, and generates suboptimal code for the off-stack
> case.

OK, that leaves us agreeing we want to clean that up, but still no
closer to explaining WTH happened on Michel's machine. Weird that.

2014-07-23 18:55:44

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 11:41 AM, Peter Zijlstra <[email protected]> wrote:
>
> OK, that leaves us agreeing we want to clean that up, but still no
> closer to explaining WTH happened on Michel's machine. Weird that.

So looking at that destination pointer value (ffff10043c803e8c), it
*looks* like a pointer. Almost. If it was 0xffff80.. instead of
0xffff10.. it would probably be a fine pointer.

Can you send me the config? I'm wondering what else might be close to
that load_balance_mask percpu allocation, that migth perhaps have
overflowed and written a stray byte into the pointer or something?

Linus

2014-07-23 19:02:37

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 11:55:42AM -0700, Linus Torvalds wrote:
> On Wed, Jul 23, 2014 at 11:41 AM, Peter Zijlstra <[email protected]> wrote:
> >
> > OK, that leaves us agreeing we want to clean that up, but still no
> > closer to explaining WTH happened on Michel's machine. Weird that.
>
> So looking at that destination pointer value (ffff10043c803e8c), it
> *looks* like a pointer. Almost. If it was 0xffff80.. instead of
> 0xffff10.. it would probably be a fine pointer.
>
> Can you send me the config? I'm wondering what else might be close to
> that load_balance_mask percpu allocation, that migth perhaps have
> overflowed and written a stray byte into the pointer or something?

Here goes..

Attachments:

(No filename) (705.00 B)
michel-config.txt (129.94 kB)
Download all attachments

2014-07-23 19:20:57

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 12:02 PM, Peter Zijlstra <[email protected]> wrote:
>
> Here goes..

Oh. So this doesn't have CPUMASK_OFFSTACK set at all, so the pointer
has never been loaded from memory in the first place. The calculation
has been (for me) something like

movq $load_balance_mask, %rax
add %gs:this_cpu_off, %rax

and then gcc is being stupid and saving it to the frame and reloading
it for no good reason (at least for me it *also* saved the value in
%rbx in order to save it into "env.cpus", and the stack spill seems to
be just moronic).

In Michel's oops, %rbx doesn't contain the pointer any more, though,
so he clearly does have a different compiler. His frame offsets are
rather different too ("-136(%rbp)" vs "-168(%rbp)") so looking at
whether possibly some stack frame got overwritten is clearly very
compiler-specific.

Michel, mind doing

make kernel/sched/fair.s

and sending us the resulting file?

Linus

2014-07-24 01:43:54

by Michel Dänzer

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 24.07.2014 04:20, Linus Torvalds wrote:
> On Wed, Jul 23, 2014 at 12:02 PM, Peter Zijlstra <[email protected]> wrote:
>>
>> Here goes..
>
> Oh. So this doesn't have CPUMASK_OFFSTACK set at all, so the pointer
> has never been loaded from memory in the first place. The calculation
> has been (for me) something like
>
> movq $load_balance_mask, %rax
> add %gs:this_cpu_off, %rax
>
> and then gcc is being stupid and saving it to the frame and reloading
> it for no good reason (at least for me it *also* saved the value in
> %rbx in order to save it into "env.cpus", and the stack spill seems to
> be just moronic).
>
> In Michel's oops, %rbx doesn't contain the pointer any more, though,
> so he clearly does have a different compiler. His frame offsets are
> rather different too ("-136(%rbp)" vs "-168(%rbp)") so looking at
> whether possibly some stack frame got overwritten is clearly very
> compiler-specific.
>
> Michel, mind doing
>
> make kernel/sched/fair.s
>
> and sending us the resulting file?

Here it is, gzipped, hope that's okay.

Note that my tree is now based on 3.16-rc6.

--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

Attachments:

fair.s.gz (183.30 kB)

2014-07-24 18:47:21

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 6:43 PM, Michel Dänzer <[email protected]> wrote:
>>
>> Michel, mind doing
>>
>> make kernel/sched/fair.s
>>
>> and sending us the resulting file?
>
> Here it is, gzipped, hope that's okay.
>
> Note that my tree is now based on 3.16-rc6.

Ok, so I'm looking at the code generation and your compiler is pure
and utter *shit*.

Adding Jakub to the cc, because gcc-4.9.0 seems to be terminally broken.

Lookie here, your compiler does some absolutely insane things with the
spilling, including spilling a *constant*. For chrissake, that
compiler shouldn't have been allowed to graduate from kindergarten.
We're talking "sloth that was dropped on the head as a baby" level
retardation levels here:

...
movq $load_balance_mask, -136(%rbp) #, %sfp
subq $184, %rsp #,
movq (%rdx), %rax # sd_22(D)->parent, sd_parent
movl %edi, -144(%rbp) # this_cpu, %sfp
movl %ecx, -140(%rbp) # idle, %sfp
movq %r8, -200(%rbp) # continue_balancing, %sfp
movq %rax, -184(%rbp) # sd_parent, %sfp
movq -136(%rbp), %rax # %sfp, tcp_ptr__
#APP
add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__
#NO_APP
...

Note the contents of -136(%rbp). Seriously. That's an
_immediate_constant_ that the compiler is spilling.

Somebody needs to raise that as a gcc bug. Because it damn well is
some seriously crazy shit.

However, that constant spilling part just counts as "too stupid to
live". The real bug is this:

movq $load_balance_mask, -136(%rbp) #, %sfp
subq $184, %rsp #,

where gcc creates the stack frame *after* having already used it to
save that constant *deep* below the stack frame.

The x86-64 ABI specifies a 128-byte red-zone under the stack pointer,
and this is ok by that limit. It looks like it's illegal (136 > 128),
but the fact is, we've had four "pushq"s to update %rsp since loading
the frame pointer, so it's just *barely* legal with the red-zoning.

But we build the kernel with -mno-red-zone. We do *not* follow the
x86-64 ABI wrt redzoning, because we *cannot*: interrupts while in
kernel mode *will* use the stack without a redzone. So that
"-mno-red-zone" is not some "optional guideline". It's a hard and
harsh requirement for the kernel, and gcc-4.9 is a buggy piece of shit
for ignoring it. And your bug happens becuase you happen to hit an
interrupt _just_ in that single instruction window (or perhaps hit
some other similar case and corrupted kernel data structures earlier).

Now, I suspect that this redzoning bug might actually be related to
the fact that gcc is stupid in spilling a constant. I would not be
surprised if there is some liveness analysis going on to decide *when*
to insert the stack decrement, and constants are being ignored because
clearly liveness isn't an issue for a constant value. So the two bugs
("stupid constant spilling" and "invalid use or red zone stack") go
hand in hand. But who knows.

Anyway, this is not a kernel bug. This is your compiler creating
completely broken code. We may need to add a warning to make sure
nobody compiles with gcc-4.9.0, and the Debian people should probably
downgrate their shiny new compiler.

Jakub, any ideas?

Linus

2014-07-24 18:59:43

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 11:47:17AM -0700, Linus Torvalds wrote:
> However, that constant spilling part just counts as "too stupid to
> live". The real bug is this:
>
> movq $load_balance_mask, -136(%rbp) #, %sfp
> subq $184, %rsp #,
>
> where gcc creates the stack frame *after* having already used it to
> save that constant *deep* below the stack frame.
>
> The x86-64 ABI specifies a 128-byte red-zone under the stack pointer,
> and this is ok by that limit. It looks like it's illegal (136 > 128),
> but the fact is, we've had four "pushq"s to update %rsp since loading
> the frame pointer, so it's just *barely* legal with the red-zoning.
>
> But we build the kernel with -mno-red-zone. We do *not* follow the
> x86-64 ABI wrt redzoning, because we *cannot*: interrupts while in
> kernel mode *will* use the stack without a redzone. So that
> "-mno-red-zone" is not some "optional guideline". It's a hard and
> harsh requirement for the kernel, and gcc-4.9 is a buggy piece of shit
> for ignoring it. And your bug happens becuase you happen to hit an
> interrupt _just_ in that single instruction window (or perhaps hit
> some other similar case and corrupted kernel data structures earlier).

Ooh, shiny, I so missed all that (also didn't know about red-zones
etc..).

Glad this got sorted.

2014-07-25 01:25:26

by Michel Dänzer

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

[ Adding the Debian kernel and gcc teams to Cc ]

On 25.07.2014 03:47, Linus Torvalds wrote:
> On Wed, Jul 23, 2014 at 6:43 PM, Michel Dänzer <[email protected]> wrote:
>>>
>>> Michel, mind doing
>>>
>>> make kernel/sched/fair.s
>>>
>>> and sending us the resulting file?
>>
>> Here it is, gzipped, hope that's okay.
>>
>> Note that my tree is now based on 3.16-rc6.
>
> Ok, so I'm looking at the code generation and your compiler is pure
> and utter *shit*.
>
> Adding Jakub to the cc, because gcc-4.9.0 seems to be terminally broken.
>
> Lookie here, your compiler does some absolutely insane things with the
> spilling, including spilling a *constant*. For chrissake, that
> compiler shouldn't have been allowed to graduate from kindergarten.
> We're talking "sloth that was dropped on the head as a baby" level
> retardation levels here:
>
> ...
> movq $load_balance_mask, -136(%rbp) #, %sfp
> subq $184, %rsp #,
> movq (%rdx), %rax # sd_22(D)->parent, sd_parent
> movl %edi, -144(%rbp) # this_cpu, %sfp
> movl %ecx, -140(%rbp) # idle, %sfp
> movq %r8, -200(%rbp) # continue_balancing, %sfp
> movq %rax, -184(%rbp) # sd_parent, %sfp
> movq -136(%rbp), %rax # %sfp, tcp_ptr__
> #APP
> add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__
> #NO_APP
> ...
>
> Note the contents of -136(%rbp). Seriously. That's an
> _immediate_constant_ that the compiler is spilling.
>
> Somebody needs to raise that as a gcc bug. Because it damn well is
> some seriously crazy shit.
>
> However, that constant spilling part just counts as "too stupid to
> live". The real bug is this:
>
> movq $load_balance_mask, -136(%rbp) #, %sfp
> subq $184, %rsp #,
>
> where gcc creates the stack frame *after* having already used it to
> save that constant *deep* below the stack frame.
>
> The x86-64 ABI specifies a 128-byte red-zone under the stack pointer,
> and this is ok by that limit. It looks like it's illegal (136 > 128),
> but the fact is, we've had four "pushq"s to update %rsp since loading
> the frame pointer, so it's just *barely* legal with the red-zoning.
>
> But we build the kernel with -mno-red-zone. We do *not* follow the
> x86-64 ABI wrt redzoning, because we *cannot*: interrupts while in
> kernel mode *will* use the stack without a redzone. So that
> "-mno-red-zone" is not some "optional guideline". It's a hard and
> harsh requirement for the kernel, and gcc-4.9 is a buggy piece of shit
> for ignoring it. And your bug happens becuase you happen to hit an
> interrupt _just_ in that single instruction window (or perhaps hit
> some other similar case and corrupted kernel data structures earlier).
>
> Now, I suspect that this redzoning bug might actually be related to
> the fact that gcc is stupid in spilling a constant. I would not be
> surprised if there is some liveness analysis going on to decide *when*
> to insert the stack decrement, and constants are being ignored because
> clearly liveness isn't an issue for a constant value. So the two bugs
> ("stupid constant spilling" and "invalid use or red zone stack") go
> hand in hand. But who knows.
>
> Anyway, this is not a kernel bug. This is your compiler creating
> completely broken code. We may need to add a warning to make sure
> nobody compiles with gcc-4.9.0, and the Debian people should probably
> downgrate their shiny new compiler.

Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
going to try reproducing the problem with a kernel built by that now.

--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

Attachments:

fair.s-4.8.3-5.gz (180.97 kB)

2014-07-25 02:33:13

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 6:25 PM, Michel Dänzer <[email protected]> wrote:
>
> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
> going to try reproducing the problem with a kernel built by that now.

This looks better. For roughly that same code sequence it does
(ignoring the debug line and cfi information):

subq $184, %rsp #,
movq (%r12), %rax # sd_22(D)->parent, sd_parent
movl %edi, -156(%rbp) # this_cpu, %sfp
movl %ecx, -160(%rbp) # idle, %sfp
movq %r8, -184(%rbp) # continue_balancing, %sfp
movq %rax, -176(%rbp) # sd_parent, %sfp
movq $load_balance_mask, %rax #, tcp_ptr__
#APP
add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__
#NO_APP

so it updates the stack pointer before any spills, and it also doesn't
spill that constant value.

I still have no idea why it does the 4-byte rep stosl/movsl thing, but
that's a whole separate guessing game and might have something to do
with the fact that you do CONFIG_CC_OPTIMIZE_FOR_SIZE and the 4-byte
form is one byte smaller.

I'm a big believer in not blowing up the I$ footprint, and I have to
admit to pushing that myself a few years ago, but gcc does some rather
bad things with '-Os', so it's not actually suggested for the kernel
any more. I wish there was some middle ground model that cared about
size, but not to exclusion of everything else. The string instructions
are not good for performance when it's a compile-time known small
size.

Linus

2014-07-25 02:36:57

by Nicholas Krause

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 9:25 PM, Michel Dänzer <[email protected]> wrote:
> [ Adding the Debian kernel and gcc teams to Cc ]
>
> On 25.07.2014 03:47, Linus Torvalds wrote:
>> On Wed, Jul 23, 2014 at 6:43 PM, Michel Dänzer <[email protected]> wrote:
>>>>
>>>> Michel, mind doing
>>>>
>>>> make kernel/sched/fair.s
>>>>
>>>> and sending us the resulting file?
>>>
>>> Here it is, gzipped, hope that's okay.
>>>
>>> Note that my tree is now based on 3.16-rc6.
>>
>> Ok, so I'm looking at the code generation and your compiler is pure
>> and utter *shit*.
>>
>> Adding Jakub to the cc, because gcc-4.9.0 seems to be terminally broken.
>>
>> Lookie here, your compiler does some absolutely insane things with the
>> spilling, including spilling a *constant*. For chrissake, that
>> compiler shouldn't have been allowed to graduate from kindergarten.
>> We're talking "sloth that was dropped on the head as a baby" level
>> retardation levels here:
>>
>> ...
>> movq $load_balance_mask, -136(%rbp) #, %sfp
>> subq $184, %rsp #,
>> movq (%rdx), %rax # sd_22(D)->parent, sd_parent
>> movl %edi, -144(%rbp) # this_cpu, %sfp
>> movl %ecx, -140(%rbp) # idle, %sfp
>> movq %r8, -200(%rbp) # continue_balancing, %sfp
>> movq %rax, -184(%rbp) # sd_parent, %sfp
>> movq -136(%rbp), %rax # %sfp, tcp_ptr__
>> #APP
>> add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__
>> #NO_APP
>> ...
>>
>> Note the contents of -136(%rbp). Seriously. That's an
>> _immediate_constant_ that the compiler is spilling.
>>
>> Somebody needs to raise that as a gcc bug. Because it damn well is
>> some seriously crazy shit.
>>
>> However, that constant spilling part just counts as "too stupid to
>> live". The real bug is this:
>>
>> movq $load_balance_mask, -136(%rbp) #, %sfp
>> subq $184, %rsp #,
>>
>> where gcc creates the stack frame *after* having already used it to
>> save that constant *deep* below the stack frame.
>>
>> The x86-64 ABI specifies a 128-byte red-zone under the stack pointer,
>> and this is ok by that limit. It looks like it's illegal (136 > 128),
>> but the fact is, we've had four "pushq"s to update %rsp since loading
>> the frame pointer, so it's just *barely* legal with the red-zoning.
>>
>> But we build the kernel with -mno-red-zone. We do *not* follow the
>> x86-64 ABI wrt redzoning, because we *cannot*: interrupts while in
>> kernel mode *will* use the stack without a redzone. So that
>> "-mno-red-zone" is not some "optional guideline". It's a hard and
>> harsh requirement for the kernel, and gcc-4.9 is a buggy piece of shit
>> for ignoring it. And your bug happens becuase you happen to hit an
>> interrupt _just_ in that single instruction window (or perhaps hit
>> some other similar case and corrupted kernel data structures earlier).
>>
>> Now, I suspect that this redzoning bug might actually be related to
>> the fact that gcc is stupid in spilling a constant. I would not be
>> surprised if there is some liveness analysis going on to decide *when*
>> to insert the stack decrement, and constants are being ignored because
>> clearly liveness isn't an issue for a constant value. So the two bugs
>> ("stupid constant spilling" and "invalid use or red zone stack") go
>> hand in hand. But who knows.
>>
>> Anyway, this is not a kernel bug. This is your compiler creating
>> completely broken code. We may need to add a warning to make sure
>> nobody compiles with gcc-4.9.0, and the Debian people should probably
>> downgrate their shiny new compiler.
>
> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
> going to try reproducing the problem with a kernel built by that now.
>
>
> --
> Earthling Michel Dänzer http://www.amd.com
> Libre software enthusiast Mesa and X developer

Hey guys,
I am new so please bear with me here but I can build test this on
Ubuntu 14.04 with gcc 4.8.1.
I am wondering through after only speed reading these messages , if
you can just give me
a one line summary of what I need to be looking for :).
Nick

2014-07-25 02:50:22

by Nicholas Krause

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 10:33 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Jul 24, 2014 at 6:25 PM, Michel Dänzer <[email protected]> wrote:
>>
>> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
>> going to try reproducing the problem with a kernel built by that now.
>
> This looks better. For roughly that same code sequence it does
> (ignoring the debug line and cfi information):
>
> subq $184, %rsp #,
> movq (%r12), %rax # sd_22(D)->parent, sd_parent
> movl %edi, -156(%rbp) # this_cpu, %sfp
> movl %ecx, -160(%rbp) # idle, %sfp
> movq %r8, -184(%rbp) # continue_balancing, %sfp
> movq %rax, -176(%rbp) # sd_parent, %sfp
> movq $load_balance_mask, %rax #, tcp_ptr__
> #APP
> add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__
> #NO_APP
>
> so it updates the stack pointer before any spills, and it also doesn't
> spill that constant value.
>
> I still have no idea why it does the 4-byte rep stosl/movsl thing, but
> that's a whole separate guessing game and might have something to do
> with the fact that you do CONFIG_CC_OPTIMIZE_FOR_SIZE and the 4-byte
> form is one byte smaller.
>
> I'm a big believer in not blowing up the I$ footprint, and I have to
> admit to pushing that myself a few years ago, but gcc does some rather
> bad things with '-Os', so it's not actually suggested for the kernel
> any more. I wish there was some middle ground model that cared about
> size, but not to exclusion of everything else. The string instructions
> are not good for performance when it's a compile-time known small
> size.
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Seems better after looking at it too, seems I don't need to test this and this
bug is in gcc 4.9 related versions.
Cheers Nick

2014-07-25 03:55:33

by Alexei Starovoitov

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Fri, Jul 25, 2014 at 10:25:03AM +0900, Michel D?nzer wrote:
> [ Adding the Debian kernel and gcc teams to Cc ]
>
> > movq $load_balance_mask, -136(%rbp) #, %sfp
> > subq $184, %rsp #,
> >
> > Anyway, this is not a kernel bug. This is your compiler creating
> > completely broken code. We may need to add a warning to make sure
> > nobody compiles with gcc-4.9.0, and the Debian people should probably
> > downgrate their shiny new compiler.
>
> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
> going to try reproducing the problem with a kernel built by that now.

4.8 and 4.7 don't hit the problem on this test.
4.9 with -O2 compiles this file ok. 4.9 with -Os triggers it.

-mno-red-zone only affected prologue emition in gcc. This part didn't
change between the releases. So the bug is quite deep.
What seems to be happening is that 2nd pass of instruction scheduler
(after emit prologue and reg alloc) is ignoring barrier properties
of 'subq $184, %rsp' and moving 'movq $.., -136(%rbp)' instruction
ahead of it. afaik rtl sched was never aware of 'red-zone'.
As an ex-compiler guy, I'm worried that this bug exists in earlier
releases. rtl backend guys need to take a serious look at it.
imo this is very serious bug, since broken red-zone is extremelly
hard to debug.
There are two weak test in gcc testsuite related to -mno-red-zone,
but not a single test that actually check that it is doing
the right thing. It is scary. I hope I'm wrong with this analysis.

2014-07-25 04:00:41

by Nicholas Krause

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 11:55 PM, Alexei Starovoitov <[email protected]> wrote:
> On Fri, Jul 25, 2014 at 10:25:03AM +0900, Michel Dänzer wrote:
>> [ Adding the Debian kernel and gcc teams to Cc ]
>>
>> > movq $load_balance_mask, -136(%rbp) #, %sfp
>> > subq $184, %rsp #,
>> >
>> > Anyway, this is not a kernel bug. This is your compiler creating
>> > completely broken code. We may need to add a warning to make sure
>> > nobody compiles with gcc-4.9.0, and the Debian people should probably
>> > downgrate their shiny new compiler.
>>
>> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
>> going to try reproducing the problem with a kernel built by that now.
>
> 4.8 and 4.7 don't hit the problem on this test.
> 4.9 with -O2 compiles this file ok. 4.9 with -Os triggers it.
>
> -mno-red-zone only affected prologue emition in gcc. This part didn't
> change between the releases. So the bug is quite deep.
> What seems to be happening is that 2nd pass of instruction scheduler
> (after emit prologue and reg alloc) is ignoring barrier properties
> of 'subq $184, %rsp' and moving 'movq $.., -136(%rbp)' instruction
> ahead of it. afaik rtl sched was never aware of 'red-zone'.
> As an ex-compiler guy, I'm worried that this bug exists in earlier
> releases. rtl backend guys need to take a serious look at it.
> imo this is very serious bug, since broken red-zone is extremelly
> hard to debug.
> There are two weak test in gcc testsuite related to -mno-red-zone,
> but not a single test that actually check that it is doing
> the right thing. It is scary. I hope I'm wrong with this analysis.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Alexi,
Thanks for replying and sending this email to the Debian developers,
seems to me a very serious issue with gcc. I don't known much about
compilers but I trust your experience here.
Nick

2014-07-25 06:49:22

by Jakub Jelinek

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 11:47:17AM -0700, Linus Torvalds wrote:
> Adding Jakub to the cc, because gcc-4.9.0 seems to be terminally broken.
...
> Jakub, any ideas?

Can I ask anyone involved in this for preprocessed source and all gcc command
line options to reproduce it, best in the form of a http://gcc.gnu.org/bugzilla/
bugreport?

Thanks.

Jakub

2014-07-25 08:15:38

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek <[email protected]> wrote:
>
> Can I ask anyone involved in this for preprocessed source and all gcc command
> line options to reproduce it, best in the form of a http://gcc.gnu.org/bugzilla/
> bugreport?

I've created bug 61904 for this:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61904

Note that I don't personally have a reproducer (my machine has
gcc-4.8.3, and I don't see the same behavior), but I included the
incorrect fair.s file that Michel sent me (which has all the command
line options in it), and a pre-processed "fair.i" source file that I
generated and that *should* match the configuration that was the
source for that result. So there might be some version/configuration
skew there between the two files, but I think they match.

Holler if you cannot reproduce the problem based on that.

Linus

2014-07-25 09:04:03

by Michel Dänzer

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 25.07.2014 17:15, Linus Torvalds wrote:
> On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek <[email protected]> wrote:
>>
>> Can I ask anyone involved in this for preprocessed source and all gcc command
>> line options to reproduce it, best in the form of a http://gcc.gnu.org/bugzilla/
>> bugreport?
>
> I've created bug 61904 for this:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61904
>
> Note that I don't personally have a reproducer (my machine has
> gcc-4.8.3, and I don't see the same behavior), but I included the
> incorrect fair.s file that Michel sent me (which has all the command
> line options in it), and a pre-processed "fair.i" source file that I
> generated and that *should* match the configuration that was the
> source for that result. So there might be some version/configuration
> skew there between the two files, but I think they match.

Thank you Linus for tracking down the problem and filing the bug report!
I would have done the latter as well, but I could only have channeled
your analysis anyway. :)

> Holler if you cannot reproduce the problem based on that.

Likewise. I can create a GCC bugzilla account and add myself to the CC
list if that helps.

FWIW though, the fair.i file you attached is basically identical to what
I'm getting, the only difference being a handful of file path strings.

--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2014-07-25 09:28:13

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 2014.07.25 at 18:03 +0900, Michel D?nzer wrote:
> On 25.07.2014 17:15, Linus Torvalds wrote:
> > On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek <[email protected]> wrote:
> >>
> >> Can I ask anyone involved in this for preprocessed source and all gcc command
> >> line options to reproduce it, best in the form of a http://gcc.gnu.org/bugzilla/
> >> bugreport?
> >
> > I've created bug 61904 for this:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61904
> >
> > Note that I don't personally have a reproducer (my machine has
> > gcc-4.8.3, and I don't see the same behavior), but I included the
> > incorrect fair.s file that Michel sent me (which has all the command
> > line options in it), and a pre-processed "fair.i" source file that I
> > generated and that *should* match the configuration that was the
> > source for that result. So there might be some version/configuration
> > skew there between the two files, but I think they match.
>
> Thank you Linus for tracking down the problem and filing the bug report!
> I would have done the latter as well, but I could only have channeled
> your analysis anyway. :)
>
>
> > Holler if you cannot reproduce the problem based on that.
>
> Likewise. I can create a GCC bugzilla account and add myself to the CC
> list if that helps.
>
> FWIW though, the fair.i file you attached is basically identical to what
> I'm getting, the only difference being a handful of file path strings.

Unfortunately I cannot reproduce the issue. Please post your complete
compiler invocation while also adding -v to the options.

--
Markus

2014-07-25 09:43:07

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 2014.07.25 at 11:21 +0200, Markus Trippelsdorf wrote:
> On 2014.07.25 at 18:03 +0900, Michel D?nzer wrote:
> > On 25.07.2014 17:15, Linus Torvalds wrote:
> > > On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek <[email protected]> wrote:
> > >>
> > >> Can I ask anyone involved in this for preprocessed source and all gcc command
> > >> line options to reproduce it, best in the form of a http://gcc.gnu.org/bugzilla/
> > >> bugreport?
> > >
> > > I've created bug 61904 for this:
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61904
> > >
> > > Note that I don't personally have a reproducer (my machine has
> > > gcc-4.8.3, and I don't see the same behavior), but I included the
> > > incorrect fair.s file that Michel sent me (which has all the command
> > > line options in it), and a pre-processed "fair.i" source file that I
> > > generated and that *should* match the configuration that was the
> > > source for that result. So there might be some version/configuration
> > > skew there between the two files, but I think they match.
> >
> > Thank you Linus for tracking down the problem and filing the bug report!
> > I would have done the latter as well, but I could only have channeled
> > your analysis anyway. :)
> >
> >
> > > Holler if you cannot reproduce the problem based on that.
> >
> > Likewise. I can create a GCC bugzilla account and add myself to the CC
> > list if that helps.
> >
> > FWIW though, the fair.i file you attached is basically identical to what
> > I'm getting, the only difference being a handful of file path strings.
>
> Unfortunately I cannot reproduce the issue. Please post your complete
> compiler invocation while also adding -v to the options.

The bug only happens for CONFIG_DEBUG_INFO !CONFIG_DEBUG_INFO_REDUCED
configurations.

--
Markus

2014-07-25 14:03:42

by Steven Rostedt

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Thu, Jul 24, 2014 at 08:55:28PM -0700, Alexei Starovoitov wrote:
>
> -mno-red-zone only affected prologue emition in gcc. This part didn't
> change between the releases. So the bug is quite deep.
> What seems to be happening is that 2nd pass of instruction scheduler
> (after emit prologue and reg alloc) is ignoring barrier properties
> of 'subq $184, %rsp' and moving 'movq $.., -136(%rbp)' instruction
> ahead of it. afaik rtl sched was never aware of 'red-zone'.
> As an ex-compiler guy, I'm worried that this bug exists in earlier
> releases. rtl backend guys need to take a serious look at it.
> imo this is very serious bug, since broken red-zone is extremelly
> hard to debug.

But wouldn't it be rather trivial to run a static analyzer on the final
vmlinux to make sure there are no red zones? I mean, you would only need
to read each function and check to make sure that the offset of rbp is
within the change of rsp, wouldn't you?

Almost seems like an objdump -rd into a perl script could do this.

-- Steve

> There are two weak test in gcc testsuite related to -mno-red-zone,
> but not a single test that actually check that it is doing
> the right thing. It is scary. I hope I'm wrong with this analysis.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-07-25 18:29:10

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Fri, Jul 25, 2014 at 7:02 AM, Steven Rostedt <[email protected]> wrote:
>
> But wouldn't it be rather trivial to run a static analyzer on the final
> vmlinux to make sure there are no red zones? I mean, you would only need
> to read each function and check to make sure that the offset of rbp is
> within the change of rsp, wouldn't you?
>
> Almost seems like an objdump -rd into a perl script could do this.

I'm sure it's possible, but it sounds potentially complicated. It's
not like the function prologue is fixed, and gcc will create code
(including conditional branches etc) before the whole frame setup if
there are simple things that can be done purely with the
callee-clobbered registers etc.

Some simple pattern to make sure that the "sub $frame-size,%rsp" comes
before any accesses to (%rbp) (when frame pointers are enabled)
*might* work, but it might also end up missing things.

You want to try?

Linus

2014-07-25 19:10:23

by Steven Rostedt

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Fri, 25 Jul 2014 11:29:06 -0700
Linus Torvalds <[email protected]> wrote:

> On Fri, Jul 25, 2014 at 7:02 AM, Steven Rostedt <[email protected]> wrote:
> >
> > But wouldn't it be rather trivial to run a static analyzer on the final
> > vmlinux to make sure there are no red zones? I mean, you would only need
> > to read each function and check to make sure that the offset of rbp is
> > within the change of rsp, wouldn't you?
> >
> > Almost seems like an objdump -rd into a perl script could do this.
>
> I'm sure it's possible, but it sounds potentially complicated. It's
> not like the function prologue is fixed, and gcc will create code
> (including conditional branches etc) before the whole frame setup if
> there are simple things that can be done purely with the
> callee-clobbered registers etc.
>
> Some simple pattern to make sure that the "sub $frame-size,%rsp" comes
> before any accesses to (%rbp) (when frame pointers are enabled)
> *might* work, but it might also end up missing things.
>
> You want to try?
>

Yeah, I could write something up. I probably wont get to it for a week
or two, but it shouldn't be too hard.

As you said, it will probably miss the complex cases where gcc finishes
the frame later in the function or with branches and such. But at least
it should be able to catch any totally retard set up. I compiled
Michel's file and I'll make sure that it at least catches that:

3572: 48 c7 85 78 ff ff ff movq $0x0,-0x88(%rbp)
3579: 00 00 00 00
3579: R_X86_64_32S load_balance_mask
357d: 48 81 ec b8 00 00 00 sub $0xb8,%rsp

-- Steve

2014-07-25 20:01:13

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Fri, Jul 25, 2014 at 11:29 AM, Linus Torvalds
<[email protected]> wrote:
>
> Some simple pattern to make sure that the "sub $frame-size,%rsp" comes
> before any accesses to (%rbp) (when frame pointers are enabled)
> *might* work, but it might also end up missing things.

You're going to have a hard time doing that pattern. Just for fun, I
did something really quick in awk:

/>:/ { state = 0 }
/%rsp,%rbp/ { state = 1 }
/\$.*rsp/ { state = 2 }
/lea/ { next }
/\(%rbp\)/ { if (state == 1) print "Error: " $0; state = 2; }

which is incomprehensible line noise, but it's a trivial state machine
where "beginning of function" starts state 0, "mov %rsp,%rbp" starts
state 1 ("have frame pointer in function"), sub/add constant of %rsp
starts state 2 ("created frame"), and then we ignore "lea" (because we
don't follow address calculations off %rbp) and error out if we see an
access through %rbp in a function with a frame pointer but without a
frame created.

That thing is excessively stupid, in other words, but hey, it's good
to see "ok, what does that tell us".

And what it tells me is that gcc does some crazy things.

For example, gcc will not create a small stack frame with "sub
$8,%rsp". No, what gcc does is to use a random "push" instruction.
Fair enough, but that really makes things much harder to see. Here's
an example:

ffffffff813143a3 <dock_notify>:
ffffffff813143a3: 55 push %rbp
ffffffff813143a4: 48 89 e5 mov %rsp,%rbp
ffffffff813143a7: 41 57 push %r15
ffffffff813143a9: 41 56 push %r14
ffffffff813143ab: 49 89 fe mov %rdi,%r14
ffffffff813143ae: 41 55 push %r13
ffffffff813143b0: 41 89 f5 mov %esi,%r13d
ffffffff813143b3: 41 54 push %r12
ffffffff813143b5: 53 push %rbx
ffffffff813143b6: 51 push %rcx
...
ffffffff81314501: 48 8b 7e 08 mov 0x8(%rsi),%rdi
ffffffff81314505: 48 89 75 d0 mov %rsi,-0x30(%rbp)
ffffffff81314509: e8 5f d1 ff ff callq
ffffffff8131166d <acpi_bus_scan>
ffffffff8131450e: 85 c0 test %eax,%eax
...
ffffffff813145d6: 5a pop %rdx
ffffffff813145d7: 5b pop %rbx
ffffffff813145d8: 44 89 e0 mov %r12d,%eax
ffffffff813145db: 41 5c pop %r12
ffffffff813145dd: 41 5d pop %r13
ffffffff813145df: 41 5e pop %r14
ffffffff813145e1: 41 5f pop %r15
ffffffff813145e3: 5d pop %rbp
ffffffff813145e4: c3 retq

note the use (deep down in the function) of -0x30(%rbp), and note how
it does "pop %rdx" twice to undo the "push %rcx". It was just to
allocate space.

So you definitely have to track the actual stack pointer updates, not
just the patterns of add/sub to %rsp.

Linus

2014-07-25 20:13:25

by Steven Rostedt

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Fri, 25 Jul 2014 13:01:11 -0700
Linus Torvalds <[email protected]> wrote:

> For example, gcc will not create a small stack frame with "sub
> $8,%rsp". No, what gcc does is to use a random "push" instruction.
> Fair enough, but that really makes things much harder to see. Here's
> an example:
>
> ffffffff813143a3 <dock_notify>:
> ffffffff813143a3: 55 push %rbp
> ffffffff813143a4: 48 89 e5 mov %rsp,%rbp
> ffffffff813143a7: 41 57 push %r15
> ffffffff813143a9: 41 56 push %r14
> ffffffff813143ab: 49 89 fe mov %rdi,%r14
> ffffffff813143ae: 41 55 push %r13
> ffffffff813143b0: 41 89 f5 mov %esi,%r13d
> ffffffff813143b3: 41 54 push %r12
> ffffffff813143b5: 53 push %rbx
> ffffffff813143b6: 51 push %rcx
> ...
> ffffffff81314501: 48 8b 7e 08 mov 0x8(%rsi),%rdi
> ffffffff81314505: 48 89 75 d0 mov %rsi,-0x30(%rbp)
> ffffffff81314509: e8 5f d1 ff ff callq
> ffffffff8131166d <acpi_bus_scan>
> ffffffff8131450e: 85 c0 test %eax,%eax
> ...
> ffffffff813145d6: 5a pop %rdx
> ffffffff813145d7: 5b pop %rbx
> ffffffff813145d8: 44 89 e0 mov %r12d,%eax
> ffffffff813145db: 41 5c pop %r12
> ffffffff813145dd: 41 5d pop %r13
> ffffffff813145df: 41 5e pop %r14
> ffffffff813145e1: 41 5f pop %r15
> ffffffff813145e3: 5d pop %rbp
> ffffffff813145e4: c3 retq
>
> note the use (deep down in the function) of -0x30(%rbp), and note how
> it does "pop %rdx" twice to undo the "push %rcx". It was just to
> allocate space.

I don't see a pop %rdx twice. Sure you're not suffering from a little
dyslexia? ;-) But I do get your point. The rdx is popped where the rcx
was, and both are useless, as rcx and rdx are volatile regs.

>
> So you definitely have to track the actual stack pointer updates, not
> just the patterns of add/sub to %rsp.

With Perl that would be rather trivial. I'm more concerned with branch
logic. I'll see if I can include some simple branch logic too to
flatten paths. But I wont really know the depth of this until I start
hacking at it.

-- Steve

2014-07-25 21:27:17

by Jakub Jelinek

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Fri, Jul 25, 2014 at 01:01:11PM -0700, Linus Torvalds wrote:
> For example, gcc will not create a small stack frame with "sub
> $8,%rsp". No, what gcc does is to use a random "push" instruction.
> Fair enough, but that really makes things much harder to see. Here's
> an example:

That is because for -Os, push is certainly shorter than sub $8,%rsp.

If you want to test for this gcc bug in the kernel, supposedly one should
just take the short testcase from the GCC PR, try to compile it and see if
you e.g. get a -fcompare-debug -Os failure with the testcase.
In that case, you could instead of giving up completely just
-fno-var-tracking-assignments.

Jakub

2014-07-26 18:28:13

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Fri, Jul 25, 2014 at 11:29 AM, Linus Torvalds
<[email protected]> wrote:
>
> I'm sure it's possible, but it sounds potentially complicated.

Hmm. The bugzilla entry just taught me a new gcc flag:
"-fcompare-debug". That apparently makes gcc compile things twice,
once with debugging and once without, and verify that the result is
the same.

And you can enable it for the whole kernel build with just a simple

export GCC_COMPARE_DEBUG=1

before doing the build.

It actually results in a failure for me even on my standard small
localized kernel build, even with gcc-4.8.3. I get a compare failure
for (at least) fs/ext4/inode.c.

That's a bit worrisome. I haven't actually checked if the code
generation differs in significant ways yet..

Linus

2014-07-26 18:39:17

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Sat, Jul 26, 2014 at 11:28 AM, Linus Torvalds
<[email protected]> wrote:
>
> That's a bit worrisome. I haven't actually checked if the code
> generation differs in significant ways yet..

Nope. Just three instructions that got re-ordered from ABC to CAB in a
way that makes no difference. But just the knowledge that "-g" affects
code generation is nasty. And with "allmodconfig" my build fails
almost immediately (failures on at least arch/x86/kernel/process_64.c,
kernel/exit.c and mm/vmalloc.c in that case. I was too lazy to check
what the differences were).

Does anybody have current gcc build and can verify that current gcc
tip passes the kernel compile with that

export GCC_COMPARE_DEBUG=1

thing?

Linus

2014-07-26 18:43:34

by Steven Chamberlain

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

Hi Michel,

On 25/07/14 02:25, Michel Dänzer wrote:
> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
> going to try reproducing the problem with a kernel built by that now.

It looks like gcc-4.9 Debian package version 4.9.1-2 available in
sid/jessie may have already fixed this (though Debian's buildd systems
may not have been updated with it yet). If you are able to test with
that version and let us know, that would be wonderful.

Thank you,
Regards,
--
Steven Chamberlain
[email protected]

2014-07-26 19:36:09

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 2014.07.26 at 11:39 -0700, Linus Torvalds wrote:
> On Sat, Jul 26, 2014 at 11:28 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > That's a bit worrisome. I haven't actually checked if the code
> > generation differs in significant ways yet..
>
> Nope. Just three instructions that got re-ordered from ABC to CAB in a
> way that makes no difference. But just the knowledge that "-g" affects
> code generation is nasty. And with "allmodconfig" my build fails
> almost immediately (failures on at least arch/x86/kernel/process_64.c,
> kernel/exit.c and mm/vmalloc.c in that case. I was too lazy to check
> what the differences were).
>
> Does anybody have current gcc build and can verify that current gcc
> tip passes the kernel compile with that
>
> export GCC_COMPARE_DEBUG=1
>
> thing?

The fs/ext4/inode.c issue is a variant of the gcc bug and isn't fixed
yet. I will open a bug report. The kernel/exit.c issue is already fixed,
see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801 for a reduced
testcase.

But fortunately the workaround for the new inode.c bug is the same as
for the original bug: -fno-var-tracking-assignments.

It would make sense to enabled it unconditionally for all debug
configurations for now.

--
Markus

2014-07-26 19:55:22

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote:
>
> But fortunately the workaround for the new inode.c bug is the same as
> for the original bug: -fno-var-tracking-assignments.
>
> It would make sense to enabled it unconditionally for all debug
> configurations for now.

What's the downside of enabling this unconditionally on a compiler
with the bug fixed? I assume a certain amount of optimization will
lost, but is it significant/measurable?

- Ted

2014-07-26 19:56:24

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Sat, Jul 26, 2014 at 12:35 PM, Markus Trippelsdorf
<[email protected]> wrote:
>
> But fortunately the workaround for the new inode.c bug is the same as
> for the original bug: -fno-var-tracking-assignments.
>
> It would make sense to enabled it unconditionally for all debug
> configurations for now.

So how is code generation affected - if at all? Does the whole
"var-tracking-assignments" thing *only* matter for debug information?

Also, when was it introduced as an option? Can we just unconditionally
enable it, or do we need to be careful about gcc versions?

I'd *like* a debug kernel to not differ significantly from a non-debug
kernel. Most sane kernel developers (where "sane" is "me" by
definition) do not tend to use debug kernels, because the debug
overhead is absolutely disgustingly enormous at build time. But if we
then have most users using distro kernels that had debug info enabled,
it would be sad if code generation differences are huge.

So I'd prefer to just unconditionally add that
"-fno-var-tracking-assignments" to the build.

I just tested it on one file (fs/dcache.c) and it made no difference
at all for my non-debug build. Is this expected?

Because if the only effect of "-fno-var-tracking-assignments" is
potentially slightly worse debug information for variables, I'll
enable it in a jiffy if it fixes this code generation bug. But I'd
like to get that confirmed.

Finally, for CONFIG_DEBUG_INFO_REDUCED, we already use
"-fno-var-tracking". Is that a stronger version that also disables
"var-tracking-assignments"?

Also, Michel - can you try this patch if you still have your
gcc-4.9.0 install, and send me the resulting fair.s file again?

Linus

Attachments:

patch.diff (377.00 B)

2014-07-26 20:03:43

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Sat, Jul 26, 2014 at 12:56 PM, Linus Torvalds
<[email protected]> wrote:
>
> Also, Michel - can you try this patch if you still have your
> gcc-4.9.0 install, and send me the resulting fair.s file again?

Hmm. The good news is that with that patch, the GCC_COMPARE_DEBUG
build succeeds. At least for my small local config.

So I think that's the answer for now, although I'll wait to hear about
any possible non-debug downsides..

Linus

2014-07-26 20:19:21

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 2014.07.26 at 12:56 -0700, Linus Torvalds wrote:
> On Sat, Jul 26, 2014 at 12:35 PM, Markus Trippelsdorf
> <[email protected]> wrote:
> >
> > But fortunately the workaround for the new inode.c bug is the same as
> > for the original bug: -fno-var-tracking-assignments.
> >
> > It would make sense to enabled it unconditionally for all debug
> > configurations for now.
>
> So how is code generation affected - if at all? Does the whole
> "var-tracking-assignments" thing *only* matter for debug information?

Yes. It should only affect the quality of the debugging information.
If code generation is affected it is a compiler bug.

> Also, when was it introduced as an option? Can we just unconditionally
> enable it, or do we need to be careful about gcc versions?

Git blame says it was introduced: Wed Sep 2 02:42:21 2009.

> I'd *like* a debug kernel to not differ significantly from a non-debug
> kernel. Most sane kernel developers (where "sane" is "me" by
> definition) do not tend to use debug kernels, because the debug
> overhead is absolutely disgustingly enormous at build time. But if we
> then have most users using distro kernels that had debug info enabled,
> it would be sad if code generation differences are huge.
>
> So I'd prefer to just unconditionally add that
> "-fno-var-tracking-assignments" to the build.
>
> I just tested it on one file (fs/dcache.c) and it made no difference
> at all for my non-debug build. Is this expected?

Yes. The option only affects -g builds.

> Because if the only effect of "-fno-var-tracking-assignments" is
> potentially slightly worse debug information for variables, I'll
> enable it in a jiffy if it fixes this code generation bug. But I'd
> like to get that confirmed.
>
> Finally, for CONFIG_DEBUG_INFO_REDUCED, we already use
> "-fno-var-tracking". Is that a stronger version that also disables
> "var-tracking-assignments"?

Yes.

So, the option should only be enabled for debugging builds. Something
like the following should be sufficient:

diff --git a/Makefile b/Makefile
index 6b2774145d66..037b78d0f407 100644
--- a/Makefile
+++ b/Makefile
@@ -689,7 +689,7 @@ endif
endif

ifdef CONFIG_DEBUG_INFO
-KBUILD_CFLAGS += -g
+KBUILD_CFLAGS += -g -fno-var-tracking-assignments
KBUILD_AFLAGS += -Wa,-gdwarf-2
endif

--
Markus

2014-07-26 20:20:58

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 2014.07.26 at 15:55 -0400, Theodore Ts'o wrote:
> On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote:
> >
> > But fortunately the workaround for the new inode.c bug is the same as
> > for the original bug: -fno-var-tracking-assignments.
> >
> > It would make sense to enabled it unconditionally for all debug
> > configurations for now.
>
> What's the downside of enabling this unconditionally on a compiler
> with the bug fixed? I assume a certain amount of optimization will
> lost, but is it significant/measurable?

Only the quality of the debug info would suffer a bit.

--
Markus

2014-07-26 20:39:56

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Sat, Jul 26, 2014 at 1:19 PM, Markus Trippelsdorf
<[email protected]> wrote:
>
> Yes. The option only affects -g builds.

Ok, good. I'll wait a bit to hopefully get confirmation from Michel's
setup, but this does seem to be the solution.

> So, the option should only be enabled for debugging builds. Something
> like the following should be sufficient:

Actually, I prefer my patch that did it with cc-option checking, and
does it unconditionally.

Because if we do it even for non-debug builds - where it ostensibly
shouldn't matter - we then have that GCC_COMPARE_DEBUG thing working
regardless of configuration.

Linus

2014-07-26 22:09:51

by Jakub Jelinek

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Sat, Jul 26, 2014 at 10:20:55PM +0200, Markus Trippelsdorf wrote:
> On 2014.07.26 at 15:55 -0400, Theodore Ts'o wrote:
> > On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote:
> > >
> > > But fortunately the workaround for the new inode.c bug is the same as
> > > for the original bug: -fno-var-tracking-assignments.
> > >
> > > It would make sense to enabled it unconditionally for all debug
> > > configurations for now.
> >
> > What's the downside of enabling this unconditionally on a compiler
> > with the bug fixed? I assume a certain amount of optimization will
> > lost, but is it significant/measurable?
>
> Only the quality of the debug info would suffer a bit.

Which for various tools that use kernel's debug info is a significant
difference.
So adding the option even for fixed gcc is undesirable (and, tracking
gcc version numbers only is not enough, I guess most of the distro gccs
will backport the fix soon).

This PR is the first -fcompare-debug wrong-code in the last few years
I remember. There are -fcompare-debug failures from time to time, but
usually they are just that either there is insignificant code change or
no change at all, just changes in the text dump files -fcompare-debug
uses to check whether there might be code differences or not.
GCC's stated goal is that -g should not affect code generation, so we
treat all such differences as bugs, but most of the time they aren't
breaking anything.

Jakub

2014-07-28 03:47:26

by Michel Dänzer

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 27.07.2014 04:56, Linus Torvalds wrote:
>
> Also, Michel - can you try this patch if you still have your
> gcc-4.9.0 install, and send me the resulting fair.s file again?

Attached.

I also verified that this patch fixes the compilation of fair.c with
GCC_COMPARE_DEBUG=1.

--
Earthling Michel D?nzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

Attachments:

fair.s-4.9.1-1-patch.gz (166.82 kB)

2014-07-28 12:28:36

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

torvalds wrote:

> [...]
> Actually, I prefer my patch that did it with cc-option checking, and
> does it unconditionally.
>
> Because if we do it even for non-debug builds - where it ostensibly
> shouldn't matter - we then have that GCC_COMPARE_DEBUG thing working
> regardless of configuration.

Please note that the data produced by "-g -fvar-tracking" is consumed
by tools like systemtap, perf, crash, and makes a significant
difference to the observability of debug AND non-debug kernels. (The
presence of compiled-in DEBUG_* self-checking code is orthogonal to
kernel observability via debuginfo.) Please consider only disabling
var-tracking optionally/temporarily to work around this already-fixed
compiler bug, but not losing high-quality dwarf data permanently.

- FChE

2014-07-28 13:10:22

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Mon, Jul 28, 2014 at 08:26:59AM -0400, Frank Ch. Eigler wrote:
> Please note that the data produced by "-g -fvar-tracking" is consumed
> by tools like systemtap, perf, crash, and makes a significant
> difference to the observability of debug AND non-debug kernels. (The
> presence of compiled-in DEBUG_* self-checking code is orthogonal to
> kernel observability via debuginfo.) Please consider only disabling
> var-tracking optionally/temporarily to work around this already-fixed
> compiler bug, but not losing high-quality dwarf data permanently.

I thought Markus told us that -fno-var-tracking-assignments makes
absolutely no difference for non-debug kernels?

For cases where it's really critical that userspace know whether a
particular kernel bug or feature is present, one of the tricks I use
is the presence or absense of a file in /sys/fs/ext4/features. That
way, userspace can reliably detect if feature or bug fix is present,
without relying solely on a version check which doesn't take into
account enterprise distro backports.

Is there some equivalent signalling system that gcc could use, so that
the Makefile can test whether or not -fvar-tracking is needed to avoid
creating an unstable kernel, and which takes into account that (a)
some users will try building bleeding edge kernels on RHEL systems
that might not have the bug fix backported, and it would be nice if
they don't get a buggy compiled kernel, and (b) once the bug fix is
backported, companies like Red Hat would prefer that the workaround
gets disabled to avoid the side effects for things like Systemtap?

Hopefully such a feature will only be needed Very, VERY, *VERY*
rarely, but when the need arises, it's nice to have.

Regards,

- Ted

2014-07-28 14:11:59

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

Hi -

On Mon, Jul 28, 2014 at 09:10:04AM -0400, Theodore Ts'o wrote:
> [...]
> I thought Markus told us that -fno-var-tracking-assignments makes
> absolutely no difference for non-debug kernels?

It does affect CONFIG_DEBUG_INFO kernels, and that config option is
set for all Red Hat kernels (-debug or plain).

> [...] Is there some equivalent signalling system that gcc could use
> [...]

I'm not aware of anything trivial like a gcc --report-fixed-PRs kind
of thing. But, kbuild could conceivably have a run-time test
involving test-running gcc with in that compare-debug mode with a
suitable test case. We use the latter technique in systemtap for
auto-configuring to kernel versions/features; we got the $(CHECK_BUILD)
trick from vmware module makefiles. It could be recast as a variant
of $(cc-option ...).

- FChE

2014-07-28 16:45:49

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Mon, Jul 28, 2014 at 5:26 AM, Frank Ch. Eigler <[email protected]> wrote:
>
> Please note that the data produced by "-g -fvar-tracking" is consumed
> by tools like systemtap, perf, crash, and makes a significant
> difference to the observability of debug AND non-debug kernels.

Yeah, and compared to having a buggy kernel, I care exactly this much: "".

Besides, "significant difference" is very debatable indeed. It may be
noticeable, it's likely not actually significant.

Linus

2014-07-28 16:48:27

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Sun, Jul 27, 2014 at 8:47 PM, Michel Dänzer <[email protected]> wrote:
> On 27.07.2014 04:56, Linus Torvalds wrote:
>>
>> Also, Michel - can you try this patch if you still have your
>> gcc-4.9.0 install, and send me the resulting fair.s file again?
>
> Attached.

The frame setup looks fine to me now (apart from the spilling of a
constant value, but hey, that wasn't _buggy_, that was just stupid).

And I'm assuming everything runs fine too?

Linus

2014-07-28 17:27:46

by Alexei Starovoitov

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Mon, Jul 28, 2014 at 09:45:45AM -0700, Linus Torvalds wrote:
> On Mon, Jul 28, 2014 at 5:26 AM, Frank Ch. Eigler <[email protected]> wrote:
> >
> > Please note that the data produced by "-g -fvar-tracking" is consumed
> > by tools like systemtap, perf, crash, and makes a significant
> > difference to the observability of debug AND non-debug kernels.
>
> Yeah, and compared to having a buggy kernel, I care exactly this much: "".
>
> Besides, "significant difference" is very debatable indeed. It may be
> noticeable, it's likely not actually significant.

as far as I can see in gcc code, -fno-var-tracking-assignments is the main
switch for 'vartrack' pass. So -fno-var-tracking-assignments is pretty
much equivalent to -fno-var-tracking
Both kill majority of debug info for local variables and function arguments.
The end effect for perf/systemtap will be quite noticeable.

It's not pretty, but adding it unconditionally was the right thing to do.
Black listing compiler versions is too fragile.
Look at the flip side: now size of build dir will be much smaller :)

For gcc it's obviously a big blow. Now 10k lines pass developed
over 10+ years will not be used :(

2014-07-28 18:09:10

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 2014.07.28 at 10:27 -0700, Alexei Starovoitov wrote:
> On Mon, Jul 28, 2014 at 09:45:45AM -0700, Linus Torvalds wrote:
> > On Mon, Jul 28, 2014 at 5:26 AM, Frank Ch. Eigler <[email protected]> wrote:
> > >
> > > Please note that the data produced by "-g -fvar-tracking" is consumed
> > > by tools like systemtap, perf, crash, and makes a significant
> > > difference to the observability of debug AND non-debug kernels.
> >
> > Yeah, and compared to having a buggy kernel, I care exactly this much: "".
>
> It's not pretty, but adding it unconditionally was the right thing to do.
> Black listing compiler versions is too fragile.
> Look at the flip side: now size of build dir will be much smaller :)

It shouldn't be too hard to implement a simple check for the bug in the
next release. Just compile the gcc/testsuite/gcc.target/i386/pr61801.c
testcase with -fcompare-debug. If gcc returns 0 then
-fvar-tracking-assignments could safely be enabled again.

Here's the testcase:

int a, b, c;
void fn1 ()
{
int d;
if (fn2 () && !0)
{
b = (
{
int e;
fn3 ();
switch (0)
default:
asm volatile("" : "=a"(e) : "0"(a), "i"(0));
e;
});
d = b;
}
c = d;
}

--
Markus

2014-07-28 18:28:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Mon, Jul 28, 2014 at 11:09 AM, Markus Trippelsdorf
<[email protected]> wrote:
>
> It shouldn't be too hard to implement a simple check for the bug in the
> next release. Just compile the gcc/testsuite/gcc.target/i386/pr61801.c
> testcase with -fcompare-debug. If gcc returns 0 then
> -fvar-tracking-assignments could safely be enabled again.

We don't really have any good infrastructure for things like this,
though. We probably *should* have a way to generate config options by
compiler version, but right now we don't. We do random ugly things
from within Makefile shell escapes (see all the helpers for this we do
in scripts/Kbuild.include, for example), and we could add yet another
one. But this is a whole new level of "ugly hack". It would be better
if we could do things like this at config time, not at build-time with
Makefile hacks.

Also, the test-case seems to be very sensitive to compiler options: it
passes with "-O", but fails with "-O2" or "-Os" for me. So I wonder
how reliable it is in the face of compiler version differences (ie is
it really robust wrt the bug actually being *fixed*, or is it a bit of
a happenstance)

Linus

2014-07-28 18:41:56

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 2014.07.28 at 11:28 -0700, Linus Torvalds wrote:
> On Mon, Jul 28, 2014 at 11:09 AM, Markus Trippelsdorf
> <[email protected]> wrote:
> >
> > It shouldn't be too hard to implement a simple check for the bug in the
> > next release. Just compile the gcc/testsuite/gcc.target/i386/pr61801.c
> > testcase with -fcompare-debug. If gcc returns 0 then
> > -fvar-tracking-assignments could safely be enabled again.
>
> We don't really have any good infrastructure for things like this,
> though. We probably *should* have a way to generate config options by
> compiler version, but right now we don't. We do random ugly things
> from within Makefile shell escapes (see all the helpers for this we do
> in scripts/Kbuild.include, for example), and we could add yet another
> one. But this is a whole new level of "ugly hack". It would be better
> if we could do things like this at config time, not at build-time with
> Makefile hacks.
>
> Also, the test-case seems to be very sensitive to compiler options: it
> passes with "-O", but fails with "-O2" or "-Os" for me. So I wonder
> how reliable it is in the face of compiler version differences (ie is
> it really robust wrt the bug actually being *fixed*, or is it a bit of
> a happenstance)

It is robust with -O2 and -Os for all supported series that I've
checked: 4.8, 4.9 and 5.0. I haven't checked older releases.

--
Markus

2014-07-28 19:50:27

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Mon, Jul 28, 2014 at 10:27:39AM -0700, Alexei Starovoitov wrote:
>
> It's not pretty, but adding it unconditionally was the right thing to do.
> Black listing compiler versions is too fragile.
> Look at the flip side: now size of build dir will be much smaller :)

White-listing the fixed compiler version should be safe though, and
that should pretty much fix things for community distros fairly
quickly. For enterprise distros, once the compiler is patched, their
kernel sources could also be hacked to add a distro-specific version
whitelist. This would all be fixed in 4.9.2, right?

- Ted

2014-07-29 02:29:18

by Michel Dänzer

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 29.07.2014 01:48, Linus Torvalds wrote:
> On Sun, Jul 27, 2014 at 8:47 PM, Michel Dänzer <[email protected]> wrote:
>> On 27.07.2014 04:56, Linus Torvalds wrote:
>>>
>>> Also, Michel - can you try this patch if you still have your
>>> gcc-4.9.0 install, and send me the resulting fair.s file again?
>>
>> Attached.
>
> The frame setup looks fine to me now (apart from the spilling of a
> constant value, but hey, that wasn't _buggy_, that was just stupid).
>
> And I'm assuming everything runs fine too?

I've rebased my .config on the one from the current Debian kernel
packages, which no longer enables CONFIG_CC_OPTIMIZE_FOR_SIZE. So I'm no
longer affected by this bug anyway. (I used the old .config for
compiling the fair.s above though)

--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2014-07-29 08:59:37

by Jakub Jelinek

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On Mon, Jul 28, 2014 at 08:09:02PM +0200, Markus Trippelsdorf wrote:
> Here's the testcase:
>
> int a, b, c;
> void fn1 ()
> {
> int d;
> if (fn2 () && !0)
> {
> b = (
> {
> int e;
> fn3 ();
> switch (0)
> default:
> asm volatile("" : "=a"(e) : "0"(a), "i"(0));
> e;
> });
> d = b;
> }
> c = d;
> }

int a, c;
int bar (void);
void baz (void);

void
foo (void)
{
int d;
if (bar ())
{
int e;
baz ();
asm volatile ("" : "=a" (e) : "0" (a), "i" (0));
d = e;
}
c = d;
}

fails the same way and has more creduce cruft removed.
Fails also with 4.7 at -O2 -fcompare-debug.

Jakub

2014-07-29 09:20:23

by Michel Dänzer

[permalink] [raw]

Subject: Re: Random panic in load_balance() with 3.16-rc

On 27.07.2014 03:02, Steven Chamberlain wrote:
> On 25/07/14 02:25, Michel Dänzer wrote:
>> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm
>> going to try reproducing the problem with a kernel built by that now.
>
> It looks like gcc-4.9 Debian package version 4.9.1-2 available in
> sid/jessie may have already fixed this (though Debian's buildd systems
> may not have been updated with it yet). If you are able to test with
> that version and let us know, that would be wonderful.

I couldn't test 4.9.1-2, but 4.9.1-3 seems to fix the problem.

--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer