2021-05-05 03:43:57

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] sched: Work around undefined behavior in sched class checking

From: Andi Kleen <[email protected]>

The scheduler initialization code checks that the scheduling
classes are consecutive in memory by comparing the end
addresses with the next address.

Technically in ISO C comparing symbol addresseses outside different objects
is undefined. With LTO gcc 10 tries to exploits this and creates an
unconditional BUG_ON in the scheduler initialization, resulting
in a boot hang.

Use RELOC_HIDE to make this work. This hides the symbols from gcc,
so the optimizer won't make these assumption. I also split
the BUG_ONs in multiple.

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/sched/core.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 98191218d891..272a654f5293 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8086,12 +8086,20 @@ void __init sched_init(void)
unsigned long ptr = 0;
int i;

- /* Make sure the linker didn't screw up */
- BUG_ON(&idle_sched_class + 1 != &fair_sched_class ||
- &fair_sched_class + 1 != &rt_sched_class ||
- &rt_sched_class + 1 != &dl_sched_class);
+ /*
+ * Make sure the linker didn't screw up.
+ * We have to use RELOC_HIDE to prevent gcc from optimizing
+ * them to true because they're technically undefined in ISO-C.
+ */
+ BUG_ON(RELOC_HIDE(&idle_sched_class, 0) + 1 !=
+ RELOC_HIDE(&fair_sched_class, 0));
+ BUG_ON(RELOC_HIDE(&fair_sched_class, 0) + 1 !=
+ RELOC_HIDE(&rt_sched_class, 0));
+ BUG_ON(RELOC_HIDE(&rt_sched_class, 0) + 1 !=
+ RELOC_HIDE(&dl_sched_class, 0));
#ifdef CONFIG_SMP
- BUG_ON(&dl_sched_class + 1 != &stop_sched_class);
+ BUG_ON(RELOC_HIDE(&dl_sched_class, 0) + 1 !=
+ RELOC_HIDE(&stop_sched_class, 0));
#endif

wait_bit_init();
--
2.25.4


2021-05-05 06:49:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The scheduler initialization code checks that the scheduling
> classes are consecutive in memory by comparing the end
> addresses with the next address.
>
> Technically in ISO C comparing symbol addresseses outside different objects
> is undefined. With LTO gcc 10 tries to exploits this and creates an
> unconditional BUG_ON in the scheduler initialization, resulting
> in a boot hang.
>
> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> so the optimizer won't make these assumption. I also split
> the BUG_ONs in multiple.

Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
disable that?

2021-05-05 08:50:35

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

* Peter Zijlstra:

> On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
>> From: Andi Kleen <[email protected]>
>>
>> The scheduler initialization code checks that the scheduling
>> classes are consecutive in memory by comparing the end
>> addresses with the next address.
>>
>> Technically in ISO C comparing symbol addresseses outside different objects
>> is undefined. With LTO gcc 10 tries to exploits this and creates an
>> unconditional BUG_ON in the scheduler initialization, resulting
>> in a boot hang.
>>
>> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
>> so the optimizer won't make these assumption. I also split
>> the BUG_ONs in multiple.
>
> Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> disable that?

Context:

<https://lore.kernel.org/lkml/[email protected]/>

Obviously, GCC doesn't do this in general. Would you please provide a
minimal test case?

Thanks,
Florian

2021-05-05 09:06:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

On Wed, May 05, 2021 at 10:47:07AM +0200, Florian Weimer wrote:
> * Peter Zijlstra:
>
> > On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
> >> From: Andi Kleen <[email protected]>
> >>
> >> The scheduler initialization code checks that the scheduling
> >> classes are consecutive in memory by comparing the end
> >> addresses with the next address.
> >>
> >> Technically in ISO C comparing symbol addresseses outside different objects
> >> is undefined. With LTO gcc 10 tries to exploits this and creates an
> >> unconditional BUG_ON in the scheduler initialization, resulting
> >> in a boot hang.
> >>
> >> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> >> so the optimizer won't make these assumption. I also split
> >> the BUG_ONs in multiple.
> >
> > Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> > disable that?
>
> Context:
>
> <https://lore.kernel.org/lkml/[email protected]/>
>
> Obviously, GCC doesn't do this in general. Would you please provide a
> minimal test case?

Andi has this GCC-LTO patch-set that triggers this, but the thing I'd
like fixed is the UB mentioned above. Not this particular instance.

And, we've had the problem before, see all the RELOC_HIDE crud. Having
this pointer arith outside object be UB is just really annoying. And in
the spirit of UB bad, can we please get a flag to remove the UB and have
it do the obvious, just do the arithmetic and don't do daft things.

Pretty please.

2021-05-05 14:40:49

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

> Context:
>
> <https://lore.kernel.org/lkml/[email protected]/>
>
> Obviously, GCC doesn't do this in general.

We've seen it in other cases before, that's why RELOC_HIDE exists.
A classic case was __pa_symbol()

That dates back nearly two decades at this point.

> Would you please provide a
> minimal test case?

You can only reproduce it with a LTO build because it needs knowledge
between different translation units for this specific case.

But gcc will totally do the optimization even without LTO if it can
prove the same inside a single TU.

If you want to reproduce it you can use my tree here
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc lto-5.12-3
and revert the fix. The kernel will not boot.

-Andi

2021-05-05 15:37:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

On Wed, May 05, 2021 at 07:34:42AM -0700, Andi Kleen wrote:
> > > Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> > > so the optimizer won't make these assumption. I also split
> > > the BUG_ONs in multiple.
> >
> > Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> > disable that?
>
> Even if that was done (I could totally see the gcc people pushing back on this;
> why should they add special flags just for Linux developers not understanding
> ISO-C?)

I understand C fine, I just don't agree with it. I also want to
explicitly define as much UB as is possible, because UB is just utter
garbage.

So just like we do with -fwrapv and others, add more knobs that
explictly define away UB. Less UB is more better. This being C it's
unlikely we'll ever get to no UB, but we should damn well try :-)

> you would still need the fix for already shipping compilers.

Yes, there is that.

2021-05-05 17:02:05

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking


On 5/5/2021 9:41 AM, Nick Desaulniers wrote:
> On Wed, May 5, 2021 at 7:39 AM Andi Kleen <[email protected]> wrote:
>>> Would you please provide a
>>> minimal test case?
>> You can only reproduce it with a LTO build because it needs knowledge
>> between different translation units for this specific case.
>>
>> But gcc will totally do the optimization even without LTO if it can
>> prove the same inside a single TU.
> It would be helpful to isolate a test case that doesn't rely on LTO,
> if possible.

Like I wrote earlier we used to see it all the time in __pa_symbol
before it used RELOC_HIDE. I bet if you make RELOC_HIDE a nop you'll see
multiple instances.

But not sure why you want a test case?

-Andi

2021-05-05 17:26:03

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

On Wed, May 5, 2021 at 7:39 AM Andi Kleen <[email protected]> wrote:
>
> > Would you please provide a
> > minimal test case?
>
> You can only reproduce it with a LTO build because it needs knowledge
> between different translation units for this specific case.
>
> But gcc will totally do the optimization even without LTO if it can
> prove the same inside a single TU.

It would be helpful to isolate a test case that doesn't rely on LTO,
if possible.
--
Thanks,
~Nick Desaulniers

2021-05-05 17:43:14

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

On Wed, May 5, 2021 at 9:49 AM Andi Kleen <[email protected]> wrote:
>
>
> On 5/5/2021 9:41 AM, Nick Desaulniers wrote:
> > On Wed, May 5, 2021 at 7:39 AM Andi Kleen <[email protected]> wrote:
> >>> Would you please provide a
> >>> minimal test case?
> >> You can only reproduce it with a LTO build because it needs knowledge
> >> between different translation units for this specific case.
> >>
> >> But gcc will totally do the optimization even without LTO if it can
> >> prove the same inside a single TU.
> > It would be helpful to isolate a test case that doesn't rely on LTO,
> > if possible.
>
> Like I wrote earlier we used to see it all the time in __pa_symbol
> before it used RELOC_HIDE. I bet if you make RELOC_HIDE a nop you'll see
> multiple instances.
>
> But not sure why you want a test case?

In general,
when making a feature request to a compiler vendor, having a
digestible snippet of code that demonstrates the problem goes a long
way, much further than "clone this branch of my fork of this project
and do a build and something goes wrong somewhere." We're too busy to
do that, please take the time to isolate it before making such
requests.

--
Thanks,
~Nick Desaulniers

2021-05-05 19:29:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking

> > Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> > so the optimizer won't make these assumption. I also split
> > the BUG_ONs in multiple.
>
> Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> disable that?

Even if that was done (I could totally see the gcc people pushing back on this;
why should they add special flags just for Linux developers not understanding
ISO-C?) you would still need the fix for already shipping compilers.

-Andi

2021-05-05 21:56:25

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] sched: Work around undefined behavior in sched class checking


>> But not sure why you want a test case?
> In general,
> when making a feature request to a compiler vendor, having a
> digestible snippet of code that demonstrates the problem goes a long
> way, much further than "clone this branch of my fork of this project
> and do a build and something goes wrong somewhere." We're too busy to
> do that, please take the time to isolate it before making such
> requests.

Ah you misunderstood. I'm not making any feature requests.

-Andi