From: Andi Kleen <[email protected]>
The scheduler initialization code checks that the scheduling
classes are consecutive in memory by comparing the end
addresses with the next address.
Technically in ISO C comparing symbol addresseses outside different objects
is undefined. With LTO gcc 10 tries to exploits this and creates an
unconditional BUG_ON in the scheduler initialization, resulting
in a boot hang.
Use RELOC_HIDE to make this work. This hides the symbols from gcc,
so the optimizer won't make these assumption. I also split
the BUG_ONs in multiple.
Signed-off-by: Andi Kleen <[email protected]>
---
kernel/sched/core.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 98191218d891..272a654f5293 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8086,12 +8086,20 @@ void __init sched_init(void)
unsigned long ptr = 0;
int i;
- /* Make sure the linker didn't screw up */
- BUG_ON(&idle_sched_class + 1 != &fair_sched_class ||
- &fair_sched_class + 1 != &rt_sched_class ||
- &rt_sched_class + 1 != &dl_sched_class);
+ /*
+ * Make sure the linker didn't screw up.
+ * We have to use RELOC_HIDE to prevent gcc from optimizing
+ * them to true because they're technically undefined in ISO-C.
+ */
+ BUG_ON(RELOC_HIDE(&idle_sched_class, 0) + 1 !=
+ RELOC_HIDE(&fair_sched_class, 0));
+ BUG_ON(RELOC_HIDE(&fair_sched_class, 0) + 1 !=
+ RELOC_HIDE(&rt_sched_class, 0));
+ BUG_ON(RELOC_HIDE(&rt_sched_class, 0) + 1 !=
+ RELOC_HIDE(&dl_sched_class, 0));
#ifdef CONFIG_SMP
- BUG_ON(&dl_sched_class + 1 != &stop_sched_class);
+ BUG_ON(RELOC_HIDE(&dl_sched_class, 0) + 1 !=
+ RELOC_HIDE(&stop_sched_class, 0));
#endif
wait_bit_init();
--
2.25.4
On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The scheduler initialization code checks that the scheduling
> classes are consecutive in memory by comparing the end
> addresses with the next address.
>
> Technically in ISO C comparing symbol addresseses outside different objects
> is undefined. With LTO gcc 10 tries to exploits this and creates an
> unconditional BUG_ON in the scheduler initialization, resulting
> in a boot hang.
>
> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> so the optimizer won't make these assumption. I also split
> the BUG_ONs in multiple.
Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
disable that?
* Peter Zijlstra:
> On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
>> From: Andi Kleen <[email protected]>
>>
>> The scheduler initialization code checks that the scheduling
>> classes are consecutive in memory by comparing the end
>> addresses with the next address.
>>
>> Technically in ISO C comparing symbol addresseses outside different objects
>> is undefined. With LTO gcc 10 tries to exploits this and creates an
>> unconditional BUG_ON in the scheduler initialization, resulting
>> in a boot hang.
>>
>> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
>> so the optimizer won't make these assumption. I also split
>> the BUG_ONs in multiple.
>
> Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> disable that?
Context:
<https://lore.kernel.org/lkml/[email protected]/>
Obviously, GCC doesn't do this in general. Would you please provide a
minimal test case?
Thanks,
Florian
On Wed, May 05, 2021 at 10:47:07AM +0200, Florian Weimer wrote:
> * Peter Zijlstra:
>
> > On Tue, May 04, 2021 at 08:39:45PM -0700, Andi Kleen wrote:
> >> From: Andi Kleen <[email protected]>
> >>
> >> The scheduler initialization code checks that the scheduling
> >> classes are consecutive in memory by comparing the end
> >> addresses with the next address.
> >>
> >> Technically in ISO C comparing symbol addresseses outside different objects
> >> is undefined. With LTO gcc 10 tries to exploits this and creates an
> >> unconditional BUG_ON in the scheduler initialization, resulting
> >> in a boot hang.
> >>
> >> Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> >> so the optimizer won't make these assumption. I also split
> >> the BUG_ONs in multiple.
> >
> > Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> > disable that?
>
> Context:
>
> <https://lore.kernel.org/lkml/[email protected]/>
>
> Obviously, GCC doesn't do this in general. Would you please provide a
> minimal test case?
Andi has this GCC-LTO patch-set that triggers this, but the thing I'd
like fixed is the UB mentioned above. Not this particular instance.
And, we've had the problem before, see all the RELOC_HIDE crud. Having
this pointer arith outside object be UB is just really annoying. And in
the spirit of UB bad, can we please get a flag to remove the UB and have
it do the obvious, just do the arithmetic and don't do daft things.
Pretty please.
> Context:
>
> <https://lore.kernel.org/lkml/[email protected]/>
>
> Obviously, GCC doesn't do this in general.
We've seen it in other cases before, that's why RELOC_HIDE exists.
A classic case was __pa_symbol()
That dates back nearly two decades at this point.
> Would you please provide a
> minimal test case?
You can only reproduce it with a LTO build because it needs knowledge
between different translation units for this specific case.
But gcc will totally do the optimization even without LTO if it can
prove the same inside a single TU.
If you want to reproduce it you can use my tree here
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc lto-5.12-3
and revert the fix. The kernel will not boot.
-Andi
On Wed, May 05, 2021 at 07:34:42AM -0700, Andi Kleen wrote:
> > > Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> > > so the optimizer won't make these assumption. I also split
> > > the BUG_ONs in multiple.
> >
> > Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> > disable that?
>
> Even if that was done (I could totally see the gcc people pushing back on this;
> why should they add special flags just for Linux developers not understanding
> ISO-C?)
I understand C fine, I just don't agree with it. I also want to
explicitly define as much UB as is possible, because UB is just utter
garbage.
So just like we do with -fwrapv and others, add more knobs that
explictly define away UB. Less UB is more better. This being C it's
unlikely we'll ever get to no UB, but we should damn well try :-)
> you would still need the fix for already shipping compilers.
Yes, there is that.
On 5/5/2021 9:41 AM, Nick Desaulniers wrote:
> On Wed, May 5, 2021 at 7:39 AM Andi Kleen <[email protected]> wrote:
>>> Would you please provide a
>>> minimal test case?
>> You can only reproduce it with a LTO build because it needs knowledge
>> between different translation units for this specific case.
>>
>> But gcc will totally do the optimization even without LTO if it can
>> prove the same inside a single TU.
> It would be helpful to isolate a test case that doesn't rely on LTO,
> if possible.
Like I wrote earlier we used to see it all the time in __pa_symbol
before it used RELOC_HIDE. I bet if you make RELOC_HIDE a nop you'll see
multiple instances.
But not sure why you want a test case?
-Andi
On Wed, May 5, 2021 at 7:39 AM Andi Kleen <[email protected]> wrote:
>
> > Would you please provide a
> > minimal test case?
>
> You can only reproduce it with a LTO build because it needs knowledge
> between different translation units for this specific case.
>
> But gcc will totally do the optimization even without LTO if it can
> prove the same inside a single TU.
It would be helpful to isolate a test case that doesn't rely on LTO,
if possible.
--
Thanks,
~Nick Desaulniers
On Wed, May 5, 2021 at 9:49 AM Andi Kleen <[email protected]> wrote:
>
>
> On 5/5/2021 9:41 AM, Nick Desaulniers wrote:
> > On Wed, May 5, 2021 at 7:39 AM Andi Kleen <[email protected]> wrote:
> >>> Would you please provide a
> >>> minimal test case?
> >> You can only reproduce it with a LTO build because it needs knowledge
> >> between different translation units for this specific case.
> >>
> >> But gcc will totally do the optimization even without LTO if it can
> >> prove the same inside a single TU.
> > It would be helpful to isolate a test case that doesn't rely on LTO,
> > if possible.
>
> Like I wrote earlier we used to see it all the time in __pa_symbol
> before it used RELOC_HIDE. I bet if you make RELOC_HIDE a nop you'll see
> multiple instances.
>
> But not sure why you want a test case?
In general,
when making a feature request to a compiler vendor, having a
digestible snippet of code that demonstrates the problem goes a long
way, much further than "clone this branch of my fork of this project
and do a build and something goes wrong somewhere." We're too busy to
do that, please take the time to isolate it before making such
requests.
--
Thanks,
~Nick Desaulniers
> > Use RELOC_HIDE to make this work. This hides the symbols from gcc,
> > so the optimizer won't make these assumption. I also split
> > the BUG_ONs in multiple.
>
> Urgh, that insanity again :/ Can't we pretty please get a GCC flag to
> disable that?
Even if that was done (I could totally see the gcc people pushing back on this;
why should they add special flags just for Linux developers not understanding
ISO-C?) you would still need the fix for already shipping compilers.
-Andi
>> But not sure why you want a test case?
> In general,
> when making a feature request to a compiler vendor, having a
> digestible snippet of code that demonstrates the problem goes a long
> way, much further than "clone this branch of my fork of this project
> and do a build and something goes wrong somewhere." We're too busy to
> do that, please take the time to isolate it before making such
> requests.
Ah you misunderstood. I'm not making any feature requests.
-Andi