The latest GCC in Fedora rawhide contains some serious bug (or provokes a latent one in the kernel) that makes every kernel built unbootable. It just locks up halfway through the init. Kernels that previously worked fine all now experience the same symptom. Even RH's own kernels exhibit this. The kernel built Nov 24th works, Nov 26th doesn't. gcc was updated 26th, 14 hours earlier.
The last message printed is:
isapnp: Scanning for PnP cards...
Comparing with the working kernel, the next steps are:
isapnp: Scanning for PnP cards...
Switched to high resolution mode on CPU 0
isapnp: No Plug & Play device found
Any ideas on how I can work around this? I'm rather unproductive when I can't build working kernels.. :/
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Sat, 1 Dec 2007 15:42:23 +0100
Pierre Ossman <[email protected]> wrote:
> The latest GCC in Fedora rawhide contains some serious bug (or provokes a latent one in the kernel) that makes every kernel built unbootable. It just locks up halfway through the init. Kernels that previously worked fine all now experience the same symptom. Even RH's own kernels exhibit this. The kernel built Nov 24th works, Nov 26th doesn't. gcc was updated 26th, 14 hours earlier.
>
Digging a bit further, it is indeed the high-res stuff (the first missing message) that hangs. If I hard code the kernel to just be non-high-res capable, it boots, but time keeping is horribly broken.
Anyway, hopefully this means I'll soon have the object file that gets miscompiled. Jakub also pointed me to an older gcc RPM so that I can produce an object file with that as well and see what differs.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> On Sat, 1 Dec 2007 15:42:23 +0100
> Pierre Ossman <[email protected]> wrote:
>
>> The latest GCC in Fedora rawhide contains some serious bug (or provokes a latent one in the kernel) that makes every kernel built unbootable. It just locks up halfway through the init. Kernels that previously worked fine all now experience the same symptom. Even RH's own kernels exhibit this. The kernel built Nov 24th works, Nov 26th doesn't. gcc was updated 26th, 14 hours earlier.
>>
>
> Digging a bit further, it is indeed the high-res stuff (the first missing message) that hangs. If I hard code the kernel to just be non-high-res capable, it boots, but time keeping is horribly broken.
>
> Anyway, hopefully this means I'll soon have the object file that gets miscompiled. Jakub also pointed me to an older gcc RPM so that I can produce an object file with that as well and see what differs.
>
If you are referring to the "compat" RPMs, be aware that they use the
current headers, which is a good or bad thing depending on what you want
to do. If you want to build old software, you get to keep a down-rev
virtual machine to do it right :-(
--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
On Sat, 01 Dec 2007 13:37:44 -0500
Bill Davidsen <[email protected]> wrote:
> If you are referring to the "compat" RPMs, be aware that they use the
> current headers, which is a good or bad thing depending on what you want
> to do. If you want to build old software, you get to keep a down-rev
> virtual machine to do it right :-(
>
Nah. The previous gcc package is the one shipped with Fedora 8. So I could just grab that one (plus cpp and libgomp) and downgrade.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Sat, 1 Dec 2007 18:47:52 +0100
Pierre Ossman <[email protected]> wrote:
> On Sat, 1 Dec 2007 15:42:23 +0100
> Pierre Ossman <[email protected]> wrote:
>
> > The latest GCC in Fedora rawhide contains some serious bug (or provokes a latent one in the kernel) that makes every kernel built unbootable. It just locks up halfway through the init. Kernels that previously worked fine all now experience the same symptom. Even RH's own kernels exhibit this. The kernel built Nov 24th works, Nov 26th doesn't. gcc was updated 26th, 14 hours earlier.
> >
>
> Digging a bit further, it is indeed the high-res stuff (the first missing message) that hangs. If I hard code the kernel to just be non-high-res capable, it boots, but time keeping is horribly broken.
>
> Anyway, hopefully this means I'll soon have the object file that gets miscompiled. Jakub also pointed me to an older gcc RPM so that I can produce an object file with that as well and see what differs.
>
I've now pinpointed where it hangs. And it doesn't hang in fact. It gets stuck in an infinite loop in tick_setup_sched_timer():
for (;;) {
hrtimer_forward(&ts->sched_timer, now, tick_period);
hrtimer_start(&ts->sched_timer, ts->sched_timer.expires,
HRTIMER_MODE_ABS);
/* Check, if the timer was already in the past */
if (hrtimer_active(&ts->sched_timer))
break;
now = ktime_get();
}
I've added Thomas as cc as this is his domain, so perhaps he has some idea what the compiler does wrong here. I've also included the two object files (one good, one bad). HEAD is v2.6.24-rc3.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
> On Sat, 1 Dec 2007 18:47:52 +0100
> Pierre Ossman <[email protected]> wrote:
>> > The latest GCC in Fedora rawhide contains some serious bug (or
>> provokes a latent one in the kernel) that makes every kernel built
>> unbootable. It just locks up halfway through the init. Kernels that
>> previously worked fine all now experience the same symptom. Even RH's
>> own kernels exhibit this. The kernel built Nov 24th works, Nov 26th
>> doesn't. gcc was updated 26th, 14 hours earlier.
>> >
>>
>> Digging a bit further, it is indeed the high-res stuff (the first
>> missing message) that hangs. If I hard code the kernel to just be
>> non-high-res capable, it boots, but time keeping is horribly broken.
>>
>> Anyway, hopefully this means I'll soon have the object file that gets
>> miscompiled. Jakub also pointed me to an older gcc RPM so that I can
>> produce an object file with that as well and see what differs.
>>
>
> I've now pinpointed where it hangs. And it doesn't hang in fact. It gets
> stuck in an infinite loop in tick_setup_sched_timer():
>
> for (;;) {
> hrtimer_forward(&ts->sched_timer, now, tick_period);
> hrtimer_start(&ts->sched_timer, ts->sched_timer.expires,
> HRTIMER_MODE_ABS);
> /* Check, if the timer was already in the past */
> if (hrtimer_active(&ts->sched_timer))
> break;
> now = ktime_get();
> }
>
> I've added Thomas as cc as this is his domain, so perhaps he has some idea
> what the compiler does wrong here. I've also included the two object files
> (one good, one bad). HEAD is v2.6.24-rc3.
I looked at the disassembly but I can not spot the problem.
I think the real problem is somewhere else. Likely candidates are
hrtimer_forward() or hrtimer_start() - in that order.
Thanks,
tglx
P.S.: I have restricted network access today, so I can not reproduce my self.
On Mon, Dec 03, 2007 at 09:17:22AM +0100, Thomas Gleixner wrote:
> I looked at the disassembly but I can not spot the problem.
>
> I think the real problem is somewhere else. Likely candidates are
> hrtimer_forward() or hrtimer_start() - in that order.
Should be hopefully fixed in latest Fedora gcc. The problem was in code like
typedef union { long long int s; } U;
typedef struct { U u; } S;
void foo (S *s, long long int x, unsigned long int y)
{
s->u = ({ (U) { .s = s->u.s + x * y }; });
}
where a backport of a recent optimization of mine, without which gcc handles
terribly initializers from compound literals (which is something hrtimer
uses just everywhere - why can't ktime.h for #if BITS_PER_LONG == 64 || defined(CONFIG_KTIME_SCALAR)
just use a scalar rather than union with a scalar in it??), sets the LHS
object to the compound literal's initializer rather than forcing creation of
a temporary object (the compound literal). Unfortunately the gimplifier
had some bugs in case the initializer references (or at least might
reference) parts of LHS object. Fixed by backporting 2 Ada bugfixes for the
gimplifier from GCC trunk (Ada was hitting those bugs even without this
compound literal optimization).
Jakub
On Mon, 3 Dec 2007, Jakub Jelinek wrote:
> On Mon, Dec 03, 2007 at 09:17:22AM +0100, Thomas Gleixner wrote:
> > I looked at the disassembly but I can not spot the problem.
> >
> > I think the real problem is somewhere else. Likely candidates are
> > hrtimer_forward() or hrtimer_start() - in that order.
>
> Should be hopefully fixed in latest Fedora gcc. The problem was in code like
> typedef union { long long int s; } U;
> typedef struct { U u; } S;
>
> void foo (S *s, long long int x, unsigned long int y)
> {
> s->u = ({ (U) { .s = s->u.s + x * y }; });
> }
>
> where a backport of a recent optimization of mine, without which gcc handles
> terribly initializers from compound literals (which is something hrtimer
> uses just everywhere - why can't ktime.h for #if BITS_PER_LONG == 64 || defined(CONFIG_KTIME_SCALAR)
> just use a scalar rather than union with a scalar in it??),
Of course just to annoy you :)
Seriously, we want the same code/initializers for both the scalar and the
sec/nsec case. That's where the union comes from.
Thanks,
tglx
On Mon, Dec 03, 2007 at 12:34:17PM +0100, Thomas Gleixner wrote:
> Of course just to annoy you :)
It doesn't matter whether I'm annoyed about this or not, but whether gcc is
able to generate decent code with it or not. And especially with union it
is not, at least through all the tree ssa passes. You already have a lot of
the details hidden in ktime.h accessor inlines, so I don't think it would be
hard to add further one or two.
Anyway, even just using typedef struct ktime { s64 tv64; } ktime_t; could
make things better in case you have just one field. Unlike unions, structs
can be (and in this case most likely will be) scalarized by SRA, so
half of tree SSA passes will see it as integral var and will be able to
perform optimizations on it.
Jakub
On Mon, 3 Dec 2007, Jakub Jelinek wrote:
> On Mon, Dec 03, 2007 at 12:34:17PM +0100, Thomas Gleixner wrote:
> > Of course just to annoy you :)
>
> It doesn't matter whether I'm annoyed about this or not, but whether gcc is
> able to generate decent code with it or not. And especially with union it
> is not, at least through all the tree ssa passes. You already have a lot of
> the details hidden in ktime.h accessor inlines, so I don't think it would be
> hard to add further one or two.
>
> Anyway, even just using typedef struct ktime { s64 tv64; } ktime_t; could
> make things better in case you have just one field. Unlike unions, structs
> can be (and in this case most likely will be) scalarized by SRA, so
> half of tree SSA passes will see it as integral var and will be able to
> perform optimizations on it.
Makes sense. I look into fixing that.
Thanks,
tglx