2016-04-26 19:00:15

by Stefan Richter

[permalink] [raw]
Subject: Regression of v4.6-rc vs. v4.5: hangs after a few minutes after boot

Hi,

v4.6-rc solidly hangs after a short while after boot, login to X11, and
doing nothing much remarkable on the just brought up X desktop.

Hardware: x86-64, E3-1245 v3 (Haswell),
mainboard Supermicro X10SAE,
using integrated Intel graphics (HD P4600, i915 driver),
C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
Intel LAN (i217, igb driver),
several IEEE 1394 controllers, some of them behind
PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI, Tundra)
and one PCI-to-CardBus bridge (Ricoh)

kernel.org kernel, Gentoo Linux userland

1. known good: v4.5-rc5 (gcc 4.9.3)
known bad: v4.6-rc2 (gcc 4.9.3), only tried one time

2. known good: v4.5.2 (gcc 5.2.0)
known bad: v4.6-rc5 (gcc 5.2.0), only tried one time

I will send my linux-4.6-rc5/.config in a follow-up message.

In theory I could collect more info (simplify the hardware, run
netconsole, bisect). In practice I cannot do so for the the time being
due to lack of spare time. That's also the reason why I did not already
send a report when I tested v4.6-rc2, and why I did not boot v4.6-rc[25]
more than once yet.
--
Stefan Richter
-======----- -=-- ==-=-
http://arcgraph.de/sr/


2016-04-26 19:05:32

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression of v4.6-rc vs. v4.5: hangs after a few minutes after boot

On Apr 26 Stefan Richter wrote:
> Hardware: x86-64, E3-1245 v3 (Haswell),
> mainboard Supermicro X10SAE,
> using integrated Intel graphics (HD P4600, i915 driver),
> C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
> Intel LAN (i217, igb driver),
> several IEEE 1394 controllers, some of them behind
> PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI, Tundra)
> and one PCI-to-CardBus bridge (Ricoh)
>
> kernel.org kernel, Gentoo Linux userland
>
> 1. known good: v4.5-rc5 (gcc 4.9.3)
> known bad: v4.6-rc2 (gcc 4.9.3), only tried one time
>
> 2. known good: v4.5.2 (gcc 5.2.0)
> known bad: v4.6-rc5 (gcc 5.2.0), only tried one time
>
> I will send my linux-4.6-rc5/.config in a follow-up message.
>
> In theory I could collect more info (simplify the hardware, run
> netconsole, bisect). In practice I cannot do so for the the time being
> due to lack of spare time. That's also the reason why I did not already
> send a report when I tested v4.6-rc2, and why I did not boot v4.6-rc[25]
> more than once yet.

Attached: linux-4.6-rc5/.config
--
Stefan Richter
-======----- -=-- ==-=-
http://arcgraph.de/sr/


Attachments:
(No filename) (1.17 kB)
config-4.6.0-rc5 (82.37 kB)
Download all attachments

2016-04-26 19:07:52

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression of v4.6-rc vs. v4.5: hangs after a few minutes after boot

On Apr 26 Stefan Richter wrote:
> v4.6-rc solidly hangs after a short while after boot, login to X11, and
> doing nothing much remarkable on the just brought up X desktop.
>
> Hardware: x86-64, E3-1245 v3 (Haswell),
> mainboard Supermicro X10SAE,
> using integrated Intel graphics (HD P4600, i915 driver),
> C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
> Intel LAN (i217, igb driver),
> several IEEE 1394 controllers, some of them behind
> PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI, Tundra)
> and one PCI-to-CardBus bridge (Ricoh)
>
> kernel.org kernel, Gentoo Linux userland
>
> 1. known good: v4.5-rc5 (gcc 4.9.3)
> known bad: v4.6-rc2 (gcc 4.9.3), only tried one time
>
> 2. known good: v4.5.2 (gcc 5.2.0)
> known bad: v4.6-rc5 (gcc 5.2.0), only tried one time
>
> I will send my linux-4.6-rc5/.config in a follow-up message.
>
> In theory I could collect more info (simplify the hardware, run
> netconsole, bisect). In practice I cannot do so for the the time being
> due to lack of spare time. That's also the reason why I did not already
> send a report when I tested v4.6-rc2, and why I did not boot v4.6-rc[25]
> more than once yet.

Attached: lspci -vvnn, obtained while on the good v4.5.2
--
Stefan Richter
-======----- -=-- ==-=-
http://arcgraph.de/sr/


Attachments:
(No filename) (1.34 kB)
lspci-vvnn (95.37 kB)
Download all attachments

2016-04-27 18:52:01

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression of v4.6-rc vs. v4.5: hangs after a few minutes after boot

On Apr 26 Stefan Richter wrote:
> v4.6-rc solidly hangs after a short while after boot, login to X11, and
> doing nothing much remarkable on the just brought up X desktop.
>
> Hardware: x86-64, E3-1245 v3 (Haswell),
> mainboard Supermicro X10SAE,
> using integrated Intel graphics (HD P4600, i915 driver),
> C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
> Intel LAN (i217, igb driver),
> several IEEE 1394 controllers, some of them behind
> PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI, Tundra)
> and one PCI-to-CardBus bridge (Ricoh)
>
> kernel.org kernel, Gentoo Linux userland
>
> 1. known good: v4.5-rc5 (gcc 4.9.3)
> known bad: v4.6-rc2 (gcc 4.9.3), only tried one time
>
> 2. known good: v4.5.2 (gcc 5.2.0)
> known bad: v4.6-rc5 (gcc 5.2.0), only tried one time
>
> I will send my linux-4.6-rc5/.config in a follow-up message.
>
> In theory I could collect more info (simplify the hardware, run
> netconsole, bisect). In practice I cannot do so for the the time being
> due to lack of spare time. That's also the reason why I did not already
> send a report when I tested v4.6-rc2, and why I did not boot v4.6-rc[25]
> more than once yet.

Today I booted a 2nd time into v4.6-rc5, and loaded netconsole shortly
after boot and xdm login to try capturing an oops. But throughout 5 hours
uptime now, the hang was not reproduced.
--
Stefan Richter
-======----- -=-- ==-==
http://arcgraph.de/sr/

2016-04-27 19:22:37

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression of v4.6-rc vs. v4.5: hangs after a few minutes after boot

On Apr 27 Stefan Richter wrote:
> Today I booted a 2nd time into v4.6-rc5, and loaded netconsole shortly
> after boot and xdm login to try capturing an oops. But throughout 5 hours
> uptime now, the hang was not reproduced.

...and 20 minutes after this post went out, the PC hang.
There was nothing logged over netconsole, alas.
--
Stefan Richter
-======----- -=-- ==-==
http://arcgraph.de/sr/

2016-04-27 19:37:39

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression of v4.6-rc vs. v4.5: hangs after a few minutes after boot

On Apr 27 Stefan Richter wrote:
> On Apr 27 Stefan Richter wrote:
> > Today I booted a 2nd time into v4.6-rc5, and loaded netconsole shortly
> > after boot and xdm login to try capturing an oops. But throughout 5 hours
> > uptime now, the hang was not reproduced.
>
> ...and 20 minutes after this post went out, the PC hang.
> There was nothing logged over netconsole, alas.

One more hang, now after 12 minutes uptime.
Again no netconsole output.

For the time being I can't investigate further.
--
Stefan Richter
-======----- -=-- ==-==
http://arcgraph.de/sr/

2016-04-29 08:07:49

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression of v4.6-rc vs. v4.5: hangs after a few minutes after boot

On Apr 26 Stefan Richter wrote:
> v4.6-rc solidly hangs after a short while after boot, login to X11, and
> doing nothing much remarkable on the just brought up X desktop.
>
> Hardware: x86-64, E3-1245 v3 (Haswell),
> mainboard Supermicro X10SAE,
> using integrated Intel graphics (HD P4600, i915 driver),
> C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
> Intel LAN (i217, igb driver),
> several IEEE 1394 controllers, some of them behind
> PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI, Tundra)
> and one PCI-to-CardBus bridge (Ricoh)
>
> kernel.org kernel, Gentoo Linux userland
>
> 1. known good: v4.5-rc5 (gcc 4.9.3)
> known bad: v4.6-rc2 (gcc 4.9.3), only tried one time
>
> 2. known good: v4.5.2 (gcc 5.2.0)
> known bad: v4.6-rc5 (gcc 5.2.0), only tried one time
>
> I will send my linux-4.6-rc5/.config in a follow-up message.

After it proved impossible to capture an oops through netconsole, I
started git bisect. This will apparently take almost a week, as git
estimated 13 bisection steps and I will be allowing about 12 hours of
uptime as a sign for a good kernel. (In my four or five tests of bad
kernels before I started bisection, they hung after 3 minutes...5.5 hours
uptime, with no discernible difference in workload. Maybe 12 h cutoff is
even too short...)
--
Stefan Richter
-======----- -=-- ===-=
http://arcgraph.de/sr/

2016-04-30 13:52:07

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression of v4.6-rc vs. v4.5: Merge tag 'drm-intel-next-2016-02-29'

On Apr 29 Stefan Richter wrote:
> On Apr 26 Stefan Richter wrote:
> > v4.6-rc solidly hangs after a short while after boot, login to X11, and
> > doing nothing much remarkable on the just brought up X desktop.
> >
> > Hardware: x86-64, E3-1245 v3 (Haswell),
> > mainboard Supermicro X10SAE,
> > using integrated Intel graphics (HD P4600, i915 driver),
> > C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
> > Intel LAN (i217, igb driver),
> > several IEEE 1394 controllers, some of them behind
> > PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI, Tundra)
> > and one PCI-to-CardBus bridge (Ricoh)
> >
> > kernel.org kernel, Gentoo Linux userland
> >
> > 1. known good: v4.5-rc5 (gcc 4.9.3)
> > known bad: v4.6-rc2 (gcc 4.9.3), only tried one time
> >
> > 2. known good: v4.5.2 (gcc 5.2.0)
> > known bad: v4.6-rc5 (gcc 5.2.0), only tried one time
> >
> > I will send my linux-4.6-rc5/.config in a follow-up message.

.config: http://www.spinics.net/lists/kernel/msg2243444.html
lspci: http://www.spinics.net/lists/kernel/msg2243447.html

Some userland package versions, in case these have any bearing:
x11-base/xorg-drivers-1.17
x11-base/xorg-server-1.17.4
x11-bas/xorg-x11-7.4-r2

> After it proved impossible to capture an oops through netconsole, I
> started git bisect. This will apparently take almost a week, as git
> estimated 13 bisection steps and I will be allowing about 12 hours of
> uptime as a sign for a good kernel. (In my four or five tests of bad
> kernels before I started bisection, they hung after 3 minutes...5.5 hours
> uptime, with no discernible difference in workload. Maybe 12 h cutoff is
> even too short...)

There are about 9 more bisection steps left to go.
The first few steps sent me straight into DRM land.
My current "git bisect log" with own annotations:

git bisect start

# bad: [9735a22799b9214d17d3c231fe377fc852f042e9] Linux 4.6-rc2
git bisect bad 9735a22799b9214d17d3c231fe377fc852f042e9

# good: [b562e44f507e863c6792946e4e1b1449fbbac85d] Linux 4.5
git bisect good b562e44f507e863c6792946e4e1b1449fbbac85d

# good: [6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f] Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
# ++ still good after 18 h uptime
git bisect good 6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f

# good: [2c856e14dad8cb1b085ae1f30c5e125c6d46019b] Merge tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
# ++ still good after 24 h uptime
git bisect good 2c856e14dad8cb1b085ae1f30c5e125c6d46019b

# bad: [8bb7e27bbb9d0db7ca0e83d40810fb752381cdd5] staging: delete STE RMI4 hackish driver
# -- hung after 3 h uptime
git bisect bad 8bb7e27bbb9d0db7ca0e83d40810fb752381cdd5

# bad: [507d44a9e1bb01661c75b88fd866d2461ab41c9c] Merge tag 'drm-intel-next-2016-02-29' of git://anongit.freedesktop.org/drm-intel into drm-next
# -- hung after 2 h uptime
git bisect bad 507d44a9e1bb01661c75b88fd866d2461ab41c9c
--
Stefan Richter
-======----- -=-- ====-
http://arcgraph.de/sr/