On Sun, Sep 05, 2021 at 01:52:31AM +0200, wim wrote:
>
> Hello Greg,
>
> from kernel-4.9.270 up until now (4.9.282) I experience kernel crashes upon
> loading a GPU module.
> It happens on two out of at least six different machines.
> I can't believe that I'm the only one where that happens, but since the bug
> is still there twelve versions later, I need to report this.
>
> I run Gentoo with vanilla kernels.
> Upon loading i915.ko (automatically or manually) my laptop freezes until
> power-down. (Note that other machines using i915.ko have no problems here.)
> It's an Asus laptop with Intel chipset with a peculiarity:
>
> 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
> 01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
>
> (It uses Intel natively and nobody knows how to make use of that Nvidia chip)
>
>
> On an AMD desktop I get the same crash upon loading of nouveau.ko .
>
> Something ugly must have been introduced in kernel-4.9.270 .
> Strace modprobe .. only prints two lines on the screen.
> Strace modprobe .. 2>&1 > file produces only an empty file.
>
> Any ideas?
Regards,
Wim Osterholt.
On Sun, Sep 05, 2021 at 09:00:45PM +0200, wim wrote:
> On Sun, Sep 05, 2021 at 01:52:31AM +0200, wim wrote:
> >
> > Hello Greg,
> >
> > from kernel-4.9.270 up until now (4.9.282) I experience kernel crashes upon
> > loading a GPU module.
> > It happens on two out of at least six different machines.
> > I can't believe that I'm the only one where that happens, but since the bug
> > is still there twelve versions later, I need to report this.
> >
> > I run Gentoo with vanilla kernels.
> > Upon loading i915.ko (automatically or manually) my laptop freezes until
> > power-down. (Note that other machines using i915.ko have no problems here.)
> > It's an Asus laptop with Intel chipset with a peculiarity:
> >
> > 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
> > 01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
> >
> > (It uses Intel natively and nobody knows how to make use of that Nvidia chip)
> >
> >
> > On an AMD desktop I get the same crash upon loading of nouveau.ko .
> >
> > Something ugly must have been introduced in kernel-4.9.270 .
> > Strace modprobe .. only prints two lines on the screen.
> > Strace modprobe .. 2>&1 > file produces only an empty file.
> >
> > Any ideas?
Do you have any kernel log messages when these crashes happen?
Can you use 'git bisect' to track down the offending commit?
And why are you stuck on 4.9.y for these machines? Why not use 5.10 or
newer?
thanks,
greg k-h
On Mon, Sep 06, 2021 at 11:36:11AM +0200, wim wrote:
> On Mon, Sep 06, 2021 at 06:59:22AM +0200, Greg KH wrote:
> > On Sun, Sep 05, 2021 at 09:00:45PM +0200, wim wrote:
> > > On Sun, Sep 05, 2021 at 01:52:31AM +0200, wim wrote:
> > > >
> > > > Hello Greg,
> > > >
> > > > from kernel-4.9.270 up until now (4.9.282) I experience kernel crashes upon
> > > > loading a GPU module.
> > > > It happens on two out of at least six different machines.
> > > > I can't believe that I'm the only one where that happens, but since the bug
> > > > is still there twelve versions later, I need to report this.
> > > > ...
> >
> > Do you have any kernel log messages when these crashes happen?
>
> On the AMD machine:
>
> Aug 1 20:51:24 djo kernel: [drm] Initialized
> Aug 1 20:51:24 djo kernel: checking generic (a0000 10000) vs hw (e0000000 8000000)
> Aug 1 20:51:24 djo kernel: checking generic (a0000 10000) vs hw (ea000000 1000000)
> Aug 1 20:51:24 djo kernel: fb: switching to nouveaufb from VGA16 VGA
> Aug 1 20:51:24 djo kernel: divide error: 0000 [#1] SMP
> Aug 1 20:51:24 djo kernel: Modules linked in: nouveau(+) video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart i2c_algo_bit tun lirc_serial(C) lirc_dev arc4 binfmt_misc snd_pcm_oss snd_mixer_oss fbcon bitblit softcursor font tileblit ath9k_htc ath9k_common ath9k_hw ath mac80211 cfg80211 uvcvideo rfkill firmware_class snd_usb_audio sr9700 videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib videobuf2_v4l2 dm9601 videobuf2_core usbnet snd_rawmidi mii usb_storage snd_hda_codec_generic kvm snd_hda_intel irqbypass snd_hda_codec gpio_ich ppdev snd_hwdep pcspkr snd_hda_core snd_pcm uhci_hcd ohci_pci snd_timer ohci_hcd lpc_ich ehci_pci snd ehci_hcd wmi mfd_core usbcore soundcore parport_pc floppy usb_common parport acpi_cpufreq button processor
> Aug 1 20:51:24 djo kernel: CPU: 0 PID: 2791 Comm: modprobe Tainted: G C 4.9.277 #1
> Aug 1 20:51:24 djo kernel: Hardware name: Hewlett-Packard HP xw4300 Workstation/0A00h, BIOS 786D3 v01.08 03/10/2006
> Aug 1 20:51:24 djo kernel: task: f6317080 task.stack: f4058000
> Aug 1 20:51:24 djo kernel: EIP: 0060:[<c02f789d>] EFLAGS: 00010206 CPU: 0
> Aug 1 20:51:24 djo kernel: EAX: 00000190 EBX: ffffffea ECX: 00000019 EDX: 00000000
> Aug 1 20:51:24 djo kernel: ESI: f52db800 EDI: 00000050 EBP: c02f7838 ESP: f4059c10
> Aug 1 20:51:24 djo kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Aug 1 20:51:24 djo kernel: CR0: 80050033 CR2: 080a1a54 CR3: 35234000 CR4: 00000690
> Aug 1 20:51:24 djo kernel: Stack:
> Aug 1 20:51:24 djo kernel: 00000050 f52db800 00000019 c0340732 00000000 000000a0 000000a0 00000fa0
> Aug 1 20:51:24 djo kernel: f62f4000 0000001e 00000000 00000000 f5a63800 00000000 00000000 00000000
> Aug 1 20:51:24 djo kernel: 00000000 00000000 f6024000 00000000 f52db800 00000001 00000000 00000000
> Aug 1 20:51:24 djo kernel: Call Trace:
> Aug 1 20:51:24 djo kernel: [<c0340732>] ? 0xc0340732
> Aug 1 20:51:24 djo kernel: [<c0340988>] ? 0xc0340988
> Aug 1 20:51:24 djo kernel: [<c02f734a>] ? 0xc02f734a
> Aug 1 20:51:24 djo kernel: [<c033f780>] ? 0xc033f780
> Aug 1 20:51:24 djo kernel: [<c0340b32>] ? 0xc0340b32
> Aug 1 20:51:24 djo kernel: [<c0340d20>] ? 0xc0340d20
> Aug 1 20:51:24 djo kernel: [<f8bc4ef7>] ? 0xf8bc4ef7
> Aug 1 20:51:24 djo kernel: [<c0163715>] ? 0xc0163715
<snip>
These aren't going to help us much, can you turn on debugging symbols
for these crashes for us to see the symbol names?
<snip>
> > Can you use 'git bisect' to track down the offending commit?
>
> If I would know how to do that
'man git bisect' should provide a tutorial on how to do this.
> > And why are you stuck on 4.9.y for these machines? Why not use 5.10 or
> > newer?
>
> Because in 4.10 they dropped lirc-serial and I need that. The new ir-serial
> is no replacement. (The last working version of LIRC is 0.9.6. After that
> they destroyed transmitter support.)
>
> (I believe irda support got dropped too, which I need for my old nokia.)
If the new functionality is not working properly, please work with those
developers to fix that up. Sticking with the 4.4.x kernel isn't going
to be a good long-term solution for you.
thanks,
greg k-h
On Mon, Sep 06, 2021 at 06:59:22AM +0200, Greg KH wrote:
> On Sun, Sep 05, 2021 at 09:00:45PM +0200, wim wrote:
> > On Sun, Sep 05, 2021 at 01:52:31AM +0200, wim wrote:
> > >
> > > Hello Greg,
> > >
> > > from kernel-4.9.270 up until now (4.9.282) I experience kernel crashes upon
> > > loading a GPU module.
> > > It happens on two out of at least six different machines.
> > > I can't believe that I'm the only one where that happens, but since the bug
> > > is still there twelve versions later, I need to report this.
> > > ...
>
> Do you have any kernel log messages when these crashes happen?
On the AMD machine:
Aug 1 20:51:24 djo kernel: [drm] Initialized
Aug 1 20:51:24 djo kernel: checking generic (a0000 10000) vs hw (e0000000 8000000)
Aug 1 20:51:24 djo kernel: checking generic (a0000 10000) vs hw (ea000000 1000000)
Aug 1 20:51:24 djo kernel: fb: switching to nouveaufb from VGA16 VGA
Aug 1 20:51:24 djo kernel: divide error: 0000 [#1] SMP
Aug 1 20:51:24 djo kernel: Modules linked in: nouveau(+) video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart i2c_algo_bit tun lirc_serial(C) lirc_dev arc4 binfmt_misc snd_pcm_oss snd_mixer_oss fbcon bitblit softcursor font tileblit ath9k_htc ath9k_common ath9k_hw ath mac80211 cfg80211 uvcvideo rfkill firmware_class snd_usb_audio sr9700 videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib videobuf2_v4l2 dm9601 videobuf2_core usbnet snd_rawmidi mii usb_storage snd_hda_codec_generic kvm snd_hda_intel irqbypass snd_hda_codec gpio_ich ppdev snd_hwdep pcspkr snd_hda_core snd_pcm uhci_hcd ohci_pci snd_timer ohci_hcd lpc_ich ehci_pci snd ehci_hcd wmi mfd_core usbcore soundcore parport_pc floppy usb_common parport acpi_cpufreq button processor
Aug 1 20:51:24 djo kernel: CPU: 0 PID: 2791 Comm: modprobe Tainted: G C 4.9.277 #1
Aug 1 20:51:24 djo kernel: Hardware name: Hewlett-Packard HP xw4300 Workstation/0A00h, BIOS 786D3 v01.08 03/10/2006
Aug 1 20:51:24 djo kernel: task: f6317080 task.stack: f4058000
Aug 1 20:51:24 djo kernel: EIP: 0060:[<c02f789d>] EFLAGS: 00010206 CPU: 0
Aug 1 20:51:24 djo kernel: EAX: 00000190 EBX: ffffffea ECX: 00000019 EDX: 00000000
Aug 1 20:51:24 djo kernel: ESI: f52db800 EDI: 00000050 EBP: c02f7838 ESP: f4059c10
Aug 1 20:51:24 djo kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Aug 1 20:51:24 djo kernel: CR0: 80050033 CR2: 080a1a54 CR3: 35234000 CR4: 00000690
Aug 1 20:51:24 djo kernel: Stack:
Aug 1 20:51:24 djo kernel: 00000050 f52db800 00000019 c0340732 00000000 000000a0 000000a0 00000fa0
Aug 1 20:51:24 djo kernel: f62f4000 0000001e 00000000 00000000 f5a63800 00000000 00000000 00000000
Aug 1 20:51:24 djo kernel: 00000000 00000000 f6024000 00000000 f52db800 00000001 00000000 00000000
Aug 1 20:51:24 djo kernel: Call Trace:
Aug 1 20:51:24 djo kernel: [<c0340732>] ? 0xc0340732
Aug 1 20:51:24 djo kernel: [<c0340988>] ? 0xc0340988
Aug 1 20:51:24 djo kernel: [<c02f734a>] ? 0xc02f734a
Aug 1 20:51:24 djo kernel: [<c033f780>] ? 0xc033f780
Aug 1 20:51:24 djo kernel: [<c0340b32>] ? 0xc0340b32
Aug 1 20:51:24 djo kernel: [<c0340d20>] ? 0xc0340d20
Aug 1 20:51:24 djo kernel: [<f8bc4ef7>] ? 0xf8bc4ef7
Aug 1 20:51:24 djo kernel: [<c0163715>] ? 0xc0163715
Aug 1 20:51:24 djo kernel: [<f8bc4c82>] ? 0xf8bc4c82
Aug 1 20:51:24 djo kernel: [<c014aac4>] ? 0xc014aac4
Aug 1 20:51:24 djo kernel: [<c014ad8a>] ? 0xc014ad8a
Aug 1 20:51:24 djo kernel: [<c014ada6>] ? 0xc014ada6
Aug 1 20:51:24 djo kernel: [<c02f9aa4>] ? 0xc02f9aa4
Aug 1 20:51:24 djo kernel: [<c0168c32>] ? 0xc0168c32
Aug 1 20:51:24 djo kernel: [<c02fa294>] ? 0xc02fa294
Aug 1 20:51:24 djo kernel: [<c02fa47e>] ? 0xc02fa47e
Aug 1 20:51:24 djo kernel: [<c02fa4f5>] ? 0xc02fa4f5
Aug 1 20:51:24 djo kernel: [<f90a5c94>] ? 0xf90a5c94
Aug 1 20:51:24 djo kernel: [<f90a5b88>] ? 0xf90a5b88
Aug 1 20:51:24 djo kernel: [<c02e82de>] ? 0xc02e82de
Aug 1 20:51:24 djo kernel: [<c03545f8>] ? 0xc03545f8
Aug 1 20:51:24 djo kernel: [<c035475d>] ? 0xc035475d
Aug 1 20:51:24 djo kernel: [<c03533a9>] ? 0xc03533a9
Aug 1 20:51:24 djo kernel: [<c035424a>] ? 0xc035424a
Aug 1 20:51:24 djo kernel: [<c0354705>] ? 0xc0354705
Aug 1 20:51:24 djo kernel: [<c0353f3d>] ? 0xc0353f3d
Aug 1 20:51:24 djo kernel: [<c0354e44>] ? 0xc0354e44
Aug 1 20:51:24 djo kernel: [<f9124000>] ? 0xf9124000
Aug 1 20:51:24 djo kernel: [<c01003df>] ? 0xc01003df
Aug 1 20:51:24 djo kernel: [<c01dbb22>] ? 0xc01dbb22
Aug 1 20:51:24 djo kernel: [<c04ba42d>] ? 0xc04ba42d
Aug 1 20:51:24 djo kernel: [<c04ba45c>] ? 0xc04ba45c
Aug 1 20:51:24 djo kernel: [<c01889d5>] ? 0xc01889d5
Aug 1 20:51:24 djo kernel: [<c01e45e4>] ? 0xc01e45e4
Aug 1 20:51:24 djo kernel: [<c0188c2b>] ? 0xc0188c2b
Aug 1 20:51:24 djo kernel: [<c0101211>] ? 0xc0101211
Aug 1 20:51:24 djo kernel: [<c04c0579>] ? 0xc04c0579
Aug 1 20:51:24 djo kernel: Code: 63 c0 eb 53 f6 04 24 01 bb ea ff ff ff 75 4a 0f b6 05 07 c5 6c c0 3b 04 24 72 3e 0f b6 05 0e c5 6c c0 31 d2 0f af 05 08 cc 63 c0 <f7> b6 ec 00 00 00 39 c8 72 24 8b 86 24 02 00 00 31 db 3b 30 75
Aug 1 20:51:24 djo kernel: EIP: [<c02f789d>]
Aug 1 20:51:24 djo kernel: SS:ESP 0068:f4059c10
Aug 1 20:51:24 djo kernel: ---[ end trace 307fdb439b21cfc0 ]---
On the Intel machine:
Sep 5 00:20:26 asusUX410U kernel: Adding 2097148k swap on /dev/sda2. Priority:-1 extents:1 across:2097148k FS
Sep 5 00:20:38 asusUX410U kernel: [drm] Memory usable by graphics device = 4096M
Sep 5 00:20:38 asusUX410U kernel: fb: switching to inteldrmfb from VGA16 VGA
Sep 5 00:20:38 asusUX410U kernel: divide error: 0000 [#1] SMP
Sep 5 00:20:38 asusUX410U kernel: Modules linked in: i915(+) intel_gtt cmac uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core arc4 iwlmvm mac80211 nouveau drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart btusb btrtl btbcm btintel bluetooth hid_multitouch iwlwifi i2c_designware_platform mxm_wmi i2c_designware_core cfg80211 x86_pkg_temp_thermal intel_powerclamp pcspkr nvidiafb i2c_algo_bit fb_ddc rfkill firmware_class thermal i2c_hid xhci_pci xhci_hcd usbcore battery int3403_thermal wmi video ac int3400_thermal acpi_thermal_rel acpi_pad asus_wireless intel_lpss_pci intel_lpss button processor_thermal_device i2c_i801 intel_soc_dts_iosf i2c_smbus intel_pch_thermal usb_common mfd_core int340x_thermal_zone binfmt_misc snd_hda_codec_generic snd_pcm_oss snd_mixer_oss snd_hda_intel
Sep 5 00:20:38 asusUX410U kernel: snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore fbcon bitblit softcursor font tileblit
Sep 5 00:20:38 asusUX410U kernel: CPU: 2 PID: 2601 Comm: modprobe Not tainted 4.9.282 #1
Sep 5 00:20:38 asusUX410U kernel: Hardware name: ASUSTeK COMPUTER INC. UX410UQK/UX410UQK, BIOS UX410UQK.301 12/12/2016
Sep 5 00:20:38 asusUX410U kernel: task: ffff880264ac8000 task.stack: ffffc90003ee0000
Sep 5 00:20:38 asusUX410U kernel: RIP: 0010:[<ffffffff8044b341>] [<ffffffff8044b341>] 0xffffffff8044b341
Sep 5 00:20:38 asusUX410U kernel: RSP: 0018:ffffc90003ee38e8 EFLAGS: 00010246
Sep 5 00:20:38 asusUX410U kernel: RAX: 0000000000000190 RBX: 00000000000000a0 RCX: 0000000000000000
Sep 5 00:20:38 asusUX410U kernel: RDX: 0000000000000000 RSI: 0000000000000050 RDI: ffff880256b9b800
Sep 5 00:20:38 asusUX410U kernel: RBP: 0000000000000019 R08: 0000000000000019 R09: 00000000000000a0
Sep 5 00:20:38 asusUX410U kernel: R10: 000000000000001e R11: 0000000000000001 R12: 00000000ffffffea
Sep 5 00:20:38 asusUX410U kernel: R13: ffff880256b9b800 R14: 0000000000000fa0 R15: 0000000000000000
Sep 5 00:20:38 asusUX410U kernel: FS: 00007fb959a4cc00(0000) GS:ffff88026ed00000(0000) knlGS:0000000000000000
Sep 5 00:20:38 asusUX410U kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 5 00:20:38 asusUX410U kernel: CR2: 000056515c106000 CR3: 0000000259500000 CR4: 0000000000360670
Sep 5 00:20:38 asusUX410U kernel: Stack:
Sep 5 00:20:38 asusUX410U kernel: 0000000000000050 ffffffff804a8d05 ffff880259667000 0000000000000000
Sep 5 00:20:38 asusUX410U kernel: ffff88020000001e 000000a000000fa0 00000000000000a0 00000000000000a0
Sep 5 00:20:38 asusUX410U kernel: 0190000000500019 ffff880256b9b800 ffff880256b9b800 0000000000000000
Sep 5 00:20:38 asusUX410U kernel: Call Trace:
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804a8d05>] ? 0xffffffff804a8d05
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff8044adee>] ? 0xffffffff8044adee
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804a79dd>] ? 0xffffffff804a79dd
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804a9160>] ? 0xffffffff804a9160
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804a9395>] ? 0xffffffff804a9395
Sep 5 00:20:38 asusUX410U kernel: [<ffffffffa000c549>] ? 0xffffffffa000c549
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff80257d40>] ? 0xffffffff80257d40
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff80258077>] ? 0xffffffff80258077
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff8044d551>] ? 0xffffffff8044d551
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff8044e213>] ? 0xffffffff8044e213
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff8044e457>] ? 0xffffffff8044e457
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff8044e4de>] ? 0xffffffff8044e4de
Sep 5 00:20:38 asusUX410U kernel: [<ffffffffa05cb585>] ? 0xffffffffa05cb585
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff80439f8a>] ? 0xffffffff80439f8a
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804c0d1e>] ? 0xffffffff804c0d1e
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804c0eda>] ? 0xffffffff804c0eda
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804c0e72>] ? 0xffffffff804c0e72
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804bf59b>] ? 0xffffffff804bf59b
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804c04c9>] ? 0xffffffff804c04c9
Sep 5 00:20:38 asusUX410U kernel: [<ffffffffa06ab000>] ? 0xffffffffa06ab000
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff804c1738>] ? 0xffffffff804c1738
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff80200341>] ? 0xffffffff80200341
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff802962fc>] ? 0xffffffff802962fc
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff80297a24>] ? 0xffffffff80297a24
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff80297e0e>] ? 0xffffffff80297e0e
Sep 5 00:20:38 asusUX410U last message buffered 1 times
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff802014fd>] ? 0xffffffff802014fd
Sep 5 00:20:38 asusUX410U kernel: [<ffffffff80645a3e>] ? 0xffffffff80645a3e
Sep 5 00:20:38 asusUX410U kernel: Code: 65 00 eb 57 41 bc ea ff ff ff 40 f6 c6 01 75 4e 0f b6 05 da 22 75 00 39 f0 72 43 0f b6 05 d6 22 75 00 0f af 05 e9 6d 65 00 31 d2 <f7> b7 7c 01 00 00 44 39 c0 72 28 48 8b 87 00 03 00 00 45 31 e4
Sep 5 00:20:38 asusUX410U kernel: RSP <ffffc90003ee38e8>
Sep 5 00:20:38 asusUX410U kernel: ---[ end trace a46f8400460cdde1 ]---
> Can you use 'git bisect' to track down the offending commit?
If I would know how to do that
> And why are you stuck on 4.9.y for these machines? Why not use 5.10 or
> newer?
Because in 4.10 they dropped lirc-serial and I need that. The new ir-serial
is no replacement. (The last working version of LIRC is 0.9.6. After that
they destroyed transmitter support.)
(I believe irda support got dropped too, which I need for my old nokia.)
Wim.
On Mon, Sep 06, 2021 at 12:52:20PM +0200, Greg KH wrote:
> > > > >
> > > > > from kernel-4.9.270 up until now (4.9.282) I experience kernel crashes upon
> > > > > loading a GPU module.
> > > > > ...
> > >
> > > Do you have any kernel log messages when these crashes happen?
> > ...
> > Aug 1 20:51:24 djo kernel: [<f8bc4ef7>] ? 0xf8bc4ef7
>
> <snip>
>
> These aren't going to help us much, can you turn on debugging symbols
> for these crashes for us to see the symbol names?
ERROR: not enough memory to load nouveau.ko
i915.ko is smaller and my laptop is bigger. Identical crash, no symbols.
> > > Can you use 'git bisect' to track down the offending commit?
> >
> > If I would know how to do that
>
> 'man git bisect' should provide a tutorial on how to do this.
No, it does not.
It would have taken an enormous amount of time and GBs less if I'd found
earlier the only pointer on internet that stated:
cd linux
git remote add stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
and that brought me reasonably fast to this:
3bd3a8ca5a7b1530f463b6e1cc811c085e6ffa01 is the first bad commit
commit 3bd3a8ca5a7b1530f463b6e1cc811c085e6ffa01
Author: Maciej W. Rozycki <[email protected]>
Date: Thu May 13 11:51:50 2021 +0200
...
> > > And why are you stuck on 4.9.y for these machines? Why not use 5.10 or
> > > newer?
> >
> > Because in 4.10 they dropped lirc-serial and I need that. The new ir-serial
> > is no replacement. (The last working version of LIRC is 0.9.6. After that
> > they destroyed transmitter support.)
Correction: lirc-0.9.0-rc6 it is.
> >
> If the new functionality is not working properly, please work with those
> developers to fix that up.
I can't. I can hardly write and compile 'Hello world', let alone fix some
complex fossil and abandoned software. To make a long LIRC story short:
LIRC got orphaned long ago. A dozen patches from Gentoo kept it alive (until
kernel-3.x where f_dentry got dropped, which gentoo never fixed). I managed
to get around that problem. By then there was a new maintainer that was not
interested in bug reports and clearly stated that he was against a transmitter
(over the serial port). The new LIRC-0.10 is not popular, to say the least.
The only route for IR blasting nowadays seems to be a RaspberryPi, where
Rasbian seems to have something like 'ir-ctl' outside of LIRC.
Regards, Wim.
On Wed, Sep 08, 2021 at 03:51:39AM +0200, wim wrote:
> On Mon, Sep 06, 2021 at 12:52:20PM +0200, Greg KH wrote:
> > > > > >
> > > > > > from kernel-4.9.270 up until now (4.9.282) I experience kernel crashes upon
> > > > > > loading a GPU module.
> > > > > > ...
> > > >
> > > > Do you have any kernel log messages when these crashes happen?
> > > ...
> > > Aug 1 20:51:24 djo kernel: [<f8bc4ef7>] ? 0xf8bc4ef7
> >
> > <snip>
> >
> > These aren't going to help us much, can you turn on debugging symbols
> > for these crashes for us to see the symbol names?
>
> ERROR: not enough memory to load nouveau.ko
That's the only error? Maybe you don't have enough memory?
> i915.ko is smaller and my laptop is bigger. Identical crash, no symbols.
Odd.
> > > > Can you use 'git bisect' to track down the offending commit?
> > >
> > > If I would know how to do that
> >
> > 'man git bisect' should provide a tutorial on how to do this.
>
> No, it does not.
> It would have taken an enormous amount of time and GBs less if I'd found
> earlier the only pointer on internet that stated:
>
> cd linux
> git remote add stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>
> and that brought me reasonably fast to this:
>
> 3bd3a8ca5a7b1530f463b6e1cc811c085e6ffa01 is the first bad commit
> commit 3bd3a8ca5a7b1530f463b6e1cc811c085e6ffa01
> Author: Maciej W. Rozycki <[email protected]>
> Date: Thu May 13 11:51:50 2021 +0200
> ...
That is a vt change that handles an issue with a console driver, so this
feels like a false failure.
If you revert this change on a newer kernel release, does it work?
And what about showing us the symbols of that traceback?
thanks,
greg k-h
On Wed, Sep 08, 2021 at 07:30:49AM +0200, Greg KH wrote:
> > > > ...
> > > > Aug 1 20:51:24 djo kernel: [<f8bc4ef7>] ? 0xf8bc4ef7
> > >
> > > <snip>
> > >
> > > These aren't going to help us much, can you turn on debugging symbols
> > > for these crashes for us to see the symbol names?
> >
> > ERROR: not enough memory to load nouveau.ko
>
> That's the only error? Maybe you don't have enough memory?
Nouveau.ko with symbols is really huge. I see only 2GB RAM in that machine,
so I'm not amazed.
> > i915.ko is smaller and my laptop is bigger. Identical crash, no symbols.
>
> Odd.
I've had that before, some years ago. The devs were very reluctant to start
investigating. After a while the bug just vanished. Bugs come and go was
their remark.
This time the bug doesn't vanish spontaneously.
> > > > > Can you use 'git bisect' to track down the offending commit?
> > and that brought me reasonably fast to this:
> >
> > 3bd3a8ca5a7b1530f463b6e1cc811c085e6ffa01 is the first bad commit
> > commit 3bd3a8ca5a7b1530f463b6e1cc811c085e6ffa01
> > Author: Maciej W. Rozycki <[email protected]>
> > Date: Thu May 13 11:51:50 2021 +0200
> > ...
>
> That is a vt change that handles an issue with a console driver, so this
> feels like a false failure.
>
> If you revert this change on a newer kernel release, does it work?
No false failure.
git checkout v4.9.282
git revert <the above patch>
Lo and behold, no crash on modprobe i915 !!!
> And what about showing us the symbols of that traceback?
What symbols of what traceback? It does not crash!
And when it crashes (the previous case) there are no symbols, despite
debugging set to on. Just the same log. Apparently it ran invalid code.
What does the 'Divide Error: 0000' mean? A divide by zero error?
Regards, Wim.
On Wed, Sep 08, 2021 at 07:30:49AM +0200, Greg KH wrote:
>
> That is a vt change that handles an issue with a console driver, so this
> feels like a false failure.
>
> If you revert this change on a newer kernel release, does it work?
Oh, you mean a higher version number (which wasn't directly obvious to me).
Make oldconfig gives an awful lot of output which I'm not going to read.
Just keep pressing the return key for all the defaults.
Kernel-5.10.10 runs into a black screen, I can perform a blind login and
play an audio file. I then tried to revert the patch, but git couldn't
complete it. The closest uplevel version is 4.14.246 which I then tried.
It runs into a black screen, but I can login and play audio, but no reaction
on modprobe fbcon. Git revert ran fine, but that also gave me a black
screen. It appeared that there was no fbcon.ko, even worse, the option to
modularize it was gone! Insane.
Since that option was now invalid, make oldconfig chose for a default no,
which I didn't know. In-kernel fbcon gives no problems, I guess.
This led to the discovery that the hard crash in 4.9.270(-282) did NOT occur
when fbcon.ko was not loaded. Modprobe fbcon after i915 went fine.
So here you have another reason to not wanting to run a kernel version above 4.9.
I need fbcon.ko as a diagnostics tool. In many machines with i915 I loose
sound when i915.ko gets loaded. I need to fiddle with the rc scripts to make
sure that the snd modules got loaded first. And because the changing fonts and
layout drives me nuts while looking at the progress, I need to put fbcon/i915
the very last (in rc.local).
Regards, Wim.