2022-02-09 07:13:31

by Steven J Abner

[permalink] [raw]
Subject: amd apu crashes

Hi
I've been trying out kernel 5.16. Lots of amdgpu upgrades? However it
seems to be
getting worse :(
On AMD Ryzen 5 2400G, elementary OS 5.1.7, Ubuntu 18.04.6 LTS, Linux
5.15.5-051505-generic, GTK 3.22.30. Background: Was using 5.16.6 when
it started it's triple threat, so went back to 5.15 in panic.
Previously, back in November, my first triple threat, I was on system
with btrfs which destroyed my hard drive.
Rebuilt with ext4 and still trying to recreate the losses. Cant use
higher Ubuntu due to still need afp to connect with mac for transfer,
and elementary went even heavier with gtk, so crawls. I did find better
workaround to afp, but not happy with Ubuntu's treatment of bug.
The triple threat is when monitor flashes 3 times before total
lockup. The last may have been but I was ready, hit reboot before third
flash, so no test on it killing my hard drive.
Guessing, it's not a true kernel problem, but gtk exploiting a
weakness. Probably uninitialized pointer. But with new kernels, the
crashes seem to be more frequent.
Here are the last few:
$ journalctl -o short-precise -f -k -b -3
-- Logs begin at Mon 2022-01-03 17:21:50 EST. --
Feb 05 08:37:32.229754 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:32.230639 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:32.273370 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:32.668947 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:32.794231 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:32.919503 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:33.044753 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:33.169986 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:33.295263 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out
Feb 05 08:37:33.420514 steven-ryzen kernel: AMD-Vi: Completion-Wait
loop timed out

$ journalctl -o short-precise -f -k -b -2
-- Logs begin at Mon 2022-01-03 17:21:50 EST. --
Feb 07 06:11:47.495092 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: RW: 0x0
Feb 07 06:11:47.495199 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32782,
for process WebKitWebProces pid 5037 thread WebKitWebP:cs0 pid 5101)
Feb 07 06:11:47.495304 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: in page starting at address 0x000080010e24d000 from IH client
0x12 (VMC)
Feb 07 06:11:47.495413 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 07 06:11:47.495520 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: Faulty UTCL2 client ID: MP1 (0x0)
Feb 07 06:11:47.495631 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: MORE_FAULTS: 0x0
Feb 07 06:11:47.495766 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: WALKER_ERROR: 0x0
Feb 07 06:11:47.495875 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: PERMISSION_FAULTS: 0x0
Feb 07 06:11:47.495987 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: MAPPING_ERROR: 0x0
Feb 07 06:11:47.496108 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: RW: 0x0

$ journalctl -o short-precise -f -k -b -1
-- Logs begin at Mon 2022-01-03 17:21:50 EST. --
Feb 07 16:49:00.229782 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: RW: 0x0
Feb 07 16:49:00.229898 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769,
for process Xorg pid 2061 thread Xorg:cs0 pid 2062)
Feb 07 16:49:00.230010 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: in page starting at address 0x0000800101955000 from IH client
0x12 (VMC)
Feb 07 16:49:00.230114 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 07 16:49:00.230220 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: Faulty UTCL2 client ID: MP1 (0x0)
Feb 07 16:49:00.230425 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: MORE_FAULTS: 0x0
Feb 07 16:49:00.230535 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: WALKER_ERROR: 0x0
Feb 07 16:49:00.230646 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: PERMISSION_FAULTS: 0x0
Feb 07 16:49:00.230771 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: MAPPING_ERROR: 0x0
Feb 07 16:49:00.230910 steven-ryzen kernel: amdgpu 0000:38:00.0:
amdgpu: RW: 0x0

I haven't dealt with kernel debug for years, so please if more info
needed, I probably don't remember how to get it.
If this is a bother, sorry I troubled you.
Per 'Do I have to be subscribed to post to the list?':
I wish to be personally CC'ed the answers/comments posted to the list
in response to your posting, please.
Thanks Steve



2022-02-09 12:57:21

by Ilkka Prusi

[permalink] [raw]
Subject: Re: amd apu crashes

Hi,

I've seen random crashes during booting (Vega 56/64) after kernel
switches to DRM driver. The display is corrupted and after that system
crashes or freezes. This is not always reproducible for me as next boot
may succeed (or not). Kernel logs don't survive the crash.

Do you have boot log on display or a splash screen (plymouth)?

--
- Ilkka


On 8.2.2022 23.17, Steven J Abner wrote:
> Hi
>  I've been trying out kernel 5.16. Lots of amdgpu upgrades? However it
> seems to be
> getting worse :(
>  On AMD Ryzen 5 2400G, elementary OS 5.1.7, Ubuntu 18.04.6 LTS, Linux
> 5.15.5-051505-generic, GTK 3.22.30. Background: Was using 5.16.6 when it
> started it's triple threat, so went back to 5.15 in panic. Previously,
> back in November, my first triple threat, I was on system with btrfs
> which destroyed my hard drive.
> Rebuilt with ext4 and still trying to recreate the losses. Cant use
> higher Ubuntu due to still need afp to connect with mac for transfer,
> and elementary went even heavier with gtk, so crawls. I did find better
> workaround to afp, but not happy with Ubuntu's treatment of bug.
>  The triple threat is when monitor flashes 3 times before total lockup.
> The last may have been but I was ready, hit reboot before third flash,
> so no test on it killing my hard drive.
>  Guessing, it's not a true kernel problem, but gtk exploiting a
> weakness. Probably uninitialized pointer. But with new kernels, the
> crashes seem to be more frequent.
> Here are the last few:
> $ journalctl -o short-precise -f -k -b -3
> -- Logs begin at Mon 2022-01-03 17:21:50 EST. --
> Feb 05 08:37:32.229754 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:32.230639 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:32.273370 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:32.668947 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:32.794231 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:32.919503 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:33.044753 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:33.169986 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:33.295263 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
> Feb 05 08:37:33.420514 steven-ryzen kernel: AMD-Vi: Completion-Wait loop
> timed out
>
> $ journalctl -o short-precise -f -k -b -2
> -- Logs begin at Mon 2022-01-03 17:21:50 EST. --
> Feb 07 06:11:47.495092 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> RW: 0x0
> Feb 07 06:11:47.495199 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> [mmhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32782, for
> process WebKitWebProces pid 5037 thread WebKitWebP:cs0 pid 5101)
> Feb 07 06:11:47.495304 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> in page starting at address 0x000080010e24d000 from IH client 0x12 (VMC)
> Feb 07 06:11:47.495413 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> VM_L2_PROTECTION_FAULT_STATUS:0x00000000
> Feb 07 06:11:47.495520 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> Faulty UTCL2 client ID: MP1 (0x0)
> Feb 07 06:11:47.495631 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> MORE_FAULTS: 0x0
> Feb 07 06:11:47.495766 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> WALKER_ERROR: 0x0
> Feb 07 06:11:47.495875 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> PERMISSION_FAULTS: 0x0
> Feb 07 06:11:47.495987 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> MAPPING_ERROR: 0x0
> Feb 07 06:11:47.496108 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> RW: 0x0
>
> $ journalctl -o short-precise -f -k -b -1
> -- Logs begin at Mon 2022-01-03 17:21:50 EST. --
> Feb 07 16:49:00.229782 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> RW: 0x0
> Feb 07 16:49:00.229898 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> [mmhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for
> process Xorg pid 2061 thread Xorg:cs0 pid 2062)
> Feb 07 16:49:00.230010 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> in page starting at address 0x0000800101955000 from IH client 0x12 (VMC)
> Feb 07 16:49:00.230114 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> VM_L2_PROTECTION_FAULT_STATUS:0x00000000
> Feb 07 16:49:00.230220 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> Faulty UTCL2 client ID: MP1 (0x0)
> Feb 07 16:49:00.230425 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> MORE_FAULTS: 0x0
> Feb 07 16:49:00.230535 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> WALKER_ERROR: 0x0
> Feb 07 16:49:00.230646 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> PERMISSION_FAULTS: 0x0
> Feb 07 16:49:00.230771 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> MAPPING_ERROR: 0x0
> Feb 07 16:49:00.230910 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu:
> RW: 0x0
>
> I haven't dealt with kernel debug for years, so please if more info
> needed, I probably don't remember how to get it.
> If this is a bother, sorry I troubled you.
> Per 'Do I have to be subscribed to post to the list?':
> I wish to be personally CC'ed the answers/comments posted to the list in
> response to your posting, please.
> Thanks Steve
>


2022-02-09 14:26:49

by Steven J Abner

[permalink] [raw]
Subject: Re: amd apu crashes

On several crashes, it goes black, no log, than screen as was, no
computer activity(no keyboard, nor mouse, but power to keyboard
lighting), just allowed reset or power button. The triple threat was
total blackout, and on destroyed drive, I think only power button,
maybe reset didn't work??? So no kernel panic screen or efi shell if
that's the question? I do have a snapshot of a case that didn't blink
out monitor, and possibly no freeze but restarted out of what might be
next. Note: was kernel 5.15 kernel, during picture, and the destruction
of drive. Prior non-issue kernel I'm thinking was 5.10? I assumed first
triple threat was btrfs, 3 blinks for destroying drive data and its
backups, and when I took this shot was reconstruction of older projects
and data, but felt wasn't kernel but questioning gtk incorrect use of
graphics. I also can't recall ever an issue under 5.10, It just ran!
(minor annoyances with some programs). Possibly related, now that I
think about but not looked into, Kodi crashes more frequently and was
running on a couple kernel freezes.
Steve


Attachments:
Screenshot from 2021-12-04 06.41.19.png (57.02 kB)