2020-08-17 22:40:32

by Pavel Machek

[permalink] [raw]
Subject: 5.9-rc1: graphics regression moved from -next to mainline

Hi!

After about half an hour of uptime, screen starts blinking on thinkpad
x60 and machine becomes unusable.

I already reported this in -next, and now it is in mainline. It is
32-bit x86 system.


Pavel


Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link local (bound):
[undef]
Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link remote:
[AF_INET]87.138.219.28:1194
Aug 17 17:36:23 amd kernel: BUG: unable to handle page fault for
address: f8601000
Aug 17 17:36:23 amd kernel: #PF: supervisor write access in kernel
mode
Aug 17 17:36:23 amd kernel: #PF: error_code(0x0002) - not-present page
Aug 17 17:36:23 amd kernel: *pdpt = 00000000318f2001 *pde =
0000000000000000
Aug 17 17:36:23 amd kernel: Oops: 0002 [#1] PREEMPT SMP PTI
Aug 17 17:36:23 amd kernel: CPU: 1 PID: 3004 Comm: Xorg Not tainted
5.9.0-rc1+ #86
Aug 17 17:36:23 amd kernel: Hardware name: LENOVO 17097HU/17097HU,
BIOS 7BETD8WW (2.19 ) 03/31
/2011
Aug 17 17:36:23 amd kernel: EIP: eb_relocate_vma+0xcf6/0xf20
Aug 17 17:36:23 amd kernel: Code: e9 ff f7 ff ff c7 85 c0 fd ff ff ed
ff ff ff c7 85 c4 fd ff
ff ff ff ff ff 8b 85 c0 fd ff ff e9 a5 f8 ff ff 8b 85 d0 fd ff ff <c7>
03 01 00 40 10 89 43 04
8b 85 b4 fd ff ff 89 43 08 e9 9f f7 ff
Aug 17 17:36:23 amd kernel: EAX: 003c306c EBX: f8601000 ECX: 00847000
EDX: 00000000
Aug 17 17:36:23 amd kernel: ESI: 00847000 EDI: 00000000 EBP: f1947c68
ESP: f19479fc
Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS:
0068 EFLAGS: 00210246
Aug 17 17:36:23 amd kernel: CR0: 80050033 CR2: f8601000 CR3: 31a1e000
CR4: 000006b0
Aug 17 17:36:23 amd kernel: Call Trace:
Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0
Aug 17 17:36:23 amd kernel: ? __mutex_unlock_slowpath+0x2b/0x280
Aug 17 17:36:23 amd kernel: ? __active_retire+0x7e/0xd0
Aug 17 17:36:23 amd kernel: ? mutex_unlock+0xb/0x10
Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0
Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
Aug 17 17:36:23 amd kernel: ? eb_lookup_vmas+0x1f5/0x9e0
Aug 17 17:36:23 amd kernel: i915_gem_do_execbuffer+0xaab/0x2780
Aug 17 17:36:23 amd kernel: ? _raw_spin_unlock_irqrestore+0x27/0x40
Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
Aug 17 17:36:23 amd kernel: ? kvmalloc_node+0x69/0x70
Aug 17 17:36:23 amd kernel: i915_gem_execbuffer2_ioctl+0xdd/0x360
Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0
Aug 17 17:36:23 amd kernel: drm_ioctl_kernel+0x87/0xd0
Aug 17 17:36:23 amd kernel: drm_ioctl+0x1f4/0x38b
Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0
Aug 17 17:36:23 amd kernel: ? posix_get_monotonic_timespec+0x1c/0x90
Aug 17 17:36:23 amd kernel: ? ktime_get_ts64+0x7a/0x1e0
Aug 17 17:36:23 amd kernel: ? drm_ioctl_kernel+0xd0/0xd0
Aug 17 17:36:23 amd kernel: __ia32_sys_ioctl+0x1ad/0x799
Aug 17 17:36:23 amd kernel: ? debug_smp_processor_id+0x12/0x20
Aug 17 17:36:23 amd kernel: ? exit_to_user_mode_prepare+0x4f/0x100
Aug 17 17:36:23 amd kernel: do_int80_syscall_32+0x2c/0x40
Aug 17 17:36:23 amd kernel: entry_INT80_32+0x111/0x111
Aug 17 17:36:23 amd kernel: EIP: 0xb7fbc092
Aug 17 17:36:23 amd kernel: Code: 00 00 00 e9 90 ff ff ff ff a3 24 00
00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00
00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b
1c 24 c3 8d b4 26 00
Aug 17 17:36:23 amd kernel: EAX: ffffffda EBX: 0000000a ECX: c0406469
EDX: bff0ae3c
Aug 17 17:36:23 amd kernel: ESI: b73aa000 EDI: c0406469 EBP: 0000000a
ESP: bff0adb4
Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS:
007b EFLAGS: 00200296
Aug 17 17:36:23 amd kernel: ? asm_exc_nmi+0xcc/0x2bc
Aug 17 17:36:23 amd kernel: Modules linked in:
Aug 17 17:36:23 amd kernel: CR2: 00000000f8601000
Aug 17 17:36:23 amd kernel: ---[ end trace 2ca9775068bbac06 ]---

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (4.02 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2020-08-19 00:41:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: 5.9-rc1: graphics regression moved from -next to mainline

Ping on this?

The code disassembles to

24: 8b 85 d0 fd ff ff mov -0x230(%ebp),%eax
2a:* c7 03 01 00 40 10 movl $0x10400001,(%ebx) <-- trapping instruction
30: 89 43 04 mov %eax,0x4(%ebx)
33: 8b 85 b4 fd ff ff mov -0x24c(%ebp),%eax
39: 89 43 08 mov %eax,0x8(%ebx)
3c: e9 jmp ...

which looks like is one of the cases in __reloc_entry_gpu(). I *think*
it's this one:

} else if (gen >= 3 &&
!(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
*batch++ = addr;
*batch++ = target_addr;

where that "batch" pointer is 0xf8601000, so it looks like it just
overflowed into the next page that isn't there.

The cleaned-up call trace is

drm_ioctl+0x1f4/0x38b ->
drm_ioctl_kernel+0x87/0xd0 ->
i915_gem_execbuffer2_ioctl+0xdd/0x360 ->
i915_gem_do_execbuffer+0xaab/0x2780 ->
eb_relocate_vma

but there's a lot of inling going on, so..

The obvious suspect is commit 9e0f9464e2ab ("drm/i915/gem: Async GPU
relocations only") but that's going purely by "that seems to be the
main relocation change this mmrge window".

Linus

On Mon, Aug 17, 2020 at 9:11 AM Pavel Machek <[email protected]> wrote:
>
> Hi!
>
> After about half an hour of uptime, screen starts blinking on thinkpad
> x60 and machine becomes unusable.
>
> I already reported this in -next, and now it is in mainline. It is
> 32-bit x86 system.
>
>
> Pavel
>
>
> Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link local (bound):
> [undef]
> Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link remote:
> [AF_INET]87.138.219.28:1194
> Aug 17 17:36:23 amd kernel: BUG: unable to handle page fault for
> address: f8601000
> Aug 17 17:36:23 amd kernel: #PF: supervisor write access in kernel
> mode
> Aug 17 17:36:23 amd kernel: #PF: error_code(0x0002) - not-present page
> Aug 17 17:36:23 amd kernel: *pdpt = 00000000318f2001 *pde =
> 0000000000000000
> Aug 17 17:36:23 amd kernel: Oops: 0002 [#1] PREEMPT SMP PTI
> Aug 17 17:36:23 amd kernel: CPU: 1 PID: 3004 Comm: Xorg Not tainted
> 5.9.0-rc1+ #86
> Aug 17 17:36:23 amd kernel: Hardware name: LENOVO 17097HU/17097HU,
> BIOS 7BETD8WW (2.19 ) 03/31
> /2011
> Aug 17 17:36:23 amd kernel: EIP: eb_relocate_vma+0xcf6/0xf20
> Aug 17 17:36:23 amd kernel: Code: e9 ff f7 ff ff c7 85 c0 fd ff ff ed
> ff ff ff c7 85 c4 fd ff
> ff ff ff ff ff 8b 85 c0 fd ff ff e9 a5 f8 ff ff 8b 85 d0 fd ff ff <c7>
> 03 01 00 40 10 89 43 04
> 8b 85 b4 fd ff ff 89 43 08 e9 9f f7 ff
> Aug 17 17:36:23 amd kernel: EAX: 003c306c EBX: f8601000 ECX: 00847000
> EDX: 00000000
> Aug 17 17:36:23 amd kernel: ESI: 00847000 EDI: 00000000 EBP: f1947c68
> ESP: f19479fc
> Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS:
> 0068 EFLAGS: 00210246
> Aug 17 17:36:23 amd kernel: CR0: 80050033 CR2: f8601000 CR3: 31a1e000
> CR4: 000006b0
> Aug 17 17:36:23 amd kernel: Call Trace:
> Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0
> Aug 17 17:36:23 amd kernel: ? __mutex_unlock_slowpath+0x2b/0x280
> Aug 17 17:36:23 amd kernel: ? __active_retire+0x7e/0xd0
> Aug 17 17:36:23 amd kernel: ? mutex_unlock+0xb/0x10
> Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0
> Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
> Aug 17 17:36:23 amd kernel: ? eb_lookup_vmas+0x1f5/0x9e0
> Aug 17 17:36:23 amd kernel: i915_gem_do_execbuffer+0xaab/0x2780
> Aug 17 17:36:23 amd kernel: ? _raw_spin_unlock_irqrestore+0x27/0x40
> Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
> Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
> Aug 17 17:36:23 amd kernel: ? kvmalloc_node+0x69/0x70
> Aug 17 17:36:23 amd kernel: i915_gem_execbuffer2_ioctl+0xdd/0x360
> Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0
> Aug 17 17:36:23 amd kernel: drm_ioctl_kernel+0x87/0xd0
> Aug 17 17:36:23 amd kernel: drm_ioctl+0x1f4/0x38b
> Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0
> Aug 17 17:36:23 amd kernel: ? posix_get_monotonic_timespec+0x1c/0x90
> Aug 17 17:36:23 amd kernel: ? ktime_get_ts64+0x7a/0x1e0
> Aug 17 17:36:23 amd kernel: ? drm_ioctl_kernel+0xd0/0xd0
> Aug 17 17:36:23 amd kernel: __ia32_sys_ioctl+0x1ad/0x799
> Aug 17 17:36:23 amd kernel: ? debug_smp_processor_id+0x12/0x20
> Aug 17 17:36:23 amd kernel: ? exit_to_user_mode_prepare+0x4f/0x100
> Aug 17 17:36:23 amd kernel: do_int80_syscall_32+0x2c/0x40
> Aug 17 17:36:23 amd kernel: entry_INT80_32+0x111/0x111
> Aug 17 17:36:23 amd kernel: EIP: 0xb7fbc092
> Aug 17 17:36:23 amd kernel: Code: 00 00 00 e9 90 ff ff ff ff a3 24 00
> 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00
> 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b
> 1c 24 c3 8d b4 26 00
> Aug 17 17:36:23 amd kernel: EAX: ffffffda EBX: 0000000a ECX: c0406469
> EDX: bff0ae3c
> Aug 17 17:36:23 amd kernel: ESI: b73aa000 EDI: c0406469 EBP: 0000000a
> ESP: bff0adb4
> Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS:
> 007b EFLAGS: 00200296
> Aug 17 17:36:23 amd kernel: ? asm_exc_nmi+0xcc/0x2bc
> Aug 17 17:36:23 amd kernel: Modules linked in:
> Aug 17 17:36:23 amd kernel: CR2: 00000000f8601000
> Aug 17 17:36:23 amd kernel: ---[ end trace 2ca9775068bbac06 ]---
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2020-08-19 02:01:29

by Dave Airlie

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

On Wed, 19 Aug 2020 at 10:38, Linus Torvalds
<[email protected]> wrote:
>
> Ping on this?
>
> The code disassembles to
>
> 24: 8b 85 d0 fd ff ff mov -0x230(%ebp),%eax
> 2a:* c7 03 01 00 40 10 movl $0x10400001,(%ebx) <-- trapping instruction
> 30: 89 43 04 mov %eax,0x4(%ebx)
> 33: 8b 85 b4 fd ff ff mov -0x24c(%ebp),%eax
> 39: 89 43 08 mov %eax,0x8(%ebx)
> 3c: e9 jmp ...
>
> which looks like is one of the cases in __reloc_entry_gpu(). I *think*
> it's this one:
>
> } else if (gen >= 3 &&
> !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
> *batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
> *batch++ = addr;
> *batch++ = target_addr;
>
> where that "batch" pointer is 0xf8601000, so it looks like it just
> overflowed into the next page that isn't there.
>
> The cleaned-up call trace is
>
> drm_ioctl+0x1f4/0x38b ->
> drm_ioctl_kernel+0x87/0xd0 ->
> i915_gem_execbuffer2_ioctl+0xdd/0x360 ->
> i915_gem_do_execbuffer+0xaab/0x2780 ->
> eb_relocate_vma
>
> but there's a lot of inling going on, so..
>
> The obvious suspect is commit 9e0f9464e2ab ("drm/i915/gem: Async GPU
> relocations only") but that's going purely by "that seems to be the
> main relocation change this mmrge window".

I think there's been some discussion about reverting that change for
other reasons, but it's quite likely the culprit.

Maybe we can push for a revert sooner, (cc'ing more of i915 team).

Dave.

2020-08-19 02:09:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

On Tue, Aug 18, 2020 at 6:13 PM Dave Airlie <[email protected]> wrote:
>
> I think there's been some discussion about reverting that change for
> other reasons, but it's quite likely the culprit.

Hmm. It reverts cleanly, but the end result doesn't work, because of
other changes.

Reverting all of

763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()")
7ac2d2536dfa ("drm/i915/gem: Delete unused code")
9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only")

seems to at least build.

Pavel, does doing those three reverts make things work for you?

Linus

2020-08-19 17:01:43

by Pavel Machek

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

Hi!

> > I think there's been some discussion about reverting that change for
> > other reasons, but it's quite likely the culprit.
>
> Hmm. It reverts cleanly, but the end result doesn't work, because of
> other changes.
>
> Reverting all of
>
> 763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()")
> 7ac2d2536dfa ("drm/i915/gem: Delete unused code")
> 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only")
>
> seems to at least build.
>
> Pavel, does doing those three reverts make things work for you?

Thanks.

I got "[PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator
on..." in my inbox; I believe that's related. Let me try those, first.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (880.00 B)
signature.asc (188.00 B)
Digital signature
Download all attachments

2020-08-19 20:20:46

by Pavel Machek

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

On Tue 2020-08-18 18:59:27, Linus Torvalds wrote:
> On Tue, Aug 18, 2020 at 6:13 PM Dave Airlie <[email protected]> wrote:
> >
> > I think there's been some discussion about reverting that change for
> > other reasons, but it's quite likely the culprit.
>
> Hmm. It reverts cleanly, but the end result doesn't work, because of
> other changes.
>
> Reverting all of
>
> 763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()")
> 7ac2d2536dfa ("drm/i915/gem: Delete unused code")
> 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only")
>
> seems to at least build.
>
> Pavel, does doing those three reverts make things work for you?

Ok, so Chris' patches resulted in (less severe?) crash, let me try this.

pavel@amd:/data/l/linux-next-32$ git reset --hard 8eb858df0a5f6bcd371b5d5637255c987278b8c9
HEAD is now at 8eb858df0a5f Add linux-next specific files for 20200819
pavel@amd:/data/l/linux-next-32$ git revert 763fedd6a216
Performing inexact rename detection: 100% (1212316/1212316), done.
hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG
/home/pavel/bin/emacsf: line 3: ed: command not found
[detached HEAD 261cbba627b7] Revert "drm/i915: Remove i915_gem_object_get_dirty_page()"
2 files changed, 18 insertions(+)
pavel@amd:/data/l/linux-next-32$ git revert 7ac2d2536dfa
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at least 3877 and retry the command.
hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG
/home/pavel/bin/emacsf: line 3: ed: command not found
[detached HEAD 526af90ea811] Revert "drm/i915/gem: Delete unused code"
1 file changed, 19 insertions(+)
pavel@amd:/data/l/linux-next-32$ git revert 9e0f9464e2ab
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at least 3877 and retry the command.
hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG
/home/pavel/bin/emacsf: line 3: ed: command not found
[detached HEAD 173e46213949] Revert "drm/i915/gem: Async GPU relocations only"
2 files changed, 289 insertions(+), 27 deletions(-)
pavel@amd:/data/l/linux-next-32$

It is now running, it seems unison is the thing that usually triggers
this (due to memory pressure?). This time it survived unison (but
without chromium). I'll really know if it works in day or two.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.72 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2020-08-20 13:56:11

by Pavel Machek

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

Hi!

> > I think there's been some discussion about reverting that change for
> > other reasons, but it's quite likely the culprit.
>
> Hmm. It reverts cleanly, but the end result doesn't work, because of
> other changes.
>
> Reverting all of
>
> 763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()")
> 7ac2d2536dfa ("drm/i915/gem: Delete unused code")
> 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only")
>
> seems to at least build.
>
> Pavel, does doing those three reverts make things work for you?

Yes, it seems they make things work. (Chris asked for new patch to be
tested, so I am switching to his kernel, but it survived longer than
it usually does.)

Thanks and best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (898.00 B)
signature.asc (188.00 B)
Digital signature
Download all attachments

2020-08-20 16:46:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek <[email protected]> wrote:
>
> Yes, it seems they make things work. (Chris asked for new patch to be
> tested, so I am switching to his kernel, but it survived longer than
> it usually does.)

Ok, so at worst we know how to solve it, at best the reverts won't be
needed because Chris' patch will fix the issue properly.

So I'll archive this thread, but remind me if this hasn't gotten
sorted out in the later rc's.

Linus

2020-08-21 09:20:12

by Pavel Machek

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

On Thu 2020-08-20 09:16:18, Linus Torvalds wrote:
> On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek <[email protected]> wrote:
> >
> > Yes, it seems they make things work. (Chris asked for new patch to be
> > tested, so I am switching to his kernel, but it survived longer than
> > it usually does.)
>
> Ok, so at worst we know how to solve it, at best the reverts won't be
> needed because Chris' patch will fix the issue properly.
>
> So I'll archive this thread, but remind me if this hasn't gotten
> sorted out in the later rc's.

Yes, thank you, it seems we have a solution w/o the revert.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (775.00 B)
signature.asc (201.00 B)
Download all attachments

2020-08-25 09:57:28

by Jani Nikula

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

On Fri, 21 Aug 2020, Pavel Machek <[email protected]> wrote:
> On Thu 2020-08-20 09:16:18, Linus Torvalds wrote:
>> On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek <[email protected]> wrote:
>> >
>> > Yes, it seems they make things work. (Chris asked for new patch to be
>> > tested, so I am switching to his kernel, but it survived longer than
>> > it usually does.)
>>
>> Ok, so at worst we know how to solve it, at best the reverts won't be
>> needed because Chris' patch will fix the issue properly.
>>
>> So I'll archive this thread, but remind me if this hasn't gotten
>> sorted out in the later rc's.
>
> Yes, thank you, it seems we have a solution w/o the revert.

For posterity, I'm told the fix is [1].

BR,
Jani.


[1] https://lore.kernel.org/intel-gfx/[email protected]/


--
Jani Nikula, Intel Open Source Graphics Center

2020-08-25 16:33:17

by Harald Arnesen

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

Jani Nikula [25.08.2020 11:55]:

> On Fri, 21 Aug 2020, Pavel Machek <[email protected]> wrote:
>> On Thu 2020-08-20 09:16:18, Linus Torvalds wrote:
>>> On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek <[email protected]> wrote:
>>> >
>>> > Yes, it seems they make things work. (Chris asked for new patch to be
>>> > tested, so I am switching to his kernel, but it survived longer than
>>> > it usually does.)
>>>
>>> Ok, so at worst we know how to solve it, at best the reverts won't be
>>> needed because Chris' patch will fix the issue properly.
>>>
>>> So I'll archive this thread, but remind me if this hasn't gotten
>>> sorted out in the later rc's.
>>
>> Yes, thank you, it seems we have a solution w/o the revert.
>
> For posterity, I'm told the fix is [1].
>
> BR,
> Jani.
>
>
> [1] https://lore.kernel.org/intel-gfx/[email protected]/

Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard
freeezes. I can still ssh into the machine

The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes
the bug for me.
--
Hilsen Harald

2020-08-25 18:22:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

On Tue, Aug 25, 2020 at 9:32 AM Harald Arnesen <[email protected]> wrote:
>
> > For posterity, I'm told the fix is [1].
> >
> > [1] https://lore.kernel.org/intel-gfx/[email protected]/
>
> Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard
> freeezes. I can still ssh into the machine
>
> The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes
> the bug for me.

Do you get any oops or other indication of what ends up going wrong?
Since ssh works that should be fairly easy to see.

Linus

2020-08-25 21:42:33

by Harald Arnesen

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

Linus Torvalds [25.08.2020 20:19]:

>> Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard
>> freeezes. I can still ssh into the machine
>>
>> The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes
>> the bug for me.
> Do you get any oops or other indication of what ends up going wrong?
> Since ssh works that should be fairly easy to see.

Away from the machine now, will check tomorrow morning (CET).
--
Hilsen Harald

2020-08-26 14:07:33

by Harald Arnesen

[permalink] [raw]
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

Linus Torvalds [25.08.2020 20:19]:

> On Tue, Aug 25, 2020 at 9:32 AM Harald Arnesen <[email protected]> wrote:
>>
>> > For posterity, I'm told the fix is [1].
>> >
>> > [1] https://lore.kernel.org/intel-gfx/[email protected]/
>>
>> Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard
>> freeezes. I can still ssh into the machine
>>
>> The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes
>> the bug for me.
>
> Do you get any oops or other indication of what ends up going wrong?
> Since ssh works that should be fairly easy to see.
I was wrong about ssh working. The whole machine locks up when X starts.

A strange thing, sometimes I can log in from lightdm before it locks up,
sometimes I cannot even use the login screen. Timing related?

If I don't start X, console login seems to work fine, and I see nothing
obvious in the logs or kernel messages.

I will try to start just a window manager with startx instead of going
through lightdm.
--
Hilsen Harald