2021-06-04 12:28:42

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations") breaks Hyper-V FB driver

Hi,

Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations")
seems to be breaking Hyper-V framebuffer
(drivers/video/fbdev/hyperv_fb.c) driver for me: Hyper-V guest boots
well and plymouth even works but when I try starting Gnome, virtual
screen just goes black. Reverting the above mentioned commit on top of
5.13-rc4 saves the day. The behavior is 100% reproducible. I'm using
Gen2 guest runing on Hyper-V 2019. It was also reported that Gen1 guests
are equally broken.

Is this something known?

--
Vitaly


2021-06-04 13:02:30

by Wei Liu

[permalink] [raw]
Subject: Re: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations") breaks Hyper-V FB driver

On Fri, Jun 04, 2021 at 02:25:01PM +0200, Vitaly Kuznetsov wrote:
> Hi,
>
> Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations")
> seems to be breaking Hyper-V framebuffer
> (drivers/video/fbdev/hyperv_fb.c) driver for me: Hyper-V guest boots
> well and plymouth even works but when I try starting Gnome, virtual
> screen just goes black. Reverting the above mentioned commit on top of
> 5.13-rc4 saves the day. The behavior is 100% reproducible. I'm using
> Gen2 guest runing on Hyper-V 2019. It was also reported that Gen1 guests
> are equally broken.
>
> Is this something known?
>

I've heard a similar report from Vineeth but we didn't get to the bottom
of this.

Wei.

> --
> Vitaly
>

2021-06-04 13:35:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations") breaks Hyper-V FB driver

On Fri, Jun 04, 2021 at 02:25:01PM +0200, Vitaly Kuznetsov wrote:
> Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations")
> seems to be breaking Hyper-V framebuffer

https://lore.kernel.org/linux-mm/[email protected]/

2021-06-04 15:48:30

by Vineeth Pillai

[permalink] [raw]
Subject: Re: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations") breaks Hyper-V FB driver


On 6/4/2021 9:00 AM, Wei Liu wrote:
> On Fri, Jun 04, 2021 at 02:25:01PM +0200, Vitaly Kuznetsov wrote:
>> Hi,
>>
>> Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations")
>> seems to be breaking Hyper-V framebuffer
>> (drivers/video/fbdev/hyperv_fb.c) driver for me: Hyper-V guest boots
>> well and plymouth even works but when I try starting Gnome, virtual
>> screen just goes black. Reverting the above mentioned commit on top of
>> 5.13-rc4 saves the day. The behavior is 100% reproducible. I'm using
>> Gen2 guest runing on Hyper-V 2019. It was also reported that Gen1 guests
>> are equally broken.
>>
>> Is this something known?
>>
> I've heard a similar report from Vineeth but we didn't get to the bottom
> of this.
I have just tried reverting the commit mentioned above and it solves the
GUI freeze
I was also seeing. Previously, login screen was just freezing, but VM
was accessible
through ssh. With the above commit reverted, I can login to Gnome.

Looks like I am also experiencing the same bug mentioned here.

Thanks,
Vineeth

2021-06-04 18:19:26

by Dexuan Cui

[permalink] [raw]
Subject: RE: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations") breaks Hyper-V FB driver

> From: Vineeth Pillai <[email protected]>
> Sent: Friday, June 4, 2021 8:47 AM
> To: Wei Liu <[email protected]>; vkuznets <[email protected]>
> Cc: Matthew Wilcox <[email protected]>; [email protected];
> [email protected]; [email protected];
> [email protected]; Michael Kelley <[email protected]>;
> Dexuan Cui <[email protected]>
> Subject: Re: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom
> address_space_operations") breaks Hyper-V FB driver
>
>
> On 6/4/2021 9:00 AM, Wei Liu wrote:
> > On Fri, Jun 04, 2021 at 02:25:01PM +0200, Vitaly Kuznetsov wrote:
> >> Hi,
> >>
> >> Commit ccf953d8f3d6 ("fb_defio: Remove custom
> address_space_operations")
> >> seems to be breaking Hyper-V framebuffer
> >> (drivers/video/fbdev/hyperv_fb.c) driver for me: Hyper-V guest boots
> >> well and plymouth even works but when I try starting Gnome, virtual
> >> screen just goes black. Reverting the above mentioned commit on top of
> >> 5.13-rc4 saves the day. The behavior is 100% reproducible. I'm using
> >> Gen2 guest runing on Hyper-V 2019. It was also reported that Gen1 guests
> >> are equally broken.
> >>
> >> Is this something known?
> >>
> > I've heard a similar report from Vineeth but we didn't get to the bottom
> > of this.
> I have just tried reverting the commit mentioned above and it solves the
> GUI freeze
> I was also seeing. Previously, login screen was just freezing, but VM
> was accessible
> through ssh. With the above commit reverted, I can login to Gnome.
>
> Looks like I am also experiencing the same bug mentioned here.
>
> Thanks,
> Vineeth

As Matthew mentioned, this is a known issue:
https://lwn.net/ml/linux-kernel/[email protected]/

Matthew has reverted ccf953d8f3d6:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b78f8bcf4951af30b0ae83ea4fad27d641ab617
so the latest mainline should work now.

Thanks,
Dexuan

2021-06-04 18:39:36

by Dexuan Cui

[permalink] [raw]
Subject: RE: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations") breaks Hyper-V FB driver

> From: Dexuan Cui
> Sent: Friday, June 4, 2021 11:17 AM
> > >> ...
> > > I've heard a similar report from Vineeth but we didn't get to the bottom
> > > of this.
> > I have just tried reverting the commit mentioned above and it solves the
> > GUI freeze
> > I was also seeing. Previously, login screen was just freezing, but VM
> > was accessible
> > through ssh. With the above commit reverted, I can login to Gnome.
> >
> > Looks like I am also experiencing the same bug mentioned here.
> >
> > Thanks,
> > Vineeth
>
> As Matthew mentioned, this is a known issue:
> https://lwn.net/ml/linux-kernel/[email protected]/
>
> Matthew has reverted ccf953d8f3d6:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
> 0b78f8bcf4951af30b0ae83ea4fad27d641ab617
> so the latest mainline should work now.
>
> Thanks,
> Dexuan

Hi Matthew,
With today's latest mainline 16f0596fc1d78a1f3ae4628cff962bb297dc908c,
the Xorg works again in Linux VM on Hyper-V, but when I reboot the VM, I
always see a lot of "BUG: Bad page state in process Xorg " warnings (there
are about 60 such warnings) before the VM reboots.

BTW, I happen to have an older Mar-28 mainline kernel (36a14638f7c0654),
which has the same warnings.

Any idea which change introduced the warnings?

[ 87.446129] BUG: Bad page state in process Xorg pfn:151807
[ 87.448238] page:000000004f982270 refcount:0 mapcount:0 mapping:00000000dcf68290 index:0x0 pfn:0x151807
[ 87.455506] aops:fb_deferred_io_aops ino:f9 dentry name:"fb0"
[ 87.455506] flags: 0x57fffc000000000(node=1|zone=2|lastcpupid=0x1ffff)
[ 87.455506] raw: 057fffc000000000 dead000000000100 dead000000000122 ffff9fa811c4a280
[ 87.455506] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 87.455506] page dumped because: non-NULL mapping
[ 87.487491] Modules linked in: ...
[ 87.491504] CPU: 6 PID: 1388 Comm: Xorg Tainted: G E 5.13.0-rc4+ #1
[ 87.533204] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 87.533204] Call Trace:
[ 87.533204] dump_stack+0x64/0x7c
[ 87.533204] bad_page.cold.118+0x63/0x93
[ OK 87.533204] free_pcppages_bulk+0x1ac/0x770
0m] Stopped Crea[ 87.533204] free_unref_page_list+0x101/0x180
te final runtime[ 87.533204] release_pages+0x186/0x4c0
[ 87.533204] tlb_flush_mmu+0x44/0x120
dir for shutdow[ 87.533204] tlb_finish_mmu+0x3c/0x70
n pivot root.
[ 87.533204] unmap_region+0xd1/0x110
[ 87.533204] __do_munmap+0x2a2/0x500
[ 87.533204] __vm_munmap+0x7d/0x130
[ 87.533204] __x64_sys_munmap+0x27/0x30
[ 87.533204] do_syscall_64+0x3c/0xb0
[ 87.533204] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 87.533204] RIP: 0033:0x7f22b49a7567
[ 87.533204] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 21 ...
[ 87.533204] RSP: 002b:00007ffd56c4f2b8 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
[ 87.533204] RAX: ffffffffffffffda RBX: 0000557f7299e200 RCX: 00007f22b49a7567
[ 87.533204] RDX: 0000000000000000 RSI: 0000000000400000 RDI: 00007f22b3652000
[ 87.533204] RBP: 0000557f7299edf0 R08: 0000557f729abfc0 R09: 0000000000000005
[ 87.533204] R10: 0000000000000034 R11: 0000000000000206 R12: 0000557f7299edf0
[ 87.533204] R13: 0000000000000000 R14: 0000557f71cf6968 R15: 0000557f71cdbe10
[ 87.533204] Disabling lock debugging due to kernel taint
[ 87.533204] BUG: Bad page state in process Xorg pfn:151806
[ 87.533204] page:00000000cf964098 refcount:0 mapcount:0 mapping:00000000dcf68290 index:0x0 pfn:0x151806
[ 87.533204] aops:fb_deferred_io_aops ino:f9 dentry name:"fb0"
[ 87.533204] flags: 0x57fffc000000000(node=1|zone=2|lastcpupid=0x1ffff)
[ 87.533204] raw: 057fffc000000000 dead000000000100 dead000000000122 ffff9fa811c4a280
[ 87.533204] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 87.533204] page dumped because: non-NULL mapping
[ 87.533204] Modules linked in: ...
[ 87.533204] CPU: 6 PID: 1388 Comm: Xorg Tainted: G B E 5.13.0-rc4+ #1
[ 87.533204] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 87.747523] Call Trace:
[ 87.747523] dump_stack+0x64/0x7c
[ 87.747523] bad_page.cold.118+0x63/0x93
[ 87.747523] free_pcppages_bulk+0x1ac/0x770
[ 87.747523] free_unref_page_list+0x101/0x180
[ 87.747523] release_pages+0x186/0x4c0
[ 87.747523] tlb_flush_mmu+0x44/0x120
[ 87.747523] tlb_finish_mmu+0x3c/0x70
[ 87.747523] unmap_region+0xd1/0x110
[ 87.747523] __do_munmap+0x2a2/0x500
[ 87.747523] __vm_munmap+0x7d/0x130
[ 87.747523] __x64_sys_munmap+0x27/0x30
[ 87.747523] do_syscall_64+0x3c/0xb0
[ 87.747523] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 87.747523] RIP: 0033:0x7f22b49a7567
[ 87.747523] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 21 ...
[ 87.747523] RSP: 002b:00007ffd56c4f2b8 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
[ 87.747523] RAX: ffffffffffffffda RBX: 0000557f7299e200 RCX: 00007f22b49a7567
[ 87.747523] RDX: 0000000000000000 RSI: 0000000000400000 RDI: 00007f22b3652000
[ 87.747523] RBP: 0000557f7299edf0 R08: 0000557f729abfc0 R09: 0000000000000005
[ 87.747523] R10: 0000000000000034 R11: 0000000000000206 R12: 0000557f7299edf0
[ 87.747523] R13: 0000000000000000 R14: 0000557f71cf6968 R15: 0000557f71cdbe10

2021-06-04 18:56:38

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [bug report] Commit ccf953d8f3d6 ("fb_defio: Remove custom address_space_operations") breaks Hyper-V FB driver

On Fri, Jun 04, 2021 at 06:37:49PM +0000, Dexuan Cui wrote:
> > From: Dexuan Cui
> > Sent: Friday, June 4, 2021 11:17 AM
> > > >> ...
> > > > I've heard a similar report from Vineeth but we didn't get to the bottom
> > > > of this.
> > > I have just tried reverting the commit mentioned above and it solves the
> > > GUI freeze
> > > I was also seeing. Previously, login screen was just freezing, but VM
> > > was accessible
> > > through ssh. With the above commit reverted, I can login to Gnome.
> > >
> > > Looks like I am also experiencing the same bug mentioned here.
> > >
> > > Thanks,
> > > Vineeth
> >
> > As Matthew mentioned, this is a known issue:
> > https://lwn.net/ml/linux-kernel/[email protected]/
> >
> > Matthew has reverted ccf953d8f3d6:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
> > 0b78f8bcf4951af30b0ae83ea4fad27d641ab617
> > so the latest mainline should work now.
> >
> > Thanks,
> > Dexuan
>
> Hi Matthew,
> With today's latest mainline 16f0596fc1d78a1f3ae4628cff962bb297dc908c,
> the Xorg works again in Linux VM on Hyper-V, but when I reboot the VM, I
> always see a lot of "BUG: Bad page state in process Xorg " warnings (there
> are about 60 such warnings) before the VM reboots.
>
> BTW, I happen to have an older Mar-28 mainline kernel (36a14638f7c0654),
> which has the same warnings.
>
> Any idea which change introduced the warnings?

Looks like someone forgot to call fb_deferred_io_cleanup()?