2023-10-01 22:02:36

by Oleksandr Natalenko

[permalink] [raw]
Subject: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:

```
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
drm_gem_put_pages+0x186/0x250
drm_gem_shmem_put_pages_locked+0x43/0xc0
drm_gem_shmem_object_vunmap+0x83/0xe0
drm_gem_vunmap_unlocked+0x46/0xb0
drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
drm_fb_helper_damage_work+0x96/0x170
process_one_work+0x254/0x470
worker_thread+0x55/0x4f0
kthread+0xe8/0x120
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s:
drm_gem_get_pages+0x94/0x2b0
drm_gem_shmem_get_pages+0x5d/0x110
drm_gem_shmem_object_vmap+0xc4/0x1e0
drm_gem_vmap_unlocked+0x3c/0x70
drm_client_buffer_vmap+0x23/0x50
drm_fbdev_generic_helper_fb_dirty+0xae/0x310
drm_fb_helper_damage_work+0x96/0x170
process_one_work+0x254/0x470
worker_thread+0x55/0x4f0
kthread+0xe8/0x120
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s:
drm_gem_put_pages+0x186/0x250
drm_gem_shmem_put_pages_locked+0x43/0xc0
drm_gem_shmem_object_vunmap+0x83/0xe0
drm_gem_vunmap_unlocked+0x46/0xb0
drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
drm_fb_helper_damage_work+0x96/0x170
process_one_work+0x254/0x470
worker_thread+0x55/0x4f0
kthread+0xe8/0x120
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events drm_fb_helper_damage_work
```

This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

Please check.

Thanks.

--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2023-10-02 05:10:39

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
> Hello.
>
> I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
>
> ```
> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
>
> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> drm_gem_put_pages+0x186/0x250
> drm_gem_shmem_put_pages_locked+0x43/0xc0
> drm_gem_shmem_object_vunmap+0x83/0xe0
> drm_gem_vunmap_unlocked+0x46/0xb0
> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
>
> allocated by task 51 on cpu 0 at 14.668667s:
> drm_gem_get_pages+0x94/0x2b0
> drm_gem_shmem_get_pages+0x5d/0x110
> drm_gem_shmem_object_vmap+0xc4/0x1e0
> drm_gem_vmap_unlocked+0x3c/0x70
> drm_client_buffer_vmap+0x23/0x50
> drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> freed by task 51 on cpu 0 at 14.668697s:
> drm_gem_put_pages+0x186/0x250
> drm_gem_shmem_put_pages_locked+0x43/0xc0
> drm_gem_shmem_object_vunmap+0x83/0xe0
> drm_gem_vunmap_unlocked+0x46/0xb0
> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> Workqueue: events drm_fb_helper_damage_work
> ```
>
> This repeats a couple of times and then stops.
>
> Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
>
> The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
>

Do you have this issue on v6.4?

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (2.28 kB)
signature.asc (235.00 B)
Download all attachments

2023-10-02 10:56:09

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:
> Hello.
>
> On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
> > On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
> > > Hello.
> > >
> > > I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
> > >
> > > ```
> > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > >
> > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > drm_gem_put_pages+0x186/0x250
> > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > drm_fb_helper_damage_work+0x96/0x170
> > > process_one_work+0x254/0x470
> > > worker_thread+0x55/0x4f0
> > > kthread+0xe8/0x120
> > > ret_from_fork+0x34/0x50
> > > ret_from_fork_asm+0x1b/0x30
> > >
> > > kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
> > >
> > > allocated by task 51 on cpu 0 at 14.668667s:
> > > drm_gem_get_pages+0x94/0x2b0
> > > drm_gem_shmem_get_pages+0x5d/0x110
> > > drm_gem_shmem_object_vmap+0xc4/0x1e0
> > > drm_gem_vmap_unlocked+0x3c/0x70
> > > drm_client_buffer_vmap+0x23/0x50
> > > drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> > > drm_fb_helper_damage_work+0x96/0x170
> > > process_one_work+0x254/0x470
> > > worker_thread+0x55/0x4f0
> > > kthread+0xe8/0x120
> > > ret_from_fork+0x34/0x50
> > > ret_from_fork_asm+0x1b/0x30
> > >
> > > freed by task 51 on cpu 0 at 14.668697s:
> > > drm_gem_put_pages+0x186/0x250
> > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > drm_fb_helper_damage_work+0x96/0x170
> > > process_one_work+0x254/0x470
> > > worker_thread+0x55/0x4f0
> > > kthread+0xe8/0x120
> > > ret_from_fork+0x34/0x50
> > > ret_from_fork_asm+0x1b/0x30
> > >
> > > CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> > > Workqueue: events drm_fb_helper_damage_work
> > > ```
> > >
> > > This repeats a couple of times and then stops.
> > >
> > > Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
> > >
> > > The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
> > >
> >
> > Do you have this issue on v6.4?
>
> No, I did not have this issue with v6.4.
>

Then proceed with kernel bisection. You can refer to
Documentation/admin-guide/bug-bisect.rst in the kernel sources for the
process.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (2.86 kB)
signature.asc (235.00 B)
Download all attachments

2023-10-02 12:06:29

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
> Hello.
>
> I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
>
> ```
> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
>
> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> drm_gem_put_pages+0x186/0x250
> drm_gem_shmem_put_pages_locked+0x43/0xc0
> drm_gem_shmem_object_vunmap+0x83/0xe0
> drm_gem_vunmap_unlocked+0x46/0xb0
> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
>
> allocated by task 51 on cpu 0 at 14.668667s:
> drm_gem_get_pages+0x94/0x2b0
> drm_gem_shmem_get_pages+0x5d/0x110
> drm_gem_shmem_object_vmap+0xc4/0x1e0
> drm_gem_vmap_unlocked+0x3c/0x70
> drm_client_buffer_vmap+0x23/0x50
> drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> freed by task 51 on cpu 0 at 14.668697s:
> drm_gem_put_pages+0x186/0x250
> drm_gem_shmem_put_pages_locked+0x43/0xc0
> drm_gem_shmem_object_vunmap+0x83/0xe0
> drm_gem_vunmap_unlocked+0x46/0xb0
> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> Workqueue: events drm_fb_helper_damage_work
> ```
>
> This repeats a couple of times and then stops.
>
> Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
>
> The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
>

Thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: v6.4..v6.5

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (2.34 kB)
signature.asc (235.00 B)
Download all attachments

2023-10-02 17:22:32

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Hello.

On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
> On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
> > Hello.
> >
> > I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
> >
> > ```
> > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> >
> > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > drm_gem_put_pages+0x186/0x250
> > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > drm_gem_shmem_object_vunmap+0x83/0xe0
> > drm_gem_vunmap_unlocked+0x46/0xb0
> > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > drm_fb_helper_damage_work+0x96/0x170
> > process_one_work+0x254/0x470
> > worker_thread+0x55/0x4f0
> > kthread+0xe8/0x120
> > ret_from_fork+0x34/0x50
> > ret_from_fork_asm+0x1b/0x30
> >
> > kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
> >
> > allocated by task 51 on cpu 0 at 14.668667s:
> > drm_gem_get_pages+0x94/0x2b0
> > drm_gem_shmem_get_pages+0x5d/0x110
> > drm_gem_shmem_object_vmap+0xc4/0x1e0
> > drm_gem_vmap_unlocked+0x3c/0x70
> > drm_client_buffer_vmap+0x23/0x50
> > drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> > drm_fb_helper_damage_work+0x96/0x170
> > process_one_work+0x254/0x470
> > worker_thread+0x55/0x4f0
> > kthread+0xe8/0x120
> > ret_from_fork+0x34/0x50
> > ret_from_fork_asm+0x1b/0x30
> >
> > freed by task 51 on cpu 0 at 14.668697s:
> > drm_gem_put_pages+0x186/0x250
> > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > drm_gem_shmem_object_vunmap+0x83/0xe0
> > drm_gem_vunmap_unlocked+0x46/0xb0
> > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > drm_fb_helper_damage_work+0x96/0x170
> > process_one_work+0x254/0x470
> > worker_thread+0x55/0x4f0
> > kthread+0xe8/0x120
> > ret_from_fork+0x34/0x50
> > ret_from_fork_asm+0x1b/0x30
> >
> > CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> > Workqueue: events drm_fb_helper_damage_work
> > ```
> >
> > This repeats a couple of times and then stops.
> >
> > Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
> >
> > The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
> >
>
> Do you have this issue on v6.4?

No, I did not have this issue with v6.4.

Thanks.

--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2023-10-02 21:03:18

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > > > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > > >
> > > > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > > > drm_gem_put_pages+0x186/0x250
> > > > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > > drm_fb_helper_damage_work+0x96/0x170
>
> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible. I think you have two useful points to look at
before delving into a full bisect -- 863a8e and the parent of 0b62af.
If either of them work, I think you have no more work to do.


2023-10-02 21:56:38

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

/cc Matthew, Andrew (please see below)

On pondělí 2. října 2023 12:42:42 CEST Bagas Sanjaya wrote:
> On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:
> > Hello.
> >
> > On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
> > > On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
> > > > Hello.
> > > >
> > > > I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
> > > >
> > > > ```
> > > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > >
> > > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > > drm_gem_put_pages+0x186/0x250
> > > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > drm_fb_helper_damage_work+0x96/0x170
> > > > process_one_work+0x254/0x470
> > > > worker_thread+0x55/0x4f0
> > > > kthread+0xe8/0x120
> > > > ret_from_fork+0x34/0x50
> > > > ret_from_fork_asm+0x1b/0x30
> > > >
> > > > kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
> > > >
> > > > allocated by task 51 on cpu 0 at 14.668667s:
> > > > drm_gem_get_pages+0x94/0x2b0
> > > > drm_gem_shmem_get_pages+0x5d/0x110
> > > > drm_gem_shmem_object_vmap+0xc4/0x1e0
> > > > drm_gem_vmap_unlocked+0x3c/0x70
> > > > drm_client_buffer_vmap+0x23/0x50
> > > > drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> > > > drm_fb_helper_damage_work+0x96/0x170
> > > > process_one_work+0x254/0x470
> > > > worker_thread+0x55/0x4f0
> > > > kthread+0xe8/0x120
> > > > ret_from_fork+0x34/0x50
> > > > ret_from_fork_asm+0x1b/0x30
> > > >
> > > > freed by task 51 on cpu 0 at 14.668697s:
> > > > drm_gem_put_pages+0x186/0x250
> > > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > drm_fb_helper_damage_work+0x96/0x170
> > > > process_one_work+0x254/0x470
> > > > worker_thread+0x55/0x4f0
> > > > kthread+0xe8/0x120
> > > > ret_from_fork+0x34/0x50
> > > > ret_from_fork_asm+0x1b/0x30
> > > >
> > > > CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> > > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> > > > Workqueue: events drm_fb_helper_damage_work
> > > > ```
> > > >
> > > > This repeats a couple of times and then stops.
> > > >
> > > > Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
> > > >
> > > > The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
> > > >
> > >
> > > Do you have this issue on v6.4?
> >
> > No, I did not have this issue with v6.4.
> >
>
> Then proceed with kernel bisection. You can refer to
> Documentation/admin-guide/bug-bisect.rst in the kernel sources for the
> process.

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

In the git log between v6.4 and v6.5 I see this:

```
commit 3291e09a463870610b8227f32b16b19a587edf33
Author: Matthew Wilcox (Oracle) <[email protected]>
Date: Wed Jun 21 17:45:49 2023 +0100

drm: convert drm_gem_put_pages() to use a folio_batch

Remove a few hidden compound_head() calls by converting the returned page
to a folio once and using the folio APIs.
```

Thanks.

--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2023-10-02 23:06:26

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > > > > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > > > >
> > > > > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > > > > drm_gem_put_pages+0x186/0x250
> > > > > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > > > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > > > drm_fb_helper_damage_work+0x96/0x170
> >
> > Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
>
> Yes, entirely plausible. I think you have two useful points to look at
> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> If either of them work, I think you have no more work to do.

OK, I've did this against v6.5.5:

```
git log --oneline HEAD~3..
7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
```

then rebooted the host multiple times, and the issue is not seen any more.

So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.

Thanks.

--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2023-10-05 13:57:23

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:
> Hello.
>
> On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
> > Hi
> >
> > Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> > > On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> > >> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > >>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > >>>>>>>
> > >>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > >>>>>>> drm_gem_put_pages+0x186/0x250
> > >>>>>>> drm_gem_shmem_put_pages_locked+0x43/0xc0
> > >>>>>>> drm_gem_shmem_object_vunmap+0x83/0xe0
> > >>>>>>> drm_gem_vunmap_unlocked+0x46/0xb0
> > >>>>>>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > >>>>>>> drm_fb_helper_damage_work+0x96/0x170
> > >>>
> > >>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> > >>
> > >> Yes, entirely plausible. I think you have two useful points to look at
> > >> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> > >> If either of them work, I think you have no more work to do.
> > >
> > > OK, I've did this against v6.5.5:
> > >
> > > ```
> > > git log --oneline HEAD~3..
> > > 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> > > 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> > > fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> > > ```
> > >
> > > then rebooted the host multiple times, and the issue is not seen any more.
> > >
> > > So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
> >
> > Ignore my other email. It's apparently been fixed already. Thanks!
>
> Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.

I don't understand; you said reverting those DRM commits fixed the
problem, so 863a8eb3f270 is the solution. No?

2023-10-05 14:04:22

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Hi

Am 01.10.23 um 18:32 schrieb Oleksandr Natalenko:
> Hello.
>
> I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
>
> ```
> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
>
> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> drm_gem_put_pages+0x186/0x250
> drm_gem_shmem_put_pages_locked+0x43/0xc0
> drm_gem_shmem_object_vunmap+0x83/0xe0
> drm_gem_vunmap_unlocked+0x46/0xb0
> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
>
> allocated by task 51 on cpu 0 at 14.668667s:
> drm_gem_get_pages+0x94/0x2b0
> drm_gem_shmem_get_pages+0x5d/0x110
> drm_gem_shmem_object_vmap+0xc4/0x1e0
> drm_gem_vmap_unlocked+0x3c/0x70
> drm_client_buffer_vmap+0x23/0x50
> drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> freed by task 51 on cpu 0 at 14.668697s:
> drm_gem_put_pages+0x186/0x250
> drm_gem_shmem_put_pages_locked+0x43/0xc0
> drm_gem_shmem_object_vunmap+0x83/0xe0
> drm_gem_vunmap_unlocked+0x46/0xb0
> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> drm_fb_helper_damage_work+0x96/0x170
> process_one_work+0x254/0x470
> worker_thread+0x55/0x4f0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> Workqueue: events drm_fb_helper_damage_work
> ```
>
> This repeats a couple of times and then stops.
>
> Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
>
> The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

There's nothing special about the cirrus driver. Can you please provide
the full output of 'lspci -v' ?

Would you be able to bisect this bug?

Best regards
Thomas

>
> Please check.
>
> Thanks.
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


Attachments:
OpenPGP_signature.asc (855.00 B)
OpenPGP digital signature

2023-10-05 14:16:18

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On čtvrtek 5. října 2023 15:05:27 CEST Matthew Wilcox wrote:
> On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:
> > No-no, sorry for possible confusion. Let me explain again:
> >
> > 1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
> > 2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
> > 3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed
> >
> > IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.
>
> Thank you! Sorry about the misunderstanding. Try this:
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 6129b89bb366..44a948b80ee1 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
> struct page **pages;
> struct folio *folio;
> struct folio_batch fbatch;
> - int i, j, npages;
> + long i, j, npages;
>
> if (WARN_ON(!obj->filp))
> return ERR_PTR(-EINVAL);
> @@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
>
> i = 0;
> while (i < npages) {
> + long nr;
> folio = shmem_read_folio_gfp(mapping, i,
> mapping_gfp_mask(mapping));
> if (IS_ERR(folio))
> goto fail;
> - for (j = 0; j < folio_nr_pages(folio); j++, i++)
> + nr = min(npages - i, folio_nr_pages(folio));
> + for (j = 0; j < nr; j++, i++)
> pages[i] = folio_file_page(folio, i);
>
> /* Make sure shmem keeps __GFP_DMA32 allocated pages in the

No issues after five reboots with this patch applied on top of v6.5.5.

Reported-by: Oleksandr Natalenko <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Fixes: 3291e09a4638 ("drm: convert drm_gem_put_pages() to use a folio_batch")
Cc: [email protected] # 6.5.x

Thank you!

--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2023-10-05 14:17:31

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Hi

Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
>> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
>>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
>>>>>>>
>>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
>>>>>>> drm_gem_put_pages+0x186/0x250
>>>>>>> drm_gem_shmem_put_pages_locked+0x43/0xc0
>>>>>>> drm_gem_shmem_object_vunmap+0x83/0xe0
>>>>>>> drm_gem_vunmap_unlocked+0x46/0xb0
>>>>>>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
>>>>>>> drm_fb_helper_damage_work+0x96/0x170
>>>
>>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
>>
>> Yes, entirely plausible. I think you have two useful points to look at
>> before delving into a full bisect -- 863a8e and the parent of 0b62af.
>> If either of them work, I think you have no more work to do.
>
> OK, I've did this against v6.5.5:
>
> ```
> git log --oneline HEAD~3..
> 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> ```
>
> then rebooted the host multiple times, and the issue is not seen any more.
>
> So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.

Ignore my other email. It's apparently been fixed already. Thanks!

Best regards
Thomas

>
> Thanks.
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


Attachments:
OpenPGP_signature.asc (855.00 B)
OpenPGP digital signature

2023-10-05 14:30:36

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Hello.

On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
> Hi
>
> Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> > On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> >> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> >>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> >>>>>>>
> >>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> >>>>>>> drm_gem_put_pages+0x186/0x250
> >>>>>>> drm_gem_shmem_put_pages_locked+0x43/0xc0
> >>>>>>> drm_gem_shmem_object_vunmap+0x83/0xe0
> >>>>>>> drm_gem_vunmap_unlocked+0x46/0xb0
> >>>>>>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> >>>>>>> drm_fb_helper_damage_work+0x96/0x170
> >>>
> >>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> >>
> >> Yes, entirely plausible. I think you have two useful points to look at
> >> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> >> If either of them work, I think you have no more work to do.
> >
> > OK, I've did this against v6.5.5:
> >
> > ```
> > git log --oneline HEAD~3..
> > 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> > 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> > fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> > ```
> >
> > then rebooted the host multiple times, and the issue is not seen any more.
> >
> > So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
>
> Ignore my other email. It's apparently been fixed already. Thanks!

Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.

Thanks.

> Best regards
> Thomas
>
> >
> > Thanks.
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
>


--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2023-10-05 16:16:21

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Hello.

On čtvrtek 5. října 2023 14:19:44 CEST Matthew Wilcox wrote:
> On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:
> > Hello.
> >
> > On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
> > > Hi
> > >
> > > Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> > > > On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> > > >> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > > >>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > >>>>>>>
> > > >>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > >>>>>>> drm_gem_put_pages+0x186/0x250
> > > >>>>>>> drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > >>>>>>> drm_gem_shmem_object_vunmap+0x83/0xe0
> > > >>>>>>> drm_gem_vunmap_unlocked+0x46/0xb0
> > > >>>>>>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > >>>>>>> drm_fb_helper_damage_work+0x96/0x170
> > > >>>
> > > >>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> > > >>
> > > >> Yes, entirely plausible. I think you have two useful points to look at
> > > >> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> > > >> If either of them work, I think you have no more work to do.
> > > >
> > > > OK, I've did this against v6.5.5:
> > > >
> > > > ```
> > > > git log --oneline HEAD~3..
> > > > 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> > > > 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> > > > fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> > > > ```
> > > >
> > > > then rebooted the host multiple times, and the issue is not seen any more.
> > > >
> > > > So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
> > >
> > > Ignore my other email. It's apparently been fixed already. Thanks!
> >
> > Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.
>
> I don't understand; you said reverting those DRM commits fixed the
> problem, so 863a8eb3f270 is the solution. No?

No-no, sorry for possible confusion. Let me explain again:

1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed

IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Hope this gets clear.

Thanks.

--
Oleksandr Natalenko (post-factum)


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2023-10-05 16:19:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:
> No-no, sorry for possible confusion. Let me explain again:
>
> 1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
> 2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
> 3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed
>
> IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Thank you! Sorry about the misunderstanding. Try this:

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 6129b89bb366..44a948b80ee1 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
struct page **pages;
struct folio *folio;
struct folio_batch fbatch;
- int i, j, npages;
+ long i, j, npages;

if (WARN_ON(!obj->filp))
return ERR_PTR(-EINVAL);
@@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)

i = 0;
while (i < npages) {
+ long nr;
folio = shmem_read_folio_gfp(mapping, i,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
goto fail;
- for (j = 0; j < folio_nr_pages(folio); j++, i++)
+ nr = min(npages - i, folio_nr_pages(folio));
+ for (j = 0; j < nr; j++, i++)
pages[i] = folio_file_page(folio, i);

/* Make sure shmem keeps __GFP_DMA32 allocated pages in the