2023-06-26 09:35:43

by Thomas Hellström

[permalink] [raw]
Subject: [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A couple of ttm fixes for issues that either were hit while developing the
xe driver or, for the resource leak patches, discovered during code
inspection.

v2:
- Avoid a goto in patch 3 (Andi Shyti)
- Add some RB's

Thomas Hellström (4):
drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
drm/ttm: Don't shadow the operation context
drm/ttm: Don't leak a resource on eviction error
drm/ttm: Don't leak a resource on swapout move error

drivers/gpu/drm/ttm/ttm_bo.c | 26 +++++++++++++-------------
drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
2 files changed, 15 insertions(+), 13 deletions(-)

--
2.40.1



2023-06-26 09:45:24

by Thomas Hellström

[permalink] [raw]
Subject: [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail()

The value of pos->first was not updated when the first resource of the
range was moved. This could lead to errors like the one below.
Fix this by updating pos->first when needed.

<3> [218.963342] BUG: KASAN: null-ptr-deref in ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.963456] Read of size 8 at addr 0000000000000038 by task xe_evict/1529
<3> [218.963546]
<3> [218.963566] CPU: 0 PID: 1529 Comm: xe_evict Not tainted 6.3.0-xe #1
<3> [218.963664] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake H DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.4064.A00.2102041619 02/04/2021
<3> [218.963841] Call Trace:
<3> [218.963881] <TASK>
<3> [218.963915] dump_stack_lvl+0x64/0xb0
<3> [218.963976] print_report+0x3e5/0x600
<3> [218.964036] ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964127] kasan_report+0x96/0xc0
<3> [218.964183] ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964276] ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964365] ttm_bo_set_bulk_move+0x92/0x140 [ttm]
<3> [218.964454] xe_gem_object_close+0xc8/0x120 [xe]
<3> [218.964675] ? __pfx_xe_gem_object_close+0x10/0x10 [xe]
<3> [218.964908] ? drm_gem_object_handle_put_unlocked+0xc7/0x170 [drm]
<3> [218.965071] drm_gem_object_release_handle+0x45/0x80 [drm]
<3> [218.965220] ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
<3> [218.965381] idr_for_each+0xc9/0x180
<3> [218.965437] ? __pfx_idr_for_each+0x10/0x10
<3> [218.965504] drm_gem_release+0x20/0x30 [drm]
<3> [218.965637] drm_file_free.part.0+0x4cb/0x4f0 [drm]
<3> [218.965778] ? drm_close_helper.isra.0+0xb7/0xe0 [drm]
<3> [218.965921] drm_release_noglobal+0x49/0x90 [drm]
<3> [218.966061] __fput+0x122/0x450
<3> [218.966115] task_work_run+0xfe/0x190
<3> [218.966175] ? __pfx_task_work_run+0x10/0x10
<3> [218.966239] ? do_raw_spin_unlock+0xa7/0x140
<3> [218.966308] do_exit+0x55f/0x1430
<3> [218.966364] ? __pfx_lock_release+0x10/0x10
<3> [218.966431] ? do_raw_spin_lock+0x11d/0x1e0
<3> [218.966498] ? __pfx_do_exit+0x10/0x10
<3> [218.966554] ? __pfx_do_raw_spin_lock+0x10/0x10
<3> [218.966625] ? mark_held_locks+0x24/0x90
<3> [218.966688] ? lockdep_hardirqs_on_prepare+0x136/0x210
<3> [218.966768] do_group_exit+0x68/0x110
<3> [218.966828] __x64_sys_exit_group+0x2c/0x30
<3> [218.966896] do_syscall_64+0x3c/0x90
<3> [218.966955] entry_SYSCALL_64_after_hwframe+0x72/0xdc
<3> [218.967035] RIP: 0033:0x7f77b194f146
<3> [218.967094] Code: Unable to access opcode bytes at 0x7f77b194f11c.
<3> [218.967174] RSP: 002b:00007ffc64791188 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
<3> [218.967271] RAX: ffffffffffffffda RBX: 00007f77b1a548a0 RCX: 00007f77b194f146
<3> [218.967364] RDX: 0000000000000062 RSI: 000000000000003c RDI: 0000000000000062
<3> [218.967458] RBP: 0000000000000062 R08: 00000000000000e7 R09: ffffffffffffff78
<3> [218.967553] R10: 0000000000000058 R11: 0000000000000246 R12: 00007f77b1a548a0
<3> [218.967648] R13: 0000000000000003 R14: 00007f77b1a5d2e8 R15: 0000000000000000
<3> [218.967745] </TASK>

Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
Cc: "Christian König" <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # v5.19+
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/411
Signed-off-by: Thomas Hellström <[email protected]>
---
drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 7333f7a87a2f..cb05e0a36576 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
struct ttm_resource *res)
{
if (pos->last != res) {
+ if (pos->first == res)
+ pos->first = list_next_entry(res, lru);
list_move(&res->lru, &pos->last->lru);
pos->last = res;
}
--
2.40.1


2023-06-26 09:49:15

by Thomas Hellström

[permalink] [raw]
Subject: [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error

If moving the bo to system for swapout failed, we were leaking
a resource. Fix.

Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
Cc: Christian König <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # v5.14+
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
---
drivers/gpu/drm/ttm/ttm_bo.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c0e3bbd21d3d..d9a8f227f310 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
if (unlikely(ret != 0)) {
WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
+ ttm_resource_free(bo, &evict_mem);
goto out;
}
}
--
2.40.1


2023-06-26 10:51:15

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail()

I've already pushed the version from Teddy to drm-misc-fixes last week.

So no need for that one any more.

Christian.

Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> The value of pos->first was not updated when the first resource of the
> range was moved. This could lead to errors like the one below.
> Fix this by updating pos->first when needed.
>
> <3> [218.963342] BUG: KASAN: null-ptr-deref in ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.963456] Read of size 8 at addr 0000000000000038 by task xe_evict/1529
> <3> [218.963546]
> <3> [218.963566] CPU: 0 PID: 1529 Comm: xe_evict Not tainted 6.3.0-xe #1
> <3> [218.963664] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake H DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.4064.A00.2102041619 02/04/2021
> <3> [218.963841] Call Trace:
> <3> [218.963881] <TASK>
> <3> [218.963915] dump_stack_lvl+0x64/0xb0
> <3> [218.963976] print_report+0x3e5/0x600
> <3> [218.964036] ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964127] kasan_report+0x96/0xc0
> <3> [218.964183] ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964276] ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964365] ttm_bo_set_bulk_move+0x92/0x140 [ttm]
> <3> [218.964454] xe_gem_object_close+0xc8/0x120 [xe]
> <3> [218.964675] ? __pfx_xe_gem_object_close+0x10/0x10 [xe]
> <3> [218.964908] ? drm_gem_object_handle_put_unlocked+0xc7/0x170 [drm]
> <3> [218.965071] drm_gem_object_release_handle+0x45/0x80 [drm]
> <3> [218.965220] ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
> <3> [218.965381] idr_for_each+0xc9/0x180
> <3> [218.965437] ? __pfx_idr_for_each+0x10/0x10
> <3> [218.965504] drm_gem_release+0x20/0x30 [drm]
> <3> [218.965637] drm_file_free.part.0+0x4cb/0x4f0 [drm]
> <3> [218.965778] ? drm_close_helper.isra.0+0xb7/0xe0 [drm]
> <3> [218.965921] drm_release_noglobal+0x49/0x90 [drm]
> <3> [218.966061] __fput+0x122/0x450
> <3> [218.966115] task_work_run+0xfe/0x190
> <3> [218.966175] ? __pfx_task_work_run+0x10/0x10
> <3> [218.966239] ? do_raw_spin_unlock+0xa7/0x140
> <3> [218.966308] do_exit+0x55f/0x1430
> <3> [218.966364] ? __pfx_lock_release+0x10/0x10
> <3> [218.966431] ? do_raw_spin_lock+0x11d/0x1e0
> <3> [218.966498] ? __pfx_do_exit+0x10/0x10
> <3> [218.966554] ? __pfx_do_raw_spin_lock+0x10/0x10
> <3> [218.966625] ? mark_held_locks+0x24/0x90
> <3> [218.966688] ? lockdep_hardirqs_on_prepare+0x136/0x210
> <3> [218.966768] do_group_exit+0x68/0x110
> <3> [218.966828] __x64_sys_exit_group+0x2c/0x30
> <3> [218.966896] do_syscall_64+0x3c/0x90
> <3> [218.966955] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> <3> [218.967035] RIP: 0033:0x7f77b194f146
> <3> [218.967094] Code: Unable to access opcode bytes at 0x7f77b194f11c.
> <3> [218.967174] RSP: 002b:00007ffc64791188 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> <3> [218.967271] RAX: ffffffffffffffda RBX: 00007f77b1a548a0 RCX: 00007f77b194f146
> <3> [218.967364] RDX: 0000000000000062 RSI: 000000000000003c RDI: 0000000000000062
> <3> [218.967458] RBP: 0000000000000062 R08: 00000000000000e7 R09: ffffffffffffff78
> <3> [218.967553] R10: 0000000000000058 R11: 0000000000000246 R12: 00007f77b1a548a0
> <3> [218.967648] R13: 0000000000000003 R14: 00007f77b1a5d2e8 R15: 0000000000000000
> <3> [218.967745] </TASK>
>
> Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
> Cc: "Christian König" <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: [email protected]
> Cc: <[email protected]> # v5.19+
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/411
> Signed-off-by: Thomas Hellström <[email protected]>
> ---
> drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 7333f7a87a2f..cb05e0a36576 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
> struct ttm_resource *res)
> {
> if (pos->last != res) {
> + if (pos->first == res)
> + pos->first = list_next_entry(res, lru);
> list_move(&res->lru, &pos->last->lru);
> pos->last = res;
> }


2023-06-26 11:53:07

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error

Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> If moving the bo to system for swapout failed, we were leaking
> a resource. Fix.
>
> Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
> Cc: Christian König <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: [email protected]
> Cc: <[email protected]> # v5.14+
> Signed-off-by: Thomas Hellström <[email protected]>
> Reviewed-by: Nirmoy Das <[email protected]>
> Reviewed-by: Andi Shyti <[email protected]>

Reviewed-by: Christian König <[email protected]>

> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index c0e3bbd21d3d..d9a8f227f310 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
> if (unlikely(ret != 0)) {
> WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
> + ttm_resource_free(bo, &evict_mem);
> goto out;
> }
> }


2023-06-26 12:44:58

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error

Hi, Christian,

Will you take a look at 2/4 as well? Will you merge these?

Thanks,

Thomas


On 6/26/23 13:33, Christian König wrote:
> Am 26.06.23 um 11:14 schrieb Thomas Hellström:
>> If moving the bo to system for swapout failed, we were leaking
>> a resource. Fix.
>>
>> Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of
>> embedding it v2")
>> Cc: Christian König <[email protected]>
>> Cc: "Christian König" <[email protected]>
>> Cc: [email protected]
>> Cc: <[email protected]> # v5.14+
>> Signed-off-by: Thomas Hellström <[email protected]>
>> Reviewed-by: Nirmoy Das <[email protected]>
>> Reviewed-by: Andi Shyti <[email protected]>
>
> Reviewed-by: Christian König <[email protected]>
>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index c0e3bbd21d3d..d9a8f227f310 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object
>> *bo, struct ttm_operation_ctx *ctx,
>>           ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>>           if (unlikely(ret != 0)) {
>>               WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput
>> - likely driver bug.\n");
>> +            ttm_resource_free(bo, &evict_mem);
>>               goto out;
>>           }
>>       }
>

2023-06-27 09:43:08

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves


On 6/26/23 11:14, Thomas Hellström wrote:
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> A couple of ttm fixes for issues that either were hit while developing the
> xe driver or, for the resource leak patches, discovered during code
> inspection.
>
> v2:
> - Avoid a goto in patch 3 (Andi Shyti)
> - Add some RB's
>
> Thomas Hellström (4):
> drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
> drm/ttm: Don't shadow the operation context
> drm/ttm: Don't leak a resource on eviction error
> drm/ttm: Don't leak a resource on swapout move error
>
> drivers/gpu/drm/ttm/ttm_bo.c | 26 +++++++++++++-------------
> drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
> 2 files changed, 15 insertions(+), 13 deletions(-)
>
Pushed 2/4 to drm-misc-next, 3/4 & 4/4 to drm-misc-fixes.

/Thomas