2023-07-13 21:36:01

by André Almeida

[permalink] [raw]
Subject: [PATCH v2 4/6] drm/amdgpu: Limit info in coredump for kernel threads

If a kernel thread caused the reset, the information available to be
logged will be limited, so return early in the dump function.

Signed-off-by: André Almeida <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e80670420586..07546781b8b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4988,10 +4988,14 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
drm_printf(&p, "kernel: " UTS_RELEASE "\n");
drm_printf(&p, "module: " KBUILD_MODNAME "\n");
drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
- if (coredump->reset_task_info.pid)
+ if (coredump->reset_task_info.pid) {
drm_printf(&p, "process_name: %s PID: %d\n",
coredump->reset_task_info.process_name,
coredump->reset_task_info.pid);
+ } else {
+ drm_printf(&p, "GPU reset caused by a kernel thread\n");
+ return count - iter.remain;
+ }

if (coredump->reset_vram_lost)
drm_printf(&p, "VRAM is lost due to GPU reset!\n");
--
2.41.0



2023-07-14 08:07:12

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] drm/amdgpu: Limit info in coredump for kernel threads



Am 13.07.23 um 23:32 schrieb André Almeida:
> If a kernel thread caused the reset, the information available to be
> logged will be limited, so return early in the dump function.

Why? The register values and vram lost state should still be valid.

Christian.

>
> Signed-off-by: André Almeida <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e80670420586..07546781b8b8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4988,10 +4988,14 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
> drm_printf(&p, "kernel: " UTS_RELEASE "\n");
> drm_printf(&p, "module: " KBUILD_MODNAME "\n");
> drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
> - if (coredump->reset_task_info.pid)
> + if (coredump->reset_task_info.pid) {
> drm_printf(&p, "process_name: %s PID: %d\n",
> coredump->reset_task_info.process_name,
> coredump->reset_task_info.pid);
> + } else {
> + drm_printf(&p, "GPU reset caused by a kernel thread\n");
> + return count - iter.remain;
> + }
>
> if (coredump->reset_vram_lost)
> drm_printf(&p, "VRAM is lost due to GPU reset!\n");


2023-07-14 12:44:00

by André Almeida

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] drm/amdgpu: Limit info in coredump for kernel threads

Em 14/07/2023 04:52, Christian König escreveu:
>
>
> Am 13.07.23 um 23:32 schrieb André Almeida:
>> If a kernel thread caused the reset, the information available to be
>> logged will be limited, so return early in the dump function.
>
> Why? The register values and vram lost state should still be valid.
>

Fair enough, I was thinking about the new added information, such as
ring and job, that won't be around for this type of thread. I'll drop
this patch for the next version.

> Christian.
>
>>
>> Signed-off-by: André Almeida <[email protected]>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e80670420586..07546781b8b8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4988,10 +4988,14 @@ static ssize_t amdgpu_devcoredump_read(char
>> *buffer, loff_t offset,
>>       drm_printf(&p, "kernel: " UTS_RELEASE "\n");
>>       drm_printf(&p, "module: " KBUILD_MODNAME "\n");
>>       drm_printf(&p, "time: %lld.%09ld\n",
>> coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);
>> -    if (coredump->reset_task_info.pid)
>> +    if (coredump->reset_task_info.pid) {
>>           drm_printf(&p, "process_name: %s PID: %d\n",
>>                  coredump->reset_task_info.process_name,
>>                  coredump->reset_task_info.pid);
>> +    } else {
>> +        drm_printf(&p, "GPU reset caused by a kernel thread\n");
>> +        return count - iter.remain;
>> +    }
>>       if (coredump->reset_vram_lost)
>>           drm_printf(&p, "VRAM is lost due to GPU reset!\n");
>