LinuxLists.cc - Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device

2024-01-23 01:35:37

Subject: Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4

On Mon, 22 Jan 2024 19:29:41 -0500
"Bhardwaj, Rajneesh" <[email protected]> wrote:

>
> In one of my previous revisions of this patch when I was experimenting,
> I used something like below. Wonder if that could work in your case
> and/or in general.
>
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
> b/drivers/gpu/drm/ttm/ttm_device.c
>
> index 43e27ab77f95..4c3902b94be4 100644
>
> --- a/drivers/gpu/drm/ttm/ttm_device.c
>
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>
> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct
> ttm_device_funcs *funcs,
>
> bool use_dma_alloc, bool use_dma32){
>
> struct ttm_global *glob = &ttm_glob;
>
> +bool node_has_cpu = false;
>
> int ret;
>
> if (WARN_ON(vma_manager == NULL))
>
> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct
> ttm_device_funcs *funcs,
>
> bdev->funcs = funcs;
>
> ttm_sys_man_init(bdev);
>
> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
>
> +
>
> +node_has_cpu = node_state(dev->numa_node, N_CPU);

Considering that qxl_ttm_init() passes in dev = NULL, the above would blow
up just the same.

-- Steve

>
> +if (node_has_cpu)
>
> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32);
>
> +else
>
> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc,
>
> +use_dma32);
>
> bdev->vma_manager = vma_manager;
>
> spin_lock_init(&bdev->lru_lock);
>
>
> >
> > -- Steve

2024-01-23 01:36:48

by Bhardwaj, Rajneesh

[permalink] [raw]

Subject: Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4

On 1/22/2024 7:34 PM, Steven Rostedt wrote:
> On Mon, 22 Jan 2024 19:29:41 -0500
> "Bhardwaj, Rajneesh" <[email protected]> wrote:
>
>> In one of my previous revisions of this patch when I was experimenting,
>> I used something like below. Wonder if that could work in your case
>> and/or in general.
>>
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
>> b/drivers/gpu/drm/ttm/ttm_device.c
>>
>> index 43e27ab77f95..4c3902b94be4 100644
>>
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>
>> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct
>> ttm_device_funcs *funcs,
>>
>> bool use_dma_alloc, bool use_dma32){
>>
>> struct ttm_global *glob = &ttm_glob;
>>
>> +bool node_has_cpu = false;
>>
>> int ret;
>>
>> if (WARN_ON(vma_manager == NULL))
>>
>> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct
>> ttm_device_funcs *funcs,
>>
>> bdev->funcs = funcs;
>>
>> ttm_sys_man_init(bdev);
>>
>> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
>>
>> +
>>
>> +node_has_cpu = node_state(dev->numa_node, N_CPU);
> Considering that qxl_ttm_init() passes in dev = NULL, the above would blow
> up just the same.

I agree, I think we need something like you suggested i.e.

+ ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE,
+ use_dma_alloc, use_dma32);

I am not quite sure if the above node_has_cpu change will be a better
solution in general, along with the NULL pointer check as you suggested.
If you prefer that, then I can send a fix otherwise, your fix looks good
to me.

>
> -- Steve
>
>
>> +if (node_has_cpu)
>>
>> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32);
>>
>> +else
>>
>> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc,
>>
>> +use_dma32);
>>
>> bdev->vma_manager = vma_manager;
>>
>> spin_lock_init(&bdev->lru_lock);
>>
>>
>>> -- Steve