LinuxLists.cc - [PATCH] drm/amdkfd: Fix potential NULL pointer dereferences

2018-01-10 17:14:46

Subject: [PATCH] drm/amdkfd: Fix potential NULL pointer dereferences

In case kfd_get_process_device_data returns null, there are some
null pointer dereferences in functions kfd_bind_processes_to_device
and kfd_unbind_processes_from_device.

Fix this by null checking pdd before dereferencing it.

Addresses-Coverity-ID: 1463794 ("Dereference null return value")
Addresses-Coverity-ID: 1463772 ("Dereference null return value")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index a22fb071..29d51d5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -461,6 +461,13 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev)
hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
mutex_lock(&p->mutex);
pdd = kfd_get_process_device_data(dev, p);
+
+ if (!pdd) {
+ pr_err("Process device data doesn't exist\n");
+ mutex_unlock(&p->mutex);
+ return -EFAULT;
+ }
+
if (pdd->bound != PDD_BOUND_SUSPENDED) {
mutex_unlock(&p->mutex);
continue;
@@ -501,6 +508,11 @@ void kfd_unbind_processes_from_device(struct kfd_dev *dev)
mutex_lock(&p->mutex);
pdd = kfd_get_process_device_data(dev, p);

+ if (!pdd) {
+ mutex_unlock(&p->mutex);
+ return;
+ }
+
if (pdd->bound == PDD_BOUND)
pdd->bound = PDD_BOUND_SUSPENDED;
mutex_unlock(&p->mutex);
--
2.7.4

2018-01-10 18:33:28

by Felix Kuehling

[permalink] [raw]

Subject: Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereferences

Hi Gustavo,

Thanks for catching that. When returning a fault, I think you also need
to srcu_read_unlock(&kfd_processes_srcu, idx).

However, instead of returning an error, I think I'd prefer to skip PDDs
that can't be found with continue statements. That way others would
still suspend and resume successfully. Maybe just print a WARN_ON for
PDDs that aren't found, because that's an unexpected situation,
currently. Maybe in the future it could be normal thing if we ever
support GPU hotplug.

Regards,
Felix

On 2018-01-10 11:50 AM, Gustavo A. R. Silva wrote:
> In case kfd_get_process_device_data returns null, there are some
> null pointer dereferences in functions kfd_bind_processes_to_device
> and kfd_unbind_processes_from_device.
>
> Fix this by null checking pdd before dereferencing it.
>
> Addresses-Coverity-ID: 1463794 ("Dereference null return value")
> Addresses-Coverity-ID: 1463772 ("Dereference null return value")
> Signed-off-by: Gustavo A. R. Silva <[email protected]>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index a22fb071..29d51d5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -461,6 +461,13 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev)
> hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> mutex_lock(&p->mutex);
> pdd = kfd_get_process_device_data(dev, p);
> +
> + if (!pdd) {
> + pr_err("Process device data doesn't exist\n");
> + mutex_unlock(&p->mutex);
> + return -EFAULT;
> + }
> +
> if (pdd->bound != PDD_BOUND_SUSPENDED) {
> mutex_unlock(&p->mutex);
> continue;
> @@ -501,6 +508,11 @@ void kfd_unbind_processes_from_device(struct kfd_dev *dev)
> mutex_lock(&p->mutex);
> pdd = kfd_get_process_device_data(dev, p);
>
> + if (!pdd) {
> + mutex_unlock(&p->mutex);
> + return;
> + }
> +
> if (pdd->bound == PDD_BOUND)
> pdd->bound = PDD_BOUND_SUSPENDED;
> mutex_unlock(&p->mutex);

2018-01-10 22:26:12

by Felix Kuehling

[permalink] [raw]

Subject: Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereferences

Yeah, this looks good to me.

Regards,
Felix

On 2018-01-10 04:58 PM, Gustavo A. R. Silva wrote:
> Hi Felix,
>
> Quoting Felix Kuehling <[email protected]>:
>
>> Hi Gustavo,
>>
>> Thanks for catching that. When returning a fault, I think you also need
>> to srcu_read_unlock(&kfd_processes_srcu, idx).
>>
>> However, instead of returning an error, I think I'd prefer to skip PDDs
>> that can't be found with continue statements. That way others would
>> still suspend and resume successfully. Maybe just print a WARN_ON for
>> PDDs that aren't found, because that's an unexpected situation,
>> currently. Maybe in the future it could be normal thing if we ever
>> support GPU hotplug.
>>
>
> I got it. In that case, what do you think about the following patch
> instead?
>
> index a22fb071..4ff5f0f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -461,7 +461,8 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev)
>         hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>                 mutex_lock(&p->mutex);
>                 pdd = kfd_get_process_device_data(dev, p);
> -               if (pdd->bound != PDD_BOUND_SUSPENDED) {
> +
> +               if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
>                         mutex_unlock(&p->mutex);
>                         continue;
>                 }
> @@ -501,6 +502,11 @@ void kfd_unbind_processes_from_device(struct
> kfd_dev *dev)
>                 mutex_lock(&p->mutex);
>                 pdd = kfd_get_process_device_data(dev, p);
>
> +               if (WARN_ON(!pdd)) {
> +                       mutex_unlock(&p->mutex);
> +                       continue;
> +               }
> +
>                 if (pdd->bound == PDD_BOUND)
>                         pdd->bound = PDD_BOUND_SUSPENDED;
>                 mutex_unlock(&p->mutex);
>
>
> Thank you for the feedback.
> --
> Gustavo
>
>> Regards,
>> Felix
>>
>>
>> On 2018-01-10 11:50 AM, Gustavo A. R. Silva wrote:
>>> In case kfd_get_process_device_data returns null, there are some
>>> null pointer dereferences in functions kfd_bind_processes_to_device
>>> and kfd_unbind_processes_from_device.
>>>
>>> Fix this by null checking pdd before dereferencing it.
>>>
>>> Addresses-Coverity-ID: 1463794 ("Dereference null return value")
>>> Addresses-Coverity-ID: 1463772 ("Dereference null return value")
>>> Signed-off-by: Gustavo A. R. Silva <[email protected]>
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++++++++++
>>> 1 file changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index a22fb071..29d51d5 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -461,6 +461,13 @@ int kfd_bind_processes_to_device(struct kfd_dev
>>> *dev)
>>>      hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>>          mutex_lock(&p->mutex);
>>>          pdd = kfd_get_process_device_data(dev, p);
>>> +
>>> +        if (!pdd) {
>>> +            pr_err("Process device data doesn't exist\n");
>>> +            mutex_unlock(&p->mutex);
>>> +            return -EFAULT;
>>> +        }
>>> +
>>>          if (pdd->bound != PDD_BOUND_SUSPENDED) {
>>>              mutex_unlock(&p->mutex);
>>>              continue;
>>> @@ -501,6 +508,11 @@ void kfd_unbind_processes_from_device(struct
>>> kfd_dev *dev)
>>>          mutex_lock(&p->mutex);
>>>          pdd = kfd_get_process_device_data(dev, p);
>>>
>>> +        if (!pdd) {
>>> +            mutex_unlock(&p->mutex);
>>> +            return;
>>> +        }
>>> +
>>>          if (pdd->bound == PDD_BOUND)
>>>              pdd->bound = PDD_BOUND_SUSPENDED;
>>>          mutex_unlock(&p->mutex);
>
>
>
>
>
>

2018-01-10 22:44:48

by Gustavo A. R. Silva

[permalink] [raw]

Subject: Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereferences

Hi Felix,

Quoting Felix Kuehling <[email protected]>:

> Hi Gustavo,
>
> Thanks for catching that. When returning a fault, I think you also need
> to srcu_read_unlock(&kfd_processes_srcu, idx).
>
> However, instead of returning an error, I think I'd prefer to skip PDDs
> that can't be found with continue statements. That way others would
> still suspend and resume successfully. Maybe just print a WARN_ON for
> PDDs that aren't found, because that's an unexpected situation,
> currently. Maybe in the future it could be normal thing if we ever
> support GPU hotplug.
>

I got it. In that case, what do you think about the following patch instead?

index a22fb071..4ff5f0f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -461,7 +461,8 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev)
hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
mutex_lock(&p->mutex);
pdd = kfd_get_process_device_data(dev, p);
- if (pdd->bound != PDD_BOUND_SUSPENDED) {
+
+ if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
mutex_unlock(&p->mutex);
continue;
}
@@ -501,6 +502,11 @@ void kfd_unbind_processes_from_device(struct
kfd_dev *dev)
mutex_lock(&p->mutex);
pdd = kfd_get_process_device_data(dev, p);

+ if (WARN_ON(!pdd)) {
+ mutex_unlock(&p->mutex);
+ continue;
+ }
+
if (pdd->bound == PDD_BOUND)
pdd->bound = PDD_BOUND_SUSPENDED;
mutex_unlock(&p->mutex);

Thank you for the feedback.
--
Gustavo

> Regards,
> Felix
>
>
> On 2018-01-10 11:50 AM, Gustavo A. R. Silva wrote:
>> In case kfd_get_process_device_data returns null, there are some
>> null pointer dereferences in functions kfd_bind_processes_to_device
>> and kfd_unbind_processes_from_device.
>>
>> Fix this by null checking pdd before dereferencing it.
>>
>> Addresses-Coverity-ID: 1463794 ("Dereference null return value")
>> Addresses-Coverity-ID: 1463772 ("Dereference null return value")
>> Signed-off-by: Gustavo A. R. Silva <[email protected]>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index a22fb071..29d51d5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -461,6 +461,13 @@ int kfd_bind_processes_to_device(struct kfd_dev *dev)
>> hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>> mutex_lock(&p->mutex);
>> pdd = kfd_get_process_device_data(dev, p);
>> +
>> + if (!pdd) {
>> + pr_err("Process device data doesn't exist\n");
>> + mutex_unlock(&p->mutex);
>> + return -EFAULT;
>> + }
>> +
>> if (pdd->bound != PDD_BOUND_SUSPENDED) {
>> mutex_unlock(&p->mutex);
>> continue;
>> @@ -501,6 +508,11 @@ void kfd_unbind_processes_from_device(struct
>> kfd_dev *dev)
>> mutex_lock(&p->mutex);
>> pdd = kfd_get_process_device_data(dev, p);
>>
>> + if (!pdd) {
>> + mutex_unlock(&p->mutex);
>> + return;
>> + }
>> +
>> if (pdd->bound == PDD_BOUND)
>> pdd->bound = PDD_BOUND_SUSPENDED;
>> mutex_unlock(&p->mutex);