2018-06-26 14:49:27

by Michel Dänzer

[permalink] [raw]
Subject: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

From: Michel Dänzer <[email protected]>

Fixes the BUG_ON spuriously triggering under the following
circumstances:

* ttm_eu_reserve_buffers processes a list containing multiple BOs using
the same reservation object, so it calls
reservation_object_reserve_shared with that reservation object once
for each such BO.
* In reservation_object_reserve_shared, old->shared_count ==
old->shared_max - 1, so obj->staged is freed in preparation of an
in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
once for each of the BOs above, always with the same fence.
* The first call adds the fence in the remaining free slot, after which
old->shared_count == old->shared_max.

In the next call to reservation_object_add_shared_fence, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since the fence is already in the
reservation object.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).

Cc: [email protected]
Signed-off-by: Michel Dänzer <[email protected]>
---
drivers/dma-buf/reservation.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..532545b9488e 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
if (signaled) {
RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
} else {
+ BUG_ON(fobj->shared_count >= fobj->shared_max);
RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
fobj->shared_count++;
}
@@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
old = reservation_object_get_list(obj);
obj->staged = NULL;

- if (!fobj) {
- BUG_ON(old->shared_count >= old->shared_max);
+ if (!fobj)
reservation_object_add_shared_inplace(obj, old, fence);
- } else
+ else
reservation_object_add_shared_replace(obj, old, fobj, fence);
}
EXPORT_SYMBOL(reservation_object_add_shared_fence);
--
2.18.0



2018-06-27 11:58:09

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

Quoting Michel Dänzer (2018-06-26 15:31:47)
> From: Michel Dänzer <[email protected]>
>
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
>
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
> the same reservation object, so it calls
> reservation_object_reserve_shared with that reservation object once
> for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
> old->shared_max - 1, so obj->staged is freed in preparation of an
> in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
> once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
> old->shared_count == old->shared_max.
>
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
> Cc: [email protected]
> Signed-off-by: Michel Dänzer <[email protected]>

I've convinced myself (or rather have not found a valid argument
against) that being able to call reserve_shared + add_shared multiple
times for the same fence is an intended part of reservation_object API

I'd double check with Christian though.

Reviewed-by: Chris Wilson <[email protected]>

> drivers/dma-buf/reservation.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> if (signaled) {
> RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
> } else {
> + BUG_ON(fobj->shared_count >= fobj->shared_max);

Personally I would just let kasan detect this and throw away the BUG_ON
or at least move it behind some DMABUF_BUG_ON().
-Chris

2018-06-27 16:07:23

by Michel Dänzer

[permalink] [raw]
Subject: Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

On 2018-06-27 01:50 PM, Chris Wilson wrote:
> Quoting Michel Dänzer (2018-06-26 15:31:47)
>> From: Michel Dänzer <[email protected]>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>> the same reservation object, so it calls
>> reservation_object_reserve_shared with that reservation object once
>> for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>> old->shared_max - 1, so obj->staged is freed in preparation of an
>> in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>> once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>> old->shared_count == old->shared_max.
>>
>> In the next call to reservation_object_add_shared_fence, the BUG_ON
>> triggers. However, nothing bad would happen in
>> reservation_object_add_shared_inplace, since the fence is already in the
>> reservation object.
>>
>> Prevent this by moving the BUG_ON to where an overflow would actually
>> happen (e.g. if a buggy caller didn't call
>> reservation_object_reserve_shared before).
>>
>> Cc: [email protected]
>> Signed-off-by: Michel Dänzer <[email protected]>
>
> I've convinced myself (or rather have not found a valid argument
> against) that being able to call reserve_shared + add_shared multiple
> times for the same fence is an intended part of reservation_object API
>
> I'd double check with Christian though.

Right, I'm interested in Christian's feedback.


> Reviewed-by: Chris Wilson <[email protected]>

Thanks!


>> drivers/dma-buf/reservation.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index 314eb1071cce..532545b9488e 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>> if (signaled) {
>> RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>> } else {
>> + BUG_ON(fobj->shared_count >= fobj->shared_max);
>
> Personally I would just let kasan detect this and throw away the BUG_ON
> or at least move it behind some DMABUF_BUG_ON().

Hmm. Normally, I'm not a fan of BUG(_ON) either. But in this case, it's
clear that the caller is buggy, and proceeding to write beyond the end
of the array could have far-reaching consequences. I'm leaving that to
somebody else.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-07-04 08:33:32

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
> From: Michel Dänzer <[email protected]>
>
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
>
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
> the same reservation object, so it calls
> reservation_object_reserve_shared with that reservation object once
> for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
> old->shared_max - 1, so obj->staged is freed in preparation of an
> in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
> once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
> old->shared_count == old->shared_max.

Well, the explanation here is not correct. For multiple BOs using the
same reservation object we won't call
reservation_object_add_shared_fence() multiple times because we move
those to the duplicates list in ttm_eu_reserve_buffers().

But this bug can still happen because we call
reservation_object_add_shared_fence() manually with fences for the same
context in a couple of places.

One prominent case which comes to my mind are for the VM BOs during
updates. Another possibility are VRAM BOs which need to be cleared.

>
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
> Cc: [email protected]
> Signed-off-by: Michel Dänzer <[email protected]>

Reviewed-by: Christian König <[email protected]>.

Regards,
Christian.

> ---
> drivers/dma-buf/reservation.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> if (signaled) {
> RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
> } else {
> + BUG_ON(fobj->shared_count >= fobj->shared_max);
> RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
> fobj->shared_count++;
> }
> @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
> old = reservation_object_get_list(obj);
> obj->staged = NULL;
>
> - if (!fobj) {
> - BUG_ON(old->shared_count >= old->shared_max);
> + if (!fobj)
> reservation_object_add_shared_inplace(obj, old, fence);
> - } else
> + else
> reservation_object_add_shared_replace(obj, old, fobj, fence);
> }
> EXPORT_SYMBOL(reservation_object_add_shared_fence);


2018-07-04 09:11:03

by Michel Dänzer

[permalink] [raw]
Subject: Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

On 2018-07-04 10:31 AM, Christian König wrote:
> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>> From: Michel Dänzer <[email protected]>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>    the same reservation object, so it calls
>>    reservation_object_reserve_shared with that reservation object once
>>    for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>>    old->shared_max - 1, so obj->staged is freed in preparation of an
>>    in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>    once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>>    old->shared_count == old->shared_max.
>
> Well, the explanation here is not correct. For multiple BOs using the
> same reservation object we won't call
> reservation_object_add_shared_fence() multiple times because we move
> those to the duplicates list in ttm_eu_reserve_buffers().
>
> But this bug can still happen because we call
> reservation_object_add_shared_fence() manually with fences for the same
> context in a couple of places.
>
> One prominent case which comes to my mind are for the VM BOs during
> updates. Another possibility are VRAM BOs which need to be cleared.

Thanks. How about the following:

* ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
* In reservation_object_reserve_shared, shared_count == shared_max - 1,
so obj->staged is freed in preparation of an in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
after which shared_count == shared_max.
* The amdgpu driver also calls reservation_object_add_shared_fence for
the same reservation object, and the BUG_ON triggers.

However, nothing bad would happen in
reservation_object_add_shared_inplace, since all fences use the same
context, so they can only occupy a single slot.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).


Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
as I suspect this fix is necessary under the circumstances described
there as well.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-07-04 09:32:09

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

Am 04.07.2018 um 11:09 schrieb Michel Dänzer:
> On 2018-07-04 10:31 AM, Christian König wrote:
>> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>>> From: Michel Dänzer <[email protected]>
>>>
>>> Fixes the BUG_ON spuriously triggering under the following
>>> circumstances:
>>>
>>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>>    the same reservation object, so it calls
>>>    reservation_object_reserve_shared with that reservation object once
>>>    for each such BO.
>>> * In reservation_object_reserve_shared, old->shared_count ==
>>>    old->shared_max - 1, so obj->staged is freed in preparation of an
>>>    in-place update.
>>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>>    once for each of the BOs above, always with the same fence.
>>> * The first call adds the fence in the remaining free slot, after which
>>>    old->shared_count == old->shared_max.
>> Well, the explanation here is not correct. For multiple BOs using the
>> same reservation object we won't call
>> reservation_object_add_shared_fence() multiple times because we move
>> those to the duplicates list in ttm_eu_reserve_buffers().
>>
>> But this bug can still happen because we call
>> reservation_object_add_shared_fence() manually with fences for the same
>> context in a couple of places.
>>
>> One prominent case which comes to my mind are for the VM BOs during
>> updates. Another possibility are VRAM BOs which need to be cleared.
> Thanks. How about the following:
>
> * ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
> * In reservation_object_reserve_shared, shared_count == shared_max - 1,
> so obj->staged is freed in preparation of an in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
> after which shared_count == shared_max.
> * The amdgpu driver also calls reservation_object_add_shared_fence for
> the same reservation object, and the BUG_ON triggers.

I would rather completely drop the reference to the ttm_eu_* functions,
cause those wrappers are completely unrelated to the problem.

Instead let's just note something like the following:

* When reservation_object_reserve_shared is called with shared_count ==
shared_max - 1,
  so obj->staged is freed in preparation of an in-place update.

* Now reservation_object_add_shared_fence is called with the first fence
and after that shared_count == shared_max.

* After that  reservation_object_add_shared_fence can be called with
follow up fences from the same context, but since shared_count ==
shared_max we would run into this BUG_ON.

> However, nothing bad would happen in
> reservation_object_add_shared_inplace, since all fences use the same
> context, so they can only occupy a single slot.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
>
> Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
> as I suspect this fix is necessary under the circumstances described
> there as well.

The rest sounds good to me.

Regards,
Christian.

2018-07-04 15:17:27

by Michel Dänzer

[permalink] [raw]
Subject: [PATCH v2] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

From: Michel Dänzer <[email protected]>

Fixes the BUG_ON spuriously triggering under the following
circumstances:

* reservation_object_reserve_shared is called with shared_count ==
shared_max - 1, so obj->staged is freed in preparation of an in-place
update.

* reservation_object_add_shared_fence is called with the first fence,
after which shared_count == shared_max.

* reservation_object_add_shared_fence is called with a follow-up fence
from the same context.

In the second reservation_object_add_shared_fence call, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since both fences are from the
same context, so they only occupy a single slot.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).

v2:
* Fix description of breaking scenario (Christian König)
* Add bugzilla reference

Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/106418
Reviewed-by: Chris Wilson <[email protected]> # v1
Reviewed-by: Christian König <[email protected]> # v1
Signed-off-by: Michel Dänzer <[email protected]>
---
drivers/dma-buf/reservation.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..532545b9488e 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
if (signaled) {
RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
} else {
+ BUG_ON(fobj->shared_count >= fobj->shared_max);
RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
fobj->shared_count++;
}
@@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
old = reservation_object_get_list(obj);
obj->staged = NULL;

- if (!fobj) {
- BUG_ON(old->shared_count >= old->shared_max);
+ if (!fobj)
reservation_object_add_shared_inplace(obj, old, fence);
- } else
+ else
reservation_object_add_shared_replace(obj, old, fobj, fence);
}
EXPORT_SYMBOL(reservation_object_add_shared_fence);
--
2.18.0