2022-07-27 13:16:09

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: [PATCH v3 0/6] drm/i915: reduce TLB performance regressions

Doing TLB invalidation cause performance regressions, like:
[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!

As reported at:
https://gitlab.freedesktop.org/drm/intel/-/issues/6424

as this is an expensive operation. So, reduce the need of it by:
- checking if the engine is awake;
- checking if the engine is not wedged;
- batching operations.

Additionally, add a workaround for a known hardware issue on some GPUs.

In order to double-check that this series won't be introducing any regressions,
I used this new IGT test:

https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1

Checking the results for 3 different patchsets, on Broadwell:

1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
invalidation and serialization patches:

$ sudo build/tests/gem_exec_tlb|grep Subtest
Subtest close-clear: SUCCESS (10.490s)
Subtest madv-clear: SUCCESS (10.484s)
Subtest u-unmap-clear: SUCCESS (10.527s)
Subtest u-shrink-clear: SUCCESS (10.506s)
Subtest close-dumb: SUCCESS (10.165s)
Subtest madv-dumb: SUCCESS (10.177s)
Subtest u-unmap-dumb: SUCCESS (10.172s)
Subtest u-shrink-dumb: SUCCESS (10.172s)

2) With the new version of the batch TLB invalidation patches from this series:

$ sudo build/tests/gem_exec_tlb|grep Subtest
Subtest close-clear: SUCCESS (10.483s)
Subtest madv-clear: SUCCESS (10.495s)
Subtest u-unmap-clear: SUCCESS (10.545s)
Subtest u-shrink-clear: SUCCESS (10.508s)
Subtest close-dumb: SUCCESS (10.172s)
Subtest madv-dumb: SUCCESS (10.169s)
Subtest u-unmap-dumb: SUCCESS (10.174s)
Subtest u-shrink-dumb: SUCCESS (10.176s)

3) Changing the TLB invalidation routine to do nothing[1]:

$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
Dynamic subtest smem0 failed.
**** DEBUG ****
(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
**** END ****
Subtest close-clear: FAIL (10.434s)
Subtest madv-clear: SUCCESS (10.479s)
Subtest u-unmap-clear: SUCCESS (10.512s)

In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
applied don't have TLB invalidation cache issues.

[1] I applied this patch on the top of drm-tip:

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 68c2b0d8f187..0aefcd7be5e9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
+ // HACK: don't do TLB invalidations!!!
+ return;
+

Regards,
Mauro

Chris Wilson (4):
drm/i915/gt: Ignore TLB invalidations on idle engines
drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
drm/i915/gt: Skip TLB invalidations once wedged
drm/i915/gt: Batch TLB invalidations

Mauro Carvalho Chehab (2):
drm/i915/gt: document with_intel_gt_pm_if_awake()
drm/i915/gt: describe the new tlb parameter at i915_vma_resource

.../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +-
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++---
drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++----
drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++-
drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++
drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++-
drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +-
drivers/gpu/drm/i915/i915_vma.c | 33 ++++++--
drivers/gpu/drm/i915/i915_vma.h | 1 +
drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++-
drivers/gpu/drm/i915/i915_vma_resource.h | 6 +-
11 files changed, 163 insertions(+), 40 deletions(-)

--
2.36.1



2022-07-27 13:16:52

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake()

Add a kernel-doc markup to document this new macro.

Reviewed-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/[email protected]/

drivers/gpu/drm/i915/gt/intel_gt_pm.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index a334787a4939..6c9a46452364 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,14 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
for (tmp = 1, intel_gt_pm_get(gt); tmp; \
intel_gt_pm_put(gt), tmp = 0)

+/**
+ * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent
+ * it to sleep, run some code and then asynchrously put the reference
+ * away.
+ *
+ * @gt: pointer to the gt
+ * @wf: pointer to a temporary wakeref.
+ */
#define with_intel_gt_pm_if_awake(gt, wf) \
for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)

--
2.36.1

2022-07-27 14:49:08

by Andi Shyti

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake()

Hi Mauro,

> Add a kernel-doc markup to document this new macro.
>
> Reviewed-by: Tvrtko Ursulin <[email protected]>
> Signed-off-by: Mauro Carvalho Chehab <[email protected]>

Reviewed-by: Andi Shyti <[email protected]>

Andi

2022-07-28 13:15:50

by Andi Shyti

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions

Hi Mauro,

Pushed in drm-intel-gt-next.

Thanks,
Andi

On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote:
> Doing TLB invalidation cause performance regressions, like:
> [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
>
> As reported at:
> https://gitlab.freedesktop.org/drm/intel/-/issues/6424
>
> as this is an expensive operation. So, reduce the need of it by:
> - checking if the engine is awake;
> - checking if the engine is not wedged;
> - batching operations.
>
> Additionally, add a workaround for a known hardware issue on some GPUs.
>
> In order to double-check that this series won't be introducing any regressions,
> I used this new IGT test:
>
> https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1
>
> Checking the results for 3 different patchsets, on Broadwell:
>
> 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
> invalidation and serialization patches:
>
> $ sudo build/tests/gem_exec_tlb|grep Subtest
> Subtest close-clear: SUCCESS (10.490s)
> Subtest madv-clear: SUCCESS (10.484s)
> Subtest u-unmap-clear: SUCCESS (10.527s)
> Subtest u-shrink-clear: SUCCESS (10.506s)
> Subtest close-dumb: SUCCESS (10.165s)
> Subtest madv-dumb: SUCCESS (10.177s)
> Subtest u-unmap-dumb: SUCCESS (10.172s)
> Subtest u-shrink-dumb: SUCCESS (10.172s)
>
> 2) With the new version of the batch TLB invalidation patches from this series:
>
> $ sudo build/tests/gem_exec_tlb|grep Subtest
> Subtest close-clear: SUCCESS (10.483s)
> Subtest madv-clear: SUCCESS (10.495s)
> Subtest u-unmap-clear: SUCCESS (10.545s)
> Subtest u-shrink-clear: SUCCESS (10.508s)
> Subtest close-dumb: SUCCESS (10.172s)
> Subtest madv-dumb: SUCCESS (10.169s)
> Subtest u-unmap-dumb: SUCCESS (10.174s)
> Subtest u-shrink-dumb: SUCCESS (10.176s)
>
> 3) Changing the TLB invalidation routine to do nothing[1]:
>
> $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
> (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
> (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
> (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
> (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
> (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
> (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
> (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
> (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
> (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
> Dynamic subtest smem0 failed.
> **** DEBUG ****
> (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
> (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
> (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
> (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
> **** END ****
> Subtest close-clear: FAIL (10.434s)
> Subtest madv-clear: SUCCESS (10.479s)
> Subtest u-unmap-clear: SUCCESS (10.512s)
>
> In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
> as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
> applied don't have TLB invalidation cache issues.
>
> [1] I applied this patch on the top of drm-tip:
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 68c2b0d8f187..0aefcd7be5e9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> + // HACK: don't do TLB invalidations!!!
> + return;
> +
>
> Regards,
> Mauro
>
> Chris Wilson (4):
> drm/i915/gt: Ignore TLB invalidations on idle engines
> drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
> drm/i915/gt: Skip TLB invalidations once wedged
> drm/i915/gt: Batch TLB invalidations
>
> Mauro Carvalho Chehab (2):
> drm/i915/gt: document with_intel_gt_pm_if_awake()
> drm/i915/gt: describe the new tlb parameter at i915_vma_resource
>
> .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +-
> drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++---
> drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++----
> drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++-
> drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++
> drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++-
> drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +-
> drivers/gpu/drm/i915/i915_vma.c | 33 ++++++--
> drivers/gpu/drm/i915/i915_vma.h | 1 +
> drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++-
> drivers/gpu/drm/i915/i915_vma_resource.h | 6 +-
> 11 files changed, 163 insertions(+), 40 deletions(-)
>
> --
> 2.36.1
>