LinuxLists.cc - [PATCH v3 0/3] Move TLB invalidation code for its own file and document it

2022-08-04 07:49:10

Subject: [PATCH v3 0/3] Move TLB invalidation code for its own file and document it

There are more things to be added to TLB invalidation. Before doing that,
move the code to its own file, and add the relevant documentation.

Patch 1 fixes vma_invalidate_tlb() logic to make it update the right var;

Patch 2 only moves the code and do some function renames. No functional
change;

Patch 3 adds documentation for the TLB invalidation algorithm and functions.

---

v3:
- Added a fix for an issue from the last TLB patch series;
- included a better description about the changes on patch 2;
- did some minor fixes at kernel-doc markups;

v2: only patch 2 (kernel-doc) was modified:

- The kernel-doc markups for TLB were added to i915.rst doc;
- Some minor fixes at the texts;
- Use a table instead of a literal block while explaining how the algorithm works.
That should make easier to understand the logic, both in text form and after
its conversion to HTML/PDF;
- Remove mention for GuC, as this depends on a series that will be sent later.

Chris Wilson (1):
drm/i915/gt: Move TLB invalidation to its own file

Mauro Carvalho Chehab (2):
drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()
drm/i915/gt: document TLB cache invalidation functions

Documentation/gpu/i915.rst | 7 +
drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +-
drivers/gpu/drm/i915/gt/intel_gt.c | 168 +----------------
drivers/gpu/drm/i915/gt/intel_gt.h | 12 --
drivers/gpu/drm/i915/gt/intel_ppgtt.c | 2 +-
drivers/gpu/drm/i915/gt/intel_tlb.c | 208 ++++++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 128 +++++++++++++
drivers/gpu/drm/i915/i915_vma.c | 7 +-
drivers/gpu/drm/i915/i915_vma.h | 2 +-
10 files changed, 355 insertions(+), 184 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

--
2.37.1

2022-08-04 08:11:38

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: [PATCH v3 3/3] drm/i915/gt: document TLB cache invalidation functions

Add a description for the TLB cache invalidation algorithm and for
the related kAPI functions.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/3] at: https://lore.kernel.org/all/[email protected]/

Documentation/gpu/i915.rst | 7 ++
drivers/gpu/drm/i915/gt/intel_tlb.c | 25 ++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 99 +++++++++++++++++++++++++++++
3 files changed, 131 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 4e59db1cfb00..46911fdd79e8 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
.. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
:internal:

+TLB cache invalidation
+----------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
+
Workarounds
-----------

diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
index af8cae979489..16b918ffe824 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
}

+/**
+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
+ * @gt: GT structure
+ * @seqno: sequence number
+ *
+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
+ * full TLB cache invalidation.
+ *
+ * Note:
+ * The TLB cache invalidation logic depends on GEN-specific registers.
+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
+ */
void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
{
intel_wakeref_t wakeref;
@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
}
}

+/**
+ * intel_gt_init_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * TLB cache invalidation logic internally uses some resources that require
+ * initialization. Should be called before doing any TLB cache invalidation.
+ */
void intel_gt_init_tlb(struct intel_gt *gt)
{
mutex_init(&gt->tlb.invalidate_lock);
seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
}

+/**
+ * intel_gt_fini_tlb - free TLB-specific vars
+ * @gt: GT structure
+ *
+ * Frees any resources needed by TLB cache invalidation logic.
+ */
void intel_gt_fini_tlb(struct intel_gt *gt)
{
mutex_destroy(&gt->tlb.invalidate_lock);
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
index 46ce25bf5afe..2838c051f872 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -11,16 +11,115 @@

#include "intel_gt_types.h"

+/**
+ * DOC: TLB cache invalidation logic
+ *
+ * The way the current algorithm works is that a struct drm_i915_gem_object can
+ * be created on any order. At unbind/evict time, the object is warranted that
+ * it won't be used anymore. So, a sequence number provided by
+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
+ * VMA async VMA bind.
+ *
+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
+ * where it checks if the sequence number of the object was already invalidated
+ * or not. If not, it flushes the TLB and increments the sequence number::
+ *
+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+ * {
+ * ...
+ * with_intel_gt_pm_if_awake(gt, wakeref) {
+ * mutex_lock(&gt->tlb.invalidate_lock);
+ * if (tlb_seqno_passed(gt, seqno))
+ * goto unlock;
+ *
+ * // Some code to do TLB invalidation
+ * ...
+ *
+ * write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
+ * mutex_lock(&gt->tlb.invalidate_lock);
+ * }
+ *
+ * So, let's say the current seqno is 2 and 3 new objects were created,
+ * on this order::
+ *
+ * obj1
+ * obj2
+ * obj3
+ *
+ * They can be unbind/evict on a different order. At unbind/evict time,
+ * the mm.tlb will be stamped with the sequence number, using the number
+ * from the last TLB flush, plus 1.
+ *
+ * Different threads may be used on unbind/evict and/or unset pages.
+ * As the logic at intel_gt_invalidate_tlb_full() is protected by a mutex,
+ * for simplicity, let's consider just two threads:
+ *
+ * +-------------------+-------------------------+---------------------------------+
+ * | sequence number | Thread 0 | Thread 1 +
+ * +===================+=========================+=================================+
+ * | seqno=2 | | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj3. | |
+ * | | | |
+ * | | obj3.mm.tlb = seqno | 1 | |
+ * | | // obj3.mm.tlb = 3 | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj1. | |
+ * | | | |
+ * | | obj1.mm.tlb = seqno | 1 | |
+ * | | // obj1.mm.tlb = 3 | |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj3 => TLB flush |
+ * | | | invalidating both obj1 and obj2.|
+ * | | | |
+ * | | | seqno += 2 |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=4 | | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj2. | |
+ * | | | |
+ * | | obj2.mm.tlb = seqno | 1 | |
+ * | | // obj2.mm.tlb = 5 | |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj1, don't flush |
+ * | | | as past flush invalidated obj1. |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj2 => TLB flush. |
+ * | | | invalidating obj2. |
+ * | | | |
+ * | | | seqno += 2 |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=6 | | |
+ * +-------------------+-------------------------+---------------------------------+
+ */
+
void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);

void intel_gt_init_tlb(struct intel_gt *gt);
void intel_gt_fini_tlb(struct intel_gt *gt);

+/**
+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
{
return seqprop_sequence(&gt->tlb.seqno);
}

+/**
+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
+ * sequence number
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
{
return intel_gt_tlb_seqno(gt) | 1;
--
2.37.1

2022-08-04 08:16:04

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()

WRITE_ONCE() should happen at the original var, not on a local
copy of it.

Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/3] at: https://lore.kernel.org/all/[email protected]/

drivers/gpu/drm/i915/gt/intel_ppgtt.c | 2 +-
drivers/gpu/drm/i915/i915_vma.c | 6 +++---
drivers/gpu/drm/i915/i915_vma.h | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 2da6c82a8bd2..6ee8d1127016 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -211,7 +211,7 @@ void ppgtt_unbind_vma(struct i915_address_space *vm,

vm->clear_range(vm, vma_res->start, vma_res->vma_size);
if (vma_res->tlb)
- vma_invalidate_tlb(vm, *vma_res->tlb);
+ vma_invalidate_tlb(vm, vma_res->tlb);
}

static unsigned long pd_count(u64 size, int shift)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 84a9ccbc5fc5..260371716490 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1308,7 +1308,7 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
return err;
}

-void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb)
{
/*
* Before we release the pages that were bound by this vma, we
@@ -1318,7 +1318,7 @@ void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
* the most recent TLB invalidation seqno, and if we have not yet
* flushed the TLBs upon release, perform a full invalidation.
*/
- WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
+ WRITE_ONCE(*tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
}

static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
@@ -1971,7 +1971,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
dma_fence_put(unbind_fence);
unbind_fence = NULL;
}
- vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
+ vma_invalidate_tlb(vma->vm, &vma->obj->mm.tlb);
}

/*
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 5048eed536da..33a58f605d75 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -213,7 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
u64 size, u64 alignment, u64 flags);
void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
void i915_vma_revoke_mmap(struct i915_vma *vma);
-void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb);
struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
int __i915_vma_unbind(struct i915_vma *vma);
int __must_check i915_vma_unbind(struct i915_vma *vma);
--
2.37.1

2022-08-04 08:21:21

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: [PATCH v3 2/3] drm/i915/gt: Move TLB invalidation to its own file

From: Chris Wilson <[email protected]>

Prepare for supporting more TLB invalidation scenarios by moving
the current MMIO invalidation to its own file.

It also:

- Renames intel_gt_invalidate_tlb() to intel_gt_invalidate_tlb_full()
- Add intel_gt_init_tlb() and intel_gt_fini_tlb() abstracts.

Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Niranjana Vishwanathapura <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Cc: Fei Yang <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/3] at: https://lore.kernel.org/all/[email protected]/

drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +-
drivers/gpu/drm/i915/gt/intel_gt.c | 168 +-------------------
drivers/gpu/drm/i915/gt/intel_gt.h | 12 --
drivers/gpu/drm/i915/gt/intel_tlb.c | 183 ++++++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 29 ++++
drivers/gpu/drm/i915/i915_vma.c | 1 +
7 files changed, 219 insertions(+), 179 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 522ef9b4aff3..d3df9832d1f7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -126,6 +126,7 @@ gt-y += \
gt/intel_sseu.o \
gt/intel_sseu_debugfs.o \
gt/intel_timeline.o \
+ gt/intel_tlb.o \
gt/intel_workarounds.o \
gt/shmem_utils.o \
gt/sysfs_engines.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 8357dbdcab5c..1cd76cc5d9f3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -7,7 +7,7 @@
#include <drm/drm_cache.h>

#include "gt/intel_gt.h"
-#include "gt/intel_gt_pm.h"
+#include "gt/intel_tlb.h"

#include "i915_drv.h"
#include "i915_gem_object.h"
@@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
if (!obj->mm.tlb)
return;

- intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+ intel_gt_invalidate_tlb_full(gt, obj->mm.tlb);
obj->mm.tlb = 0;
}

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f435e06125aa..18d82cd620bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,9 +11,7 @@
#include "pxp/intel_pxp.h"

#include "i915_drv.h"
-#include "i915_perf_oa_regs.h"
#include "intel_context.h"
-#include "intel_engine_pm.h"
#include "intel_engine_regs.h"
#include "intel_ggtt_gmch.h"
#include "intel_gt.h"
@@ -31,6 +29,7 @@
#include "intel_renderstate.h"
#include "intel_rps.h"
#include "intel_gt_sysfs.h"
+#include "intel_tlb.h"
#include "intel_uncore.h"
#include "shmem_utils.h"

@@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
intel_gt_init_reset(gt);
intel_gt_init_requests(gt);
intel_gt_init_timelines(gt);
- mutex_init(&gt->tlb.invalidate_lock);
- seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+ intel_gt_init_tlb(gt);
intel_gt_pm_init_early(gt);

intel_uc_init_early(&gt->uc);
@@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
intel_gt_fini_requests(gt);
intel_gt_fini_reset(gt);
intel_gt_fini_timelines(gt);
- mutex_destroy(&gt->tlb.invalidate_lock);
+ intel_gt_fini_tlb(gt);
intel_engines_free(gt);
}
}
@@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info,

intel_sseu_dump(&info->sseu, p);
}
-
-struct reg_and_bit {
- i915_reg_t reg;
- u32 bit;
-};
-
-static struct reg_and_bit
-get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
- const i915_reg_t *regs, const unsigned int num)
-{
- const unsigned int class = engine->class;
- struct reg_and_bit rb = { };
-
- if (drm_WARN_ON_ONCE(&engine->i915->drm,
- class >= num || !regs[class].reg))
- return rb;
-
- rb.reg = regs[class];
- if (gen8 && class == VIDEO_DECODE_CLASS)
- rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
- else
- rb.bit = engine->instance;
-
- rb.bit = BIT(rb.bit);
-
- return rb;
-}
-
-static void mmio_invalidate_full(struct intel_gt *gt)
-{
- static const i915_reg_t gen8_regs[] = {
- [RENDER_CLASS] = GEN8_RTCR,
- [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */
- [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR,
- [COPY_ENGINE_CLASS] = GEN8_BTCR,
- };
- static const i915_reg_t gen12_regs[] = {
- [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR,
- [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR,
- [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR,
- [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR,
- [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR,
- };
- struct drm_i915_private *i915 = gt->i915;
- struct intel_uncore *uncore = gt->uncore;
- struct intel_engine_cs *engine;
- intel_engine_mask_t awake, tmp;
- enum intel_engine_id id;
- const i915_reg_t *regs;
- unsigned int num = 0;
-
- if (GRAPHICS_VER(i915) == 12) {
- regs = gen12_regs;
- num = ARRAY_SIZE(gen12_regs);
- } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
- regs = gen8_regs;
- num = ARRAY_SIZE(gen8_regs);
- } else if (GRAPHICS_VER(i915) < 8) {
- return;
- }
-
- if (drm_WARN_ONCE(&i915->drm, !num,
- "Platform does not implement TLB invalidation!"))
- return;
-
- intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
-
- spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
-
- awake = 0;
- for_each_engine(engine, gt, id) {
- struct reg_and_bit rb;
-
- if (!intel_engine_pm_is_awake(engine))
- continue;
-
- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
- if (!i915_mmio_reg_offset(rb.reg))
- continue;
-
- intel_uncore_write_fw(uncore, rb.reg, rb.bit);
- awake |= engine->mask;
- }
-
- GT_TRACE(gt, "invalidated engines %08x\n", awake);
-
- /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
- if (awake &&
- (IS_TIGERLAKE(i915) ||
- IS_DG1(i915) ||
- IS_ROCKETLAKE(i915) ||
- IS_ALDERLAKE_S(i915) ||
- IS_ALDERLAKE_P(i915)))
- intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
-
- spin_unlock_irq(&uncore->lock);
-
- for_each_engine_masked(engine, gt, awake, tmp) {
- struct reg_and_bit rb;
-
- /*
- * HW architecture suggest typical invalidation time at 40us,
- * with pessimistic cases up to 100us and a recommendation to
- * cap at 1ms. We go a bit higher just in case.
- */
- const unsigned int timeout_us = 100;
- const unsigned int timeout_ms = 4;
-
- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
- if (__intel_wait_for_register_fw(uncore,
- rb.reg, rb.bit, 0,
- timeout_us, timeout_ms,
- NULL))
- drm_err_ratelimited(&gt->i915->drm,
- "%s TLB invalidation did not complete in %ums!\n",
- engine->name, timeout_ms);
- }
-
- /*
- * Use delayed put since a) we mostly expect a flurry of TLB
- * invalidations so it is good to avoid paying the forcewake cost and
- * b) it works around a bug in Icelake which cannot cope with too rapid
- * transitions.
- */
- intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-}
-
-static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
-{
- u32 cur = intel_gt_tlb_seqno(gt);
-
- /* Only skip if a *full* TLB invalidate barrier has passed */
- return (s32)(cur - ALIGN(seqno, 2)) > 0;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
-{
- intel_wakeref_t wakeref;
-
- if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
- return;
-
- if (intel_gt_is_wedged(gt))
- return;
-
- if (tlb_seqno_passed(gt, seqno))
- return;
-
- with_intel_gt_pm_if_awake(gt, wakeref) {
- mutex_lock(&gt->tlb.invalidate_lock);
- if (tlb_seqno_passed(gt, seqno))
- goto unlock;
-
- mmio_invalidate_full(gt);
-
- write_seqcount_invalidate(&gt->tlb.seqno);
-unlock:
- mutex_unlock(&gt->tlb.invalidate_lock);
- }
-}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 40b06adf509a..b4bba16cdb53 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info,

void intel_gt_watchdog_work(struct work_struct *work);

-static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
-{
- return seqprop_sequence(&gt->tlb.seqno);
-}
-
-static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
-{
- return intel_gt_tlb_seqno(gt) | 1;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
-
#endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
new file mode 100644
index 000000000000..af8cae979489
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
+#include "intel_engine_pm.h"
+#include "intel_gt.h"
+#include "intel_gt_pm.h"
+#include "intel_gt_regs.h"
+#include "intel_tlb.h"
+
+struct reg_and_bit {
+ i915_reg_t reg;
+ u32 bit;
+};
+
+static struct reg_and_bit
+get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
+ const i915_reg_t *regs, const unsigned int num)
+{
+ const unsigned int class = engine->class;
+ struct reg_and_bit rb = { };
+
+ if (drm_WARN_ON_ONCE(&engine->i915->drm,
+ class >= num || !regs[class].reg))
+ return rb;
+
+ rb.reg = regs[class];
+ if (gen8 && class == VIDEO_DECODE_CLASS)
+ rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
+ else
+ rb.bit = engine->instance;
+
+ rb.bit = BIT(rb.bit);
+
+ return rb;
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+ u32 cur = intel_gt_tlb_seqno(gt);
+
+ /* Only skip if a *full* TLB invalidate barrier has passed */
+ return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+static void mmio_invalidate_full(struct intel_gt *gt)
+{
+ static const i915_reg_t gen8_regs[] = {
+ [RENDER_CLASS] = GEN8_RTCR,
+ [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */
+ [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR,
+ [COPY_ENGINE_CLASS] = GEN8_BTCR,
+ };
+ static const i915_reg_t gen12_regs[] = {
+ [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR,
+ [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR,
+ [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR,
+ [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR,
+ [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR,
+ };
+ struct drm_i915_private *i915 = gt->i915;
+ struct intel_uncore *uncore = gt->uncore;
+ struct intel_engine_cs *engine;
+ intel_engine_mask_t awake, tmp;
+ enum intel_engine_id id;
+ const i915_reg_t *regs;
+ unsigned int num = 0;
+
+ if (GRAPHICS_VER(i915) == 12) {
+ regs = gen12_regs;
+ num = ARRAY_SIZE(gen12_regs);
+ } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
+ regs = gen8_regs;
+ num = ARRAY_SIZE(gen8_regs);
+ } else if (GRAPHICS_VER(i915) < 8) {
+ return;
+ }
+
+ if (drm_WARN_ONCE(&i915->drm, !num,
+ "Platform does not implement TLB invalidation!"))
+ return;
+
+ intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
+
+ spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+ awake = 0;
+ for_each_engine(engine, gt, id) {
+ struct reg_and_bit rb;
+
+ if (!intel_engine_pm_is_awake(engine))
+ continue;
+
+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+ if (!i915_mmio_reg_offset(rb.reg))
+ continue;
+
+ intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+ awake |= engine->mask;
+ }
+
+ GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
+ /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+ if (awake &&
+ (IS_TIGERLAKE(i915) ||
+ IS_DG1(i915) ||
+ IS_ROCKETLAKE(i915) ||
+ IS_ALDERLAKE_S(i915) ||
+ IS_ALDERLAKE_P(i915)))
+ intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
+ spin_unlock_irq(&uncore->lock);
+
+ for_each_engine_masked(engine, gt, awake, tmp) {
+ struct reg_and_bit rb;
+
+ /*
+ * HW architecture suggest typical invalidation time at 40us,
+ * with pessimistic cases up to 100us and a recommendation to
+ * cap at 1ms. We go a bit higher just in case.
+ */
+ const unsigned int timeout_us = 100;
+ const unsigned int timeout_ms = 4;
+
+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+ if (__intel_wait_for_register_fw(uncore,
+ rb.reg, rb.bit, 0,
+ timeout_us, timeout_ms,
+ NULL))
+ drm_err_ratelimited(&gt->i915->drm,
+ "%s TLB invalidation did not complete in %ums!\n",
+ engine->name, timeout_ms);
+ }
+
+ /*
+ * Use delayed put since a) we mostly expect a flurry of TLB
+ * invalidations so it is good to avoid paying the forcewake cost and
+ * b) it works around a bug in Icelake which cannot cope with too rapid
+ * transitions.
+ */
+ intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
+}
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+{
+ intel_wakeref_t wakeref;
+
+ if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+ return;
+
+ if (intel_gt_is_wedged(gt))
+ return;
+
+ if (tlb_seqno_passed(gt, seqno))
+ return;
+
+ with_intel_gt_pm_if_awake(gt, wakeref) {
+ mutex_lock(&gt->tlb.invalidate_lock);
+ if (tlb_seqno_passed(gt, seqno))
+ goto unlock;
+
+ mmio_invalidate_full(gt);
+
+ write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+ mutex_unlock(&gt->tlb.invalidate_lock);
+ }
+}
+
+void intel_gt_init_tlb(struct intel_gt *gt)
+{
+ mutex_init(&gt->tlb.invalidate_lock);
+ seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+}
+
+void intel_gt_fini_tlb(struct intel_gt *gt)
+{
+ mutex_destroy(&gt->tlb.invalidate_lock);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
new file mode 100644
index 000000000000..46ce25bf5afe
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef INTEL_TLB_H
+#define INTEL_TLB_H
+
+#include <linux/seqlock.h>
+#include <linux/types.h>
+
+#include "intel_gt_types.h"
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
+
+void intel_gt_init_tlb(struct intel_gt *gt);
+void intel_gt_fini_tlb(struct intel_gt *gt);
+
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+ return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+ return intel_gt_tlb_seqno(gt) | 1;
+}
+
+#endif /* INTEL_TLB_H */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 260371716490..21c6883dbeb2 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -33,6 +33,7 @@
#include "gt/intel_engine_heartbeat.h"
#include "gt/intel_gt.h"
#include "gt/intel_gt_requests.h"
+#include "gt/intel_tlb.h"

#include "i915_drv.h"
#include "i915_gem_evict.h"
--
2.37.1

2022-08-04 08:26:03

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()

On 04/08/2022 08:37, Mauro Carvalho Chehab wrote:
> WRITE_ONCE() should happen at the original var, not on a local
> copy of it.
>
> Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")

Cc: stable I think, since the above one was. So both hit 5.21 (or 6.1)
together.

Regards,

Tvrtko

> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> ---
>
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v3 0/3] at: https://lore.kernel.org/all/[email protected]/
>
> drivers/gpu/drm/i915/gt/intel_ppgtt.c | 2 +-
> drivers/gpu/drm/i915/i915_vma.c | 6 +++---
> drivers/gpu/drm/i915/i915_vma.h | 2 +-
> 3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 2da6c82a8bd2..6ee8d1127016 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -211,7 +211,7 @@ void ppgtt_unbind_vma(struct i915_address_space *vm,
>
> vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> if (vma_res->tlb)
> - vma_invalidate_tlb(vm, *vma_res->tlb);
> + vma_invalidate_tlb(vm, vma_res->tlb);
> }
>
> static unsigned long pd_count(u64 size, int shift)
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 84a9ccbc5fc5..260371716490 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -1308,7 +1308,7 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
> return err;
> }
>
> -void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb)
> {
> /*
> * Before we release the pages that were bound by this vma, we
> @@ -1318,7 +1318,7 @@ void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> * the most recent TLB invalidation seqno, and if we have not yet
> * flushed the TLBs upon release, perform a full invalidation.
> */
> - WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> + WRITE_ONCE(*tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> }
>
> static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
> @@ -1971,7 +1971,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
> dma_fence_put(unbind_fence);
> unbind_fence = NULL;
> }
> - vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
> + vma_invalidate_tlb(vma->vm, &vma->obj->mm.tlb);
> }
>
> /*
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 5048eed536da..33a58f605d75 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -213,7 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
> u64 size, u64 alignment, u64 flags);
> void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
> void i915_vma_revoke_mmap(struct i915_vma *vma);
> -void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb);
> struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
> int __i915_vma_unbind(struct i915_vma *vma);
> int __must_check i915_vma_unbind(struct i915_vma *vma);

2022-08-04 19:39:16

by Randy Dunlap

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] drm/i915/gt: document TLB cache invalidation functions

Hi Mauro,

On 8/4/22 00:37, Mauro Carvalho Chehab wrote:
> Add a description for the TLB cache invalidation algorithm and for
> the related kAPI functions.
>
> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> ---
>
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v3 0/3] at: https://lore.kernel.org/all/[email protected]/
>
> Documentation/gpu/i915.rst | 7 ++
> drivers/gpu/drm/i915/gt/intel_tlb.c | 25 ++++++++
> drivers/gpu/drm/i915/gt/intel_tlb.h | 99 +++++++++++++++++++++++++++++
> 3 files changed, 131 insertions(+)
>

> diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
> index af8cae979489..16b918ffe824 100644
> --- a/drivers/gpu/drm/i915/gt/intel_tlb.c
> +++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
> @@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
> intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> }
>
> +/**
> + * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
> + * @gt: GT structure

In multiple places (here and below) it would be nice to know what a
GT structure is. I looked thru multiple C and header files yesterday
and didn't find any comments about it.

Just saying that @gt is a GT structure isn't very helpful, other
than making kernel-doc shut up.

> + * @seqno: sequence number
> + *
> + * Do a full TLB cache invalidation if the @seqno is bigger than the last
> + * full TLB cache invalidation.
> + *
> + * Note:
> + * The TLB cache invalidation logic depends on GEN-specific registers.
> + * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
> + */
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> {
> intel_wakeref_t wakeref;
> @@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> }
> }
>
> +/**
> + * intel_gt_init_tlb - initialize TLB-specific vars
> + * @gt: GT structure
> + *
> + * TLB cache invalidation logic internally uses some resources that require
> + * initialization. Should be called before doing any TLB cache invalidation.
> + */
> void intel_gt_init_tlb(struct intel_gt *gt)
> {
> mutex_init(&gt->tlb.invalidate_lock);
> seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
> }
>
> +/**
> + * intel_gt_fini_tlb - free TLB-specific vars
> + * @gt: GT structure
> + *
> + * Frees any resources needed by TLB cache invalidation logic.
> + */
> void intel_gt_fini_tlb(struct intel_gt *gt)
> {
> mutex_destroy(&gt->tlb.invalidate_lock);
> diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
> index 46ce25bf5afe..2838c051f872 100644
> --- a/drivers/gpu/drm/i915/gt/intel_tlb.h
> +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
> @@ -11,16 +11,115 @@
>
> #include "intel_gt_types.h"
>
> +/**
> + * DOC: TLB cache invalidation logic
> + *
...
> +
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
>
> void intel_gt_init_tlb(struct intel_gt *gt);
> void intel_gt_fini_tlb(struct intel_gt *gt);
>
> +/**
> + * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
> + * @gt: GT structure
> + *
> + * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> + */
> static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> {
> return seqprop_sequence(&gt->tlb.seqno);
> }
>
> +/**
> + * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
> + * sequence number
> + * @gt: GT structure
> + *
> + * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> + */
> static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> {
> return intel_gt_tlb_seqno(gt) | 1;

thanks.

--
~Randy

2022-08-05 09:12:41

by Andi Shyti

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] drm/i915/gt: document TLB cache invalidation functions

Hi Randy,

> > +/**
> > + * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
> > + * @gt: GT structure
>
> In multiple places (here and below) it would be nice to know what a
> GT structure is. I looked thru multiple C and header files yesterday
> and didn't find any comments about it.
>
> Just saying that @gt is a GT structure isn't very helpful, other
> than making kernel-doc shut up.

the 'gt' belongs to the drivers/gpu/drm/i915/gt/ subsystem and
it's widely used a throughout i915.

I think it's inappropriate to describe it just here. On the other
hand I agree that a better documentation is required for the GT
itself where other parts can point to.

Andi

2022-08-05 09:35:24

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] drm/i915/gt: document TLB cache invalidation functions

On 05/08/2022 10:08, Andi Shyti wrote:
> Hi Randy,
>
>>> +/**
>>> + * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
>>> + * @gt: GT structure
>>
>> In multiple places (here and below) it would be nice to know what a
>> GT structure is. I looked thru multiple C and header files yesterday
>> and didn't find any comments about it.
>>
>> Just saying that @gt is a GT structure isn't very helpful, other
>> than making kernel-doc shut up.
>
> the 'gt' belongs to the drivers/gpu/drm/i915/gt/ subsystem and
> it's widely used a throughout i915.
>
> I think it's inappropriate to describe it just here. On the other
> hand I agree that a better documentation is required for the GT
> itself where other parts can point to.

Yeah agreed there is no point of copy pasting some explanation all over
the place. Could we just do s/GT structure/struct intel_gt/, or "pointer
to struct intel_gt to operate on", would that be good enough?

Regards,

Tvrtko

2022-08-05 10:52:52

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/i915/gt: document TLB cache invalidation functions

On Fri, 5 Aug 2022 10:24:25 +0100
Tvrtko Ursulin <[email protected]> wrote:

> On 05/08/2022 10:08, Andi Shyti wrote:
> > Hi Randy,
> >
> >>> +/**
> >>> + * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
> >>> + * @gt: GT structure
> >>
> >> In multiple places (here and below) it would be nice to know what a
> >> GT structure is. I looked thru multiple C and header files yesterday
> >> and didn't find any comments about it.
> >>
> >> Just saying that @gt is a GT structure isn't very helpful, other
> >> than making kernel-doc shut up.
> >
> > the 'gt' belongs to the drivers/gpu/drm/i915/gt/ subsystem and
> > it's widely used a throughout i915.
> >
> > I think it's inappropriate to describe it just here. On the other
> > hand I agree that a better documentation is required for the GT
> > itself where other parts can point to.

GT is actually a well-understood term for GPU developers. It is an alias
for:

https://en.wikipedia.org/wiki/Intel_Graphics_Technology

It is basically the "core" of the GPU, where the engine units sit.

I agree with Andi: terms like this should likely be defined on a glossary
at i915.rst file.

> Yeah agreed there is no point of copy pasting some explanation all over
> the place. Could we just do s/GT structure/struct intel_gt/, or "pointer
> to struct intel_gt to operate on", would that be good enough?

IMO, it won't make any difference. kerneldoc already says that the
parameter has struct intel_gt type on its output:

.. c:function:: void intel_gt_fini_tlb (struct intel_gt *gt)

free TLB-specific vars

**Parameters**

``struct intel_gt *gt``
GT structure

**Description**

Frees any resources needed by TLB cache invalidation logic.

This struct somewhat is similar to struct device. This is a container
struct that has the common data needed to control the GT hardware.

Almost all functions that work with GT needs it. There's not much sense
to describe it everywhere. What makes sense is to have struct intel_gt
documented at intel_gt_types.h, letting the build system to generate
hiperlinks to it.

This is easier said than done...

Regards,
Mauro

2022-08-08 17:09:33

by Andi Shyti

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] drm/i915/gt: document TLB cache invalidation functions

Hi Mauro,

On Thu, Aug 04, 2022 at 09:37:24AM +0200, Mauro Carvalho Chehab wrote:
> Add a description for the TLB cache invalidation algorithm and for
> the related kAPI functions.
>
> Signed-off-by: Mauro Carvalho Chehab <[email protected]>

Reviewed-by: Andi Shyti <[email protected]>

Andi

2022-08-08 17:19:08

by Andi Shyti

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()

Hi Mauro,

On Thu, Aug 04, 2022 at 09:37:22AM +0200, Mauro Carvalho Chehab wrote:
> WRITE_ONCE() should happen at the original var, not on a local
> copy of it.
>
> Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")
> Signed-off-by: Mauro Carvalho Chehab <[email protected]>

Reviewed-by: Andi Shyti <[email protected]>

Are you going to send it to the stable mailing list?

Andi

2022-08-08 19:27:22

by Rodrigo Vivi

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()

On Mon, Aug 08, 2022 at 06:37:58PM +0200, Andi Shyti wrote:
> Hi Mauro,
>
> On Thu, Aug 04, 2022 at 09:37:22AM +0200, Mauro Carvalho Chehab wrote:
> > WRITE_ONCE() should happen at the original var, not on a local
> > copy of it.
> >
> > Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")
> > Signed-off-by: Mauro Carvalho Chehab <[email protected]>
>
> Reviewed-by: Andi Shyti <[email protected]>

Thanks and pushed...

>
> Are you going to send it to the stable mailing list?

I added cc stable while pushing and the cherry-pick to drm-intel-next-fixes
has the right sha, so I'd assume this would be automagically now.
But yeap, it would be good if Mauro can follow up whenever this gets
to Linus tree and Greg's script start to pop up the heads-up messages.

Thanks,
Rodrigo.

>
> Andi

2022-08-08 23:28:39

by Andi Shyti

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()

Hi Rodrigo,

On Mon, Aug 08, 2022 at 03:04:13PM -0400, Rodrigo Vivi wrote:
> On Mon, Aug 08, 2022 at 06:37:58PM +0200, Andi Shyti wrote:
> > Hi Mauro,
> >
> > On Thu, Aug 04, 2022 at 09:37:22AM +0200, Mauro Carvalho Chehab wrote:
> > > WRITE_ONCE() should happen at the original var, not on a local
> > > copy of it.
> > >
> > > Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")
> > > Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> >
> > Reviewed-by: Andi Shyti <[email protected]>
>
> Thanks and pushed...

Thanks!

> >
> > Are you going to send it to the stable mailing list?
>
> I added cc stable while pushing and the cherry-pick to drm-intel-next-fixes
> has the right sha, so I'd assume this would be automagically now.
> But yeap, it would be good if Mauro can follow up whenever this gets
> to Linus tree and Greg's script start to pop up the heads-up messages.

That's what I meant... does Mauro now need to send the e-mail
again for the stable?

I thought there was some suspicion towards e-mails pushed without
being first sent to both stable and upstream mailing lists
because they can get lost or forgotten... maybe I'm wrong.

Andi

> Thanks,
> Rodrigo.
>
> >
> > Andi

2022-08-09 01:03:41

by Rodrigo Vivi

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()

On Tue, 2022-08-09 at 01:09 +0200, Andi Shyti wrote:
> Hi Rodrigo,
>
> On Mon, Aug 08, 2022 at 03:04:13PM -0400, Rodrigo Vivi wrote:
> > On Mon, Aug 08, 2022 at 06:37:58PM +0200, Andi Shyti wrote:
> > > Hi Mauro,
> > >
> > > On Thu, Aug 04, 2022 at 09:37:22AM +0200, Mauro Carvalho Chehab
> > > wrote:
> > > > WRITE_ONCE() should happen at the original var, not on a local
> > > > copy of it.
> > > >
> > > > Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")
> > > > Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> > >
> > > Reviewed-by: Andi Shyti <[email protected]>
> >
> > Thanks and pushed...
>
> Thanks!
>
> > >
> > > Are you going to send it to the stable mailing list?
> >
> > I added cc stable while pushing and the cherry-pick to drm-intel-
> > next-fixes
> > has the right sha, so I'd assume this would be automagically now.
> > But yeap, it would be good if Mauro can follow up whenever this
> > gets
> > to Linus tree and Greg's script start to pop up the heads-up
> > messages.
>
> That's what I meant... does Mauro now need to send the e-mail
> again for the stable?
>
> I thought there was some suspicion towards e-mails pushed without
> being first sent to both stable and upstream mailing lists
> because they can get lost or forgotten... maybe I'm wrong.

It doesn't help to send now to stable ml because it can only be merged
there after it reaches the Linus' master tree.
Right now with the right fixes and cc:stable it should be automatically
and he shouldn't worry.
But in case he notices that the first patch got in but the second
didn't then it is when we send the patch directly to the stable ml.

>
> Andi
>
> > Thanks,
> > Rodrigo.
> >
> > >
> > > Andi

2022-08-09 07:07:17

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb()

On Tue, 9 Aug 2022 00:15:14 +0000
"Vivi, Rodrigo" <[email protected]> wrote:

> On Tue, 2022-08-09 at 01:09 +0200, Andi Shyti wrote:
> > Hi Rodrigo,
> >
> > On Mon, Aug 08, 2022 at 03:04:13PM -0400, Rodrigo Vivi wrote:
> > > On Mon, Aug 08, 2022 at 06:37:58PM +0200, Andi Shyti wrote:
> > > > Hi Mauro,
> > > >
> > > > On Thu, Aug 04, 2022 at 09:37:22AM +0200, Mauro Carvalho Chehab
> > > > wrote:
> > > > > WRITE_ONCE() should happen at the original var, not on a local
> > > > > copy of it.
> > > > >
> > > > > Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")
> > > > > Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> > > >
> > > > Reviewed-by: Andi Shyti <[email protected]>
> > >
> > > Thanks and pushed...
> >
> > Thanks!
> >
> > > >
> > > > Are you going to send it to the stable mailing list?
> > >
> > > I added cc stable while pushing and the cherry-pick to drm-intel-
> > > next-fixes
> > > has the right sha, so I'd assume this would be automagically now.
> > > But yeap, it would be good if Mauro can follow up whenever this
> > > gets
> > > to Linus tree and Greg's script start to pop up the heads-up
> > > messages.
> >
> > That's what I meant... does Mauro now need to send the e-mail
> > again for the stable?
> >
> > I thought there was some suspicion towards e-mails pushed without
> > being first sent to both stable and upstream mailing lists
> > because they can get lost or forgotten... maybe I'm wrong.
>
> It doesn't help to send now to stable ml because it can only be merged
> there after it reaches the Linus' master tree.
> Right now with the right fixes and cc:stable it should be automatically
> and he shouldn't worry.
> But in case he notices that the first patch got in but the second
> didn't then it is when we send the patch directly to the stable ml.

I sent a heads-up to Greg to warn him about the issue. I'll keep my eyes
on the -stable automatic e-mails to double-check that they'll both be
merged altogether.

Thanks!
Mauro

>
>
> >
> > Andi
> >
> > > Thanks,
> > > Rodrigo.
> > >
> > > >
> > > > Andi
>