LinuxLists.cc - [PATCH v2 0/2] Move TLB invalidation code for its own file and document it

2022-07-29 07:06:33

Subject: [PATCH v2 0/2] Move TLB invalidation code for its own file and document it

There are more things to be added to TLB invalidation. Before doing that,
move the code to its own file, and add the relevant documentation.

Patch 1 only moves the code and do some function renames. No functional
change.

Patch 2 adds documentation for the TLB invalidation algorithm and functions.

---

v2: only patch 2 (kernel-doc) was modified:

- The kernel-doc markups for TLB were added to i915.rst doc;
- Some minor fixes at the texts;
- Use a table instead of a literal block while explaining how the algorithm works.
That should make easier to understand the logic, both in text form and after
its conversion to HTML/PDF;
- Remove mention for GuC, as this depends on a series that will be sent later.

Chris Wilson (1):
drm/i915/gt: Move TLB invalidation to its own file

Mauro Carvalho Chehab (1):
drm/i915/gt: document TLB cache invalidation functions

Documentation/gpu/i915.rst | 7 +
drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +-
drivers/gpu/drm/i915/gt/intel_gt.c | 168 +----------------
drivers/gpu/drm/i915/gt/intel_gt.h | 12 --
drivers/gpu/drm/i915/gt/intel_tlb.c | 208 ++++++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 130 ++++++++++++++
drivers/gpu/drm/i915/i915_vma.c | 1 +
8 files changed, 352 insertions(+), 179 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

--
2.36.1

2022-07-29 07:06:35

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions

Add a description for the TLB cache invalidation algorithm and for
the related kAPI functions.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 0/2] at: https://lore.kernel.org/all/[email protected]/

Documentation/gpu/i915.rst | 7 ++
drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
3 files changed, 133 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 4e59db1cfb00..46911fdd79e8 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
.. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
:internal:

+TLB cache invalidation
+----------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
+
Workarounds
-----------

diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
index af8cae979489..4873b7ecc015 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
}

+/**
+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
+ * @gt: GT structure
+ * @seqno: sequence number
+ *
+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
+ * full TLB cache invalidation.
+ *
+ * Note:
+ * The TLB cache invalidation logic depends on GEN-specific registers.
+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
+ */
void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
{
intel_wakeref_t wakeref;
@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
}
}

+/**
+ * intel_gt_init_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * TLB cache invalidation logic internally uses some resources that require
+ * initialization. Should be called before doing any TLB cache invalidation.
+ */
void intel_gt_init_tlb(struct intel_gt *gt)
{
mutex_init(&gt->tlb.invalidate_lock);
seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
}

+/**
+ * intel_gt_fini_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * Frees any resources needed by TLB cache invalidation logic.
+ */
void intel_gt_fini_tlb(struct intel_gt *gt)
{
mutex_destroy(&gt->tlb.invalidate_lock);
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
index 46ce25bf5afe..dca70c33bd61 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -11,16 +11,117 @@

#include "intel_gt_types.h"

+/**
+ * DOC: TLB cache invalidation logic
+ *
+ * The way the current algorithm works is that a struct drm_i915_gem_object can
+ * be created on any order. At unbind/evict time, the object is warranted that
+ * it won't be used anymore. So, a sequence number provided by
+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
+ * VMA async VMA bind.
+ *
+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
+ * where it checks if the sequence number of the object was already invalidated
+ * or not. If not, it flushes the TLB and increments the sequence number::
+ *
+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+ * {
+ * ...
+ * with_intel_gt_pm_if_awake(gt, wakeref) {
+ * mutex_lock(&gt->tlb.invalidate_lock);
+ * if (tlb_seqno_passed(gt, seqno))
+ * goto unlock;
+ *
+ * // Some code to do TLB invalidation
+ * ...
+ *
+ * write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
+ * mutex_lock(&gt->tlb.invalidate_lock);
+ * }
+ *
+ * So, let's say the current seqno is 2 and 3 new objects were created,
+ * on this order::
+ *
+ * obj1
+ * obj2
+ * obj3
+ *
+ * They can be unbind/evict on a different order. At unbind/evict time,
+ * the mm.tlb will be stamped with the sequence number, using the number
+ * from the last TLB flush, plus 1.
+ *
+ * Different threads may be used on unbind/evict and/or unset pages.
+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,
+ * for simplicity, let's consider just two threads:
+ *
+ * +-------------------+-------------------------+---------------------------------+
+ * | sequence number | Thread 0 | Thread 1 +
+ * +===================+=========================+=================================+
+ * | seqno=2 | | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj3. | |
+ * | | | |
+ * | | obj3.mm.tlb = seqno | 1 | |
+ * | | // obj3.mm.tlb = 3 | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj1. | |
+ * | | | |
+ * | | obj1.mm.tlb = seqno | 1 | |
+ * | | // obj1.mm.tlb = 3 | |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj3 => TLB flush |
+ * | | | invalidating both obj1 and obj2.|
+ * | | | |
+ * | | | seqno += 2 |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=4 | | |
+ * | +-------------------------+---------------------------------+
+ * | | unbind/evict obj2. | |
+ * | | | |
+ * | | obj2.mm.tlb = seqno | 1 | |
+ * | | // obj2.mm.tlb = 5 | |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj1, don't flush |
+ * | | | as past flush invalidated obj1. |
+ * | +-------------------------+---------------------------------+
+ * | | | __i915_gem_object_unset_pages() |
+ * | | | called for obj2 => TLB flush. |
+ * | | | invalidating obj2. |
+ * | | | |
+ * | | | seqno += 2 |
+ * +-------------------+-------------------------+---------------------------------+
+ * | seqno=6 | | |
+ * +-------------------+-------------------------+---------------------------------+
+ */
+
void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);

void intel_gt_init_tlb(struct intel_gt *gt);
void intel_gt_fini_tlb(struct intel_gt *gt);

+/**
+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
{
return seqprop_sequence(&gt->tlb.seqno);
}

+/**
+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
+ * sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
{
return intel_gt_tlb_seqno(gt) | 1;
--
2.36.1

2022-07-29 07:06:41

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file

From: Chris Wilson <[email protected]>

Prepare for supporting more TLB invalidation scenarios by moving
the current MMIO invalidation to its own file.

Signed-off-by: Chris Wilson <[email protected]>
Cc: Fei Yang <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 0/2] at: https://lore.kernel.org/all/[email protected]/

drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +-
drivers/gpu/drm/i915/gt/intel_gt.c | 168 +-------------------
drivers/gpu/drm/i915/gt/intel_gt.h | 12 --
drivers/gpu/drm/i915/gt/intel_tlb.c | 183 ++++++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 29 ++++
drivers/gpu/drm/i915/i915_vma.c | 1 +
7 files changed, 219 insertions(+), 179 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 522ef9b4aff3..d3df9832d1f7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -126,6 +126,7 @@ gt-y += \
gt/intel_sseu.o \
gt/intel_sseu_debugfs.o \
gt/intel_timeline.o \
+ gt/intel_tlb.o \
gt/intel_workarounds.o \
gt/shmem_utils.o \
gt/sysfs_engines.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 8357dbdcab5c..1cd76cc5d9f3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -7,7 +7,7 @@
#include <drm/drm_cache.h>

#include "gt/intel_gt.h"
-#include "gt/intel_gt_pm.h"
+#include "gt/intel_tlb.h"

#include "i915_drv.h"
#include "i915_gem_object.h"
@@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
if (!obj->mm.tlb)
return;

- intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+ intel_gt_invalidate_tlb_full(gt, obj->mm.tlb);
obj->mm.tlb = 0;
}

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f435e06125aa..18d82cd620bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,9 +11,7 @@
#include "pxp/intel_pxp.h"

#include "i915_drv.h"
-#include "i915_perf_oa_regs.h"
#include "intel_context.h"
-#include "intel_engine_pm.h"
#include "intel_engine_regs.h"
#include "intel_ggtt_gmch.h"
#include "intel_gt.h"
@@ -31,6 +29,7 @@
#include "intel_renderstate.h"
#include "intel_rps.h"
#include "intel_gt_sysfs.h"
+#include "intel_tlb.h"
#include "intel_uncore.h"
#include "shmem_utils.h"

@@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
intel_gt_init_reset(gt);
intel_gt_init_requests(gt);
intel_gt_init_timelines(gt);
- mutex_init(&gt->tlb.invalidate_lock);
- seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+ intel_gt_init_tlb(gt);
intel_gt_pm_init_early(gt);

intel_uc_init_early(&gt->uc);
@@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
intel_gt_fini_requests(gt);
intel_gt_fini_reset(gt);
intel_gt_fini_timelines(gt);
- mutex_destroy(&gt->tlb.invalidate_lock);
+ intel_gt_fini_tlb(gt);
intel_engines_free(gt);
}
}
@@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info,

intel_sseu_dump(&info->sseu, p);
}
-
-struct reg_and_bit {
- i915_reg_t reg;
- u32 bit;
-};
-
-static struct reg_and_bit
-get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
- const i915_reg_t *regs, const unsigned int num)
-{
- const unsigned int class = engine->class;
- struct reg_and_bit rb = { };
-
- if (drm_WARN_ON_ONCE(&engine->i915->drm,
- class >= num || !regs[class].reg))
- return rb;
-
- rb.reg = regs[class];
- if (gen8 && class == VIDEO_DECODE_CLASS)
- rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
- else
- rb.bit = engine->instance;
-
- rb.bit = BIT(rb.bit);
-
- return rb;
-}
-
-static void mmio_invalidate_full(struct intel_gt *gt)
-{
- static const i915_reg_t gen8_regs[] = {
- [RENDER_CLASS] = GEN8_RTCR,
- [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */
- [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR,
- [COPY_ENGINE_CLASS] = GEN8_BTCR,
- };
- static const i915_reg_t gen12_regs[] = {
- [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR,
- [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR,
- [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR,
- [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR,
- [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR,
- };
- struct drm_i915_private *i915 = gt->i915;
- struct intel_uncore *uncore = gt->uncore;
- struct intel_engine_cs *engine;
- intel_engine_mask_t awake, tmp;
- enum intel_engine_id id;
- const i915_reg_t *regs;
- unsigned int num = 0;
-
- if (GRAPHICS_VER(i915) == 12) {
- regs = gen12_regs;
- num = ARRAY_SIZE(gen12_regs);
- } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
- regs = gen8_regs;
- num = ARRAY_SIZE(gen8_regs);
- } else if (GRAPHICS_VER(i915) < 8) {
- return;
- }
-
- if (drm_WARN_ONCE(&i915->drm, !num,
- "Platform does not implement TLB invalidation!"))
- return;
-
- intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
-
- spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
-
- awake = 0;
- for_each_engine(engine, gt, id) {
- struct reg_and_bit rb;
-
- if (!intel_engine_pm_is_awake(engine))
- continue;
-
- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
- if (!i915_mmio_reg_offset(rb.reg))
- continue;
-
- intel_uncore_write_fw(uncore, rb.reg, rb.bit);
- awake |= engine->mask;
- }
-
- GT_TRACE(gt, "invalidated engines %08x\n", awake);
-
- /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
- if (awake &&
- (IS_TIGERLAKE(i915) ||
- IS_DG1(i915) ||
- IS_ROCKETLAKE(i915) ||
- IS_ALDERLAKE_S(i915) ||
- IS_ALDERLAKE_P(i915)))
- intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
-
- spin_unlock_irq(&uncore->lock);
-
- for_each_engine_masked(engine, gt, awake, tmp) {
- struct reg_and_bit rb;
-
- /*
- * HW architecture suggest typical invalidation time at 40us,
- * with pessimistic cases up to 100us and a recommendation to
- * cap at 1ms. We go a bit higher just in case.
- */
- const unsigned int timeout_us = 100;
- const unsigned int timeout_ms = 4;
-
- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
- if (__intel_wait_for_register_fw(uncore,
- rb.reg, rb.bit, 0,
- timeout_us, timeout_ms,
- NULL))
- drm_err_ratelimited(&gt->i915->drm,
- "%s TLB invalidation did not complete in %ums!\n",
- engine->name, timeout_ms);
- }
-
- /*
- * Use delayed put since a) we mostly expect a flurry of TLB
- * invalidations so it is good to avoid paying the forcewake cost and
- * b) it works around a bug in Icelake which cannot cope with too rapid
- * transitions.
- */
- intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-}
-
-static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
-{
- u32 cur = intel_gt_tlb_seqno(gt);
-
- /* Only skip if a *full* TLB invalidate barrier has passed */
- return (s32)(cur - ALIGN(seqno, 2)) > 0;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
-{
- intel_wakeref_t wakeref;
-
- if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
- return;
-
- if (intel_gt_is_wedged(gt))
- return;
-
- if (tlb_seqno_passed(gt, seqno))
- return;
-
- with_intel_gt_pm_if_awake(gt, wakeref) {
- mutex_lock(&gt->tlb.invalidate_lock);
- if (tlb_seqno_passed(gt, seqno))
- goto unlock;
-
- mmio_invalidate_full(gt);
-
- write_seqcount_invalidate(&gt->tlb.seqno);
-unlock:
- mutex_unlock(&gt->tlb.invalidate_lock);
- }
-}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 40b06adf509a..b4bba16cdb53 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info,

void intel_gt_watchdog_work(struct work_struct *work);

-static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
-{
- return seqprop_sequence(&gt->tlb.seqno);
-}
-
-static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
-{
- return intel_gt_tlb_seqno(gt) | 1;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
-
#endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
new file mode 100644
index 000000000000..af8cae979489
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
+#include "intel_engine_pm.h"
+#include "intel_gt.h"
+#include "intel_gt_pm.h"
+#include "intel_gt_regs.h"
+#include "intel_tlb.h"
+
+struct reg_and_bit {
+ i915_reg_t reg;
+ u32 bit;
+};
+
+static struct reg_and_bit
+get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
+ const i915_reg_t *regs, const unsigned int num)
+{
+ const unsigned int class = engine->class;
+ struct reg_and_bit rb = { };
+
+ if (drm_WARN_ON_ONCE(&engine->i915->drm,
+ class >= num || !regs[class].reg))
+ return rb;
+
+ rb.reg = regs[class];
+ if (gen8 && class == VIDEO_DECODE_CLASS)
+ rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
+ else
+ rb.bit = engine->instance;
+
+ rb.bit = BIT(rb.bit);
+
+ return rb;
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+ u32 cur = intel_gt_tlb_seqno(gt);
+
+ /* Only skip if a *full* TLB invalidate barrier has passed */
+ return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+static void mmio_invalidate_full(struct intel_gt *gt)
+{
+ static const i915_reg_t gen8_regs[] = {
+ [RENDER_CLASS] = GEN8_RTCR,
+ [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */
+ [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR,
+ [COPY_ENGINE_CLASS] = GEN8_BTCR,
+ };
+ static const i915_reg_t gen12_regs[] = {
+ [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR,
+ [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR,
+ [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR,
+ [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR,
+ [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR,
+ };
+ struct drm_i915_private *i915 = gt->i915;
+ struct intel_uncore *uncore = gt->uncore;
+ struct intel_engine_cs *engine;
+ intel_engine_mask_t awake, tmp;
+ enum intel_engine_id id;
+ const i915_reg_t *regs;
+ unsigned int num = 0;
+
+ if (GRAPHICS_VER(i915) == 12) {
+ regs = gen12_regs;
+ num = ARRAY_SIZE(gen12_regs);
+ } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
+ regs = gen8_regs;
+ num = ARRAY_SIZE(gen8_regs);
+ } else if (GRAPHICS_VER(i915) < 8) {
+ return;
+ }
+
+ if (drm_WARN_ONCE(&i915->drm, !num,
+ "Platform does not implement TLB invalidation!"))
+ return;
+
+ intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
+
+ spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+ awake = 0;
+ for_each_engine(engine, gt, id) {
+ struct reg_and_bit rb;
+
+ if (!intel_engine_pm_is_awake(engine))
+ continue;
+
+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+ if (!i915_mmio_reg_offset(rb.reg))
+ continue;
+
+ intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+ awake |= engine->mask;
+ }
+
+ GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
+ /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+ if (awake &&
+ (IS_TIGERLAKE(i915) ||
+ IS_DG1(i915) ||
+ IS_ROCKETLAKE(i915) ||
+ IS_ALDERLAKE_S(i915) ||
+ IS_ALDERLAKE_P(i915)))
+ intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
+ spin_unlock_irq(&uncore->lock);
+
+ for_each_engine_masked(engine, gt, awake, tmp) {
+ struct reg_and_bit rb;
+
+ /*
+ * HW architecture suggest typical invalidation time at 40us,
+ * with pessimistic cases up to 100us and a recommendation to
+ * cap at 1ms. We go a bit higher just in case.
+ */
+ const unsigned int timeout_us = 100;
+ const unsigned int timeout_ms = 4;
+
+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+ if (__intel_wait_for_register_fw(uncore,
+ rb.reg, rb.bit, 0,
+ timeout_us, timeout_ms,
+ NULL))
+ drm_err_ratelimited(&gt->i915->drm,
+ "%s TLB invalidation did not complete in %ums!\n",
+ engine->name, timeout_ms);
+ }
+
+ /*
+ * Use delayed put since a) we mostly expect a flurry of TLB
+ * invalidations so it is good to avoid paying the forcewake cost and
+ * b) it works around a bug in Icelake which cannot cope with too rapid
+ * transitions.
+ */
+ intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
+}
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+{
+ intel_wakeref_t wakeref;
+
+ if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+ return;
+
+ if (intel_gt_is_wedged(gt))
+ return;
+
+ if (tlb_seqno_passed(gt, seqno))
+ return;
+
+ with_intel_gt_pm_if_awake(gt, wakeref) {
+ mutex_lock(&gt->tlb.invalidate_lock);
+ if (tlb_seqno_passed(gt, seqno))
+ goto unlock;
+
+ mmio_invalidate_full(gt);
+
+ write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+ mutex_unlock(&gt->tlb.invalidate_lock);
+ }
+}
+
+void intel_gt_init_tlb(struct intel_gt *gt)
+{
+ mutex_init(&gt->tlb.invalidate_lock);
+ seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+}
+
+void intel_gt_fini_tlb(struct intel_gt *gt)
+{
+ mutex_destroy(&gt->tlb.invalidate_lock);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
new file mode 100644
index 000000000000..46ce25bf5afe
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef INTEL_TLB_H
+#define INTEL_TLB_H
+
+#include <linux/seqlock.h>
+#include <linux/types.h>
+
+#include "intel_gt_types.h"
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
+
+void intel_gt_init_tlb(struct intel_gt *gt);
+void intel_gt_fini_tlb(struct intel_gt *gt);
+
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+ return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+ return intel_gt_tlb_seqno(gt) | 1;
+}
+
+#endif /* INTEL_TLB_H */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 84a9ccbc5fc5..fe947d1456d5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -33,6 +33,7 @@
#include "gt/intel_engine_heartbeat.h"
#include "gt/intel_gt.h"
#include "gt/intel_gt_requests.h"
+#include "gt/intel_tlb.h"

#include "i915_drv.h"
#include "i915_gem_evict.h"
--
2.36.1

2022-08-02 22:23:27

by Niranjana Vishwanathapura

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file

On Fri, Jul 29, 2022 at 09:03:54AM +0200, Mauro Carvalho Chehab wrote:
>From: Chris Wilson <[email protected]>
>
>Prepare for supporting more TLB invalidation scenarios by moving
>the current MMIO invalidation to its own file.

And looks like,
1. Rename intel_gt_invalidate_tlb() to intel_gt_invalidate_tlb_full()
2. Add intel_gt_init_tlb() and intel_gt_fini_tlb() abstracts.

Reviewed-by: Niranjana Vishwanathapura <[email protected]>

>
>Signed-off-by: Chris Wilson <[email protected]>
>Cc: Fei Yang <[email protected]>
>Signed-off-by: Mauro Carvalho Chehab <[email protected]>
>---
>
>To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
>See [PATCH v2 0/2] at: https://lore.kernel.org/all/[email protected]/
>
> drivers/gpu/drm/i915/Makefile | 1 +
> drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +-
> drivers/gpu/drm/i915/gt/intel_gt.c | 168 +-------------------
> drivers/gpu/drm/i915/gt/intel_gt.h | 12 --
> drivers/gpu/drm/i915/gt/intel_tlb.c | 183 ++++++++++++++++++++++
> drivers/gpu/drm/i915/gt/intel_tlb.h | 29 ++++
> drivers/gpu/drm/i915/i915_vma.c | 1 +
> 7 files changed, 219 insertions(+), 179 deletions(-)
> create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
> create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h
>
>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>index 522ef9b4aff3..d3df9832d1f7 100644
>--- a/drivers/gpu/drm/i915/Makefile
>+++ b/drivers/gpu/drm/i915/Makefile
>@@ -126,6 +126,7 @@ gt-y += \
> gt/intel_sseu.o \
> gt/intel_sseu_debugfs.o \
> gt/intel_timeline.o \
>+ gt/intel_tlb.o \
> gt/intel_workarounds.o \
> gt/shmem_utils.o \
> gt/sysfs_engines.o
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>index 8357dbdcab5c..1cd76cc5d9f3 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>@@ -7,7 +7,7 @@
> #include <drm/drm_cache.h>
>
> #include "gt/intel_gt.h"
>-#include "gt/intel_gt_pm.h"
>+#include "gt/intel_tlb.h"
>
> #include "i915_drv.h"
> #include "i915_gem_object.h"
>@@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
> if (!obj->mm.tlb)
> return;
>
>- intel_gt_invalidate_tlb(gt, obj->mm.tlb);
>+ intel_gt_invalidate_tlb_full(gt, obj->mm.tlb);
> obj->mm.tlb = 0;
> }
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>index f435e06125aa..18d82cd620bd 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>@@ -11,9 +11,7 @@
> #include "pxp/intel_pxp.h"
>
> #include "i915_drv.h"
>-#include "i915_perf_oa_regs.h"
> #include "intel_context.h"
>-#include "intel_engine_pm.h"
> #include "intel_engine_regs.h"
> #include "intel_ggtt_gmch.h"
> #include "intel_gt.h"
>@@ -31,6 +29,7 @@
> #include "intel_renderstate.h"
> #include "intel_rps.h"
> #include "intel_gt_sysfs.h"
>+#include "intel_tlb.h"
> #include "intel_uncore.h"
> #include "shmem_utils.h"
>
>@@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
> intel_gt_init_reset(gt);
> intel_gt_init_requests(gt);
> intel_gt_init_timelines(gt);
>- mutex_init(&gt->tlb.invalidate_lock);
>- seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>+ intel_gt_init_tlb(gt);
> intel_gt_pm_init_early(gt);
>
> intel_uc_init_early(&gt->uc);
>@@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
> intel_gt_fini_requests(gt);
> intel_gt_fini_reset(gt);
> intel_gt_fini_timelines(gt);
>- mutex_destroy(&gt->tlb.invalidate_lock);
>+ intel_gt_fini_tlb(gt);
> intel_engines_free(gt);
> }
> }
>@@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>
> intel_sseu_dump(&info->sseu, p);
> }
>-
>-struct reg_and_bit {
>- i915_reg_t reg;
>- u32 bit;
>-};
>-
>-static struct reg_and_bit
>-get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>- const i915_reg_t *regs, const unsigned int num)
>-{
>- const unsigned int class = engine->class;
>- struct reg_and_bit rb = { };
>-
>- if (drm_WARN_ON_ONCE(&engine->i915->drm,
>- class >= num || !regs[class].reg))
>- return rb;
>-
>- rb.reg = regs[class];
>- if (gen8 && class == VIDEO_DECODE_CLASS)
>- rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
>- else
>- rb.bit = engine->instance;
>-
>- rb.bit = BIT(rb.bit);
>-
>- return rb;
>-}
>-
>-static void mmio_invalidate_full(struct intel_gt *gt)
>-{
>- static const i915_reg_t gen8_regs[] = {
>- [RENDER_CLASS] = GEN8_RTCR,
>- [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */
>- [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR,
>- [COPY_ENGINE_CLASS] = GEN8_BTCR,
>- };
>- static const i915_reg_t gen12_regs[] = {
>- [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR,
>- [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR,
>- [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR,
>- [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR,
>- [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR,
>- };
>- struct drm_i915_private *i915 = gt->i915;
>- struct intel_uncore *uncore = gt->uncore;
>- struct intel_engine_cs *engine;
>- intel_engine_mask_t awake, tmp;
>- enum intel_engine_id id;
>- const i915_reg_t *regs;
>- unsigned int num = 0;
>-
>- if (GRAPHICS_VER(i915) == 12) {
>- regs = gen12_regs;
>- num = ARRAY_SIZE(gen12_regs);
>- } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
>- regs = gen8_regs;
>- num = ARRAY_SIZE(gen8_regs);
>- } else if (GRAPHICS_VER(i915) < 8) {
>- return;
>- }
>-
>- if (drm_WARN_ONCE(&i915->drm, !num,
>- "Platform does not implement TLB invalidation!"))
>- return;
>-
>- intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>-
>- spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>-
>- awake = 0;
>- for_each_engine(engine, gt, id) {
>- struct reg_and_bit rb;
>-
>- if (!intel_engine_pm_is_awake(engine))
>- continue;
>-
>- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>- if (!i915_mmio_reg_offset(rb.reg))
>- continue;
>-
>- intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>- awake |= engine->mask;
>- }
>-
>- GT_TRACE(gt, "invalidated engines %08x\n", awake);
>-
>- /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>- if (awake &&
>- (IS_TIGERLAKE(i915) ||
>- IS_DG1(i915) ||
>- IS_ROCKETLAKE(i915) ||
>- IS_ALDERLAKE_S(i915) ||
>- IS_ALDERLAKE_P(i915)))
>- intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
>-
>- spin_unlock_irq(&uncore->lock);
>-
>- for_each_engine_masked(engine, gt, awake, tmp) {
>- struct reg_and_bit rb;
>-
>- /*
>- * HW architecture suggest typical invalidation time at 40us,
>- * with pessimistic cases up to 100us and a recommendation to
>- * cap at 1ms. We go a bit higher just in case.
>- */
>- const unsigned int timeout_us = 100;
>- const unsigned int timeout_ms = 4;
>-
>- rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>- if (__intel_wait_for_register_fw(uncore,
>- rb.reg, rb.bit, 0,
>- timeout_us, timeout_ms,
>- NULL))
>- drm_err_ratelimited(&gt->i915->drm,
>- "%s TLB invalidation did not complete in %ums!\n",
>- engine->name, timeout_ms);
>- }
>-
>- /*
>- * Use delayed put since a) we mostly expect a flurry of TLB
>- * invalidations so it is good to avoid paying the forcewake cost and
>- * b) it works around a bug in Icelake which cannot cope with too rapid
>- * transitions.
>- */
>- intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
>-}
>-
>-static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
>-{
>- u32 cur = intel_gt_tlb_seqno(gt);
>-
>- /* Only skip if a *full* TLB invalidate barrier has passed */
>- return (s32)(cur - ALIGN(seqno, 2)) > 0;
>-}
>-
>-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
>-{
>- intel_wakeref_t wakeref;
>-
>- if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>- return;
>-
>- if (intel_gt_is_wedged(gt))
>- return;
>-
>- if (tlb_seqno_passed(gt, seqno))
>- return;
>-
>- with_intel_gt_pm_if_awake(gt, wakeref) {
>- mutex_lock(&gt->tlb.invalidate_lock);
>- if (tlb_seqno_passed(gt, seqno))
>- goto unlock;
>-
>- mmio_invalidate_full(gt);
>-
>- write_seqcount_invalidate(&gt->tlb.seqno);
>-unlock:
>- mutex_unlock(&gt->tlb.invalidate_lock);
>- }
>-}
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>index 40b06adf509a..b4bba16cdb53 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.h
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>@@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>
> void intel_gt_watchdog_work(struct work_struct *work);
>
>-static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
>-{
>- return seqprop_sequence(&gt->tlb.seqno);
>-}
>-
>-static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
>-{
>- return intel_gt_tlb_seqno(gt) | 1;
>-}
>-
>-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
>-
> #endif /* __INTEL_GT_H__ */
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
>new file mode 100644
>index 000000000000..af8cae979489
>--- /dev/null
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
>@@ -0,0 +1,183 @@
>+// SPDX-License-Identifier: MIT
>+/*
>+ * Copyright ? 2022 Intel Corporation
>+ */
>+
>+#include "i915_drv.h"
>+#include "i915_perf_oa_regs.h"
>+#include "intel_engine_pm.h"
>+#include "intel_gt.h"
>+#include "intel_gt_pm.h"
>+#include "intel_gt_regs.h"
>+#include "intel_tlb.h"
>+
>+struct reg_and_bit {
>+ i915_reg_t reg;
>+ u32 bit;
>+};
>+
>+static struct reg_and_bit
>+get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>+ const i915_reg_t *regs, const unsigned int num)
>+{
>+ const unsigned int class = engine->class;
>+ struct reg_and_bit rb = { };
>+
>+ if (drm_WARN_ON_ONCE(&engine->i915->drm,
>+ class >= num || !regs[class].reg))
>+ return rb;
>+
>+ rb.reg = regs[class];
>+ if (gen8 && class == VIDEO_DECODE_CLASS)
>+ rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
>+ else
>+ rb.bit = engine->instance;
>+
>+ rb.bit = BIT(rb.bit);
>+
>+ return rb;
>+}
>+
>+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
>+{
>+ u32 cur = intel_gt_tlb_seqno(gt);
>+
>+ /* Only skip if a *full* TLB invalidate barrier has passed */
>+ return (s32)(cur - ALIGN(seqno, 2)) > 0;
>+}
>+
>+static void mmio_invalidate_full(struct intel_gt *gt)
>+{
>+ static const i915_reg_t gen8_regs[] = {
>+ [RENDER_CLASS] = GEN8_RTCR,
>+ [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */
>+ [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR,
>+ [COPY_ENGINE_CLASS] = GEN8_BTCR,
>+ };
>+ static const i915_reg_t gen12_regs[] = {
>+ [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR,
>+ [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR,
>+ [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR,
>+ [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR,
>+ [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR,
>+ };
>+ struct drm_i915_private *i915 = gt->i915;
>+ struct intel_uncore *uncore = gt->uncore;
>+ struct intel_engine_cs *engine;
>+ intel_engine_mask_t awake, tmp;
>+ enum intel_engine_id id;
>+ const i915_reg_t *regs;
>+ unsigned int num = 0;
>+
>+ if (GRAPHICS_VER(i915) == 12) {
>+ regs = gen12_regs;
>+ num = ARRAY_SIZE(gen12_regs);
>+ } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
>+ regs = gen8_regs;
>+ num = ARRAY_SIZE(gen8_regs);
>+ } else if (GRAPHICS_VER(i915) < 8) {
>+ return;
>+ }
>+
>+ if (drm_WARN_ONCE(&i915->drm, !num,
>+ "Platform does not implement TLB invalidation!"))
>+ return;
>+
>+ intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>+
>+ spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>+
>+ awake = 0;
>+ for_each_engine(engine, gt, id) {
>+ struct reg_and_bit rb;
>+
>+ if (!intel_engine_pm_is_awake(engine))
>+ continue;
>+
>+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>+ if (!i915_mmio_reg_offset(rb.reg))
>+ continue;
>+
>+ intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>+ awake |= engine->mask;
>+ }
>+
>+ GT_TRACE(gt, "invalidated engines %08x\n", awake);
>+
>+ /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>+ if (awake &&
>+ (IS_TIGERLAKE(i915) ||
>+ IS_DG1(i915) ||
>+ IS_ROCKETLAKE(i915) ||
>+ IS_ALDERLAKE_S(i915) ||
>+ IS_ALDERLAKE_P(i915)))
>+ intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
>+
>+ spin_unlock_irq(&uncore->lock);
>+
>+ for_each_engine_masked(engine, gt, awake, tmp) {
>+ struct reg_and_bit rb;
>+
>+ /*
>+ * HW architecture suggest typical invalidation time at 40us,
>+ * with pessimistic cases up to 100us and a recommendation to
>+ * cap at 1ms. We go a bit higher just in case.
>+ */
>+ const unsigned int timeout_us = 100;
>+ const unsigned int timeout_ms = 4;
>+
>+ rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>+ if (__intel_wait_for_register_fw(uncore,
>+ rb.reg, rb.bit, 0,
>+ timeout_us, timeout_ms,
>+ NULL))
>+ drm_err_ratelimited(&gt->i915->drm,
>+ "%s TLB invalidation did not complete in %ums!\n",
>+ engine->name, timeout_ms);
>+ }
>+
>+ /*
>+ * Use delayed put since a) we mostly expect a flurry of TLB
>+ * invalidations so it is good to avoid paying the forcewake cost and
>+ * b) it works around a bug in Icelake which cannot cope with too rapid
>+ * transitions.
>+ */
>+ intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
>+}
>+
>+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
>+{
>+ intel_wakeref_t wakeref;
>+
>+ if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>+ return;
>+
>+ if (intel_gt_is_wedged(gt))
>+ return;
>+
>+ if (tlb_seqno_passed(gt, seqno))
>+ return;
>+
>+ with_intel_gt_pm_if_awake(gt, wakeref) {
>+ mutex_lock(&gt->tlb.invalidate_lock);
>+ if (tlb_seqno_passed(gt, seqno))
>+ goto unlock;
>+
>+ mmio_invalidate_full(gt);
>+
>+ write_seqcount_invalidate(&gt->tlb.seqno);
>+unlock:
>+ mutex_unlock(&gt->tlb.invalidate_lock);
>+ }
>+}
>+
>+void intel_gt_init_tlb(struct intel_gt *gt)
>+{
>+ mutex_init(&gt->tlb.invalidate_lock);
>+ seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>+}
>+
>+void intel_gt_fini_tlb(struct intel_gt *gt)
>+{
>+ mutex_destroy(&gt->tlb.invalidate_lock);
>+}
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
>new file mode 100644
>index 000000000000..46ce25bf5afe
>--- /dev/null
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
>@@ -0,0 +1,29 @@
>+/* SPDX-License-Identifier: MIT */
>+/*
>+ * Copyright ? 2022 Intel Corporation
>+ */
>+
>+#ifndef INTEL_TLB_H
>+#define INTEL_TLB_H
>+
>+#include <linux/seqlock.h>
>+#include <linux/types.h>
>+
>+#include "intel_gt_types.h"
>+
>+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
>+
>+void intel_gt_init_tlb(struct intel_gt *gt);
>+void intel_gt_fini_tlb(struct intel_gt *gt);
>+
>+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
>+{
>+ return seqprop_sequence(&gt->tlb.seqno);
>+}
>+
>+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
>+{
>+ return intel_gt_tlb_seqno(gt) | 1;
>+}
>+
>+#endif /* INTEL_TLB_H */
>diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>index 84a9ccbc5fc5..fe947d1456d5 100644
>--- a/drivers/gpu/drm/i915/i915_vma.c
>+++ b/drivers/gpu/drm/i915/i915_vma.c
>@@ -33,6 +33,7 @@
> #include "gt/intel_engine_heartbeat.h"
> #include "gt/intel_gt.h"
> #include "gt/intel_gt_requests.h"
>+#include "gt/intel_tlb.h"
>
> #include "i915_drv.h"
> #include "i915_gem_evict.h"
>--
>2.36.1
>

2022-08-02 22:42:16

by Niranjana Vishwanathapura

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions

On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote:
>Add a description for the TLB cache invalidation algorithm and for
>the related kAPI functions.
>
>Signed-off-by: Mauro Carvalho Chehab <[email protected]>
>---
>
>To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
>See [PATCH v2 0/2] at: https://lore.kernel.org/all/[email protected]/
>
> Documentation/gpu/i915.rst | 7 ++
> drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++
> drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
> 3 files changed, 133 insertions(+)
>
>diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
>index 4e59db1cfb00..46911fdd79e8 100644
>--- a/Documentation/gpu/i915.rst
>+++ b/Documentation/gpu/i915.rst
>@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
> .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
> :internal:
>
>+TLB cache invalidation
>+----------------------
>+
>+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
>+
>+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
>+
> Workarounds
> -----------
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
>index af8cae979489..4873b7ecc015 100644
>--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
>@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
> intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> }
>
>+/**
>+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
>+ * @gt: GT structure
>+ * @seqno: sequence number
>+ *
>+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
>+ * full TLB cache invalidation.
>+ *
>+ * Note:
>+ * The TLB cache invalidation logic depends on GEN-specific registers.
>+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
>+ */
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> {
> intel_wakeref_t wakeref;
>@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> }
> }
>
>+/**
>+ * intel_gt_init_tlb - initialize TLB-specific vars
>+ * @gt: GT structure
>+ *
>+ * TLB cache invalidation logic internally uses some resources that require
>+ * initialization. Should be called before doing any TLB cache invalidation.
>+ */
> void intel_gt_init_tlb(struct intel_gt *gt)
> {
> mutex_init(&gt->tlb.invalidate_lock);
> seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
> }
>
>+/**
>+ * intel_gt_fini_tlb - initialize TLB-specific vars

Free TLB-specific vars

>+ * @gt: GT structure
>+ *
>+ * Frees any resources needed by TLB cache invalidation logic.
>+ */
> void intel_gt_fini_tlb(struct intel_gt *gt)
> {
> mutex_destroy(&gt->tlb.invalidate_lock);
>diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
>index 46ce25bf5afe..dca70c33bd61 100644
>--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
>+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
>@@ -11,16 +11,117 @@
>
> #include "intel_gt_types.h"
>
>+/**
>+ * DOC: TLB cache invalidation logic
>+ *
>+ * The way the current algorithm works is that a struct drm_i915_gem_object can
>+ * be created on any order. At unbind/evict time, the object is warranted that
>+ * it won't be used anymore. So, a sequence number provided by
>+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
>+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
>+ * VMA async VMA bind.
>+ *
>+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
>+ * where it checks if the sequence number of the object was already invalidated
>+ * or not. If not, it flushes the TLB and increments the sequence number::
>+ *
>+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
>+ * {
>+ * ...
>+ * with_intel_gt_pm_if_awake(gt, wakeref) {
>+ * mutex_lock(&gt->tlb.invalidate_lock);
>+ * if (tlb_seqno_passed(gt, seqno))
>+ * goto unlock;
>+ *
>+ * // Some code to do TLB invalidation
>+ * ...
>+ *
>+ * write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
>+ * mutex_lock(&gt->tlb.invalidate_lock);
>+ * }
>+ *
>+ * So, let's say the current seqno is 2 and 3 new objects were created,
>+ * on this order::
>+ *
>+ * obj1
>+ * obj2
>+ * obj3
>+ *
>+ * They can be unbind/evict on a different order. At unbind/evict time,
>+ * the mm.tlb will be stamped with the sequence number, using the number
>+ * from the last TLB flush, plus 1.

I am trying to get my head around the below function.

void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
{
WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
}

Though we pass obj->mm.tlb for 'tlb' while calling this function,
aren't we writing to local 'tlb' variable here instead of obj->mm.tlb?

>+ *
>+ * Different threads may be used on unbind/evict and/or unset pages.
>+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,

May be we can skip 'void' and just keep function name here.

>+ * for simplicity, let's consider just two threads:
>+ *
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | sequence number | Thread 0 | Thread 1 +
>+ * +===================+=========================+=================================+
>+ * | seqno=2 | | |
>+ * | +-------------------------+---------------------------------+
>+ * | | unbind/evict obj3. | |
>+ * | | | |
>+ * | | obj3.mm.tlb = seqno | 1 | |
>+ * | | // obj3.mm.tlb = 3 | |
>+ * | +-------------------------+---------------------------------+
>+ * | | unbind/evict obj1. | |
>+ * | | | |
>+ * | | obj1.mm.tlb = seqno | 1 | |
>+ * | | // obj1.mm.tlb = 3 | |
>+ * | +-------------------------+---------------------------------+
>+ * | | | __i915_gem_object_unset_pages() |
>+ * | | | called for obj3 => TLB flush |
>+ * | | | invalidating both obj1 and obj2.|
>+ * | | | |
>+ * | | | seqno += 2 |
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | seqno=4 | | |
>+ * | +-------------------------+---------------------------------+
>+ * | | unbind/evict obj2. | |
>+ * | | | |
>+ * | | obj2.mm.tlb = seqno | 1 | |
>+ * | | // obj2.mm.tlb = 5 | |
>+ * | +-------------------------+---------------------------------+
>+ * | | | __i915_gem_object_unset_pages() |
>+ * | | | called for obj1, don't flush |
>+ * | | | as past flush invalidated obj1. |
>+ * | +-------------------------+---------------------------------+
>+ * | | | __i915_gem_object_unset_pages() |
>+ * | | | called for obj2 => TLB flush. |
>+ * | | | invalidating obj2. |
>+ * | | | |
>+ * | | | seqno += 2 |
>+ * +-------------------+-------------------------+---------------------------------+
>+ * | seqno=6 | | |
>+ * +-------------------+-------------------------+---------------------------------+
>+ */
>+
> void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
>
> void intel_gt_init_tlb(struct intel_gt *gt);
> void intel_gt_fini_tlb(struct intel_gt *gt);
>
>+/**
>+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
>+ *

Probably this empty comment line needs to be removed before the parameter
description below?

>+ * @gt: GT structure
>+ *
>+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
>+ */
> static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> {
> return seqprop_sequence(&gt->tlb.seqno);
> }
>
>+/**
>+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
>+ * sequence number
>+ *

Same here.

-Niranjana

>+ * @gt: GT structure
>+ *
>+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
>+ */
> static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> {
> return intel_gt_tlb_seqno(gt) | 1;
>--
>2.36.1
>

2022-08-03 09:22:48

by Andi Shyti

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] drm/i915/gt: Move TLB invalidation to its own file

Hi Mauro,

On Fri, Jul 29, 2022 at 09:03:54AM +0200, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <[email protected]>
>
> Prepare for supporting more TLB invalidation scenarios by moving
> the current MMIO invalidation to its own file.
>
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Fei Yang <[email protected]>
> Signed-off-by: Mauro Carvalho Chehab <[email protected]>

I already reviewed this patch... anyway I checked it again and
it's all correct.

Reviewed-by: Andi Shyti <[email protected]>

Thanks,
Andi

2022-08-04 08:17:01

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gt: document TLB cache invalidation functions

On Tue, 2 Aug 2022 15:30:44 -0700
Niranjana Vishwanathapura <[email protected]> wrote:

> On Fri, Jul 29, 2022 at 09:03:55AM +0200, Mauro Carvalho Chehab wrote:
> >Add a description for the TLB cache invalidation algorithm and for
> >the related kAPI functions.
> >
> >Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> >---
> >
> >To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> >See [PATCH v2 0/2] at: https://lore.kernel.org/all/[email protected]/
> >
> > Documentation/gpu/i915.rst | 7 ++
> > drivers/gpu/drm/i915/gt/intel_tlb.c | 25 +++++++
> > drivers/gpu/drm/i915/gt/intel_tlb.h | 101 ++++++++++++++++++++++++++++
> > 3 files changed, 133 insertions(+)
> >
> >diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> >index 4e59db1cfb00..46911fdd79e8 100644
> >--- a/Documentation/gpu/i915.rst
> >+++ b/Documentation/gpu/i915.rst
> >@@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model)
> > .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c
> > :internal:
> >
> >+TLB cache invalidation
> >+----------------------
> >+
> >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h
> >+
> >+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c
> >+
> > Workarounds
> > -----------
> >
> >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
> >index af8cae979489..4873b7ecc015 100644
> >--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
> >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
> >@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
> > intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> > }
> >
> >+/**
> >+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
> >+ * @gt: GT structure
> >+ * @seqno: sequence number
> >+ *
> >+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
> >+ * full TLB cache invalidation.
> >+ *
> >+ * Note:
> >+ * The TLB cache invalidation logic depends on GEN-specific registers.
> >+ * It currently supports MMIO-based TLB flush for GEN8 to GEN12.
> >+ */
> > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> > {
> > intel_wakeref_t wakeref;
> >@@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> > }
> > }
> >
> >+/**
> >+ * intel_gt_init_tlb - initialize TLB-specific vars
> >+ * @gt: GT structure
> >+ *
> >+ * TLB cache invalidation logic internally uses some resources that require
> >+ * initialization. Should be called before doing any TLB cache invalidation.
> >+ */
> > void intel_gt_init_tlb(struct intel_gt *gt)
> > {
> > mutex_init(&gt->tlb.invalidate_lock);
> > seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
> > }
> >
> >+/**
> >+ * intel_gt_fini_tlb - initialize TLB-specific vars
>
> Free TLB-specific vars

OK.

>
> >+ * @gt: GT structure
> >+ *
> >+ * Frees any resources needed by TLB cache invalidation logic.
> >+ */
> > void intel_gt_fini_tlb(struct intel_gt *gt)
> > {
> > mutex_destroy(&gt->tlb.invalidate_lock);
> >diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
> >index 46ce25bf5afe..dca70c33bd61 100644
> >--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
> >+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
> >@@ -11,16 +11,117 @@
> >
> > #include "intel_gt_types.h"
> >
> >+/**
> >+ * DOC: TLB cache invalidation logic
> >+ *
> >+ * The way the current algorithm works is that a struct drm_i915_gem_object can
> >+ * be created on any order. At unbind/evict time, the object is warranted that
> >+ * it won't be used anymore. So, a sequence number provided by
> >+ * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either
> >+ * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for
> >+ * VMA async VMA bind.
> >+ *
> >+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called,
> >+ * where it checks if the sequence number of the object was already invalidated
> >+ * or not. If not, it flushes the TLB and increments the sequence number::
> >+ *
> >+ * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
> >+ * {
> >+ * ...
> >+ * with_intel_gt_pm_if_awake(gt, wakeref) {
> >+ * mutex_lock(&gt->tlb.invalidate_lock);
> >+ * if (tlb_seqno_passed(gt, seqno))
> >+ * goto unlock;
> >+ *
> >+ * // Some code to do TLB invalidation
> >+ * ...
> >+ *
> >+ * write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
> >+ * mutex_lock(&gt->tlb.invalidate_lock);
> >+ * }
> >+ *
> >+ * So, let's say the current seqno is 2 and 3 new objects were created,
> >+ * on this order::
> >+ *
> >+ * obj1
> >+ * obj2
> >+ * obj3
> >+ *
> >+ * They can be unbind/evict on a different order. At unbind/evict time,
> >+ * the mm.tlb will be stamped with the sequence number, using the number
> >+ * from the last TLB flush, plus 1.
>
> I am trying to get my head around the below function.
>
> void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> {
> WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> }
>
> Though we pass obj->mm.tlb for 'tlb' while calling this function,
> aren't we writing to local 'tlb' variable here instead of obj->mm.tlb?

It should be passing a pointer. I wrote such fix after a review,
but somehow it ended getting lost. I'll send the fix at v3.

> >+ *
> >+ * Different threads may be used on unbind/evict and/or unset pages.
> >+ * As the logic at void intel_gt_invalidate_tlb_full() is protected by a mutex,
>
> May be we can skip 'void' and just keep function name here.

Sure.

> >+ * for simplicity, let's consider just two threads:
> >+ *
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | sequence number | Thread 0 | Thread 1 +
> >+ * +===================+=========================+=================================+
> >+ * | seqno=2 | | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | unbind/evict obj3. | |
> >+ * | | | |
> >+ * | | obj3.mm.tlb = seqno | 1 | |
> >+ * | | // obj3.mm.tlb = 3 | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | unbind/evict obj1. | |
> >+ * | | | |
> >+ * | | obj1.mm.tlb = seqno | 1 | |
> >+ * | | // obj1.mm.tlb = 3 | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | | __i915_gem_object_unset_pages() |
> >+ * | | | called for obj3 => TLB flush |
> >+ * | | | invalidating both obj1 and obj2.|
> >+ * | | | |
> >+ * | | | seqno += 2 |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | seqno=4 | | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | unbind/evict obj2. | |
> >+ * | | | |
> >+ * | | obj2.mm.tlb = seqno | 1 | |
> >+ * | | // obj2.mm.tlb = 5 | |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | | __i915_gem_object_unset_pages() |
> >+ * | | | called for obj1, don't flush |
> >+ * | | | as past flush invalidated obj1. |
> >+ * | +-------------------------+---------------------------------+
> >+ * | | | __i915_gem_object_unset_pages() |
> >+ * | | | called for obj2 => TLB flush. |
> >+ * | | | invalidating obj2. |
> >+ * | | | |
> >+ * | | | seqno += 2 |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ * | seqno=6 | | |
> >+ * +-------------------+-------------------------+---------------------------------+
> >+ */
> >+
> > void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
> >
> > void intel_gt_init_tlb(struct intel_gt *gt);
> > void intel_gt_fini_tlb(struct intel_gt *gt);
> >
> >+/**
> >+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
> >+ *
>
> Probably this empty comment line needs to be removed before the parameter
> description below?

Kernel-doc actually accepts both with or without a blank line. My
personal preference is to place a blank line, because sometimes the
function description plus function name is bigger than one line.
So, it is usually clearer when adding a blank line than doing
something like this (perfectly valid kerneldoc markup):

/**
* long_function_name_foo - Lorem ipsum dolor sit amet, consectetur
* adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore.
* @bar: some parameter
* ...

But yeah, kernel-doc documentation example doesn't have a blank
line. So, I'll drop it.

>
> >+ * @gt: GT structure
> >+ *
> >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> >+ */
> > static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> > {
> > return seqprop_sequence(&gt->tlb.seqno);
> > }
> >
> >+/**
> >+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
> >+ * sequence number
> >+ *
>
> Same here.
>
> -Niranjana
>
> >+ * @gt: GT structure
> >+ *
> >+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
> >+ */
> > static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> > {
> > return intel_gt_tlb_seqno(gt) | 1;
> >--
> >2.36.1
> >

Thanks!
Mauro