This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patchset vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patchset fixes all common PTE read errors.
Changelog:
v2:
- modified commit message (I hope the diagnosis is correct),
- added bug checks to ensure scratch is initialized on gen3 platforms.
CI produces strange stacktrace for it suggesting scratch[0] is NULL,
to be removed after resolving the issue with gen3 platforms.
v3:
- removed bug checks, replaced with gen check.
v4:
- change code for scratch page insertion to support all platforms,
- add info in commit message there could be more similar issues
v5:
- changed to patchset adding nop_clear_range related code,
- re-insert scratch PTEs on resume
To: Jani Nikula <[email protected]>
To: Joonas Lahtinen <[email protected]>
To: Rodrigo Vivi <[email protected]>
To: Tvrtko Ursulin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andi Shyti <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Nirmoy Das <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
---
Andrzej Hajda (4):
drm/i915/gt: make nop_clear_range public
drm/i915/display: use nop_clear_range instead of local function
drm/i915/selftests: use nop_clear_range instead of local function
drm/i915: add guard page to ggtt->error_capture
drivers/gpu/drm/i915/display/intel_dpt.c | 7 +-----
drivers/gpu/drm/i915/gt/intel_ggtt.c | 38 ++++++++++++++++++++++++++-----
drivers/gpu/drm/i915/gt/intel_gtt.h | 2 ++
drivers/gpu/drm/i915/selftests/mock_gtt.c | 9 ++------
4 files changed, 37 insertions(+), 19 deletions(-)
---
base-commit: 3cd6c251f39c14df9ab711e3eb56e703b359ff54
change-id: 20230308-guard_error_capture-f3f334eec85f
Best regards,
--
Andrzej Hajda <[email protected]>
Function nop_clear_range can be used instead of local implementations.
Signed-off-by: Andrzej Hajda <[email protected]>
---
drivers/gpu/drm/i915/gt/intel_ggtt.c | 3 +--
drivers/gpu/drm/i915/gt/intel_gtt.h | 2 ++
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 842e69c7b21e49..b925da42c7cfc4 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -345,8 +345,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
ggtt->invalidate(ggtt);
}
-static void nop_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
+void nop_clear_range(struct i915_address_space *vm, u64 start, u64 length)
{
}
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 5a775310d3fcb5..c15a4892e9f45d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -672,4 +672,6 @@ static inline struct sgt_dma {
return (struct sgt_dma){ sg, addr, addr + sg_dma_len(sg) };
}
+void nop_clear_range(struct i915_address_space *vm, u64 start, u64 length);
+
#endif
--
2.34.1
Since nop_clear_range is visible it can be used here.
Signed-off-by: Andrzej Hajda <[email protected]>
---
drivers/gpu/drm/i915/display/intel_dpt.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
index ad1a37b515fb1c..eb9d1a6cbfb9dd 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -73,11 +73,6 @@ static void dpt_insert_entries(struct i915_address_space *vm,
gen8_set_pte(&base[i++], pte_encode | addr);
}
-static void dpt_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
-{
-}
-
static void dpt_bind_vma(struct i915_address_space *vm,
struct i915_vm_pt_stash *stash,
struct i915_vma_resource *vma_res,
@@ -291,7 +286,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
i915_address_space_init(vm, VM_CLASS_DPT);
vm->insert_page = dpt_insert_page;
- vm->clear_range = dpt_clear_range;
+ vm->clear_range = nop_clear_range;
vm->insert_entries = dpt_insert_entries;
vm->cleanup = dpt_cleanup;
--
2.34.1
Since nop_clear_range is visible it can be used here.
Signed-off-by: Andrzej Hajda <[email protected]>
---
drivers/gpu/drm/i915/selftests/mock_gtt.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index ece97e4faacb97..89119e3970279f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -57,11 +57,6 @@ static void mock_cleanup(struct i915_address_space *vm)
{
}
-static void mock_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
-{
-}
-
struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
{
struct i915_ppgtt *ppgtt;
@@ -80,7 +75,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
- ppgtt->vm.clear_range = mock_clear_range;
+ ppgtt->vm.clear_range = nop_clear_range;
ppgtt->vm.insert_page = mock_insert_page;
ppgtt->vm.insert_entries = mock_insert_entries;
ppgtt->vm.cleanup = mock_cleanup;
@@ -119,7 +114,7 @@ void mock_init_ggtt(struct intel_gt *gt)
ggtt->vm.alloc_pt_dma = alloc_pt_dma;
ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
- ggtt->vm.clear_range = mock_clear_range;
+ ggtt->vm.clear_range = nop_clear_range;
ggtt->vm.insert_page = mock_insert_page;
ggtt->vm.insert_entries = mock_insert_entries;
ggtt->vm.cleanup = mock_cleanup;
--
2.34.1
Write-combining memory allows speculative reads by CPU.
ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
to prefetch memory beyond the error_capture, ie it tries
to read memory pointed by next PTE in GGTT.
If this PTE points to invalid address DMAR errors will occur.
This behaviour was observed on ADL and RPL platforms.
To avoid it, guard scratch page should be added after error_capture.
The patch fixes the most annoying issue with error capture but
since WC reads are used also in other places there is a risk similar
problem can affect them as well.
v2:
- modified commit message (I hope the diagnosis is correct),
- added bug checks to ensure scratch is initialized on gen3 platforms.
CI produces strange stacktrace for it suggesting scratch[0] is NULL,
to be removed after resolving the issue with gen3 platforms.
v3:
- removed bug checks, replaced with gen check.
v4:
- change code for scratch page insertion to support all platforms,
- add info in commit message there could be more similar issues
v5:
- check for nop_clear_range instead of gen8 (Tvrtko),
- re-insert scratch pages on resume (Tvrtko)
Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
---
drivers/gpu/drm/i915/gt/intel_ggtt.c | 35 +++++++++++++++++++++++++++++++----
1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index b925da42c7cfc4..8fb700fde85c8f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -502,6 +502,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
mutex_destroy(&ggtt->error_mutex);
}
+static void
+ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
+{
+ struct i915_address_space *vm = &ggtt->vm;
+
+ if (vm->clear_range != nop_clear_range)
+ return vm->clear_range(vm, offset, length);
+
+ while (length > 0) {
+ vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
+ offset += I915_GTT_PAGE_SIZE;
+ length -= I915_GTT_PAGE_SIZE;
+ }
+}
+
static int init_ggtt(struct i915_ggtt *ggtt)
{
/*
@@ -550,8 +565,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
* paths, and we trust that 0 will remain reserved. However,
* the only likely reason for failure to insert is a driver
* bug, which we expect to cause other failures...
+ *
+ * Since CPU can perform speculative reads on error capture
+ * (write-combining allows it) add scratch page after error
+ * capture to avoid DMAR errors.
*/
- ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
+ ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
drm_mm_insert_node_in_range(&ggtt->vm.mm,
@@ -561,11 +580,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
0, ggtt->mappable_end,
DRM_MM_INSERT_LOW);
}
- if (drm_mm_node_allocated(&ggtt->error_capture))
+ if (drm_mm_node_allocated(&ggtt->error_capture)) {
+ u64 start = ggtt->error_capture.start;
+ u64 size = ggtt->error_capture.size;
+
+ ggtt_insert_scratch_pages(ggtt, start, size);
drm_dbg(&ggtt->vm.i915->drm,
"Reserved GGTT:[%llx, %llx] for use by error capture\n",
- ggtt->error_capture.start,
- ggtt->error_capture.start + ggtt->error_capture.size);
+ start, start + size);
+ }
/*
* The upper portion of the GuC address space has a sizeable hole
@@ -1256,6 +1279,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
flush = i915_ggtt_resume_vm(&ggtt->vm);
+ if (drm_mm_node_allocated(&ggtt->error_capture))
+ ggtt_insert_scratch_pages(ggtt, ggtt->error_capture.start,
+ ggtt->error_capture.size);
+
ggtt->invalidate(ggtt);
if (flush)
--
2.34.1
Hi Andrzej,
On Wed, Mar 08, 2023 at 04:39:05PM +0100, Andrzej Hajda wrote:
> Since nop_clear_range is visible it can be used here.
>
> Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Thanks,
Andi
Hi Andrzej,
On Wed, Mar 08, 2023 at 04:39:04PM +0100, Andrzej Hajda wrote:
> Since nop_clear_range is visible it can be used here.
>
> Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Thanks,
Andi
Hi Andrzej,
On Wed, Mar 08, 2023 at 04:39:03PM +0100, Andrzej Hajda wrote:
> Function nop_clear_range can be used instead of local implementations.
>
> Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Andi
On 08/03/2023 15:39, Andrzej Hajda wrote:
> Write-combining memory allows speculative reads by CPU.
> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
> to prefetch memory beyond the error_capture, ie it tries
> to read memory pointed by next PTE in GGTT.
> If this PTE points to invalid address DMAR errors will occur.
> This behaviour was observed on ADL and RPL platforms.
> To avoid it, guard scratch page should be added after error_capture.
> The patch fixes the most annoying issue with error capture but
> since WC reads are used also in other places there is a risk similar
> problem can affect them as well.
>
> v2:
> - modified commit message (I hope the diagnosis is correct),
> - added bug checks to ensure scratch is initialized on gen3 platforms.
> CI produces strange stacktrace for it suggesting scratch[0] is NULL,
> to be removed after resolving the issue with gen3 platforms.
> v3:
> - removed bug checks, replaced with gen check.
> v4:
> - change code for scratch page insertion to support all platforms,
> - add info in commit message there could be more similar issues
> v5:
> - check for nop_clear_range instead of gen8 (Tvrtko),
> - re-insert scratch pages on resume (Tvrtko)
>
> Signed-off-by: Andrzej Hajda <[email protected]>
> Reviewed-by: Andi Shyti <[email protected]>
> ---
> drivers/gpu/drm/i915/gt/intel_ggtt.c | 35 +++++++++++++++++++++++++++++++----
> 1 file changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index b925da42c7cfc4..8fb700fde85c8f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -502,6 +502,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
> mutex_destroy(&ggtt->error_mutex);
> }
>
> +static void
> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
> +{
> + struct i915_address_space *vm = &ggtt->vm;
> +
> + if (vm->clear_range != nop_clear_range)
Hm I thought usually we would add a prefix for exported stuff, like in
this case i915_vm_nop_clear_range, however I see intel_gtt.h exports a
bunch of stuff with no prefixes already so I guess you could continue
like that by inertia. The conundrum also could have been avoided if you
left it static (leaving out dpt and mock_gtt patches) but no strong
opinion from me.
> + return vm->clear_range(vm, offset, length);
> +
> + while (length > 0) {
> + vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
> + offset += I915_GTT_PAGE_SIZE;
> + length -= I915_GTT_PAGE_SIZE;
> + }
> +}
> +
> static int init_ggtt(struct i915_ggtt *ggtt)
> {
> /*
> @@ -550,8 +565,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
> * paths, and we trust that 0 will remain reserved. However,
> * the only likely reason for failure to insert is a driver
> * bug, which we expect to cause other failures...
> + *
> + * Since CPU can perform speculative reads on error capture
> + * (write-combining allows it) add scratch page after error
> + * capture to avoid DMAR errors.
> */
> - ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
> + ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
> ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
> if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
> drm_mm_insert_node_in_range(&ggtt->vm.mm,
> @@ -561,11 +580,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
> 0, ggtt->mappable_end,
> DRM_MM_INSERT_LOW);
> }
> - if (drm_mm_node_allocated(&ggtt->error_capture))
> + if (drm_mm_node_allocated(&ggtt->error_capture)) {
> + u64 start = ggtt->error_capture.start;
> + u64 size = ggtt->error_capture.size;
> +
> + ggtt_insert_scratch_pages(ggtt, start, size);
> drm_dbg(&ggtt->vm.i915->drm,
> "Reserved GGTT:[%llx, %llx] for use by error capture\n",
> - ggtt->error_capture.start,
> - ggtt->error_capture.start + ggtt->error_capture.size);
> + start, start + size);
> + }
>
> /*
> * The upper portion of the GuC address space has a sizeable hole
> @@ -1256,6 +1279,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
>
> flush = i915_ggtt_resume_vm(&ggtt->vm);
>
> + if (drm_mm_node_allocated(&ggtt->error_capture))
> + ggtt_insert_scratch_pages(ggtt, ggtt->error_capture.start,
> + ggtt->error_capture.size);
Maybe it belongs in i915_ggtt_resume_vm since that one deals with PTEs?
Looks like it to me, but ack either way.
Regards,
Tvrtko
> +
> ggtt->invalidate(ggtt);
>
> if (flush)
>
On 09.03.2023 10:08, Tvrtko Ursulin wrote:
>
> On 08/03/2023 15:39, Andrzej Hajda wrote:
>> Write-combining memory allows speculative reads by CPU.
>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>> to prefetch memory beyond the error_capture, ie it tries
>> to read memory pointed by next PTE in GGTT.
>> If this PTE points to invalid address DMAR errors will occur.
>> This behaviour was observed on ADL and RPL platforms.
>> To avoid it, guard scratch page should be added after error_capture.
>> The patch fixes the most annoying issue with error capture but
>> since WC reads are used also in other places there is a risk similar
>> problem can affect them as well.
>>
>> v2:
>> - modified commit message (I hope the diagnosis is correct),
>> - added bug checks to ensure scratch is initialized on gen3
>> platforms.
>> CI produces strange stacktrace for it suggesting scratch[0] is
>> NULL,
>> to be removed after resolving the issue with gen3 platforms.
>> v3:
>> - removed bug checks, replaced with gen check.
>> v4:
>> - change code for scratch page insertion to support all platforms,
>> - add info in commit message there could be more similar issues
>> v5:
>> - check for nop_clear_range instead of gen8 (Tvrtko),
>> - re-insert scratch pages on resume (Tvrtko)
>>
>> Signed-off-by: Andrzej Hajda <[email protected]>
>> Reviewed-by: Andi Shyti <[email protected]>
>> ---
>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 35
>> +++++++++++++++++++++++++++++++----
>> 1 file changed, 31 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index b925da42c7cfc4..8fb700fde85c8f 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -502,6 +502,21 @@ static void cleanup_init_ggtt(struct i915_ggtt
>> *ggtt)
>> mutex_destroy(&ggtt->error_mutex);
>> }
>> +static void
>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64
>> length)
>> +{
>> + struct i915_address_space *vm = &ggtt->vm;
>> +
>> + if (vm->clear_range != nop_clear_range)
>
> Hm I thought usually we would add a prefix for exported stuff, like in
> this case i915_vm_nop_clear_range, however I see intel_gtt.h exports a
> bunch of stuff with no prefixes already so I guess you could continue
> like that by inertia. The conundrum also could have been avoided if
> you left it static (leaving out dpt and mock_gtt patches) but no
> strong opinion from me.
>
>> + return vm->clear_range(vm, offset, length);
>> +
>> + while (length > 0) {
>> + vm->insert_page(vm, px_dma(vm->scratch[0]), offset,
>> I915_CACHE_NONE, 0);
>> + offset += I915_GTT_PAGE_SIZE;
>> + length -= I915_GTT_PAGE_SIZE;
>> + }
>> +}
>> +
>> static int init_ggtt(struct i915_ggtt *ggtt)
>> {
>> /*
>> @@ -550,8 +565,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>> * paths, and we trust that 0 will remain reserved. However,
>> * the only likely reason for failure to insert is a driver
>> * bug, which we expect to cause other failures...
>> + *
>> + * Since CPU can perform speculative reads on error capture
>> + * (write-combining allows it) add scratch page after error
>> + * capture to avoid DMAR errors.
>> */
>> - ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>> + ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>> ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>> if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>> drm_mm_insert_node_in_range(&ggtt->vm.mm,
>> @@ -561,11 +580,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>> 0, ggtt->mappable_end,
>> DRM_MM_INSERT_LOW);
>> }
>> - if (drm_mm_node_allocated(&ggtt->error_capture))
>> + if (drm_mm_node_allocated(&ggtt->error_capture)) {
>> + u64 start = ggtt->error_capture.start;
>> + u64 size = ggtt->error_capture.size;
>> +
>> + ggtt_insert_scratch_pages(ggtt, start, size);
>> drm_dbg(&ggtt->vm.i915->drm,
>> "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>> - ggtt->error_capture.start,
>> - ggtt->error_capture.start + ggtt->error_capture.size);
>> + start, start + size);
>> + }
>> /*
>> * The upper portion of the GuC address space has a sizeable hole
>> @@ -1256,6 +1279,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
>> flush = i915_ggtt_resume_vm(&ggtt->vm);
>> + if (drm_mm_node_allocated(&ggtt->error_capture))
>> + ggtt_insert_scratch_pages(ggtt, ggtt->error_capture.start,
>> + ggtt->error_capture.size);
>
> Maybe it belongs in i915_ggtt_resume_vm since that one deals with
> PTEs? Looks like it to me, but ack either way.
i915_ggtt_resume_vm is called for ggtt and dpt. Of course I could add
conditionals there checking if it is ggtt, but in such situation
i915_ggtt_resume seems more natural candidate.
Regards
Andrzej
>
> Regards,
>
> Tvrtko
>
>> +
>> ggtt->invalidate(ggtt);
>> if (flush)
>>
On 09/03/2023 09:34, Andrzej Hajda wrote:
>
>
> On 09.03.2023 10:08, Tvrtko Ursulin wrote:
>>
>> On 08/03/2023 15:39, Andrzej Hajda wrote:
>>> Write-combining memory allows speculative reads by CPU.
>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>> to prefetch memory beyond the error_capture, ie it tries
>>> to read memory pointed by next PTE in GGTT.
>>> If this PTE points to invalid address DMAR errors will occur.
>>> This behaviour was observed on ADL and RPL platforms.
>>> To avoid it, guard scratch page should be added after error_capture.
>>> The patch fixes the most annoying issue with error capture but
>>> since WC reads are used also in other places there is a risk similar
>>> problem can affect them as well.
>>>
>>> v2:
>>> - modified commit message (I hope the diagnosis is correct),
>>> - added bug checks to ensure scratch is initialized on gen3
>>> platforms.
>>> CI produces strange stacktrace for it suggesting scratch[0] is
>>> NULL,
>>> to be removed after resolving the issue with gen3 platforms.
>>> v3:
>>> - removed bug checks, replaced with gen check.
>>> v4:
>>> - change code for scratch page insertion to support all platforms,
>>> - add info in commit message there could be more similar issues
>>> v5:
>>> - check for nop_clear_range instead of gen8 (Tvrtko),
>>> - re-insert scratch pages on resume (Tvrtko)
>>>
>>> Signed-off-by: Andrzej Hajda <[email protected]>
>>> Reviewed-by: Andi Shyti <[email protected]>
>>> ---
>>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 35
>>> +++++++++++++++++++++++++++++++----
>>> 1 file changed, 31 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> index b925da42c7cfc4..8fb700fde85c8f 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> @@ -502,6 +502,21 @@ static void cleanup_init_ggtt(struct i915_ggtt
>>> *ggtt)
>>> mutex_destroy(&ggtt->error_mutex);
>>> }
>>> +static void
>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64
>>> length)
>>> +{
>>> + struct i915_address_space *vm = &ggtt->vm;
>>> +
>>> + if (vm->clear_range != nop_clear_range)
>>
>> Hm I thought usually we would add a prefix for exported stuff, like in
>> this case i915_vm_nop_clear_range, however I see intel_gtt.h exports a
>> bunch of stuff with no prefixes already so I guess you could continue
>> like that by inertia. The conundrum also could have been avoided if
>> you left it static (leaving out dpt and mock_gtt patches) but no
>> strong opinion from me.
>>
>>> + return vm->clear_range(vm, offset, length);
>>> +
>>> + while (length > 0) {
>>> + vm->insert_page(vm, px_dma(vm->scratch[0]), offset,
>>> I915_CACHE_NONE, 0);
>>> + offset += I915_GTT_PAGE_SIZE;
>>> + length -= I915_GTT_PAGE_SIZE;
>>> + }
>>> +}
>>> +
>>> static int init_ggtt(struct i915_ggtt *ggtt)
>>> {
>>> /*
>>> @@ -550,8 +565,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>> * paths, and we trust that 0 will remain reserved. However,
>>> * the only likely reason for failure to insert is a driver
>>> * bug, which we expect to cause other failures...
>>> + *
>>> + * Since CPU can perform speculative reads on error capture
>>> + * (write-combining allows it) add scratch page after error
>>> + * capture to avoid DMAR errors.
>>> */
>>> - ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>> + ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>> ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>> if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>> drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>> @@ -561,11 +580,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>> 0, ggtt->mappable_end,
>>> DRM_MM_INSERT_LOW);
>>> }
>>> - if (drm_mm_node_allocated(&ggtt->error_capture))
>>> + if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>> + u64 start = ggtt->error_capture.start;
>>> + u64 size = ggtt->error_capture.size;
>>> +
>>> + ggtt_insert_scratch_pages(ggtt, start, size);
>>> drm_dbg(&ggtt->vm.i915->drm,
>>> "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>> - ggtt->error_capture.start,
>>> - ggtt->error_capture.start + ggtt->error_capture.size);
>>> + start, start + size);
>>> + }
>>> /*
>>> * The upper portion of the GuC address space has a sizeable hole
>>> @@ -1256,6 +1279,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
>>> flush = i915_ggtt_resume_vm(&ggtt->vm);
>>> + if (drm_mm_node_allocated(&ggtt->error_capture))
>>> + ggtt_insert_scratch_pages(ggtt, ggtt->error_capture.start,
>>> + ggtt->error_capture.size);
>>
>> Maybe it belongs in i915_ggtt_resume_vm since that one deals with
>> PTEs? Looks like it to me, but ack either way.
>
> i915_ggtt_resume_vm is called for ggtt and dpt. Of course I could add
> conditionals there checking if it is ggtt, but in such situation
> i915_ggtt_resume seems more natural candidate.
"if (drm_mm_node_allocated(&ggtt->error_capture))" check would handle
that automatically, no? i915_ggtt_resume has nothing about PTEs at the
moment..
Regards,
Tvrtko
On 09.03.2023 10:43, Tvrtko Ursulin wrote:
>
> On 09/03/2023 09:34, Andrzej Hajda wrote:
>>
>>
>> On 09.03.2023 10:08, Tvrtko Ursulin wrote:
>>>
>>> On 08/03/2023 15:39, Andrzej Hajda wrote:
>>>> Write-combining memory allows speculative reads by CPU.
>>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>>> to prefetch memory beyond the error_capture, ie it tries
>>>> to read memory pointed by next PTE in GGTT.
>>>> If this PTE points to invalid address DMAR errors will occur.
>>>> This behaviour was observed on ADL and RPL platforms.
>>>> To avoid it, guard scratch page should be added after error_capture.
>>>> The patch fixes the most annoying issue with error capture but
>>>> since WC reads are used also in other places there is a risk similar
>>>> problem can affect them as well.
>>>>
>>>> v2:
>>>> - modified commit message (I hope the diagnosis is correct),
>>>> - added bug checks to ensure scratch is initialized on gen3
>>>> platforms.
>>>> CI produces strange stacktrace for it suggesting scratch[0] is
>>>> NULL,
>>>> to be removed after resolving the issue with gen3 platforms.
>>>> v3:
>>>> - removed bug checks, replaced with gen check.
>>>> v4:
>>>> - change code for scratch page insertion to support all platforms,
>>>> - add info in commit message there could be more similar issues
>>>> v5:
>>>> - check for nop_clear_range instead of gen8 (Tvrtko),
>>>> - re-insert scratch pages on resume (Tvrtko)
>>>>
>>>> Signed-off-by: Andrzej Hajda <[email protected]>
>>>> Reviewed-by: Andi Shyti <[email protected]>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 35
>>>> +++++++++++++++++++++++++++++++----
>>>> 1 file changed, 31 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> index b925da42c7cfc4..8fb700fde85c8f 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> @@ -502,6 +502,21 @@ static void cleanup_init_ggtt(struct i915_ggtt
>>>> *ggtt)
>>>> mutex_destroy(&ggtt->error_mutex);
>>>> }
>>>> +static void
>>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64
>>>> length)
>>>> +{
>>>> + struct i915_address_space *vm = &ggtt->vm;
>>>> +
>>>> + if (vm->clear_range != nop_clear_range)
>>>
>>> Hm I thought usually we would add a prefix for exported stuff, like
>>> in this case i915_vm_nop_clear_range, however I see intel_gtt.h
>>> exports a bunch of stuff with no prefixes already so I guess you
>>> could continue like that by inertia. The conundrum also could have
>>> been avoided if you left it static (leaving out dpt and mock_gtt
>>> patches) but no strong opinion from me.
>>>
>>>> + return vm->clear_range(vm, offset, length);
>>>> +
>>>> + while (length > 0) {
>>>> + vm->insert_page(vm, px_dma(vm->scratch[0]), offset,
>>>> I915_CACHE_NONE, 0);
>>>> + offset += I915_GTT_PAGE_SIZE;
>>>> + length -= I915_GTT_PAGE_SIZE;
>>>> + }
>>>> +}
>>>> +
>>>> static int init_ggtt(struct i915_ggtt *ggtt)
>>>> {
>>>> /*
>>>> @@ -550,8 +565,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>> * paths, and we trust that 0 will remain reserved. However,
>>>> * the only likely reason for failure to insert is a driver
>>>> * bug, which we expect to cause other failures...
>>>> + *
>>>> + * Since CPU can perform speculative reads on error capture
>>>> + * (write-combining allows it) add scratch page after error
>>>> + * capture to avoid DMAR errors.
>>>> */
>>>> - ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>>> + ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>>> ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>>> if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>>> drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>>> @@ -561,11 +580,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>> 0, ggtt->mappable_end,
>>>> DRM_MM_INSERT_LOW);
>>>> }
>>>> - if (drm_mm_node_allocated(&ggtt->error_capture))
>>>> + if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>>> + u64 start = ggtt->error_capture.start;
>>>> + u64 size = ggtt->error_capture.size;
>>>> +
>>>> + ggtt_insert_scratch_pages(ggtt, start, size);
>>>> drm_dbg(&ggtt->vm.i915->drm,
>>>> "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>>> - ggtt->error_capture.start,
>>>> - ggtt->error_capture.start + ggtt->error_capture.size);
>>>> + start, start + size);
>>>> + }
>>>> /*
>>>> * The upper portion of the GuC address space has a sizeable
>>>> hole
>>>> @@ -1256,6 +1279,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
>>>> flush = i915_ggtt_resume_vm(&ggtt->vm);
>>>> + if (drm_mm_node_allocated(&ggtt->error_capture))
>>>> + ggtt_insert_scratch_pages(ggtt, ggtt->error_capture.start,
>>>> + ggtt->error_capture.size);
>>>
>>> Maybe it belongs in i915_ggtt_resume_vm since that one deals with
>>> PTEs? Looks like it to me, but ack either way.
>>
>> i915_ggtt_resume_vm is called for ggtt and dpt. Of course I could add
>> conditionals there checking if it is ggtt, but in such situation
>> i915_ggtt_resume seems more natural candidate.
>
> "if (drm_mm_node_allocated(&ggtt->error_capture))" check would handle
> that automatically, no? i915_ggtt_resume has nothing about PTEs at the
> moment..
Yes but since i915_ggtt_resume_vm has vm as an argument (ie it operates
on generic vm), there will be needed downcasting somewhere:
if (vm->is_ggtt) {
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
if (drm_mm_node_allocated(&ggtt->error_capture))
...
}
In i915_ggtt_resume we have it for free, but moreover
i915_ggtt_resume_vm (despite its name) seems to handle common stuff of
ggtt and dpt, and i915_ggtt_resume looks as specific for ggtt, similarly
intel_dpt_resume is specific for dpt.
If it does not convince you, I will update patch with above code.
Regards
Andrzej
>
> Regards,
>
> Tvrtko
On 09/03/2023 09:59, Andrzej Hajda wrote:
>
>
> On 09.03.2023 10:43, Tvrtko Ursulin wrote:
>>
>> On 09/03/2023 09:34, Andrzej Hajda wrote:
>>>
>>>
>>> On 09.03.2023 10:08, Tvrtko Ursulin wrote:
>>>>
>>>> On 08/03/2023 15:39, Andrzej Hajda wrote:
>>>>> Write-combining memory allows speculative reads by CPU.
>>>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>>>> to prefetch memory beyond the error_capture, ie it tries
>>>>> to read memory pointed by next PTE in GGTT.
>>>>> If this PTE points to invalid address DMAR errors will occur.
>>>>> This behaviour was observed on ADL and RPL platforms.
>>>>> To avoid it, guard scratch page should be added after error_capture.
>>>>> The patch fixes the most annoying issue with error capture but
>>>>> since WC reads are used also in other places there is a risk similar
>>>>> problem can affect them as well.
>>>>>
>>>>> v2:
>>>>> - modified commit message (I hope the diagnosis is correct),
>>>>> - added bug checks to ensure scratch is initialized on gen3
>>>>> platforms.
>>>>> CI produces strange stacktrace for it suggesting scratch[0] is
>>>>> NULL,
>>>>> to be removed after resolving the issue with gen3 platforms.
>>>>> v3:
>>>>> - removed bug checks, replaced with gen check.
>>>>> v4:
>>>>> - change code for scratch page insertion to support all platforms,
>>>>> - add info in commit message there could be more similar issues
>>>>> v5:
>>>>> - check for nop_clear_range instead of gen8 (Tvrtko),
>>>>> - re-insert scratch pages on resume (Tvrtko)
>>>>>
>>>>> Signed-off-by: Andrzej Hajda <[email protected]>
>>>>> Reviewed-by: Andi Shyti <[email protected]>
>>>>> ---
>>>>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 35
>>>>> +++++++++++++++++++++++++++++++----
>>>>> 1 file changed, 31 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>>> index b925da42c7cfc4..8fb700fde85c8f 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>>> @@ -502,6 +502,21 @@ static void cleanup_init_ggtt(struct i915_ggtt
>>>>> *ggtt)
>>>>> mutex_destroy(&ggtt->error_mutex);
>>>>> }
>>>>> +static void
>>>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64
>>>>> length)
>>>>> +{
>>>>> + struct i915_address_space *vm = &ggtt->vm;
>>>>> +
>>>>> + if (vm->clear_range != nop_clear_range)
>>>>
>>>> Hm I thought usually we would add a prefix for exported stuff, like
>>>> in this case i915_vm_nop_clear_range, however I see intel_gtt.h
>>>> exports a bunch of stuff with no prefixes already so I guess you
>>>> could continue like that by inertia. The conundrum also could have
>>>> been avoided if you left it static (leaving out dpt and mock_gtt
>>>> patches) but no strong opinion from me.
>>>>
>>>>> + return vm->clear_range(vm, offset, length);
>>>>> +
>>>>> + while (length > 0) {
>>>>> + vm->insert_page(vm, px_dma(vm->scratch[0]), offset,
>>>>> I915_CACHE_NONE, 0);
>>>>> + offset += I915_GTT_PAGE_SIZE;
>>>>> + length -= I915_GTT_PAGE_SIZE;
>>>>> + }
>>>>> +}
>>>>> +
>>>>> static int init_ggtt(struct i915_ggtt *ggtt)
>>>>> {
>>>>> /*
>>>>> @@ -550,8 +565,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>>> * paths, and we trust that 0 will remain reserved. However,
>>>>> * the only likely reason for failure to insert is a driver
>>>>> * bug, which we expect to cause other failures...
>>>>> + *
>>>>> + * Since CPU can perform speculative reads on error capture
>>>>> + * (write-combining allows it) add scratch page after error
>>>>> + * capture to avoid DMAR errors.
>>>>> */
>>>>> - ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>>>> + ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>>>> ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>>>> if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>>>> drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>>>> @@ -561,11 +580,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>>> 0, ggtt->mappable_end,
>>>>> DRM_MM_INSERT_LOW);
>>>>> }
>>>>> - if (drm_mm_node_allocated(&ggtt->error_capture))
>>>>> + if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>>>> + u64 start = ggtt->error_capture.start;
>>>>> + u64 size = ggtt->error_capture.size;
>>>>> +
>>>>> + ggtt_insert_scratch_pages(ggtt, start, size);
>>>>> drm_dbg(&ggtt->vm.i915->drm,
>>>>> "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>>>> - ggtt->error_capture.start,
>>>>> - ggtt->error_capture.start + ggtt->error_capture.size);
>>>>> + start, start + size);
>>>>> + }
>>>>> /*
>>>>> * The upper portion of the GuC address space has a sizeable
>>>>> hole
>>>>> @@ -1256,6 +1279,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
>>>>> flush = i915_ggtt_resume_vm(&ggtt->vm);
>>>>> + if (drm_mm_node_allocated(&ggtt->error_capture))
>>>>> + ggtt_insert_scratch_pages(ggtt, ggtt->error_capture.start,
>>>>> + ggtt->error_capture.size);
>>>>
>>>> Maybe it belongs in i915_ggtt_resume_vm since that one deals with
>>>> PTEs? Looks like it to me, but ack either way.
>>>
>>> i915_ggtt_resume_vm is called for ggtt and dpt. Of course I could add
>>> conditionals there checking if it is ggtt, but in such situation
>>> i915_ggtt_resume seems more natural candidate.
>>
>> "if (drm_mm_node_allocated(&ggtt->error_capture))" check would handle
>> that automatically, no? i915_ggtt_resume has nothing about PTEs at the
>> moment..
>
> Yes but since i915_ggtt_resume_vm has vm as an argument (ie it operates
> on generic vm), there will be needed downcasting somewhere:
> if (vm->is_ggtt) {
> struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> if (drm_mm_node_allocated(&ggtt->error_capture))
> ...
> }
>
> In i915_ggtt_resume we have it for free, but moreover
> i915_ggtt_resume_vm (despite its name) seems to handle common stuff of
> ggtt and dpt, and i915_ggtt_resume looks as specific for ggtt, similarly
> intel_dpt_resume is specific for dpt.
> If it does not convince you, I will update patch with above code.
Right, I see your point - I was mislead by the name i915_ggtt_resume_vm
thinking it signifies it working on i915_ggtt. It's all good then.
Regards,
Tvrtko