2023-04-17 10:44:49

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH v2 0/3] drm/i915: Replace kmap() with kmap_local_page()

kmap() has been deprecated in favor of kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).

The tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and so they are still valid.

Furthermore, kmap_local_page() is faster than kmap() in kernels with
HIGHMEM enabled.

Thread locality implies that the kernel virtual addresses returned by
kmap_local_page() are only valid in the context of the callers. This
constraint is never violated with the conversions in this series,
because the pointers are never handed to other threads, so the local
mappings are allowed and preferred.

Therefore, replace kmap() with kmap_local_page() in drm/i915/,
drm/i915/gem/, drm/i915/gt/.

Suggested-by: Ira Weiny <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>

v1->v2: Do some changes in the text of the cover letter and in the
commit messages. There are no changes in the code of any of the three
patches.

Fabio M. De Francesco (3):
drm/i915: Replace kmap() with kmap_local_page()
drm/i915/gt: Replace kmap() with kmap_local_page()
drm/i915/gem: Replace kmap() with kmap_local_page()

drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++----
drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 ++++----
drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++--
drivers/gpu/drm/i915/gt/shmem_utils.c | 11 ++++-------
drivers/gpu/drm/i915/i915_gem.c | 8 ++++----
5 files changed, 16 insertions(+), 21 deletions(-)

--
2.40.0


2023-04-17 10:45:00

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH v2 1/3] drm/i915: Replace kmap() with kmap_local_page()

kmap() has been deprecated in favor of kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.

Obviously, thread locality implies that the kernel voirtual addresses
are valid only in the context of the callers. kmap_local_page() use in
i915_gem.c doesn't break the above-mentioned constraint, so it should be
preferred to kmap().

Therefore, replace kmap() with kmap_local_page() in i915_gem.c

Suggested-by: Ira Weiny <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 35950fa91406..c35248555e42 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -212,14 +212,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data,
char *vaddr;
int ret;

- vaddr = kmap(page);
+ vaddr = kmap_local_page(page);

if (needs_clflush)
drm_clflush_virt_range(vaddr + offset, len);

ret = __copy_to_user(user_data, vaddr + offset, len);

- kunmap(page);
+ kunmap_local(vaddr);

return ret ? -EFAULT : 0;
}
@@ -643,7 +643,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
char *vaddr;
int ret;

- vaddr = kmap(page);
+ vaddr = kmap_local_page(page);

if (needs_clflush_before)
drm_clflush_virt_range(vaddr + offset, len);
@@ -652,7 +652,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
if (!ret && needs_clflush_after)
drm_clflush_virt_range(vaddr + offset, len);

- kunmap(page);
+ kunmap_local(vaddr);

return ret ? -EFAULT : 0;
}
--
2.40.0

2023-04-17 10:45:08

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH v2 2/3] drm/i915/gt: Replace kmap() with kmap_local_page()

kmap() s been deprecated in favor of kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.

Obviously, thread locality implies that the kernel virtual addresses are
only valid in the context of the callers. The use of kmap_local_page() in
i915/gt doesn't break the above-mentioned constraint, so it should be
preferred to kmap().

Therefore, replace kmap() with kmap_local_page() in i915/gt.

Suggested-by: Ira Weiny <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++--
drivers/gpu/drm/i915/gt/shmem_utils.c | 11 ++++-------
2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index 37d0b0fe791d..89295c6921d6 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -749,7 +749,7 @@ static void swizzle_page(struct page *page)
char *vaddr;
int i;

- vaddr = kmap(page);
+ vaddr = kmap_local_page(page);

for (i = 0; i < PAGE_SIZE; i += 128) {
memcpy(temp, &vaddr[i], 64);
@@ -757,7 +757,7 @@ static void swizzle_page(struct page *page)
memcpy(&vaddr[i + 64], temp, 64);
}

- kunmap(page);
+ kunmap_local(vaddr);
}

/**
diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
index 449c9ed44382..be809839a241 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -101,22 +101,19 @@ static int __shmem_rw(struct file *file, loff_t off,
unsigned int this =
min_t(size_t, PAGE_SIZE - offset_in_page(off), len);
struct page *page;
- void *vaddr;

page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
GFP_KERNEL);
if (IS_ERR(page))
return PTR_ERR(page);

- vaddr = kmap(page);
if (write) {
- memcpy(vaddr + offset_in_page(off), ptr, this);
+ memcpy_to_page(page, offset_in_page(off), ptr, this);
set_page_dirty(page);
} else {
- memcpy(ptr, vaddr + offset_in_page(off), this);
+ memcpy_from_page(ptr, page, offset_in_page(off), this);
}
mark_page_accessed(page);
- kunmap(page);
put_page(page);

len -= this;
@@ -143,11 +140,11 @@ int shmem_read_to_iosys_map(struct file *file, loff_t off,
if (IS_ERR(page))
return PTR_ERR(page);

- vaddr = kmap(page);
+ vaddr = kmap_local_page(page);
iosys_map_memcpy_to(map, map_off, vaddr + offset_in_page(off),
this);
mark_page_accessed(page);
- kunmap(page);
+ kunmap_local(vaddr);
put_page(page);

len -= this;
--
2.40.0

2023-04-17 10:45:09

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH v2 3/3] drm/i915/gem: Replace kmap() with kmap_local_page()

kmap() s been deprecated in favor of kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.

Obviously, thread locality implies that the kernel virtual addresses are
only valid in the context of the callers. The kmap_local_page() use in
i915/gem doesn't break the above-mentioned constraint, so it should be
preferred to kmap().

Therefore, replace kmap() with kmap_local_page() in i915/gem and use
memcpy_to_page() where it is possible to avoid the open coding of
mapping + memcpy() + un-mapping.

Suggested-by: Ira Weiny <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++----
drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 ++++----
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 37d1efcd3ca6..8856a6409e83 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -657,16 +657,14 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
do {
unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
struct page *page;
- void *pgdata, *vaddr;
+ void *pgdata;

err = aops->write_begin(file, file->f_mapping, offset, len,
&page, &pgdata);
if (err < 0)
goto fail;

- vaddr = kmap(page);
- memcpy(vaddr, data, len);
- kunmap(page);
+ memcpy_to_page(page, 0, data, len);

err = aops->write_end(file, file->f_mapping, offset, len, len,
page, pgdata);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 56279908ed30..5fd9e1ee2340 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -155,7 +155,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
intel_gt_flush_ggtt_writes(to_gt(i915));

p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
- cpu = kmap(p) + offset_in_page(offset);
+ cpu = kmap_local_page(p) + offset_in_page(offset);
drm_clflush_virt_range(cpu, sizeof(*cpu));
if (*cpu != (u32)page) {
pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%lu + %u [0x%lx]) of 0x%x, found 0x%x\n",
@@ -173,7 +173,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
}
*cpu = 0;
drm_clflush_virt_range(cpu, sizeof(*cpu));
- kunmap(p);
+ kunmap_local(cpu);

out:
i915_gem_object_lock(obj, NULL);
@@ -251,7 +251,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
intel_gt_flush_ggtt_writes(to_gt(i915));

p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
- cpu = kmap(p) + offset_in_page(offset);
+ cpu = kmap_local_page(p) + offset_in_page(offset);
drm_clflush_virt_range(cpu, sizeof(*cpu));
if (*cpu != (u32)page) {
pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%lu + %u [0x%lx]) of 0x%x, found 0x%x\n",
@@ -269,7 +269,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
}
*cpu = 0;
drm_clflush_virt_range(cpu, sizeof(*cpu));
- kunmap(p);
+ kunmap_local(cpu);
if (err)
return err;

--
2.40.0