LinuxLists.cc - remove alloc_vm_area v2

2020-09-24 14:01:12

by Christoph Hellwig

[permalink] [raw]

Subject: remove alloc_vm_area v2

Hi Andrew,

this series removes alloc_vm_area, which was left over from the big
vmalloc interface rework. It is a rather arkane interface, basicaly
the equivalent of get_vm_area + actually faulting in all PTEs in
the allocated area. It was originally addeds for Xen (which isn't
modular to start with), and then grew users in zsmalloc and i915
which seems to mostly qualify as abuses of the interface, especially
for i915 as a random driver should not set up PTE bits directly.

Note that the i915 patches apply to the drm-tip branch of the drm-tip
tree, as that tree has recent conflicting commits in the same area.

A git tree is also available here:

git://git.infradead.org/users/hch/misc.git alloc_vm_area

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/alloc_vm_area

Changes since v1:
- fix a bug in the zsmalloc changes
- fix a bug and rebase to include the recent changes in i915
- add a new vmap flag that allows to free the page array and pages
using vfree
- add a vfree documentation updated from Matthew

Diffstat:
arch/x86/xen/grant-table.c | 27 ++++--
drivers/gpu/drm/i915/Kconfig | 1
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 131 +++++++++++++-----------------
drivers/gpu/drm/i915/gt/shmem_utils.c | 76 ++++-------------
drivers/xen/xenbus/xenbus_client.c | 30 +++---
include/linux/vmalloc.h | 7 -
mm/Kconfig | 3
mm/memory.c | 16 ++-
mm/nommu.c | 7 -
mm/vmalloc.c | 123 ++++++++++++++--------------
mm/zsmalloc.c | 10 +-
11 files changed, 200 insertions(+), 231 deletions(-)

2020-09-24 14:01:13

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 07/11] drm/i915: stop using kmap in i915_gem_object_map

kmap for !PageHighmem is just a convoluted way to say page_address,
and kunmap is a no-op in that case.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index d6eeefab3d018b..6550c0bc824ea2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -162,8 +162,6 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
{
if (is_vmalloc_addr(ptr))
vunmap(ptr);
- else
- kunmap(kmap_to_page(ptr));
}

struct sg_table *
@@ -277,11 +275,10 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
* forever.
*
* So if the page is beyond the 32b boundary, make an explicit
- * vmap. On 64b, this check will be optimised away as we can
- * directly kmap any page on the system.
+ * vmap.
*/
if (!PageHighMem(page))
- return kmap(page);
+ return page_address(page);
}

mem = stack;
--
2.28.0

2020-09-24 14:01:13

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 04/11] mm: allow a NULL fn callback in apply_to_page_range

Besides calling the callback on each page, apply_to_page_range also has
the effect of pre-faulting all PTEs for the range. To support callers
that only need the pre-faulting, make the callback optional.

Based on a patch from Minchan Kim <[email protected]>.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/memory.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 469af373ae76e1..a60136046d7fcc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2231,13 +2231,15 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,

arch_enter_lazy_mmu_mode();

- do {
- if (create || !pte_none(*pte)) {
- err = fn(pte++, addr, data);
- if (err)
- break;
- }
- } while (addr += PAGE_SIZE, addr != end);
+ if (fn) {
+ do {
+ if (create || !pte_none(*pte)) {
+ err = fn(pte++, addr, data);
+ if (err)
+ break;
+ }
+ } while (addr += PAGE_SIZE, addr != end);
+ }
*mask |= PGTBL_PTE_MODIFIED;

arch_leave_lazy_mmu_mode();
--
2.28.0

2020-09-24 14:02:10

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 03/11] mm: add a vmap_pfn function

Add a proper helper to remap PFNs into kernel virtual space so that
drivers don't have to abuse alloc_vm_area and open coded PTE
manipulation for it.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/vmalloc.h | 1 +
mm/Kconfig | 3 +++
mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 49 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index b899681e3ff9f0..c77efeac242514 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -122,6 +122,7 @@ extern void vfree_atomic(const void *addr);

extern void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
+void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot);
extern void vunmap(const void *addr);

extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f97..6fa7ba1199eb1e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,6 +815,9 @@ config DEVICE_PRIVATE
memory; i.e., memory that is only accessible from the device (or
group of devices). You likely also want to select HMM_MIRROR.

+config VMAP_PFN
+ bool
+
config FRAME_VECTOR
bool

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ffad65f052c3f9..e2a2ded8d93478 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2413,6 +2413,51 @@ void *vmap(struct page **pages, unsigned int count,
}
EXPORT_SYMBOL(vmap);

+#ifdef CONFIG_VMAP_PFN
+struct vmap_pfn_data {
+ unsigned long *pfns;
+ pgprot_t prot;
+ unsigned int idx;
+};
+
+static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
+{
+ struct vmap_pfn_data *data = private;
+
+ if (WARN_ON_ONCE(pfn_valid(data->pfns[data->idx])))
+ return -EINVAL;
+ *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
+ return 0;
+}
+
+/**
+ * vmap_pfn - map an array of PFNs into virtually contiguous space
+ * @pfns: array of PFNs
+ * @count: number of pages to map
+ * @prot: page protection for the mapping
+ *
+ * Maps @count PFNs from @pfns into contiguous kernel virtual space and returns
+ * the start address of the mapping.
+ */
+void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot)
+{
+ struct vmap_pfn_data data = { .pfns = pfns, .prot = pgprot_nx(prot) };
+ struct vm_struct *area;
+
+ area = get_vm_area_caller(count * PAGE_SIZE, VM_IOREMAP,
+ __builtin_return_address(0));
+ if (!area)
+ return NULL;
+ if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
+ count * PAGE_SIZE, vmap_pfn_apply, &data)) {
+ free_vm_area(area);
+ return NULL;
+ }
+ return area->addr;
+}
+EXPORT_SYMBOL_GPL(vmap_pfn);
+#endif /* CONFIG_VMAP_PFN */
+
static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
pgprot_t prot, int node)
{
--
2.28.0

2020-09-24 14:02:24

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 10/11] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

Replace the last call to alloc_vm_area with an open coded version using
an iterator in struct gnttab_vm_area instead of the triple indirection
magic in alloc_vm_area.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/x86/xen/grant-table.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 4988e19598c8a5..1e681bf62561a0 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -25,6 +25,7 @@
static struct gnttab_vm_area {
struct vm_struct *area;
pte_t **ptes;
+ int idx;
} gnttab_shared_vm_area, gnttab_status_vm_area;

int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
@@ -90,19 +91,31 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
}
}

+static int gnttab_apply(pte_t *pte, unsigned long addr, void *data)
+{
+ struct gnttab_vm_area *area = data;
+
+ area->ptes[area->idx++] = pte;
+ return 0;
+}
+
static int arch_gnttab_valloc(struct gnttab_vm_area *area, unsigned nr_frames)
{
area->ptes = kmalloc_array(nr_frames, sizeof(*area->ptes), GFP_KERNEL);
if (area->ptes == NULL)
return -ENOMEM;
-
- area->area = alloc_vm_area(PAGE_SIZE * nr_frames, area->ptes);
- if (area->area == NULL) {
- kfree(area->ptes);
- return -ENOMEM;
- }
-
+ area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_IOREMAP);
+ if (!area->area)
+ goto out_free_ptes;
+ if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
+ PAGE_SIZE * nr_frames, gnttab_apply, area))
+ goto out_free_vm_area;
return 0;
+out_free_vm_area:
+ free_vm_area(area->area);
+out_free_ptes:
+ kfree(area->ptes);
+ return -ENOMEM;
}

static void arch_gnttab_vfree(struct gnttab_vm_area *area)
--
2.28.0

2020-09-24 14:02:25

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 02/11] mm: add a VM_MAP_PUT_PAGES flag for vmap

Add a flag so that vmap takes ownership of the passed in page array.
When vfree is called on such an allocation it will put one reference
on each page, and free the page array itself.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/vmalloc.h | 1 +
mm/vmalloc.c | 9 +++++++--
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 0221f852a7e1a3..b899681e3ff9f0 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -24,6 +24,7 @@ struct notifier_block; /* in notifier.h */
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
+#define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */

/*
* VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8770260419af06..ffad65f052c3f9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2377,8 +2377,11 @@ EXPORT_SYMBOL(vunmap);
* @flags: vm_area->flags
* @prot: page protection for the mapping
*
- * Maps @count pages from @pages into contiguous kernel virtual
- * space.
+ * Maps @count pages from @pages into contiguous kernel virtual space.
+ * If @flags contains %VM_MAP_PUT_PAGES the ownership of the pages array itself
+ * (which must be kmalloc or vmalloc memory) and one reference per pages in it
+ * are transferred from the caller to vmap(), and will be freed / dropped when
+ * vfree() is called on the return value.
*
* Return: the address of the area or %NULL on failure
*/
@@ -2404,6 +2407,8 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
}

+ if (flags & VM_MAP_PUT_PAGES)
+ area->pages = pages;
return area->addr;
}
EXPORT_SYMBOL(vmap);
--
2.28.0

2020-09-24 14:02:45

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 01/11] mm: update the documentation for vfree

From: "Matthew Wilcox (Oracle)" <[email protected]>

* Document that you can call vfree() on an address returned from vmap()
* Remove the note about the minimum size -- the minimum size of a vmalloc
allocation is one page
* Add a Context: section
* Fix capitalisation
* Reword the prohibition on calling from NMI context to avoid a double
negative

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/vmalloc.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index be4724b916b3e7..8770260419af06 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2321,20 +2321,21 @@ static void __vfree(const void *addr)
}

/**
- * vfree - release memory allocated by vmalloc()
- * @addr: memory base address
+ * vfree - Release memory allocated by vmalloc()
+ * @addr: Memory base address
*
- * Free the virtually continuous memory area starting at @addr, as
- * obtained from vmalloc(), vmalloc_32() or __vmalloc(). If @addr is
- * NULL, no operation is performed.
+ * Free the virtually continuous memory area starting at @addr, as obtained
+ * from one of the vmalloc() family of APIs. This will usually also free the
+ * physical memory underlying the virtual allocation, but that memory is
+ * reference counted, so it will not be freed until the last user goes away.
*
- * Must not be called in NMI context (strictly speaking, only if we don't
- * have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, but making the calling
- * conventions for vfree() arch-depenedent would be a really bad idea)
+ * If @addr is NULL, no operation is performed.
*
+ * Context:
* May sleep if called *not* from interrupt context.
- *
- * NOTE: assumes that the object at @addr has a size >= sizeof(llist_node)
+ * Must not be called in NMI context (strictly speaking, it could be
+ * if we have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, but making the calling
+ * conventions for vfree() arch-depenedent would be a really bad idea).
*/
void vfree(const void *addr)
{
--
2.28.0

2020-09-24 14:04:03

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 11/11] mm: remove alloc_vm_area

All users are gone now.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/vmalloc.h | 5 +----
mm/nommu.c | 7 ------
mm/vmalloc.c | 48 -----------------------------------------
3 files changed, 1 insertion(+), 59 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c77efeac242514..938eaf9517e266 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -169,6 +169,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
unsigned long flags,
unsigned long start, unsigned long end,
const void *caller);
+void free_vm_area(struct vm_struct *area);
extern struct vm_struct *remove_vm_area(const void *addr);
extern struct vm_struct *find_vm_area(const void *addr);

@@ -204,10 +205,6 @@ static inline void set_vm_flush_reset_perms(void *addr)
}
#endif

-/* Allocate/destroy a 'vmalloc' VM area. */
-extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
-extern void free_vm_area(struct vm_struct *area);
-
/* for /dev/kmem */
extern long vread(char *buf, char *addr, unsigned long count);
extern long vwrite(char *buf, char *addr, unsigned long count);
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af127..9272f30e4c4726 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -354,13 +354,6 @@ void vm_unmap_aliases(void)
}
EXPORT_SYMBOL_GPL(vm_unmap_aliases);

-struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
-{
- BUG();
- return NULL;
-}
-EXPORT_SYMBOL_GPL(alloc_vm_area);
-
void free_vm_area(struct vm_struct *area)
{
BUG();
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e2a2ded8d93478..3bc5b832451ef2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3083,54 +3083,6 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
}
EXPORT_SYMBOL(remap_vmalloc_range);

-static int f(pte_t *pte, unsigned long addr, void *data)
-{
- pte_t ***p = data;
-
- if (p) {
- *(*p) = pte;
- (*p)++;
- }
- return 0;
-}
-
-/**
- * alloc_vm_area - allocate a range of kernel address space
- * @size: size of the area
- * @ptes: returns the PTEs for the address space
- *
- * Returns: NULL on failure, vm_struct on success
- *
- * This function reserves a range of kernel address space, and
- * allocates pagetables to map that range. No actual mappings
- * are created.
- *
- * If @ptes is non-NULL, pointers to the PTEs (in init_mm)
- * allocated for the VM area are returned.
- */
-struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
-{
- struct vm_struct *area;
-
- area = get_vm_area_caller(size, VM_IOREMAP,
- __builtin_return_address(0));
- if (area == NULL)
- return NULL;
-
- /*
- * This ensures that page tables are constructed for this region
- * of kernel virtual address space and mapped into init_mm.
- */
- if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
- size, f, ptes ? &ptes : NULL)) {
- free_vm_area(area);
- return NULL;
- }
-
- return area;
-}
-EXPORT_SYMBOL_GPL(alloc_vm_area);
-
void free_vm_area(struct vm_struct *area)
{
struct vm_struct *ret;
--
2.28.0

2020-09-24 14:04:55

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 08/11] drm/i915: use vmap in i915_gem_object_map

i915_gem_object_map implements fairly low-level vmap functionality in
a driver. Split it into two helpers, one for remapping kernel memory
which can use vmap, and one for I/O memory that uses vmap_pfn.

The only practical difference is that alloc_vm_area prefeaults the
vmalloc area PTEs, which doesn't seem to be required here for the
kernel memory case (and could be added to vmap using a flag if actually
required).

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/i915/Kconfig | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 126 ++++++++++------------
2 files changed, 59 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 9afa5c4a6bf006..1e1cb245fca778 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -25,6 +25,7 @@ config DRM_I915
select CRC32
select SND_HDA_I915 if SND_HDA_CORE
select CEC_CORE if CEC_NOTIFIER
+ select VMAP_PFN
help
Choose this option if you have a system that has "Intel Graphics
Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6550c0bc824ea2..b519417667eb4b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -232,34 +232,21 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
return err;
}

-static inline pte_t iomap_pte(resource_size_t base,
- dma_addr_t offset,
- pgprot_t prot)
-{
- return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
-}
-
/* The 'mapping' part of i915_gem_object_pin_map() below */
-static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
- enum i915_map_type type)
+static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
+ enum i915_map_type type)
{
- unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
- struct sg_table *sgt = obj->mm.pages;
- pte_t *stack[32], **mem;
- struct vm_struct *area;
+ unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
+ struct page *stack[32], **pages = stack, *page;
+ struct sgt_iter iter;
pgprot_t pgprot;
+ void *vaddr;

- if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
- return NULL;
-
- if (GEM_WARN_ON(type == I915_MAP_WC &&
- !static_cpu_has(X86_FEATURE_PAT)))
- return NULL;
-
- /* A single page can always be kmapped */
- if (n_pte == 1 && type == I915_MAP_WB) {
- struct page *page = sg_page(sgt->sgl);
-
+ switch (type) {
+ default:
+ MISSING_CASE(type);
+ fallthrough; /* to use PAGE_KERNEL anyway */
+ case I915_MAP_WB:
/*
* On 32b, highmem using a finite set of indirect PTE (i.e.
* vmap) to provide virtual mappings of the high pages.
@@ -277,30 +264,8 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
* So if the page is beyond the 32b boundary, make an explicit
* vmap.
*/
- if (!PageHighMem(page))
- return page_address(page);
- }
-
- mem = stack;
- if (n_pte > ARRAY_SIZE(stack)) {
- /* Too big for stack -- allocate temporary array instead */
- mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
- if (!mem)
- return NULL;
- }
-
- area = alloc_vm_area(obj->base.size, mem);
- if (!area) {
- if (mem != stack)
- kvfree(mem);
- return NULL;
- }
-
- switch (type) {
- default:
- MISSING_CASE(type);
- fallthrough; /* to use PAGE_KERNEL anyway */
- case I915_MAP_WB:
+ if (n_pages == 1 && !PageHighMem(sg_page(obj->mm.pages->sgl)))
+ return page_address(sg_page(obj->mm.pages->sgl));
pgprot = PAGE_KERNEL;
break;
case I915_MAP_WC:
@@ -308,30 +273,49 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
break;
}

- if (i915_gem_object_has_struct_page(obj)) {
- struct sgt_iter iter;
- struct page *page;
- pte_t **ptes = mem;
+ if (n_pages > ARRAY_SIZE(stack)) {
+ /* Too big for stack -- allocate temporary array instead */
+ pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+ if (!pages)
+ return NULL;
+ }

- for_each_sgt_page(page, iter, sgt)
- **ptes++ = mk_pte(page, pgprot);
- } else {
- resource_size_t iomap;
- struct sgt_iter iter;
- pte_t **ptes = mem;
- dma_addr_t addr;
+ i = 0;
+ for_each_sgt_page(page, iter, obj->mm.pages)
+ pages[i++] = page;
+ vaddr = vmap(pages, n_pages, 0, pgprot);
+ if (pages != stack)
+ kvfree(pages);
+ return vaddr;
+}

- iomap = obj->mm.region->iomap.base;
- iomap -= obj->mm.region->region.start;
+static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
+ enum i915_map_type type)
+{
+ resource_size_t iomap = obj->mm.region->iomap.base -
+ obj->mm.region->region.start;
+ unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
+ unsigned long stack[32], *pfns = stack, i;
+ struct sgt_iter iter;
+ dma_addr_t addr;
+ void *vaddr;
+
+ if (type != I915_MAP_WC)
+ return NULL;

- for_each_sgt_daddr(addr, iter, sgt)
- **ptes++ = iomap_pte(iomap, addr, pgprot);
+ if (n_pfn > ARRAY_SIZE(stack)) {
+ /* Too big for stack -- allocate temporary array instead */
+ pfns = kvmalloc_array(n_pfn, sizeof(*pfns), GFP_KERNEL);
+ if (!pfns)
+ return NULL;
}

- if (mem != stack)
- kvfree(mem);
-
- return area->addr;
+ for_each_sgt_daddr(addr, iter, obj->mm.pages)
+ pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
+ vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL_IO));
+ if (pfns != stack)
+ kvfree(pfns);
+ return vaddr;
}

/* get, pin, and map the pages of the object into kernel space */
@@ -383,7 +367,13 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
}

if (!ptr) {
- ptr = i915_gem_object_map(obj, type);
+ if (GEM_WARN_ON(type == I915_MAP_WC &&
+ !static_cpu_has(X86_FEATURE_PAT)))
+ ptr = NULL;
+ else if (i915_gem_object_has_struct_page(obj))
+ ptr = i915_gem_object_map_page(obj, type);
+ else
+ ptr = i915_gem_object_map_pfn(obj, type);
if (!ptr) {
err = -ENOMEM;
goto err_unpin;
--
2.28.0

2020-09-24 14:04:59

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 05/11] zsmalloc: switch from alloc_vm_area to get_vm_area

Just manually pre-fault the PTEs using apply_to_page_range.

Co-developed-by: Minchan Kim <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/zsmalloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c36fdff9a37131..918c7b019b3d78 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1122,10 +1122,16 @@ static inline int __zs_cpu_up(struct mapping_area *area)
*/
if (area->vm)
return 0;
- area->vm = alloc_vm_area(PAGE_SIZE * 2, NULL);
+ area->vm = get_vm_area(PAGE_SIZE * 2, 0);
if (!area->vm)
return -ENOMEM;
- return 0;
+
+ /*
+ * Populate ptes in advance to avoid pte allocation with GFP_KERNEL
+ * in non-preemtible context of zs_map_object.
+ */
+ return apply_to_page_range(&init_mm, (unsigned long)area->vm->addr,
+ PAGE_SIZE * 2, NULL, NULL);
}

static inline void __zs_cpu_down(struct mapping_area *area)
--
2.28.0

2020-09-24 14:05:20

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 09/11] xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv

Replacing alloc_vm_area with get_vm_area_caller + apply_page_range
allows to fill put the phys_addr values directly instead of doing
another loop over all addresses.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/xen/xenbus/xenbus_client.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 2690318ad50f48..fd80e318b99cc7 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -73,16 +73,13 @@ struct map_ring_valloc {
struct xenbus_map_node *node;

/* Why do we need two arrays? See comment of __xenbus_map_ring */
- union {
- unsigned long addrs[XENBUS_MAX_RING_GRANTS];
- pte_t *ptes[XENBUS_MAX_RING_GRANTS];
- };
+ unsigned long addrs[XENBUS_MAX_RING_GRANTS];
phys_addr_t phys_addrs[XENBUS_MAX_RING_GRANTS];

struct gnttab_map_grant_ref map[XENBUS_MAX_RING_GRANTS];
struct gnttab_unmap_grant_ref unmap[XENBUS_MAX_RING_GRANTS];

- unsigned int idx; /* HVM only. */
+ unsigned int idx;
};

static DEFINE_SPINLOCK(xenbus_valloc_lock);
@@ -686,6 +683,14 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);

#ifdef CONFIG_XEN_PV
+static int map_ring_apply(pte_t *pte, unsigned long addr, void *data)
+{
+ struct map_ring_valloc *info = data;
+
+ info->phys_addrs[info->idx++] = arbitrary_virt_to_machine(pte).maddr;
+ return 0;
+}
+
static int xenbus_map_ring_pv(struct xenbus_device *dev,
struct map_ring_valloc *info,
grant_ref_t *gnt_refs,
@@ -694,18 +699,15 @@ static int xenbus_map_ring_pv(struct xenbus_device *dev,
{
struct xenbus_map_node *node = info->node;
struct vm_struct *area;
- int err = GNTST_okay;
- int i;
- bool leaked;
+ bool leaked = false;
+ int err = -ENOMEM;

- area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, info->ptes);
+ area = get_vm_area(XEN_PAGE_SIZE * nr_grefs, VM_IOREMAP);
if (!area)
return -ENOMEM;
-
- for (i = 0; i < nr_grefs; i++)
- info->phys_addrs[i] =
- arbitrary_virt_to_machine(info->ptes[i]).maddr;
-
+ if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
+ XEN_PAGE_SIZE * nr_grefs, map_ring_apply, info))
+ goto failed;
err = __xenbus_map_ring(dev, gnt_refs, nr_grefs, node->handles,
info, GNTMAP_host_map | GNTMAP_contains_pte,
&leaked);
--
2.28.0

2020-09-24 23:45:20

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 09/11] xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv

On 9/24/20 9:58 AM, Christoph Hellwig wrote:
> Replacing alloc_vm_area with get_vm_area_caller + apply_page_range
> allows to fill put the phys_addr values directly instead of doing
> another loop over all addresses.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

Reviewed-by: Boris Ostrovsky <[email protected]>

-boris

2020-09-24 23:48:27

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 10/11] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

On 9/24/20 9:58 AM, Christoph Hellwig wrote:
> Replace the last call to alloc_vm_area with an open coded version using
> an iterator in struct gnttab_vm_area instead of the triple indirection
> magic in alloc_vm_area.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

Reviewed-by: Boris Ostrovsky <[email protected]>

2020-09-25 13:05:45

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [PATCH 07/11] drm/i915: stop using kmap in i915_gem_object_map

On 24/09/2020 14:58, Christoph Hellwig wrote:
> kmap for !PageHighmem is just a convoluted way to say page_address,
> and kunmap is a no-op in that case.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_pages.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index d6eeefab3d018b..6550c0bc824ea2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -162,8 +162,6 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
> {
> if (is_vmalloc_addr(ptr))
> vunmap(ptr);
> - else
> - kunmap(kmap_to_page(ptr));
> }
>
> struct sg_table *
> @@ -277,11 +275,10 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> * forever.
> *
> * So if the page is beyond the 32b boundary, make an explicit
> - * vmap. On 64b, this check will be optimised away as we can
> - * directly kmap any page on the system.
> + * vmap.
> */
> if (!PageHighMem(page))
> - return kmap(page);
> + return page_address(page);
> }
>
> mem = stack;
>

Reviewed-by: Tvrtko Ursulin <[email protected]>

Regards,

Tvrtko

2020-09-25 13:13:36

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [PATCH 08/11] drm/i915: use vmap in i915_gem_object_map

On 24/09/2020 14:58, Christoph Hellwig wrote:
> i915_gem_object_map implements fairly low-level vmap functionality in
> a driver. Split it into two helpers, one for remapping kernel memory
> which can use vmap, and one for I/O memory that uses vmap_pfn.
>
> The only practical difference is that alloc_vm_area prefeaults the
> vmalloc area PTEs, which doesn't seem to be required here for the
> kernel memory case (and could be added to vmap using a flag if actually
> required).
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/gpu/drm/i915/Kconfig | 1 +
> drivers/gpu/drm/i915/gem/i915_gem_pages.c | 126 ++++++++++------------
> 2 files changed, 59 insertions(+), 68 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 9afa5c4a6bf006..1e1cb245fca778 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -25,6 +25,7 @@ config DRM_I915
> select CRC32
> select SND_HDA_I915 if SND_HDA_CORE
> select CEC_CORE if CEC_NOTIFIER
> + select VMAP_PFN
> help
> Choose this option if you have a system that has "Intel Graphics
> Media Accelerator" or "HD Graphics" integrated graphics,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6550c0bc824ea2..b519417667eb4b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -232,34 +232,21 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
> return err;
> }
>
> -static inline pte_t iomap_pte(resource_size_t base,
> - dma_addr_t offset,
> - pgprot_t prot)
> -{
> - return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
> -}
> -
> /* The 'mapping' part of i915_gem_object_pin_map() below */
> -static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> - enum i915_map_type type)
> +static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
> + enum i915_map_type type)
> {
> - unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
> - struct sg_table *sgt = obj->mm.pages;
> - pte_t *stack[32], **mem;
> - struct vm_struct *area;
> + unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
> + struct page *stack[32], **pages = stack, *page;
> + struct sgt_iter iter;
> pgprot_t pgprot;
> + void *vaddr;
>
> - if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
> - return NULL;
> -
> - if (GEM_WARN_ON(type == I915_MAP_WC &&
> - !static_cpu_has(X86_FEATURE_PAT)))
> - return NULL;
> -
> - /* A single page can always be kmapped */
> - if (n_pte == 1 && type == I915_MAP_WB) {
> - struct page *page = sg_page(sgt->sgl);
> -
> + switch (type) {
> + default:
> + MISSING_CASE(type);
> + fallthrough; /* to use PAGE_KERNEL anyway */
> + case I915_MAP_WB:
> /*
> * On 32b, highmem using a finite set of indirect PTE (i.e.
> * vmap) to provide virtual mappings of the high pages.
> @@ -277,30 +264,8 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> * So if the page is beyond the 32b boundary, make an explicit
> * vmap.
> */
> - if (!PageHighMem(page))
> - return page_address(page);
> - }
> -
> - mem = stack;
> - if (n_pte > ARRAY_SIZE(stack)) {
> - /* Too big for stack -- allocate temporary array instead */
> - mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
> - if (!mem)
> - return NULL;
> - }
> -
> - area = alloc_vm_area(obj->base.size, mem);
> - if (!area) {
> - if (mem != stack)
> - kvfree(mem);
> - return NULL;
> - }
> -
> - switch (type) {
> - default:
> - MISSING_CASE(type);
> - fallthrough; /* to use PAGE_KERNEL anyway */
> - case I915_MAP_WB:
> + if (n_pages == 1 && !PageHighMem(sg_page(obj->mm.pages->sgl)))
> + return page_address(sg_page(obj->mm.pages->sgl));
> pgprot = PAGE_KERNEL;
> break;
> case I915_MAP_WC:
> @@ -308,30 +273,49 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> break;
> }
>
> - if (i915_gem_object_has_struct_page(obj)) {
> - struct sgt_iter iter;
> - struct page *page;
> - pte_t **ptes = mem;
> + if (n_pages > ARRAY_SIZE(stack)) {
> + /* Too big for stack -- allocate temporary array instead */
> + pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
> + if (!pages)
> + return NULL;
> + }
>
> - for_each_sgt_page(page, iter, sgt)
> - **ptes++ = mk_pte(page, pgprot);
> - } else {
> - resource_size_t iomap;
> - struct sgt_iter iter;
> - pte_t **ptes = mem;
> - dma_addr_t addr;
> + i = 0;
> + for_each_sgt_page(page, iter, obj->mm.pages)
> + pages[i++] = page;
> + vaddr = vmap(pages, n_pages, 0, pgprot);
> + if (pages != stack)
> + kvfree(pages);
> + return vaddr;
> +}
>
> - iomap = obj->mm.region->iomap.base;
> - iomap -= obj->mm.region->region.start;
> +static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
> + enum i915_map_type type)
> +{
> + resource_size_t iomap = obj->mm.region->iomap.base -
> + obj->mm.region->region.start;
> + unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
> + unsigned long stack[32], *pfns = stack, i;
> + struct sgt_iter iter;
> + dma_addr_t addr;
> + void *vaddr;
> +
> + if (type != I915_MAP_WC)
> + return NULL;
>
> - for_each_sgt_daddr(addr, iter, sgt)
> - **ptes++ = iomap_pte(iomap, addr, pgprot);
> + if (n_pfn > ARRAY_SIZE(stack)) {
> + /* Too big for stack -- allocate temporary array instead */
> + pfns = kvmalloc_array(n_pfn, sizeof(*pfns), GFP_KERNEL);
> + if (!pfns)
> + return NULL;
> }
>
> - if (mem != stack)
> - kvfree(mem);
> -
> - return area->addr;
> + for_each_sgt_daddr(addr, iter, obj->mm.pages)
> + pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
> + vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL_IO));
> + if (pfns != stack)
> + kvfree(pfns);
> + return vaddr;
> }
>
> /* get, pin, and map the pages of the object into kernel space */
> @@ -383,7 +367,13 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
> }
>
> if (!ptr) {
> - ptr = i915_gem_object_map(obj, type);
> + if (GEM_WARN_ON(type == I915_MAP_WC &&
> + !static_cpu_has(X86_FEATURE_PAT)))
> + ptr = NULL;
> + else if (i915_gem_object_has_struct_page(obj))
> + ptr = i915_gem_object_map_page(obj, type);
> + else
> + ptr = i915_gem_object_map_pfn(obj, type);
> if (!ptr) {
> err = -ENOMEM;
> goto err_unpin;
>

Reviewed-by: Tvrtko Ursulin <[email protected]>

Regards,

Tvrtko

2020-09-25 14:13:54

by Matthew Auld

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 08/11] drm/i915: use vmap in i915_gem_object_map

On Thu, 24 Sep 2020 at 14:59, Christoph Hellwig <[email protected]> wrote:
>
> i915_gem_object_map implements fairly low-level vmap functionality in
> a driver. Split it into two helpers, one for remapping kernel memory
> which can use vmap, and one for I/O memory that uses vmap_pfn.
>
> The only practical difference is that alloc_vm_area prefeaults the
> vmalloc area PTEs, which doesn't seem to be required here for the
> kernel memory case (and could be added to vmap using a flag if actually
> required).
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/gpu/drm/i915/Kconfig | 1 +
> drivers/gpu/drm/i915/gem/i915_gem_pages.c | 126 ++++++++++------------
> 2 files changed, 59 insertions(+), 68 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 9afa5c4a6bf006..1e1cb245fca778 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -25,6 +25,7 @@ config DRM_I915
> select CRC32
> select SND_HDA_I915 if SND_HDA_CORE
> select CEC_CORE if CEC_NOTIFIER
> + select VMAP_PFN
> help
> Choose this option if you have a system that has "Intel Graphics
> Media Accelerator" or "HD Graphics" integrated graphics,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6550c0bc824ea2..b519417667eb4b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -232,34 +232,21 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
> return err;
> }
>
> -static inline pte_t iomap_pte(resource_size_t base,
> - dma_addr_t offset,
> - pgprot_t prot)
> -{
> - return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
> -}
> -
> /* The 'mapping' part of i915_gem_object_pin_map() below */
> -static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> - enum i915_map_type type)
> +static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
> + enum i915_map_type type)
> {
> - unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
> - struct sg_table *sgt = obj->mm.pages;
> - pte_t *stack[32], **mem;
> - struct vm_struct *area;
> + unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
> + struct page *stack[32], **pages = stack, *page;
> + struct sgt_iter iter;
> pgprot_t pgprot;
> + void *vaddr;
>
> - if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
> - return NULL;
> -
> - if (GEM_WARN_ON(type == I915_MAP_WC &&
> - !static_cpu_has(X86_FEATURE_PAT)))
> - return NULL;
> -
> - /* A single page can always be kmapped */
> - if (n_pte == 1 && type == I915_MAP_WB) {
> - struct page *page = sg_page(sgt->sgl);
> -
> + switch (type) {
> + default:
> + MISSING_CASE(type);
> + fallthrough; /* to use PAGE_KERNEL anyway */
> + case I915_MAP_WB:
> /*
> * On 32b, highmem using a finite set of indirect PTE (i.e.
> * vmap) to provide virtual mappings of the high pages.
> @@ -277,30 +264,8 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> * So if the page is beyond the 32b boundary, make an explicit
> * vmap.
> */
> - if (!PageHighMem(page))
> - return page_address(page);
> - }
> -
> - mem = stack;
> - if (n_pte > ARRAY_SIZE(stack)) {
> - /* Too big for stack -- allocate temporary array instead */
> - mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
> - if (!mem)
> - return NULL;
> - }
> -
> - area = alloc_vm_area(obj->base.size, mem);
> - if (!area) {
> - if (mem != stack)
> - kvfree(mem);
> - return NULL;
> - }
> -
> - switch (type) {
> - default:
> - MISSING_CASE(type);
> - fallthrough; /* to use PAGE_KERNEL anyway */
> - case I915_MAP_WB:
> + if (n_pages == 1 && !PageHighMem(sg_page(obj->mm.pages->sgl)))
> + return page_address(sg_page(obj->mm.pages->sgl));
> pgprot = PAGE_KERNEL;
> break;
> case I915_MAP_WC:
> @@ -308,30 +273,49 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> break;
> }
>
> - if (i915_gem_object_has_struct_page(obj)) {
> - struct sgt_iter iter;
> - struct page *page;
> - pte_t **ptes = mem;
> + if (n_pages > ARRAY_SIZE(stack)) {
> + /* Too big for stack -- allocate temporary array instead */
> + pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
> + if (!pages)
> + return NULL;
> + }
>
> - for_each_sgt_page(page, iter, sgt)
> - **ptes++ = mk_pte(page, pgprot);
> - } else {
> - resource_size_t iomap;
> - struct sgt_iter iter;
> - pte_t **ptes = mem;
> - dma_addr_t addr;
> + i = 0;
> + for_each_sgt_page(page, iter, obj->mm.pages)
> + pages[i++] = page;
> + vaddr = vmap(pages, n_pages, 0, pgprot);
> + if (pages != stack)
> + kvfree(pages);
> + return vaddr;
> +}
>
> - iomap = obj->mm.region->iomap.base;
> - iomap -= obj->mm.region->region.start;
> +static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
> + enum i915_map_type type)
> +{
> + resource_size_t iomap = obj->mm.region->iomap.base -
> + obj->mm.region->region.start;
> + unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
> + unsigned long stack[32], *pfns = stack, i;
> + struct sgt_iter iter;
> + dma_addr_t addr;
> + void *vaddr;
> +
> + if (type != I915_MAP_WC)
> + return NULL;
>
> - for_each_sgt_daddr(addr, iter, sgt)
> - **ptes++ = iomap_pte(iomap, addr, pgprot);
> + if (n_pfn > ARRAY_SIZE(stack)) {
> + /* Too big for stack -- allocate temporary array instead */
> + pfns = kvmalloc_array(n_pfn, sizeof(*pfns), GFP_KERNEL);
> + if (!pfns)
> + return NULL;
> }
>
> - if (mem != stack)
> - kvfree(mem);
> -
> - return area->addr;
> + for_each_sgt_daddr(addr, iter, obj->mm.pages)
> + pfns[i++] = (iomap + addr) >> PAGE_SHIFT;

Missing the i = 0 fix from Dan?

> + vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL_IO));
> + if (pfns != stack)
> + kvfree(pfns);
> + return vaddr;
> }
>
> /* get, pin, and map the pages of the object into kernel space */
> @@ -383,7 +367,13 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
> }
>
> if (!ptr) {
> - ptr = i915_gem_object_map(obj, type);
> + if (GEM_WARN_ON(type == I915_MAP_WC &&
> + !static_cpu_has(X86_FEATURE_PAT)))
> + ptr = NULL;
> + else if (i915_gem_object_has_struct_page(obj))
> + ptr = i915_gem_object_map_page(obj, type);
> + else
> + ptr = i915_gem_object_map_pfn(obj, type);
> if (!ptr) {
> err = -ENOMEM;
> goto err_unpin;
> --
> 2.28.0
>
> _______________________________________________
> Intel-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

2020-09-25 16:04:18

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 08/11] drm/i915: use vmap in i915_gem_object_map

On Fri, Sep 25, 2020 at 03:08:59PM +0100, Matthew Auld wrote:
> > + i = 0;
> > + for_each_sgt_page(page, iter, obj->mm.pages)
> > + pages[i++] = page;
> > + vaddr = vmap(pages, n_pages, 0, pgprot);
> > + if (pages != stack)
> > + kvfree(pages);
> > + return vaddr;
> > +}

> > - return area->addr;
> > + for_each_sgt_daddr(addr, iter, obj->mm.pages)
> > + pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
>
> Missing the i = 0 fix from Dan?

Yeah, looks like I only managed to apply the one in the page based
version above.

2020-09-25 16:12:53

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 08/11, fixed] drm/i915: use vmap in i915_gem_object_map

i915_gem_object_map implements fairly low-level vmap functionality in
a driver. Split it into two helpers, one for remapping kernel memory
which can use vmap, and one for I/O memory that uses vmap_pfn.

The only practical difference is that alloc_vm_area prefeaults the
vmalloc area PTEs, which doesn't seem to be required here for the
kernel memory case (and could be added to vmap using a flag if actually
required).

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/i915/Kconfig | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 127 ++++++++++------------
2 files changed, 60 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 9afa5c4a6bf006..1e1cb245fca778 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -25,6 +25,7 @@ config DRM_I915
select CRC32
select SND_HDA_I915 if SND_HDA_CORE
select CEC_CORE if CEC_NOTIFIER
+ select VMAP_PFN
help
Choose this option if you have a system that has "Intel Graphics
Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6550c0bc824ea2..f60ca6dc911f29 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -232,34 +232,21 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
return err;
}

-static inline pte_t iomap_pte(resource_size_t base,
- dma_addr_t offset,
- pgprot_t prot)
-{
- return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
-}
-
/* The 'mapping' part of i915_gem_object_pin_map() below */
-static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
- enum i915_map_type type)
+static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
+ enum i915_map_type type)
{
- unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
- struct sg_table *sgt = obj->mm.pages;
- pte_t *stack[32], **mem;
- struct vm_struct *area;
+ unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
+ struct page *stack[32], **pages = stack, *page;
+ struct sgt_iter iter;
pgprot_t pgprot;
+ void *vaddr;

- if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
- return NULL;
-
- if (GEM_WARN_ON(type == I915_MAP_WC &&
- !static_cpu_has(X86_FEATURE_PAT)))
- return NULL;
-
- /* A single page can always be kmapped */
- if (n_pte == 1 && type == I915_MAP_WB) {
- struct page *page = sg_page(sgt->sgl);
-
+ switch (type) {
+ default:
+ MISSING_CASE(type);
+ fallthrough; /* to use PAGE_KERNEL anyway */
+ case I915_MAP_WB:
/*
* On 32b, highmem using a finite set of indirect PTE (i.e.
* vmap) to provide virtual mappings of the high pages.
@@ -277,30 +264,8 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
* So if the page is beyond the 32b boundary, make an explicit
* vmap.
*/
- if (!PageHighMem(page))
- return page_address(page);
- }
-
- mem = stack;
- if (n_pte > ARRAY_SIZE(stack)) {
- /* Too big for stack -- allocate temporary array instead */
- mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
- if (!mem)
- return NULL;
- }
-
- area = alloc_vm_area(obj->base.size, mem);
- if (!area) {
- if (mem != stack)
- kvfree(mem);
- return NULL;
- }
-
- switch (type) {
- default:
- MISSING_CASE(type);
- fallthrough; /* to use PAGE_KERNEL anyway */
- case I915_MAP_WB:
+ if (n_pages == 1 && !PageHighMem(sg_page(obj->mm.pages->sgl)))
+ return page_address(sg_page(obj->mm.pages->sgl));
pgprot = PAGE_KERNEL;
break;
case I915_MAP_WC:
@@ -308,30 +273,50 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
break;
}

- if (i915_gem_object_has_struct_page(obj)) {
- struct sgt_iter iter;
- struct page *page;
- pte_t **ptes = mem;
+ if (n_pages > ARRAY_SIZE(stack)) {
+ /* Too big for stack -- allocate temporary array instead */
+ pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+ if (!pages)
+ return NULL;
+ }

- for_each_sgt_page(page, iter, sgt)
- **ptes++ = mk_pte(page, pgprot);
- } else {
- resource_size_t iomap;
- struct sgt_iter iter;
- pte_t **ptes = mem;
- dma_addr_t addr;
+ i = 0;
+ for_each_sgt_page(page, iter, obj->mm.pages)
+ pages[i++] = page;
+ vaddr = vmap(pages, n_pages, 0, pgprot);
+ if (pages != stack)
+ kvfree(pages);
+ return vaddr;
+}

- iomap = obj->mm.region->iomap.base;
- iomap -= obj->mm.region->region.start;
+static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
+ enum i915_map_type type)
+{
+ resource_size_t iomap = obj->mm.region->iomap.base -
+ obj->mm.region->region.start;
+ unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
+ unsigned long stack[32], *pfns = stack, i;
+ struct sgt_iter iter;
+ dma_addr_t addr;
+ void *vaddr;
+
+ if (type != I915_MAP_WC)
+ return NULL;

- for_each_sgt_daddr(addr, iter, sgt)
- **ptes++ = iomap_pte(iomap, addr, pgprot);
+ if (n_pfn > ARRAY_SIZE(stack)) {
+ /* Too big for stack -- allocate temporary array instead */
+ pfns = kvmalloc_array(n_pfn, sizeof(*pfns), GFP_KERNEL);
+ if (!pfns)
+ return NULL;
}

- if (mem != stack)
- kvfree(mem);
-
- return area->addr;
+ i = 0;
+ for_each_sgt_daddr(addr, iter, obj->mm.pages)
+ pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
+ vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL_IO));
+ if (pfns != stack)
+ kvfree(pfns);
+ return vaddr;
}

/* get, pin, and map the pages of the object into kernel space */
@@ -383,7 +368,13 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
}

if (!ptr) {
- ptr = i915_gem_object_map(obj, type);
+ if (GEM_WARN_ON(type == I915_MAP_WC &&
+ !static_cpu_has(X86_FEATURE_PAT)))
+ ptr = NULL;
+ else if (i915_gem_object_has_struct_page(obj))
+ ptr = i915_gem_object_map_page(obj, type);
+ else
+ ptr = i915_gem_object_map_pfn(obj, type);
if (!ptr) {
err = -ENOMEM;
goto err_unpin;
--
2.28.0

2020-09-26 02:46:09

by Andrew Morton

[permalink] [raw]

Subject: Re: remove alloc_vm_area v2

On Thu, 24 Sep 2020 15:58:42 +0200 Christoph Hellwig <[email protected]> wrote:

> this series removes alloc_vm_area, which was left over from the big
> vmalloc interface rework. It is a rather arkane interface, basicaly
> the equivalent of get_vm_area + actually faulting in all PTEs in
> the allocated area. It was originally addeds for Xen (which isn't
> modular to start with), and then grew users in zsmalloc and i915
> which seems to mostly qualify as abuses of the interface, especially
> for i915 as a random driver should not set up PTE bits directly.
>
> Note that the i915 patches apply to the drm-tip branch of the drm-tip
> tree, as that tree has recent conflicting commits in the same area.

Is the drm-tip material in linux-next yet? I'm still seeing a non-trivial
reject in there at present.

2020-09-26 06:31:58

by Christoph Hellwig

[permalink] [raw]

Subject: Re: remove alloc_vm_area v2

On Fri, Sep 25, 2020 at 07:43:49PM -0700, Andrew Morton wrote:
> On Thu, 24 Sep 2020 15:58:42 +0200 Christoph Hellwig <[email protected]> wrote:
>
> > this series removes alloc_vm_area, which was left over from the big
> > vmalloc interface rework. It is a rather arkane interface, basicaly
> > the equivalent of get_vm_area + actually faulting in all PTEs in
> > the allocated area. It was originally addeds for Xen (which isn't
> > modular to start with), and then grew users in zsmalloc and i915
> > which seems to mostly qualify as abuses of the interface, especially
> > for i915 as a random driver should not set up PTE bits directly.
> >
> > Note that the i915 patches apply to the drm-tip branch of the drm-tip
> > tree, as that tree has recent conflicting commits in the same area.
>
> Is the drm-tip material in linux-next yet? I'm still seeing a non-trivial
> reject in there at present.

I assumed it was, but the reject imply that they aren't. Tvrtko, do you
know the details?

2020-09-28 10:17:54

by Joonas Lahtinen

[permalink] [raw]

Subject: Re: remove alloc_vm_area v2

+ Dave and Daniel
+ Stephen

Quoting Christoph Hellwig (2020-09-26 09:29:59)
> On Fri, Sep 25, 2020 at 07:43:49PM -0700, Andrew Morton wrote:
> > On Thu, 24 Sep 2020 15:58:42 +0200 Christoph Hellwig <[email protected]> wrote:
> >
> > > this series removes alloc_vm_area, which was left over from the big
> > > vmalloc interface rework. It is a rather arkane interface, basicaly
> > > the equivalent of get_vm_area + actually faulting in all PTEs in
> > > the allocated area. It was originally addeds for Xen (which isn't
> > > modular to start with), and then grew users in zsmalloc and i915
> > > which seems to mostly qualify as abuses of the interface, especially
> > > for i915 as a random driver should not set up PTE bits directly.
> > >
> > > Note that the i915 patches apply to the drm-tip branch of the drm-tip
> > > tree, as that tree has recent conflicting commits in the same area.
> >
> > Is the drm-tip material in linux-next yet? I'm still seeing a non-trivial
> > reject in there at present.
>
> I assumed it was, but the reject imply that they aren't. Tvrtko, do you
> know the details?

I think we have a gap that after splitting the drm-intel-next pull requests into
two the drm-intel/for-linux-next branch is now missing material from
drm-intel/drm-intel-gt-next.

I think a simple course of action might be to start including drm-intel-gt-next
in linux-next, which would mean that we should update DIM tooling to add
extra branch "drm-intel/gt-for-linux-next" or so.

Which specific patches are missing in this case?

Regards, Joonas

2020-09-28 12:39:26

by Christoph Hellwig

[permalink] [raw]

Subject: Re: remove alloc_vm_area v2

On Mon, Sep 28, 2020 at 01:13:38PM +0300, Joonas Lahtinen wrote:
> I think we have a gap that after splitting the drm-intel-next pull requests into
> two the drm-intel/for-linux-next branch is now missing material from
> drm-intel/drm-intel-gt-next.
>
> I think a simple course of action might be to start including drm-intel-gt-next
> in linux-next, which would mean that we should update DIM tooling to add
> extra branch "drm-intel/gt-for-linux-next" or so.
>
> Which specific patches are missing in this case?

The two dependencies required by my series not in mainline are:

drm/i915/gem: Avoid implicit vmap for highmem on x86-32
drm/i915/gem: Prevent using pgprot_writecombine() if PAT is not supported

so it has to be one or both of those.

2020-09-29 12:46:38

by Joonas Lahtinen

[permalink] [raw]

Subject: Re: remove alloc_vm_area v2

Quoting Christoph Hellwig (2020-09-28 15:37:41)
> On Mon, Sep 28, 2020 at 01:13:38PM +0300, Joonas Lahtinen wrote:
> > I think we have a gap that after splitting the drm-intel-next pull requests into
> > two the drm-intel/for-linux-next branch is now missing material from
> > drm-intel/drm-intel-gt-next.
> >
> > I think a simple course of action might be to start including drm-intel-gt-next
> > in linux-next, which would mean that we should update DIM tooling to add
> > extra branch "drm-intel/gt-for-linux-next" or so.
> >
> > Which specific patches are missing in this case?
>
> The two dependencies required by my series not in mainline are:
>
> drm/i915/gem: Avoid implicit vmap for highmem on x86-32
> drm/i915/gem: Prevent using pgprot_writecombine() if PAT is not supported
>
> so it has to be one or both of those.

Hmm, those are both committed after our last -next pull request, so they
would normally only target next merge window. drm-next closes the merge
window around -rc5 already.

But, in this specific case those are both Fixes: patches with Cc: stable,
so they should be pulled into drm-intel-next-fixes PR.

Rodrigo, can you cherry-pick those patches to -next-fixes that you send
to Dave?

Regards, Joonas

2020-09-30 14:52:12

by Christoph Hellwig

[permalink] [raw]

Subject: Re: remove alloc_vm_area v2

On Tue, Sep 29, 2020 at 03:43:30PM +0300, Joonas Lahtinen wrote:
> Hmm, those are both committed after our last -next pull request, so they
> would normally only target next merge window. drm-next closes the merge
> window around -rc5 already.
>
> But, in this specific case those are both Fixes: patches with Cc: stable,
> so they should be pulled into drm-intel-next-fixes PR.
>
> Rodrigo, can you cherry-pick those patches to -next-fixes that you send
> to Dave?

They still haven't made it to linux-next. I think for now I'll just
rebase without them again and then you can handle the conflicts for
5.11.

2020-09-30 18:39:57

by Daniel Vetter

[permalink] [raw]

Subject: Re: remove alloc_vm_area v2

On Wed, Sep 30, 2020 at 4:48 PM Christoph Hellwig <[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 03:43:30PM +0300, Joonas Lahtinen wrote:
> > Hmm, those are both committed after our last -next pull request, so they
> > would normally only target next merge window. drm-next closes the merge
> > window around -rc5 already.
> >
> > But, in this specific case those are both Fixes: patches with Cc: stable,
> > so they should be pulled into drm-intel-next-fixes PR.
> >
> > Rodrigo, can you cherry-pick those patches to -next-fixes that you send
> > to Dave?
>
> They still haven't made it to linux-next. I think for now I'll just
> rebase without them again and then you can handle the conflicts for
> 5.11.

Yeah after -rc6 drm is frozen for features, so anything that's stuck
in subordinate trees rolls over to the next merge cycle. To avoid
upsetting sfr from linux-next we keep those -next branches out of
linux-next until after -rc1 again. iow, rebasing onto linux-next and
smashing this into 5.10 sounds like the right approach (since everyone
else freezes a bunch later afaik).

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch