LinuxLists.cc - remove alloc_vm

2020-09-18 16:44:12

Subject: remove alloc_vm_area

Hi Andrew,

this series removes alloc_vm_area, which was left over from the big
vmalloc interface rework. It is a rather arkane interface, basicaly
the equivalent of get_vm_area + actually faulting in all PTEs in
the allocated area. It was originally addeds for Xen (which isn't
modular to start with), and then grew users in zsmalloc and i915
which seems to mostly qualify as abuses of the interface, especially
for i915 as a random driver should not set up PTE bits directly.

Note that my laptop doesn't seem to actually exercise the new vmap_pfn
path, so careful review from the i915 maintainers is very welcome.

Also I wonder why zsmalloc is even doing the manual allocation of kernel
virtual address space plus mapping into it. IMHO zsmalloc should be
using our normal vm_map_ram / vm_unmap_ram interface instead of being so
special, which would also allow building it as a module again for the
virtual mapping case.

Diffstat:
arch/x86/xen/grant-table.c | 27 +++++---
drivers/gpu/drm/i915/Kconfig | 1
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 101 +++++++++++++-----------------
drivers/gpu/drm/i915/gt/shmem_utils.c | 90 +++++++++++---------------
drivers/xen/xenbus/xenbus_client.c | 30 ++++----
include/linux/vmalloc.h | 6 -
mm/Kconfig | 3
mm/nommu.c | 7 --
mm/vmalloc.c | 93 +++++++++++++--------------
mm/zsmalloc.c | 2
10 files changed, 172 insertions(+), 188 deletions(-)

2020-09-18 16:46:14

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 2/6] mm: add a vmap_pfn function

Add a proper helper to remap PFNs into kernel virtual space so that
drivers don't have to abuse alloc_vm_area and open coded PTE
manipulation for it.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/vmalloc.h | 1 +
mm/Kconfig | 3 +++
mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 49 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 0221f852a7e1a3..8ecd92a947ee0c 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -121,6 +121,7 @@ extern void vfree_atomic(const void *addr);

extern void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
+void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot);
extern void vunmap(const void *addr);

extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f97..6fa7ba1199eb1e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,6 +815,9 @@ config DEVICE_PRIVATE
memory; i.e., memory that is only accessible from the device (or
group of devices). You likely also want to select HMM_MIRROR.

+config VMAP_PFN
+ bool
+
config FRAME_VECTOR
bool

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index be4724b916b3e7..59f2afcf26c312 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2407,6 +2407,51 @@ void *vmap(struct page **pages, unsigned int count,
}
EXPORT_SYMBOL(vmap);

+#ifdef CONFIG_VMAP_PFN
+struct vmap_pfn_data {
+ unsigned long *pfns;
+ pgprot_t prot;
+ unsigned int idx;
+};
+
+static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
+{
+ struct vmap_pfn_data *data = private;
+
+ if (WARN_ON_ONCE(pfn_valid(data->pfns[data->idx])))
+ return -EINVAL;
+ *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
+ return 0;
+}
+
+/**
+ * vmap_pfn - map an array of PFNs into virtually contiguous space
+ * @pfns: array of PFNs
+ * @count: number of pages to map
+ * @prot: page protection for the mapping
+ *
+ * Maps @count PFNs from @pfns into contiguous kernel virtual space and returns
+ * the start address of the mapping.
+ */
+void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot)
+{
+ struct vmap_pfn_data data = { .pfns = pfns, .prot = pgprot_nx(prot) };
+ struct vm_struct *area;
+
+ area = get_vm_area_caller(count * PAGE_SIZE, VM_IOREMAP,
+ __builtin_return_address(0));
+ if (!area)
+ return NULL;
+ if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
+ count * PAGE_SIZE, vmap_pfn_apply, &data)) {
+ free_vm_area(area);
+ return NULL;
+ }
+ return area->addr;
+}
+EXPORT_SYMBOL_GPL(vmap_pfn);
+#endif /* CONFIG_VMAP_PFN */
+
static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
pgprot_t prot, int node)
{
--
2.28.0

2020-09-18 16:49:54

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

i915_gem_object_map implements fairly low-level vmap functionality in
a driver. Split it into two helpers, one for remapping kernel memory
which can use vmap, and one for I/O memory that uses vmap_pfn.

The only practical difference is that alloc_vm_area prefeaults the
vmalloc area PTEs, which doesn't seem to be required here for the
kernel memory case (and could be added to vmap using a flag if actually
required).

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/i915/Kconfig | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 101 ++++++++++------------
2 files changed, 47 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 9afa5c4a6bf006..1e1cb245fca778 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -25,6 +25,7 @@ config DRM_I915
select CRC32
select SND_HDA_I915 if SND_HDA_CORE
select CEC_CORE if CEC_NOTIFIER
+ select VMAP_PFN
help
Choose this option if you have a system that has "Intel Graphics
Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index e8a083743e0927..90029ea83aede9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -234,50 +234,24 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
return err;
}

-static inline pte_t iomap_pte(resource_size_t base,
- dma_addr_t offset,
- pgprot_t prot)
-{
- return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
-}
-
/* The 'mapping' part of i915_gem_object_pin_map() below */
-static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
+static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
enum i915_map_type type)
{
- unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
- struct sg_table *sgt = obj->mm.pages;
- pte_t *stack[32], **mem;
- struct vm_struct *area;
+ unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
+ struct page *stack[32], **pages = stack, *page;
+ struct sgt_iter iter;
pgprot_t pgprot;
-
- if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
- return NULL;
-
- /* A single page can always be kmapped */
- if (n_pte == 1 && type == I915_MAP_WB)
- return kmap(sg_page(sgt->sgl));
-
- mem = stack;
- if (n_pte > ARRAY_SIZE(stack)) {
- /* Too big for stack -- allocate temporary array instead */
- mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
- if (!mem)
- return NULL;
- }
-
- area = alloc_vm_area(obj->base.size, mem);
- if (!area) {
- if (mem != stack)
- kvfree(mem);
- return NULL;
- }
+ void *vaddr;

switch (type) {
default:
MISSING_CASE(type);
fallthrough; /* to use PAGE_KERNEL anyway */
case I915_MAP_WB:
+ /* A single page can always be kmapped */
+ if (n_pages == 1)
+ return kmap(sg_page(obj->mm.pages->sgl));
pgprot = PAGE_KERNEL;
break;
case I915_MAP_WC:
@@ -285,30 +259,44 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
break;
}

- if (i915_gem_object_has_struct_page(obj)) {
- struct sgt_iter iter;
- struct page *page;
- pte_t **ptes = mem;
-
- for_each_sgt_page(page, iter, sgt)
- **ptes++ = mk_pte(page, pgprot);
- } else {
- resource_size_t iomap;
- struct sgt_iter iter;
- pte_t **ptes = mem;
- dma_addr_t addr;
+ if (n_pages > ARRAY_SIZE(stack)) {
+ /* Too big for stack -- allocate temporary array instead */
+ pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+ if (!pages)
+ return NULL;
+ }

- iomap = obj->mm.region->iomap.base;
- iomap -= obj->mm.region->region.start;
+ for_each_sgt_page(page, iter, obj->mm.pages)
+ pages[i++] = page;
+ vaddr = vmap(pages, n_pages, 0, pgprot);
+ if (pages != stack)
+ kvfree(pages);
+ return vaddr;
+}

- for_each_sgt_daddr(addr, iter, sgt)
- **ptes++ = iomap_pte(iomap, addr, pgprot);
+static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj)
+{
+ resource_size_t iomap = obj->mm.region->iomap.base -
+ obj->mm.region->region.start;
+ unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
+ unsigned long stack[32], *pfns = stack, i;
+ struct sgt_iter iter;
+ dma_addr_t addr;
+ void *vaddr;
+
+ if (n_pfn > ARRAY_SIZE(stack)) {
+ /* Too big for stack -- allocate temporary array instead */
+ pfns = kvmalloc_array(n_pfn, sizeof(*pfns), GFP_KERNEL);
+ if (!pfns)
+ return NULL;
}

- if (mem != stack)
- kvfree(mem);
-
- return area->addr;
+ for_each_sgt_daddr(addr, iter, obj->mm.pages)
+ pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
+ vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL_IO));
+ if (pfns != stack)
+ kvfree(pfns);
+ return vaddr;
}

/* get, pin, and map the pages of the object into kernel space */
@@ -360,7 +348,10 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
}

if (!ptr) {
- ptr = i915_gem_object_map(obj, type);
+ if (i915_gem_object_has_struct_page(obj))
+ ptr = i915_gem_object_map_page(obj, type);
+ else if (type == I915_MAP_WC)
+ ptr = i915_gem_object_map_pfn(obj);
if (!ptr) {
err = -ENOMEM;
goto err_unpin;
--
2.28.0

2020-09-18 16:50:00

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

shmem_pin_map somewhat awkwardly reimplements vmap using
alloc_vm_area and manual pte setup. The only practical difference
is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't
seem to be required here (and could be added to vmap using a flag
if actually required).

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/i915/gt/shmem_utils.c | 90 +++++++++++----------------
1 file changed, 38 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
index 43c7acbdc79dea..77410091597f19 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -49,80 +49,66 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj)
return file;
}

-static size_t shmem_npte(struct file *file)
+static size_t shmem_npages(struct file *file)
{
return file->f_mapping->host->i_size >> PAGE_SHIFT;
}

-static void __shmem_unpin_map(struct file *file, void *ptr, size_t n_pte)
-{
- unsigned long pfn;
-
- vunmap(ptr);
-
- for (pfn = 0; pfn < n_pte; pfn++) {
- struct page *page;
-
- page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
- GFP_KERNEL);
- if (!WARN_ON(IS_ERR(page))) {
- put_page(page);
- put_page(page);
- }
- }
-}
-
void *shmem_pin_map(struct file *file)
{
- const size_t n_pte = shmem_npte(file);
- pte_t *stack[32], **ptes, **mem;
- struct vm_struct *area;
- unsigned long pfn;
-
- mem = stack;
- if (n_pte > ARRAY_SIZE(stack)) {
- mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
- if (!mem)
+ const size_t n_pages = shmem_npages(file);
+ struct page **pages, *stack[32];
+ void *vaddr;
+ long i;
+
+ pages = stack;
+ if (n_pages > ARRAY_SIZE(stack)) {
+ pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+ if (!pages)
return NULL;
}

- area = alloc_vm_area(n_pte << PAGE_SHIFT, mem);
- if (!area) {
- if (mem != stack)
- kvfree(mem);
- return NULL;
- }
-
- ptes = mem;
- for (pfn = 0; pfn < n_pte; pfn++) {
- struct page *page;
-
- page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
- GFP_KERNEL);
- if (IS_ERR(page))
+ for (i = 0; i < n_pages; i++) {
+ pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i,
+ GFP_KERNEL);
+ if (IS_ERR(pages[i]))
goto err_page;
-
- **ptes++ = mk_pte(page, PAGE_KERNEL);
}

- if (mem != stack)
- kvfree(mem);
+ vaddr = vmap(pages, n_pages, 0, PAGE_KERNEL);
+ if (!vaddr)
+ goto err_page;

+ if (pages != stack)
+ kvfree(pages);
mapping_set_unevictable(file->f_mapping);
- return area->addr;
+ return vaddr;

err_page:
- if (mem != stack)
- kvfree(mem);
-
- __shmem_unpin_map(file, area->addr, pfn);
+ while (--i >= 0)
+ put_page(pages[i]);
+ if (pages != stack)
+ kvfree(pages);
return NULL;
}

void shmem_unpin_map(struct file *file, void *ptr)
{
+ long i = shmem_npages(file);
+
mapping_clear_unevictable(file->f_mapping);
- __shmem_unpin_map(file, ptr, shmem_npte(file));
+ vunmap(ptr);
+
+ for (i = 0; i < shmem_npages(file); i++) {
+ struct page *page;
+
+ page = shmem_read_mapping_page_gfp(file->f_mapping, i,
+ GFP_KERNEL);
+ if (!WARN_ON(IS_ERR(page))) {
+ put_page(page);
+ put_page(page);
+ }
+ }
}

static int __shmem_rw(struct file *file, loff_t off,
--
2.28.0

2020-09-18 16:52:02

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 5/6] xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv

Replacing alloc_vm_area with get_vm_area_caller + apply_page_range
allows to fill put the phys_addr values directly instead of doing
another loop over all addresses.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/xen/xenbus/xenbus_client.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 2690318ad50f48..fd80e318b99cc7 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -73,16 +73,13 @@ struct map_ring_valloc {
struct xenbus_map_node *node;

/* Why do we need two arrays? See comment of __xenbus_map_ring */
- union {
- unsigned long addrs[XENBUS_MAX_RING_GRANTS];
- pte_t *ptes[XENBUS_MAX_RING_GRANTS];
- };
+ unsigned long addrs[XENBUS_MAX_RING_GRANTS];
phys_addr_t phys_addrs[XENBUS_MAX_RING_GRANTS];

struct gnttab_map_grant_ref map[XENBUS_MAX_RING_GRANTS];
struct gnttab_unmap_grant_ref unmap[XENBUS_MAX_RING_GRANTS];

- unsigned int idx; /* HVM only. */
+ unsigned int idx;
};

static DEFINE_SPINLOCK(xenbus_valloc_lock);
@@ -686,6 +683,14 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);

#ifdef CONFIG_XEN_PV
+static int map_ring_apply(pte_t *pte, unsigned long addr, void *data)
+{
+ struct map_ring_valloc *info = data;
+
+ info->phys_addrs[info->idx++] = arbitrary_virt_to_machine(pte).maddr;
+ return 0;
+}
+
static int xenbus_map_ring_pv(struct xenbus_device *dev,
struct map_ring_valloc *info,
grant_ref_t *gnt_refs,
@@ -694,18 +699,15 @@ static int xenbus_map_ring_pv(struct xenbus_device *dev,
{
struct xenbus_map_node *node = info->node;
struct vm_struct *area;
- int err = GNTST_okay;
- int i;
- bool leaked;
+ bool leaked = false;
+ int err = -ENOMEM;

- area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, info->ptes);
+ area = get_vm_area(XEN_PAGE_SIZE * nr_grefs, VM_IOREMAP);
if (!area)
return -ENOMEM;
-
- for (i = 0; i < nr_grefs; i++)
- info->phys_addrs[i] =
- arbitrary_virt_to_machine(info->ptes[i]).maddr;
-
+ if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
+ XEN_PAGE_SIZE * nr_grefs, map_ring_apply, info))
+ goto failed;
err = __xenbus_map_ring(dev, gnt_refs, nr_grefs, node->handles,
info, GNTMAP_host_map | GNTMAP_contains_pte,
&leaked);
--
2.28.0

2020-09-18 16:56:45

by Christoph Hellwig

[permalink] [raw]

Subject: [PATCH 6/6] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

Open code alloc_vm_area in the last remaining caller.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/x86/xen/grant-table.c | 27 +++++++++++++++------
include/linux/vmalloc.h | 5 +---
mm/nommu.c | 7 ------
mm/vmalloc.c | 48 --------------------------------------
4 files changed, 21 insertions(+), 66 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 4988e19598c8a5..ccb377c07c651f 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -90,19 +90,32 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
}
}

+static int gnttab_apply(pte_t *pte, unsigned long addr, void *data)
+{
+ pte_t ***p = data;
+
+ **p = pte;
+ (*p)++;
+ return 0;
+}
+
static int arch_gnttab_valloc(struct gnttab_vm_area *area, unsigned nr_frames)
{
area->ptes = kmalloc_array(nr_frames, sizeof(*area->ptes), GFP_KERNEL);
if (area->ptes == NULL)
return -ENOMEM;
-
- area->area = alloc_vm_area(PAGE_SIZE * nr_frames, area->ptes);
- if (area->area == NULL) {
- kfree(area->ptes);
- return -ENOMEM;
- }
-
+ area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_IOREMAP);
+ if (!area->area)
+ goto out_free_ptes;
+ if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
+ PAGE_SIZE * nr_frames, gnttab_apply, &area->ptes))
+ goto out_free_vm_area;
return 0;
+out_free_vm_area:
+ free_vm_area(area->area);
+out_free_ptes:
+ kfree(area->ptes);
+ return -ENOMEM;
}

static void arch_gnttab_vfree(struct gnttab_vm_area *area)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 8ecd92a947ee0c..a1a4e2f8163504 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -168,6 +168,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
unsigned long flags,
unsigned long start, unsigned long end,
const void *caller);
+void free_vm_area(struct vm_struct *area);
extern struct vm_struct *remove_vm_area(const void *addr);
extern struct vm_struct *find_vm_area(const void *addr);

@@ -203,10 +204,6 @@ static inline void set_vm_flush_reset_perms(void *addr)
}
#endif

-/* Allocate/destroy a 'vmalloc' VM area. */
-extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
-extern void free_vm_area(struct vm_struct *area);
-
/* for /dev/kmem */
extern long vread(char *buf, char *addr, unsigned long count);
extern long vwrite(char *buf, char *addr, unsigned long count);
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af127..9272f30e4c4726 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -354,13 +354,6 @@ void vm_unmap_aliases(void)
}
EXPORT_SYMBOL_GPL(vm_unmap_aliases);

-struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
-{
- BUG();
- return NULL;
-}
-EXPORT_SYMBOL_GPL(alloc_vm_area);
-
void free_vm_area(struct vm_struct *area)
{
BUG();
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 59f2afcf26c312..9f29147deca580 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3077,54 +3077,6 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
}
EXPORT_SYMBOL(remap_vmalloc_range);

-static int f(pte_t *pte, unsigned long addr, void *data)
-{
- pte_t ***p = data;
-
- if (p) {
- *(*p) = pte;
- (*p)++;
- }
- return 0;
-}
-
-/**
- * alloc_vm_area - allocate a range of kernel address space
- * @size: size of the area
- * @ptes: returns the PTEs for the address space
- *
- * Returns: NULL on failure, vm_struct on success
- *
- * This function reserves a range of kernel address space, and
- * allocates pagetables to map that range. No actual mappings
- * are created.
- *
- * If @ptes is non-NULL, pointers to the PTEs (in init_mm)
- * allocated for the VM area are returned.
- */
-struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
-{
- struct vm_struct *area;
-
- area = get_vm_area_caller(size, VM_IOREMAP,
- __builtin_return_address(0));
- if (area == NULL)
- return NULL;
-
- /*
- * This ensures that page tables are constructed for this region
- * of kernel virtual address space and mapped into init_mm.
- */
- if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
- size, f, ptes ? &ptes : NULL)) {
- free_vm_area(area);
- return NULL;
- }
-
- return area;
-}
-EXPORT_SYMBOL_GPL(alloc_vm_area);
-
void free_vm_area(struct vm_struct *area)
{
struct vm_struct *ret;
--
2.28.0

2020-09-21 19:16:23

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

On Fri, Sep 18, 2020 at 06:37:21PM +0200, Christoph Hellwig wrote:
> void shmem_unpin_map(struct file *file, void *ptr)
> {
> + long i = shmem_npages(file);
> +
> mapping_clear_unevictable(file->f_mapping);
> - __shmem_unpin_map(file, ptr, shmem_npte(file));
> + vunmap(ptr);
> +
> + for (i = 0; i < shmem_npages(file); i++) {
> + struct page *page;
> +
> + page = shmem_read_mapping_page_gfp(file->f_mapping, i,
> + GFP_KERNEL);
> + if (!WARN_ON(IS_ERR(page))) {
> + put_page(page);
> + put_page(page);
> + }
> + }
> }

This is awkward. I'd like it if we had a vfree() variant which called
put_page() instead of __free_pages(). I'd like it even more if we
used release_pages() instead of our own loop that called put_page().

Perhaps something like this ...

+++ b/mm/vmalloc.c
@@ -2262,7 +2262,7 @@ static void __vunmap(const void *addr, int deallocate_page
s)

vm_remove_mappings(area, deallocate_pages);

- if (deallocate_pages) {
+ if (deallocate_pages == 1) {
int i;

for (i = 0; i < area->nr_pages; i++) {
@@ -2271,8 +2271,12 @@ static void __vunmap(const void *addr, int deallocate_pages)
BUG_ON(!page);
__free_pages(page, 0);
}
- atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
+ } else if (deallocate_pages == 2) {
+ release_pages(area->pages, area->nr_pages);
+ }

+ if (deallocate_pages) {
+ atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
kvfree(area->pages);
}
@@ -2369,6 +2373,14 @@ void vunmap(const void *addr)
}
EXPORT_SYMBOL(vunmap);

+void vunmap_put_pages(const void *addr)
+{
+ BUG_ON(in_interrupt());
+ might_sleep();
+ if (addr)
+ __vunmap(addr, 2);
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers

only with kernel-doc and so on. I bet somebody has a better idea for a name.

2020-09-21 20:46:59

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

On 9/18/20 12:37 PM, Christoph Hellwig wrote:
>
> +static int gnttab_apply(pte_t *pte, unsigned long addr, void *data)
> +{
> + pte_t ***p = data;
> +
> + **p = pte;
> + (*p)++;
> + return 0;
> +}
> +
> static int arch_gnttab_valloc(struct gnttab_vm_area *area, unsigned nr_frames)
> {
> area->ptes = kmalloc_array(nr_frames, sizeof(*area->ptes), GFP_KERNEL);
> if (area->ptes == NULL)
> return -ENOMEM;
> -
> - area->area = alloc_vm_area(PAGE_SIZE * nr_frames, area->ptes);
> - if (area->area == NULL) {
> - kfree(area->ptes);
> - return -ENOMEM;
> - }
> -
> + area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_IOREMAP);
> + if (!area->area)
> + goto out_free_ptes;
> + if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
> + PAGE_SIZE * nr_frames, gnttab_apply, &area->ptes))

This will end up incrementing area->ptes pointer. So perhaps something like

pte_t **ptes = area->ptes;

if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
PAGE_SIZE * nr_frames, gnttab_apply, &ptes)) {

...

}

-boris

2020-09-22 08:30:11

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

On Mon, Sep 21, 2020 at 08:11:57PM +0100, Matthew Wilcox wrote:
> This is awkward. I'd like it if we had a vfree() variant which called
> put_page() instead of __free_pages(). I'd like it even more if we
> used release_pages() instead of our own loop that called put_page().

Note that we don't need a new vfree variant, we can do this manually if
we want, take a look at kernel/dma/remap.c. But I thought this code
intentionally doesn't want to do that to avoid locking in the memory
for the pages array. Maybe the i915 maintainers can clarify.

2020-09-22 08:38:50

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

On 22/09/2020 07:22, Christoph Hellwig wrote:
> On Mon, Sep 21, 2020 at 08:11:57PM +0100, Matthew Wilcox wrote:
>> This is awkward. I'd like it if we had a vfree() variant which called
>> put_page() instead of __free_pages(). I'd like it even more if we
>> used release_pages() instead of our own loop that called put_page().
>
> Note that we don't need a new vfree variant, we can do this manually if
> we want, take a look at kernel/dma/remap.c. But I thought this code
> intentionally doesn't want to do that to avoid locking in the memory
> for the pages array. Maybe the i915 maintainers can clarify.

+ Chris & Matt who were involved with this part of i915.

If I understood this sub-thread correctly, iterating and freeing the
pages via the vmapped ptes, so no need for a
shmem_read_mapping_page_gfp loop in shmem_unpin_map looks plausible to me.

I did not get the reference to kernel/dma/remap.c though, and also not
sure how to do the error unwind path in shmem_pin_map at which point the
allocated vm area hasn't been fully populated yet. Hand-roll the loop
walking vm area struct in there?

Regards,

Tvrtko

2020-09-22 11:25:38

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

On Tue, Sep 22, 2020 at 08:22:49AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 21, 2020 at 08:11:57PM +0100, Matthew Wilcox wrote:
> > This is awkward. I'd like it if we had a vfree() variant which called
> > put_page() instead of __free_pages(). I'd like it even more if we
> > used release_pages() instead of our own loop that called put_page().
>
> Note that we don't need a new vfree variant, we can do this manually if
> we want, take a look at kernel/dma/remap.c. But I thought this code
> intentionally doesn't want to do that to avoid locking in the memory
> for the pages array. Maybe the i915 maintainers can clarify.

Actually, vfree() will work today; I cc'd you on a documentation update
to make it clear that this is permitted.

From my current experience with the i915 shmem code, I think that the
i915 maintainers are experts at graphics, and are unfamiliar with the MM.
There are a number of places where they do things the hard way.

2020-09-22 14:35:44

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

On Tue, Sep 22, 2020 at 09:23:59AM +0100, Tvrtko Ursulin wrote:
> If I understood this sub-thread correctly, iterating and freeing the pages
> via the vmapped ptes, so no need for a
> shmem_read_mapping_page_gfp loop in shmem_unpin_map looks plausible to me.
>
> I did not get the reference to kernel/dma/remap.c though,

What I mean is the code in dma_common_find_pages, which returns the
page array for freeing.

>
> and also not sure
> how to do the error unwind path in shmem_pin_map at which point the
> allocated vm area hasn't been fully populated yet. Hand-roll the loop
> walking vm area struct in there?

Yes. What I originally did (re-created as I didn't save it) would be
something like this:

---
From 5605e77cda246df6dd7ded99ec22cb3f341ef5d5 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Wed, 16 Sep 2020 13:54:04 +0200
Subject: drm/i915: use vmap in shmem_pin_map

shmem_pin_map somewhat awkwardly reimplements vmap using
alloc_vm_area and manual pte setup. The only practical difference
is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't
seem to be required here (and could be added to vmap using a flag
if actually required).

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/i915/gt/shmem_utils.c | 81 +++++++++------------------
1 file changed, 27 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
index 43c7acbdc79dea..7ec6ba4c1065b2 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -49,80 +49,53 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj)
return file;
}

-static size_t shmem_npte(struct file *file)
+static size_t shmem_npages(struct file *file)
{
return file->f_mapping->host->i_size >> PAGE_SHIFT;
}

-static void __shmem_unpin_map(struct file *file, void *ptr, size_t n_pte)
-{
- unsigned long pfn;
-
- vunmap(ptr);
-
- for (pfn = 0; pfn < n_pte; pfn++) {
- struct page *page;
-
- page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
- GFP_KERNEL);
- if (!WARN_ON(IS_ERR(page))) {
- put_page(page);
- put_page(page);
- }
- }
-}
-
void *shmem_pin_map(struct file *file)
{
- const size_t n_pte = shmem_npte(file);
- pte_t *stack[32], **ptes, **mem;
- struct vm_struct *area;
- unsigned long pfn;
-
- mem = stack;
- if (n_pte > ARRAY_SIZE(stack)) {
- mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
- if (!mem)
- return NULL;
- }
+ size_t n_pages = shmem_npages(file), i;
+ struct page **pages;
+ void *vaddr;

- area = alloc_vm_area(n_pte << PAGE_SHIFT, mem);
- if (!area) {
- if (mem != stack)
- kvfree(mem);
+ pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+ if (!pages)
return NULL;
- }
-
- ptes = mem;
- for (pfn = 0; pfn < n_pte; pfn++) {
- struct page *page;

- page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
- GFP_KERNEL);
- if (IS_ERR(page))
+ for (i = 0; i < n_pages; i++) {
+ pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i,
+ GFP_KERNEL);
+ if (IS_ERR(pages[i]))
goto err_page;
-
- **ptes++ = mk_pte(page, PAGE_KERNEL);
}

- if (mem != stack)
- kvfree(mem);
-
+ vaddr = vmap(pages, n_pages, 0, PAGE_KERNEL);
+ if (!vaddr)
+ goto err_page;
mapping_set_unevictable(file->f_mapping);
- return area->addr;
-
+ return vaddr;
err_page:
- if (mem != stack)
- kvfree(mem);
-
- __shmem_unpin_map(file, area->addr, pfn);
+ while (--i >= 0)
+ put_page(pages[i]);
+ kvfree(pages);
return NULL;
}

void shmem_unpin_map(struct file *file, void *ptr)
{
+ struct vm_struct *area = find_vm_area(ptr);
+ size_t i = shmem_npages(file);
+
+ if (WARN_ON_ONCE(!area || !area->pages))
+ return;
+
mapping_clear_unevictable(file->f_mapping);
- __shmem_unpin_map(file, ptr, shmem_npte(file));
+ for (i = 0; i < shmem_npages(file); i++)
+ put_page(area->pages[i]);
+ kvfree(area->pages);
+ vunmap(ptr);
}

static int __shmem_rw(struct file *file, loff_t off,
--
2.28.0

2020-09-22 14:42:10

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

On Tue, Sep 22, 2020 at 12:21:44PM +0100, Matthew Wilcox wrote:
> Actually, vfree() will work today; I cc'd you on a documentation update
> to make it clear that this is permitted.

vfree calls __free_pages, the i915 and a lot of other code calls
put_page. They are mostly the same, but not quite and everytime I
look into that mess I'm more confused than before.

Can someone in the know write sensible documentation on when to use
__free_page(s) vs put_page?

2020-09-22 14:57:26

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

On Tue, Sep 22, 2020 at 04:39:06PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 22, 2020 at 12:21:44PM +0100, Matthew Wilcox wrote:
> > Actually, vfree() will work today; I cc'd you on a documentation update
> > to make it clear that this is permitted.
>
> vfree calls __free_pages, the i915 and a lot of other code calls
> put_page. They are mostly the same, but not quite and everytime I
> look into that mess I'm more confused than before.
>
> Can someone in the know write sensible documentation on when to use
> __free_page(s) vs put_page?

I started on that, and then I found a bug that's been lurking for 12
years, so that delayed the documentation somewhat. The short answer is
that __free_pages() lets you free non-compound high-order pages while
put_page() can only free order-0 and compound pages.

I would really like to overhaul our memory allocation APIs:

current new
__get_free_page(s) alloc_page(s)
free_page(s) free_page(s)
alloc_page(s) get_free_page(s)
__free_pages put_page_order

Then put_page() and put_page_order() are more obviously friends.

But I cannot imagine a world in which Linus says yes to that upheaval.
He's previous expressed dislike of the get_free_page() family of APIs,
and thinks all those callers should just use kmalloc(). Maybe we can
make that transition happen, now that kmalloc() aligns larger allocations.

2020-09-22 15:00:22

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

On Mon, Sep 21, 2020 at 04:44:10PM -0400, [email protected] wrote:
> This will end up incrementing area->ptes pointer. So perhaps something like
>
>
> pte_t **ptes = area->ptes;
>
> if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
> ??????????????????????? PAGE_SIZE * nr_frames, gnttab_apply, &ptes)) {
>
> ?????? ...

Yeah. What do you think of this version? I think it is a little
cleaner and matches what xenbus does. At this point it probably should
be split into a Xen and a alloc_vm_area removal patch, though.

---
From 74d6b797e049f72b5e9f63f14da6321c4209a792 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Wed, 16 Sep 2020 16:09:42 +0200
Subject: x86/xen: open code alloc_vm_area in arch_gnttab_valloc

Open code alloc_vm_area in the last remaining caller.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/x86/xen/grant-table.c | 27 +++++++++++++++------
include/linux/vmalloc.h | 5 +---
mm/nommu.c | 7 ------
mm/vmalloc.c | 48 --------------------------------------
4 files changed, 21 insertions(+), 66 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 4988e19598c8a5..1e681bf62561a0 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -25,6 +25,7 @@
static struct gnttab_vm_area {
struct vm_struct *area;
pte_t **ptes;
+ int idx;
} gnttab_shared_vm_area, gnttab_status_vm_area;

int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
@@ -90,19 +91,31 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
}
}

+static int gnttab_apply(pte_t *pte, unsigned long addr, void *data)
+{
+ struct gnttab_vm_area *area = data;
+
+ area->ptes[area->idx++] = pte;
+ return 0;
+}
+
static int arch_gnttab_valloc(struct gnttab_vm_area *area, unsigned nr_frames)
{
area->ptes = kmalloc_array(nr_frames, sizeof(*area->ptes), GFP_KERNEL);
if (area->ptes == NULL)
return -ENOMEM;
-
- area->area = alloc_vm_area(PAGE_SIZE * nr_frames, area->ptes);
- if (area->area == NULL) {
- kfree(area->ptes);
- return -ENOMEM;
- }
-
+ area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_IOREMAP);
+ if (!area->area)
+ goto out_free_ptes;
+ if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
+ PAGE_SIZE * nr_frames, gnttab_apply, area))
+ goto out_free_vm_area;
return 0;
+out_free_vm_area:
+ free_vm_area(area->area);
+out_free_ptes:
+ kfree(area->ptes);
+ return -ENOMEM;
}

static void arch_gnttab_vfree(struct gnttab_vm_area *area)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 8ecd92a947ee0c..a1a4e2f8163504 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -168,6 +168,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
unsigned long flags,
unsigned long start, unsigned long end,
const void *caller);
+void free_vm_area(struct vm_struct *area);
extern struct vm_struct *remove_vm_area(const void *addr);
extern struct vm_struct *find_vm_area(const void *addr);

@@ -203,10 +204,6 @@ static inline void set_vm_flush_reset_perms(void *addr)
}
#endif

-/* Allocate/destroy a 'vmalloc' VM area. */
-extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
-extern void free_vm_area(struct vm_struct *area);
-
/* for /dev/kmem */
extern long vread(char *buf, char *addr, unsigned long count);
extern long vwrite(char *buf, char *addr, unsigned long count);
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af127..9272f30e4c4726 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -354,13 +354,6 @@ void vm_unmap_aliases(void)
}
EXPORT_SYMBOL_GPL(vm_unmap_aliases);

-struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
-{
- BUG();
- return NULL;
-}
-EXPORT_SYMBOL_GPL(alloc_vm_area);
-
void free_vm_area(struct vm_struct *area)
{
BUG();
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 59f2afcf26c312..9f29147deca580 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3077,54 +3077,6 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
}
EXPORT_SYMBOL(remap_vmalloc_range);

-static int f(pte_t *pte, unsigned long addr, void *data)
-{
- pte_t ***p = data;
-
- if (p) {
- *(*p) = pte;
- (*p)++;
- }
- return 0;
-}
-
-/**
- * alloc_vm_area - allocate a range of kernel address space
- * @size: size of the area
- * @ptes: returns the PTEs for the address space
- *
- * Returns: NULL on failure, vm_struct on success
- *
- * This function reserves a range of kernel address space, and
- * allocates pagetables to map that range. No actual mappings
- * are created.
- *
- * If @ptes is non-NULL, pointers to the PTEs (in init_mm)
- * allocated for the VM area are returned.
- */
-struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
-{
- struct vm_struct *area;
-
- area = get_vm_area_caller(size, VM_IOREMAP,
- __builtin_return_address(0));
- if (area == NULL)
- return NULL;
-
- /*
- * This ensures that page tables are constructed for this region
- * of kernel virtual address space and mapped into init_mm.
- */
- if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
- size, f, ptes ? &ptes : NULL)) {
- free_vm_area(area);
- return NULL;
- }
-
- return area;
-}
-EXPORT_SYMBOL_GPL(alloc_vm_area);
-
void free_vm_area(struct vm_struct *area)
{
struct vm_struct *ret;
--
2.28.0

2020-09-22 15:27:02

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

On 9/22/20 10:58 AM, Christoph Hellwig wrote:
> On Mon, Sep 21, 2020 at 04:44:10PM -0400, [email protected] wrote:
>> This will end up incrementing area->ptes pointer. So perhaps something like
>>
>>
>> pte_t **ptes = area->ptes;
>>
>> if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
>> PAGE_SIZE * nr_frames, gnttab_apply, &ptes)) {
>>
>> ...
> Yeah. What do you think of this version?

Oh yes, this is way better. This now can actually be read without trying to mentally unwind triple pointers. (You probably want to initialize idx to zero before calling apply_to_page_range(), I am not sure it's guaranteed to be zero).

> I think it is a little
> cleaner and matches what xenbus does. At this point it probably should
> be split into a Xen and a alloc_vm_area removal patch, though.

Right.

-boris

2020-09-22 15:28:40

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

On Tue, Sep 22, 2020 at 11:24:20AM -0400, [email protected] wrote:
>
> On 9/22/20 10:58 AM, Christoph Hellwig wrote:
> > On Mon, Sep 21, 2020 at 04:44:10PM -0400, [email protected] wrote:
> >> This will end up incrementing area->ptes pointer. So perhaps something like
> >>
> >>
> >> pte_t **ptes = area->ptes;
> >>
> >> if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
> >> ??????????????????????? PAGE_SIZE * nr_frames, gnttab_apply, &ptes)) {
> >>
> >> ?????? ...
> > Yeah. What do you think of this version?
>
>
> Oh yes, this is way better. This now can actually be read without trying to mentally unwind triple pointers. (You probably want to initialize idx to zero before calling apply_to_page_range(), I am not sure it's guaranteed to be zero).

Both instances are static variables, thus in .bss and initialized.
So unless you insist I don't think we need a manual one.

2020-09-22 15:36:46

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/xen: open code alloc_vm_area in arch_gnttab_valloc

On 9/22/20 11:27 AM, Christoph Hellwig wrote:
> On Tue, Sep 22, 2020 at 11:24:20AM -0400, [email protected] wrote:
>> On 9/22/20 10:58 AM, Christoph Hellwig wrote:
>>> On Mon, Sep 21, 2020 at 04:44:10PM -0400, [email protected] wrote:
>>>> This will end up incrementing area->ptes pointer. So perhaps something like
>>>>
>>>>
>>>> pte_t **ptes = area->ptes;
>>>>
>>>> if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
>>>> PAGE_SIZE * nr_frames, gnttab_apply, &ptes)) {
>>>>
>>>> ...
>>> Yeah. What do you think of this version?
>>
>> Oh yes, this is way better. This now can actually be read without trying to mentally unwind triple pointers. (You probably want to initialize idx to zero before calling apply_to_page_range(), I am not sure it's guaranteed to be zero).
> Both instances are static variables, thus in .bss and initialized.
> So unless you insist I don't think we need a manual one.

Yes, you are right. (I thought perhaps this code could be called more than once but no, it can't).

-boris

2020-09-23 09:56:20

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

On 18/09/2020 17:37, Christoph Hellwig wrote:
> i915_gem_object_map implements fairly low-level vmap functionality in
> a driver. Split it into two helpers, one for remapping kernel memory
> which can use vmap, and one for I/O memory that uses vmap_pfn.
>
> The only practical difference is that alloc_vm_area prefeaults the
> vmalloc area PTEs, which doesn't seem to be required here for the
> kernel memory case (and could be added to vmap using a flag if actually
> required).

Patch looks good to me.

Series did not get a CI run from our side because of a different base so
I don't know if you would like to have a run there? If so you would need
to rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you
could even send a series to [email protected],
suppressing cc, to check it out without sending a copy to the real
mailing list.

Regards,

Tvrtko

> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/gpu/drm/i915/Kconfig | 1 +
> drivers/gpu/drm/i915/gem/i915_gem_pages.c | 101 ++++++++++------------
> 2 files changed, 47 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 9afa5c4a6bf006..1e1cb245fca778 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -25,6 +25,7 @@ config DRM_I915
> select CRC32
> select SND_HDA_I915 if SND_HDA_CORE
> select CEC_CORE if CEC_NOTIFIER
> + select VMAP_PFN
> help
> Choose this option if you have a system that has "Intel Graphics
> Media Accelerator" or "HD Graphics" integrated graphics,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index e8a083743e0927..90029ea83aede9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -234,50 +234,24 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
> return err;
> }
>
> -static inline pte_t iomap_pte(resource_size_t base,
> - dma_addr_t offset,
> - pgprot_t prot)
> -{
> - return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
> -}
> -
> /* The 'mapping' part of i915_gem_object_pin_map() below */
> -static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> +static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
> enum i915_map_type type)
> {
> - unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
> - struct sg_table *sgt = obj->mm.pages;
> - pte_t *stack[32], **mem;
> - struct vm_struct *area;
> + unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
> + struct page *stack[32], **pages = stack, *page;
> + struct sgt_iter iter;
> pgprot_t pgprot;
> -
> - if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
> - return NULL;
> -
> - /* A single page can always be kmapped */
> - if (n_pte == 1 && type == I915_MAP_WB)
> - return kmap(sg_page(sgt->sgl));
> -
> - mem = stack;
> - if (n_pte > ARRAY_SIZE(stack)) {
> - /* Too big for stack -- allocate temporary array instead */
> - mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
> - if (!mem)
> - return NULL;
> - }
> -
> - area = alloc_vm_area(obj->base.size, mem);
> - if (!area) {
> - if (mem != stack)
> - kvfree(mem);
> - return NULL;
> - }
> + void *vaddr;
>
> switch (type) {
> default:
> MISSING_CASE(type);
> fallthrough; /* to use PAGE_KERNEL anyway */
> case I915_MAP_WB:
> + /* A single page can always be kmapped */
> + if (n_pages == 1)
> + return kmap(sg_page(obj->mm.pages->sgl));
> pgprot = PAGE_KERNEL;
> break;
> case I915_MAP_WC:
> @@ -285,30 +259,44 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> break;
> }
>
> - if (i915_gem_object_has_struct_page(obj)) {
> - struct sgt_iter iter;
> - struct page *page;
> - pte_t **ptes = mem;
> -
> - for_each_sgt_page(page, iter, sgt)
> - **ptes++ = mk_pte(page, pgprot);
> - } else {
> - resource_size_t iomap;
> - struct sgt_iter iter;
> - pte_t **ptes = mem;
> - dma_addr_t addr;
> + if (n_pages > ARRAY_SIZE(stack)) {
> + /* Too big for stack -- allocate temporary array instead */
> + pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
> + if (!pages)
> + return NULL;
> + }
>
> - iomap = obj->mm.region->iomap.base;
> - iomap -= obj->mm.region->region.start;
> + for_each_sgt_page(page, iter, obj->mm.pages)
> + pages[i++] = page;
> + vaddr = vmap(pages, n_pages, 0, pgprot);
> + if (pages != stack)
> + kvfree(pages);
> + return vaddr;
> +}
>
> - for_each_sgt_daddr(addr, iter, sgt)
> - **ptes++ = iomap_pte(iomap, addr, pgprot);
> +static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj)
> +{
> + resource_size_t iomap = obj->mm.region->iomap.base -
> + obj->mm.region->region.start;
> + unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
> + unsigned long stack[32], *pfns = stack, i;
> + struct sgt_iter iter;
> + dma_addr_t addr;
> + void *vaddr;
> +
> + if (n_pfn > ARRAY_SIZE(stack)) {
> + /* Too big for stack -- allocate temporary array instead */
> + pfns = kvmalloc_array(n_pfn, sizeof(*pfns), GFP_KERNEL);
> + if (!pfns)
> + return NULL;
> }
>
> - if (mem != stack)
> - kvfree(mem);
> -
> - return area->addr;
> + for_each_sgt_daddr(addr, iter, obj->mm.pages)
> + pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
> + vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL_IO));
> + if (pfns != stack)
> + kvfree(pfns);
> + return vaddr;
> }
>
> /* get, pin, and map the pages of the object into kernel space */
> @@ -360,7 +348,10 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
> }
>
> if (!ptr) {
> - ptr = i915_gem_object_map(obj, type);
> + if (i915_gem_object_has_struct_page(obj))
> + ptr = i915_gem_object_map_page(obj, type);
> + else if (type == I915_MAP_WC)
> + ptr = i915_gem_object_map_pfn(obj);
> if (!ptr) {
> err = -ENOMEM;
> goto err_unpin;
>

2020-09-23 13:42:48

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

On Wed, Sep 23, 2020 at 10:52:33AM +0100, Tvrtko Ursulin wrote:
>
> On 18/09/2020 17:37, Christoph Hellwig wrote:
>> i915_gem_object_map implements fairly low-level vmap functionality in
>> a driver. Split it into two helpers, one for remapping kernel memory
>> which can use vmap, and one for I/O memory that uses vmap_pfn.
>>
>> The only practical difference is that alloc_vm_area prefeaults the
>> vmalloc area PTEs, which doesn't seem to be required here for the
>> kernel memory case (and could be added to vmap using a flag if actually
>> required).
>
> Patch looks good to me.
>
> Series did not get a CI run from our side because of a different base so I
> don't know if you would like to have a run there? If so you would need to
> rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could
> even send a series to [email protected], suppressing
> cc, to check it out without sending a copy to the real mailing list.

It doesn't seem like I can post to any freedesktop list, as I always
get rejection messages. But I'll happily prepare a branch if one
of you an feed it into your CI.

2020-09-23 14:02:41

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

On 23/09/2020 14:41, Christoph Hellwig wrote:
> On Wed, Sep 23, 2020 at 10:52:33AM +0100, Tvrtko Ursulin wrote:
>>
>> On 18/09/2020 17:37, Christoph Hellwig wrote:
>>> i915_gem_object_map implements fairly low-level vmap functionality in
>>> a driver. Split it into two helpers, one for remapping kernel memory
>>> which can use vmap, and one for I/O memory that uses vmap_pfn.
>>>
>>> The only practical difference is that alloc_vm_area prefeaults the
>>> vmalloc area PTEs, which doesn't seem to be required here for the
>>> kernel memory case (and could be added to vmap using a flag if actually
>>> required).
>>
>> Patch looks good to me.
>>
>> Series did not get a CI run from our side because of a different base so I
>> don't know if you would like to have a run there? If so you would need to
>> rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could
>> even send a series to [email protected], suppressing
>> cc, to check it out without sending a copy to the real mailing list.
>
> It doesn't seem like I can post to any freedesktop list, as I always
> get rejection messages. But I'll happily prepare a branch if one
> of you an feed it into your CI.

That's fine, just ping me and I will forward it for testing, thanks!

Regards,

Tvrtko

2020-09-23 14:46:46

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

On Wed, Sep 23, 2020 at 02:58:43PM +0100, Tvrtko Ursulin wrote:
>>> Series did not get a CI run from our side because of a different base so I
>>> don't know if you would like to have a run there? If so you would need to
>>> rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could
>>> even send a series to [email protected], suppressing
>>> cc, to check it out without sending a copy to the real mailing list.
>>
>> It doesn't seem like I can post to any freedesktop list, as I always
>> get rejection messages. But I'll happily prepare a branch if one
>> of you an feed it into your CI.
>
> That's fine, just ping me and I will forward it for testing, thanks!

git://git.infradead.org/users/hch/misc.git i915-vmap-wip

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/i915-vmap-wip

note that this includes a new commit to clean up one of the recent
commits in the code.

2020-09-24 12:25:32

by Tvrtko Ursulin

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

On 23/09/2020 15:44, Christoph Hellwig wrote:
> On Wed, Sep 23, 2020 at 02:58:43PM +0100, Tvrtko Ursulin wrote:
>>>> Series did not get a CI run from our side because of a different base so I
>>>> don't know if you would like to have a run there? If so you would need to
>>>> rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could
>>>> even send a series to [email protected], suppressing
>>>> cc, to check it out without sending a copy to the real mailing list.
>>>
>>> It doesn't seem like I can post to any freedesktop list, as I always
>>> get rejection messages. But I'll happily prepare a branch if one
>>> of you an feed it into your CI.
>>
>> That's fine, just ping me and I will forward it for testing, thanks!
>
> git://git.infradead.org/users/hch/misc.git i915-vmap-wip
>
> Gitweb:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/i915-vmap-wip
>
> note that this includes a new commit to clean up one of the recent
> commits in the code.

CI says series looks good from the i915 perspective (*).

I don't know how will you handle it logistically, but when you have
final version I am happy to re-read and r-b the i915 patches.

Regards,

Tvrtko

*)
https://patchwork.freedesktop.org/series/82051/

2020-09-24 13:25:27

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

On Thu, Sep 24, 2020 at 01:22:35PM +0100, Tvrtko Ursulin wrote:
> CI says series looks good from the i915 perspective (*).
>
> I don't know how will you handle it logistically, but when you have final
> version I am happy to re-read and r-b the i915 patches.

I'll resend the series later today, and will make sure you are on the
Cc list.