v3 -> v4:
* Base on next-20190711
* patch 1: From: Uladzislau Rezki (Sony) <[email protected]> (author)
- https://lkml.org/lkml/2019/7/3/661
* patch 2: Modify the layout of struct vmap_area for readability
v2 -> v3:
* patch 1-4: Abandoned
* patch 5:
- Eliminate "flags" (suggested by Uladzislau Rezki)
- Base on https://lkml.org/lkml/2019/6/6/455
and https://lkml.org/lkml/2019/7/3/661
v1 -> v2:
* patch 3: Rename __find_vmap_area to __search_va_in_busy_tree
instead of __search_va_from_busy_tree.
* patch 5: Add motivation and necessary test data to the commit
message.
* patch 5: Let va->flags use only some low bits of va_start
instead of completely overwriting va_start.
The current implementation of struct vmap_area wasted space.
After applying this commit, sizeof(struct vmap_area) has been
reduced from 11 words to 8 words.
Pengfei Li (1):
mm/vmalloc.c: Modify struct vmap_area to reduce its size
Uladzislau Rezki (Sony) (1):
mm/vmalloc: do not keep unpurged areas in the busy tree
include/linux/vmalloc.h | 40 ++++++++++++++++-----
mm/vmalloc.c | 79 ++++++++++++++++++++++++++++-------------
2 files changed, 86 insertions(+), 33 deletions(-)
--
2.21.0
From: "Uladzislau Rezki (Sony)" <[email protected]>
The busy tree can be quite big, even though the area is freed
or unmapped it still stays there until "purge" logic removes
it.
1) Optimize and reduce the size of "busy" tree by removing a
node from it right away as soon as user triggers free paths.
It is possible to do so, because the allocation is done using
another augmented tree.
The vmalloc test driver shows the difference, for example the
"fix_size_alloc_test" is ~11% better comparing with default
configuration:
sudo ./test_vmalloc.sh performance
<default>
Summary: fix_size_alloc_test loops: 1000000 avg: 993985 usec
Summary: full_fit_alloc_test loops: 1000000 avg: 973554 usec
Summary: long_busy_list_alloc_test loops: 1000000 avg: 12617652 usec
<default>
<this patch>
Summary: fix_size_alloc_test loops: 1000000 avg: 882263 usec
Summary: full_fit_alloc_test loops: 1000000 avg: 973407 usec
Summary: long_busy_list_alloc_test loops: 1000000 avg: 12593929 usec
<this patch>
2) Since the busy tree now contains allocated areas only and does
not interfere with lazily free nodes, introduce the new function
show_purge_info() that dumps "unpurged" areas that is propagated
through "/proc/vmallocinfo".
3) Eliminate VM_LAZY_FREE flag.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
---
mm/vmalloc.c | 51 ++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 42 insertions(+), 9 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4fa8d84599b0..9eb700a2087b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
#define DEBUG_AUGMENT_PROPAGATE_CHECK 0
#define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0
-#define VM_LAZY_FREE 0x02
#define VM_VM_AREA 0x04
static DEFINE_SPINLOCK(vmap_area_lock);
@@ -541,7 +540,7 @@ link_va(struct vmap_area *va, struct rb_root *root,
static __always_inline void
unlink_va(struct vmap_area *va, struct rb_root *root)
{
- if (WARN_ON(RB_EMPTY_NODE(&va->rb_node)))
+ if (RB_EMPTY_NODE(&va->rb_node))
return;
if (root == &free_vmap_area_root)
@@ -1167,7 +1166,11 @@ EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier);
static void __free_vmap_area(struct vmap_area *va)
{
/*
- * Remove from the busy tree/list.
+ * In most cases VA is not attached to the tree, but there
+ * are a few exceptions:
+ *
+ * - is linked only in case of pcpu, recovery part;
+ * - if radix_tree_preload gets failed, see new_vmap_block().
*/
unlink_va(va, &vmap_area_root);
@@ -1318,6 +1321,10 @@ static void free_vmap_area_noflush(struct vmap_area *va)
{
unsigned long nr_lazy;
+ spin_lock(&vmap_area_lock);
+ unlink_va(va, &vmap_area_root);
+ spin_unlock(&vmap_area_lock);
+
nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >>
PAGE_SHIFT, &vmap_lazy_nr);
@@ -2137,14 +2144,13 @@ struct vm_struct *remove_vm_area(const void *addr)
might_sleep();
- va = find_vmap_area((unsigned long)addr);
+ spin_lock(&vmap_area_lock);
+ va = __find_vmap_area((unsigned long)addr);
if (va && va->flags & VM_VM_AREA) {
struct vm_struct *vm = va->vm;
- spin_lock(&vmap_area_lock);
va->vm = NULL;
va->flags &= ~VM_VM_AREA;
- va->flags |= VM_LAZY_FREE;
spin_unlock(&vmap_area_lock);
kasan_free_shadow(vm);
@@ -2152,6 +2158,8 @@ struct vm_struct *remove_vm_area(const void *addr)
return vm;
}
+
+ spin_unlock(&vmap_area_lock);
return NULL;
}
@@ -3431,6 +3439,22 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v)
}
}
+static void show_purge_info(struct seq_file *m)
+{
+ struct llist_node *head;
+ struct vmap_area *va;
+
+ head = READ_ONCE(vmap_purge_list.first);
+ if (head == NULL)
+ return;
+
+ llist_for_each_entry(va, head, purge_list) {
+ seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n",
+ (void *)va->va_start, (void *)va->va_end,
+ va->va_end - va->va_start);
+ }
+}
+
static int s_show(struct seq_file *m, void *p)
{
struct vmap_area *va;
@@ -3443,10 +3467,9 @@ static int s_show(struct seq_file *m, void *p)
* behalf of vmap area is being tear down or vm_map_ram allocation.
*/
if (!(va->flags & VM_VM_AREA)) {
- seq_printf(m, "0x%pK-0x%pK %7ld %s\n",
+ seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
(void *)va->va_start, (void *)va->va_end,
- va->va_end - va->va_start,
- va->flags & VM_LAZY_FREE ? "unpurged vm_area" : "vm_map_ram");
+ va->va_end - va->va_start);
return 0;
}
@@ -3482,6 +3505,16 @@ static int s_show(struct seq_file *m, void *p)
show_numa_info(m, v);
seq_putc(m, '\n');
+
+ /*
+ * As a final step, dump "unpurged" areas. Note,
+ * that entire "/proc/vmallocinfo" output will not
+ * be address sorted, because the purge list is not
+ * sorted.
+ */
+ if (list_is_last(&va->list, &vmap_area_list))
+ show_purge_info(m);
+
return 0;
}
--
2.21.0
Objective
---------
The current implementation of struct vmap_area wasted space.
After applying this commit, sizeof(struct vmap_area) has been
reduced from 11 words to 8 words.
Description
-----------
1) Pack "vm" and "subtree_max_size"
This is no problem because
A) "vm" is only used when vmap_area is in "busy" tree
B) "subtree_max_size" is only used when vmap_area is in
"free" tree
2) Pack "purge_list"
The variable "purge_list" is only used when vmap_area is in
"lazy purge" list. So it can be packed with other variables,
which are only used in rbtree and list sorted by address.
3) Eliminate "flags".
Since only one flag VM_VM_AREA is being used, and the same
thing can be done by judging whether "vm" is NULL, then the
"flags" can be eliminated.
Signed-off-by: Pengfei Li <[email protected]>
Suggested-by: Uladzislau Rezki (Sony) <[email protected]>
---
include/linux/vmalloc.h | 40 +++++++++++++++++++++++++++++++---------
mm/vmalloc.c | 28 +++++++++++++---------------
2 files changed, 44 insertions(+), 24 deletions(-)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9b21d0047710..6fb377ca9e7a 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -51,15 +51,37 @@ struct vmap_area {
unsigned long va_start;
unsigned long va_end;
- /*
- * Largest available free size in subtree.
- */
- unsigned long subtree_max_size;
- unsigned long flags;
- struct rb_node rb_node; /* address sorted rbtree */
- struct list_head list; /* address sorted list */
- struct llist_node purge_list; /* "lazy purge" list */
- struct vm_struct *vm;
+ union {
+ /* In rbtree and list sorted by address */
+ struct {
+ union {
+ /*
+ * In "busy" rbtree and list.
+ * rbtree root: vmap_area_root
+ * list head: vmap_area_list
+ */
+ struct vm_struct *vm;
+
+ /*
+ * In "free" rbtree and list.
+ * rbtree root: free_vmap_area_root
+ * list head: free_vmap_area_list
+ */
+ unsigned long subtree_max_size;
+ };
+
+ struct rb_node rb_node;
+ struct list_head list;
+ };
+
+ /*
+ * In "lazy purge" list.
+ * llist head: vmap_purge_list
+ */
+ struct {
+ struct llist_node purge_list;
+ };
+ };
};
/*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9eb700a2087b..1245d3285a32 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
#define DEBUG_AUGMENT_PROPAGATE_CHECK 0
#define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0
-#define VM_VM_AREA 0x04
static DEFINE_SPINLOCK(vmap_area_lock);
/* Export for kexec only */
@@ -1115,7 +1114,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
va->va_start = addr;
va->va_end = addr + size;
- va->flags = 0;
+ va->vm = NULL;
insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
spin_unlock(&vmap_area_lock);
@@ -1279,7 +1278,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
llist_for_each_entry_safe(va, n_va, valist, purge_list) {
unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
- __free_vmap_area(va);
+ merge_or_add_vmap_area(va,
+ &free_vmap_area_root, &free_vmap_area_list);
+
atomic_long_sub(nr, &vmap_lazy_nr);
if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
@@ -1919,7 +1920,6 @@ void __init vmalloc_init(void)
if (WARN_ON_ONCE(!va))
continue;
- va->flags = VM_VM_AREA;
va->va_start = (unsigned long)tmp->addr;
va->va_end = va->va_start + tmp->size;
va->vm = tmp;
@@ -2017,7 +2017,6 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
vm->size = va->va_end - va->va_start;
vm->caller = caller;
va->vm = vm;
- va->flags |= VM_VM_AREA;
spin_unlock(&vmap_area_lock);
}
@@ -2122,10 +2121,10 @@ struct vm_struct *find_vm_area(const void *addr)
struct vmap_area *va;
va = find_vmap_area((unsigned long)addr);
- if (va && va->flags & VM_VM_AREA)
- return va->vm;
+ if (!va)
+ return NULL;
- return NULL;
+ return va->vm;
}
/**
@@ -2146,11 +2145,10 @@ struct vm_struct *remove_vm_area(const void *addr)
spin_lock(&vmap_area_lock);
va = __find_vmap_area((unsigned long)addr);
- if (va && va->flags & VM_VM_AREA) {
+ if (va && va->vm) {
struct vm_struct *vm = va->vm;
va->vm = NULL;
- va->flags &= ~VM_VM_AREA;
spin_unlock(&vmap_area_lock);
kasan_free_shadow(vm);
@@ -2853,7 +2851,7 @@ long vread(char *buf, char *addr, unsigned long count)
if (!count)
break;
- if (!(va->flags & VM_VM_AREA))
+ if (!va->vm)
continue;
vm = va->vm;
@@ -2933,7 +2931,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
if (!count)
break;
- if (!(va->flags & VM_VM_AREA))
+ if (!va->vm)
continue;
vm = va->vm;
@@ -3463,10 +3461,10 @@ static int s_show(struct seq_file *m, void *p)
va = list_entry(p, struct vmap_area, list);
/*
- * s_show can encounter race with remove_vm_area, !VM_VM_AREA on
- * behalf of vmap area is being tear down or vm_map_ram allocation.
+ * If !va->vm then this vmap_area object is allocated
+ * by vm_map_ram.
*/
- if (!(va->flags & VM_VM_AREA)) {
+ if (!va->vm) {
seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
(void *)va->va_start, (void *)va->va_end,
va->va_end - va->va_start);
--
2.21.0
On Fri, Jul 12, 2019 at 08:02:13PM +0800, Pengfei Li wrote:
> +++ b/include/linux/vmalloc.h
> @@ -51,15 +51,37 @@ struct vmap_area {
> unsigned long va_start;
> unsigned long va_end;
>
> - /*
> - * Largest available free size in subtree.
> - */
> - unsigned long subtree_max_size;
> - unsigned long flags;
> - struct rb_node rb_node; /* address sorted rbtree */
> - struct list_head list; /* address sorted list */
> - struct llist_node purge_list; /* "lazy purge" list */
> - struct vm_struct *vm;
> + union {
> + /* In rbtree and list sorted by address */
> + struct {
> + union {
> + /*
> + * In "busy" rbtree and list.
> + * rbtree root: vmap_area_root
> + * list head: vmap_area_list
> + */
> + struct vm_struct *vm;
> +
> + /*
> + * In "free" rbtree and list.
> + * rbtree root: free_vmap_area_root
> + * list head: free_vmap_area_list
> + */
> + unsigned long subtree_max_size;
> + };
> +
> + struct rb_node rb_node;
> + struct list_head list;
> + };
> +
> + /*
> + * In "lazy purge" list.
> + * llist head: vmap_purge_list
> + */
> + struct {
> + struct llist_node purge_list;
> + };
I don't think you need struct union struct union. Because llist_node
is just a pointer, you can get the same savings with just:
union {
struct llist_node purge_list;
struct vm_struct *vm;
unsigned long subtree_max_size;
};
On Fri, Jul 12, 2019 at 9:49 PM Matthew Wilcox <[email protected]> wrote:
>
> On Fri, Jul 12, 2019 at 08:02:13PM +0800, Pengfei Li wrote:
>
> I don't think you need struct union struct union. Because llist_node
> is just a pointer, you can get the same savings with just:
>
> union {
> struct llist_node purge_list;
> struct vm_struct *vm;
> unsigned long subtree_max_size;
> };
>
Thanks for your comments.
As you said, I did this in v3.
https://patchwork.kernel.org/patch/11031507/
The reason why I use struct union struct in v4 is that I want to
express "in the tree" and "in the purge list" are two completely
isolated cases.
struct vmap_area {
union {
struct { /* Case A: In the tree */
...
};
struct { /* Case B: In the purge list */
...
};
};
};
The "rb_node" and "list" should also not be used when va is in
the purge list
what do you think of this idea?
On Fri, Jul 12, 2019 at 11:09:00PM +0800, Pengfei Li wrote:
> On Fri, Jul 12, 2019 at 9:49 PM Matthew Wilcox <[email protected]> wrote:
> >
> > On Fri, Jul 12, 2019 at 08:02:13PM +0800, Pengfei Li wrote:
> >
> > I don't think you need struct union struct union. Because llist_node
> > is just a pointer, you can get the same savings with just:
> >
> > union {
> > struct llist_node purge_list;
> > struct vm_struct *vm;
> > unsigned long subtree_max_size;
> > };
> >
>
> Thanks for your comments.
>
> As you said, I did this in v3.
> https://patchwork.kernel.org/patch/11031507/
>
> The reason why I use struct union struct in v4 is that I want to
> express "in the tree" and "in the purge list" are two completely
> isolated cases.
>
I think that is odd. Your v3 was fine to me. All that mess with
struct union struct makes it weird, so having just comments there
is enough, imho.
<snip>
- __free_vmap_area(va);
+ merge_or_add_vmap_area(va,
+ &free_vmap_area_root, &free_vmap_area_list);
+
<snip>
Should not be done in this patch. I can re-spin "mm/vmalloc: do not keep unpurged areas in the busy tree"
and add it there. So, as a result we will not modify unlink_va() function.
Thus, this patch will reduce the size only, and will not touch other parts.
--
Vlad Rezki
Hi, Vlad
Thanks for the comments form you and Matthew, now I am sure
v3 is enough.
I will follow the next version of your "mm/vmalloc: do not
keep unpurged areas in the busy tree".
Thanks again for your patience with me!
--
Pengfei