2021-03-09 13:59:55

by Topi Miettinen

[permalink] [raw]
Subject: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

Memory mappings inside kernel allocated with vmalloc() are in
predictable order and packed tightly toward the low addresses, except
for per-cpu areas which start from top of the vmalloc area. With
new kernel boot parameter 'randomize_vmalloc=1', the entire area is
used randomly to make the allocations less predictable and harder to
guess for attackers. Also module and BPF code locations get randomized
(within their dedicated and rather small area though) and if
CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.

On 32 bit systems this may cause problems due to increased VM
fragmentation if the address space gets crowded.

On all systems, it will reduce performance and increase memory and
cache usage due to less efficient use of page tables and inability to
merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
page tables, in the worst case, additional page table entries of up to
4 pages are created for each mapping, so with small mappings there's
considerable penalty.

Without randomize_vmalloc=1:
$ grep -v kernel_clone /proc/vmallocinfo
0xffffc90000000000-0xffffc90000009000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
0xffffc90000009000-0xffffc9000000b000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
0xffffc9000000c000-0xffffc9000000f000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
0xffffc9000000f000-0xffffc90000011000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
0xffffc90000011000-0xffffc90000013000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000013000-0xffffc90000015000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000015000-0xffffc90000017000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000021000-0xffffc90000023000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffc90000023000-0xffffc90000025000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
0xffffc90000025000-0xffffc90000027000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
0xffffc90000031000-0xffffc90000036000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
0xffffc90000043000-0xffffc90000047000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
0xffffc90000211000-0xffffc90000232000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
0xffffc90000232000-0xffffc90000253000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
0xffffc900005a9000-0xffffc900005ba000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
0xffffc900005ba000-0xffffc900005cc000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc

With randomize_vmalloc=1, the allocations are randomized:
$ grep -v kernel_clone /proc/vmallocinfo
0xffffc9759d443000-0xffffc9759d445000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
0xffffccf1e9f66000-0xffffccf1e9f68000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffcd2fc02a4000-0xffffcd2fc02a6000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffcdaefb898000-0xffffcdaefb89b000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
0xffffcef8074c3000-0xffffcef8074cc000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
0xffffcf725ca2e000-0xffffcf725ca4f000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
0xffffd0efb25e1000-0xffffd0efb25f2000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
0xffffd27054678000-0xffffd2705467c000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
0xffffd2adf716e000-0xffffd2adf7180000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
0xffffd4ba5fb6b000-0xffffd4ba5fb6d000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
0xffffded126192000-0xffffded126194000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
0xffffe01a4dbcd000-0xffffe01a4dbcf000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe4b649952000-0xffffe4b649954000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
0xffffe71ed592a000-0xffffe71ed592c000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
0xffffe7dc5824f000-0xffffe7dc58270000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
0xffffe8f4f9800000-0xffffe8f4f9a00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc
0xffffe8f4f9a19000-0xffffe8f4f9a1e000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc

With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
vmalloc area and therefore they also get randomized (only one example
line from /proc/vmallocinfo shown for brevity):

unrandomized:
0xffffc90000018000-0xffffc90000021000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc

randomized:
0xffffcb57611a8000-0xffffcb57611b1000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc

CC: Andrew Morton <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Jann Horn <[email protected]>
CC: Kees Cook <[email protected]>
CC: Linux API <[email protected]>
CC: Matthew Wilcox <[email protected]>
CC: Mike Rapoport <[email protected]>
CC: Vlad Rezki <[email protected]>
Signed-off-by: Topi Miettinen <[email protected]>
---
v2: retry allocation from other end of vmalloc space in case of
failure (Matthew Wilcox), improve commit message and documentation
v3: randomize also percpu allocations (pcpu_get_vm_areas())
v4: use static branches (Kees Cook) and make the parameter boolean.
---
.../admin-guide/kernel-parameters.txt | 24 ++++++++++
mm/vmalloc.c | 44 +++++++++++++++++--
2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..726aec542079 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4024,6 +4024,30 @@

ramdisk_start= [RAM] RAM disk image start address

+ randomize_vmalloc= [KNL] Boolean option to randomize vmalloc()
+ allocations. When enabled, the entire
+ vmalloc() area is used randomly to make the
+ allocations less predictable and harder to
+ guess for attackers. Also module and BPF code
+ locations get randomized (within their
+ dedicated and rather small area though) and if
+ CONFIG_VMAP_STACK is enabled, also kernel
+ thread stack locations.
+
+ On 32 bit systems this may cause problems due
+ to increased VM fragmentation if the address
+ space gets crowded.
+
+ On all systems, it will reduce performance and
+ increase memory and cache usage due to less
+ efficient use of page tables and inability to
+ merge adjacent VMAs with compatible
+ attributes. On x86_64 with 5 level page
+ tables, in the worst case, additional page
+ table entries of up to 4 pages are created for
+ each mapping, so with small mappings there's
+ considerable penalty.
+
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
CPU's random number generator (if available) to
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e6f352bf0498..b5ecf27dc98e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -34,6 +34,7 @@
#include <linux/bitops.h>
#include <linux/rbtree_augmented.h>
#include <linux/overflow.h>
+#include <linux/random.h>

#include <linux/uaccess.h>
#include <asm/tlbflush.h>
@@ -1089,6 +1090,25 @@ adjust_va_to_fit_type(struct vmap_area *va,
return 0;
}

+static DEFINE_STATIC_KEY_FALSE_RO(randomize_vmalloc);
+
+static int __init set_randomize_vmalloc(char *str)
+{
+ int ret;
+ bool bool_result;
+
+ ret = kstrtobool(str, &bool_result);
+ if (ret)
+ return ret;
+
+ if (bool_result)
+ static_branch_enable(&randomize_vmalloc);
+ else
+ static_branch_disable(&randomize_vmalloc);
+ return 1;
+}
+__setup("randomize_vmalloc=", set_randomize_vmalloc);
+
/*
* Returns a start address of the newly allocated area, if success.
* Otherwise a vend is returned that indicates failure.
@@ -1162,7 +1182,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
int node, gfp_t gfp_mask)
{
struct vmap_area *va, *pva;
- unsigned long addr;
+ unsigned long addr, voffset;
int purged = 0;
int ret;

@@ -1217,11 +1237,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
kmem_cache_free(vmap_area_cachep, pva);

+ /* Randomize allocation */
+ if (static_branch_unlikely(&randomize_vmalloc)) {
+ voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
+ voffset = PAGE_ALIGN(voffset);
+ if (voffset + size > vend - vstart)
+ voffset = vend - vstart - size;
+ } else
+ voffset = 0;
+
/*
* If an allocation fails, the "vend" address is
* returned. Therefore trigger the overflow path.
*/
- addr = __alloc_vmap_area(size, align, vstart, vend);
+ addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
+
+ if (unlikely(addr == vend) && voffset)
+ /* Retry randomization from other end */
+ addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
spin_unlock(&free_vmap_area_lock);

if (unlikely(addr == vend))
@@ -3258,7 +3291,12 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
start = offsets[area];
end = start + sizes[area];

- va = pvm_find_va_enclose_addr(vmalloc_end);
+ if (static_branch_unlikely(&randomize_vmalloc))
+ va = pvm_find_va_enclose_addr(vmalloc_start +
+ (get_random_long() &
+ (roundup_pow_of_two(vmalloc_end - vmalloc_start) - 1)));
+ else
+ va = pvm_find_va_enclose_addr(vmalloc_end);
base = pvm_determine_end_from_reverse(&va, align) - end;

while (true) {
--
2.30.1


2021-03-14 17:26:19

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

> Memory mappings inside kernel allocated with vmalloc() are in
> predictable order and packed tightly toward the low addresses, except
> for per-cpu areas which start from top of the vmalloc area. With
> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> used randomly to make the allocations less predictable and harder to
> guess for attackers. Also module and BPF code locations get randomized
> (within their dedicated and rather small area though) and if
> CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
>
> On 32 bit systems this may cause problems due to increased VM
> fragmentation if the address space gets crowded.
>
> On all systems, it will reduce performance and increase memory and
> cache usage due to less efficient use of page tables and inability to
> merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
> page tables, in the worst case, additional page table entries of up to
> 4 pages are created for each mapping, so with small mappings there's
> considerable penalty.
>
> Without randomize_vmalloc=1:
> $ grep -v kernel_clone /proc/vmallocinfo
> 0xffffc90000000000-0xffffc90000009000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
> 0xffffc90000009000-0xffffc9000000b000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
> 0xffffc9000000c000-0xffffc9000000f000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
> 0xffffc9000000f000-0xffffc90000011000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
> 0xffffc90000011000-0xffffc90000013000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000013000-0xffffc90000015000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000015000-0xffffc90000017000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000021000-0xffffc90000023000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000023000-0xffffc90000025000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
> 0xffffc90000025000-0xffffc90000027000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
> 0xffffc90000031000-0xffffc90000036000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
> 0xffffc90000043000-0xffffc90000047000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
> 0xffffc90000211000-0xffffc90000232000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
> 0xffffc90000232000-0xffffc90000253000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
> 0xffffc900005a9000-0xffffc900005ba000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
> 0xffffc900005ba000-0xffffc900005cc000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc
>
> With randomize_vmalloc=1, the allocations are randomized:
> $ grep -v kernel_clone /proc/vmallocinfo
> 0xffffc9759d443000-0xffffc9759d445000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
> 0xffffccf1e9f66000-0xffffccf1e9f68000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffcd2fc02a4000-0xffffcd2fc02a6000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffcdaefb898000-0xffffcdaefb89b000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
> 0xffffcef8074c3000-0xffffcef8074cc000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
> 0xffffcf725ca2e000-0xffffcf725ca4f000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
> 0xffffd0efb25e1000-0xffffd0efb25f2000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
> 0xffffd27054678000-0xffffd2705467c000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
> 0xffffd2adf716e000-0xffffd2adf7180000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
> 0xffffd4ba5fb6b000-0xffffd4ba5fb6d000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
> 0xffffded126192000-0xffffded126194000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
> 0xffffe01a4dbcd000-0xffffe01a4dbcf000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe4b649952000-0xffffe4b649954000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
> 0xffffe71ed592a000-0xffffe71ed592c000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe7dc5824f000-0xffffe7dc58270000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
> 0xffffe8f4f9800000-0xffffe8f4f9a00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc
> 0xffffe8f4f9a19000-0xffffe8f4f9a1e000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
>
> With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
> vmalloc area and therefore they also get randomized (only one example
> line from /proc/vmallocinfo shown for brevity):
>
> unrandomized:
> 0xffffc90000018000-0xffffc90000021000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
>
> randomized:
> 0xffffcb57611a8000-0xffffcb57611b1000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
>
> CC: Andrew Morton <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Jann Horn <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Linux API <[email protected]>
> CC: Matthew Wilcox <[email protected]>
> CC: Mike Rapoport <[email protected]>
> CC: Vlad Rezki <[email protected]>
> Signed-off-by: Topi Miettinen <[email protected]>
> ---
> v2: retry allocation from other end of vmalloc space in case of
> failure (Matthew Wilcox), improve commit message and documentation
> v3: randomize also percpu allocations (pcpu_get_vm_areas())
> v4: use static branches (Kees Cook) and make the parameter boolean.
> ---
> .../admin-guide/kernel-parameters.txt | 24 ++++++++++
> mm/vmalloc.c | 44 +++++++++++++++++--
> 2 files changed, 65 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a10b545c2070..726aec542079 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4024,6 +4024,30 @@
>
> ramdisk_start= [RAM] RAM disk image start address
>
> + randomize_vmalloc= [KNL] Boolean option to randomize vmalloc()
> + allocations. When enabled, the entire
> + vmalloc() area is used randomly to make the
> + allocations less predictable and harder to
> + guess for attackers. Also module and BPF code
> + locations get randomized (within their
> + dedicated and rather small area though) and if
> + CONFIG_VMAP_STACK is enabled, also kernel
> + thread stack locations.
> +
> + On 32 bit systems this may cause problems due
> + to increased VM fragmentation if the address
> + space gets crowded.
>
What kind of problems? Could you please more cpecific? I guess fail
ratio will be increased.

> +
> + On all systems, it will reduce performance and
> + increase memory and cache usage due to less
> + efficient use of page tables and inability to
> + merge adjacent VMAs with compatible
> + attributes. On x86_64 with 5 level page
> + tables, in the worst case, additional page
> + table entries of up to 4 pages are created for
> + each mapping, so with small mappings there's
> + considerable penalty.
Could you please put test results to the commit message? You can run
"test_vmalloc.sh performance" on you system. It will give us some
figures to understand the performance difference.

> +
> random.trust_cpu={on,off}
> [KNL] Enable or disable trusting the use of the
> CPU's random number generator (if available) to
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e6f352bf0498..b5ecf27dc98e 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -34,6 +34,7 @@
> #include <linux/bitops.h>
> #include <linux/rbtree_augmented.h>
> #include <linux/overflow.h>
> +#include <linux/random.h>
>
> #include <linux/uaccess.h>
> #include <asm/tlbflush.h>
> @@ -1089,6 +1090,25 @@ adjust_va_to_fit_type(struct vmap_area *va,
> return 0;
> }
>
> +static DEFINE_STATIC_KEY_FALSE_RO(randomize_vmalloc);
> +
> +static int __init set_randomize_vmalloc(char *str)
> +{
> + int ret;
> + bool bool_result;
> +
> + ret = kstrtobool(str, &bool_result);
> + if (ret)
> + return ret;
> +
> + if (bool_result)
> + static_branch_enable(&randomize_vmalloc);
> + else
> + static_branch_disable(&randomize_vmalloc);
> + return 1;
> +}
> +__setup("randomize_vmalloc=", set_randomize_vmalloc);
> +
> /*
> * Returns a start address of the newly allocated area, if success.
> * Otherwise a vend is returned that indicates failure.
> @@ -1162,7 +1182,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> int node, gfp_t gfp_mask)
> {
> struct vmap_area *va, *pva;
> - unsigned long addr;
> + unsigned long addr, voffset;
> int purged = 0;
> int ret;
>
> @@ -1217,11 +1237,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
> kmem_cache_free(vmap_area_cachep, pva);
>
> + /* Randomize allocation */
> + if (static_branch_unlikely(&randomize_vmalloc)) {
> + voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
> + voffset = PAGE_ALIGN(voffset);
> + if (voffset + size > vend - vstart)
> + voffset = vend - vstart - size;
> + } else
> + voffset = 0;
> +
Could you please wrap that change into a separate function? For example
randomize_voffset_with_range(start, end).

> /*
> * If an allocation fails, the "vend" address is
> * returned. Therefore trigger the overflow path.
> */
> - addr = __alloc_vmap_area(size, align, vstart, vend);
> + addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
> +
> + if (unlikely(addr == vend) && voffset)
> + /* Retry randomization from other end */
> + addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
> spin_unlock(&free_vmap_area_lock);
>
> if (unlikely(addr == vend))
> @@ -3258,7 +3291,12 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
> start = offsets[area];
> end = start + sizes[area];
>
> - va = pvm_find_va_enclose_addr(vmalloc_end);
> + if (static_branch_unlikely(&randomize_vmalloc))
> + va = pvm_find_va_enclose_addr(vmalloc_start +
> + (get_random_long() &
> + (roundup_pow_of_two(vmalloc_end - vmalloc_start) - 1)));
> + else
> + va = pvm_find_va_enclose_addr(vmalloc_end);
> base = pvm_determine_end_from_reverse(&va, align) - end;
As for per-cpu embedded alloator. Even though currently it is part of
the vmalloc space, it is not a vmalloc() allocation. Please do not change
its code. It does alloations by "chunks" where an internal structure
represent special memory layout that is used for actual allocations.


Also, using vmaloc test driver i can trigger a kernel BUG:

<snip>
[ 24.627577] kernel BUG at mm/vmalloc.c:1272!
[ 24.628645] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 24.628684] CPU: 30 PID: 929 Comm: vmalloc_test/0 Tainted: G E 5.11.0-next-20210225-next #484
[ 24.628724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 24.628757] RIP: 0010:alloc_vmap_area.isra.53+0x593/0xf10
[ 24.628784] Code: 41 5d 41 5e 41 5f c3 0f 0b 0f 0b 48 c7 44 24 10 f0 ff ff ff eb c9 48 8d 5a f0 e9 9c fc ff ff 48 c7 44 24 10 f4 ff ff ff eb b5 <0f> 0b 4c 8d 4b 10 48 39 d0 74 12 48 8b 44 24 18 31 ff 48 89 03 48
[ 24.628853] RSP: 0018:ffffc4296cf67d38 EFLAGS: 00010206
[ 24.628876] RAX: ffffd6db9e61a000 RBX: ffff8ae9c9309440 RCX: 0000000000000001
[ 24.628905] RDX: 0000000080000001 RSI: ffff8ae9c0046be8 RDI: 00000000ffffffff
[ 24.628933] RBP: 0000000000002000 R08: ffff8ae9c13699e8 R09: ffffb98000000000
[ 24.628961] R10: ffffd6db9e61a000 R11: 000000003aa1c801 R12: ffff8ae9c9f0d280
[ 24.628989] R13: 0000008000001fff R14: ffffff8000000000 R15: 0000007fffffffff
[ 24.629019] FS: 0000000000000000(0000) GS:ffff8af8bf580000(0000) knlGS:0000000000000000
[ 24.629051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 24.629074] CR2: 000055916370aa80 CR3: 00000006bf40a000 CR4: 00000000000006e0
[ 24.629103] Call Trace:
[ 24.629128] ? map_kernel_range_noflush+0x27a/0x360
[ 24.629150] ? kmem_cache_alloc_trace+0x340/0x460
[ 24.629172] __get_vm_area_node.isra.54+0xa7/0x150
[ 24.629195] ? fix_size_alloc_test+0x50/0x50 [test_vmalloc]
[ 24.629221] __vmalloc_node_range+0x64/0x230
[ 24.629241] ? test_func+0xdb/0x1f0 [test_vmalloc]
[ 24.629263] ? fix_size_alloc_test+0x50/0x50 [test_vmalloc]
[ 24.629288] __vmalloc_node+0x3b/0x40
[ 24.629305] ? test_func+0xdb/0x1f0 [test_vmalloc]
[ 24.629326] align_shift_alloc_test+0x39/0x50 [test_vmalloc]
[ 24.629350] test_func+0xdb/0x1f0 [test_vmalloc]
[ 24.629372] ? fix_align_alloc_test+0x50/0x50 [test_vmalloc]
[ 24.629396] kthread+0x13d/0x160
[ 24.629413] ? kthread_park+0x80/0x80
[ 24.629431] ret_from_fork+0x22/0x30
<snip>

--
Vlad Rezki

2021-03-15 09:06:58

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On 14.3.2021 19.23, Uladzislau Rezki wrote:
>> Memory mappings inside kernel allocated with vmalloc() are in
>> predictable order and packed tightly toward the low addresses, except
>> for per-cpu areas which start from top of the vmalloc area. With
>> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
>> used randomly to make the allocations less predictable and harder to
>> guess for attackers. Also module and BPF code locations get randomized
>> (within their dedicated and rather small area though) and if
>> CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
>>
>> On 32 bit systems this may cause problems due to increased VM
>> fragmentation if the address space gets crowded.
>>
>> On all systems, it will reduce performance and increase memory and
>> cache usage due to less efficient use of page tables and inability to
>> merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
>> page tables, in the worst case, additional page table entries of up to
>> 4 pages are created for each mapping, so with small mappings there's
>> considerable penalty.
>>
>> Without randomize_vmalloc=1:
>> $ grep -v kernel_clone /proc/vmallocinfo
>> 0xffffc90000000000-0xffffc90000009000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
>> 0xffffc90000009000-0xffffc9000000b000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
>> 0xffffc9000000c000-0xffffc9000000f000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
>> 0xffffc9000000f000-0xffffc90000011000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
>> 0xffffc90000011000-0xffffc90000013000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffc90000013000-0xffffc90000015000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffc90000015000-0xffffc90000017000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffc90000021000-0xffffc90000023000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffc90000023000-0xffffc90000025000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
>> 0xffffc90000025000-0xffffc90000027000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
>> 0xffffc90000031000-0xffffc90000036000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
>> 0xffffc90000043000-0xffffc90000047000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
>> 0xffffc90000211000-0xffffc90000232000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
>> 0xffffc90000232000-0xffffc90000253000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
>> 0xffffc900005a9000-0xffffc900005ba000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
>> 0xffffc900005ba000-0xffffc900005cc000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
>> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc
>>
>> With randomize_vmalloc=1, the allocations are randomized:
>> $ grep -v kernel_clone /proc/vmallocinfo
>> 0xffffc9759d443000-0xffffc9759d445000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
>> 0xffffccf1e9f66000-0xffffccf1e9f68000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffcd2fc02a4000-0xffffcd2fc02a6000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffcdaefb898000-0xffffcdaefb89b000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
>> 0xffffcef8074c3000-0xffffcef8074cc000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
>> 0xffffcf725ca2e000-0xffffcf725ca4f000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
>> 0xffffd0efb25e1000-0xffffd0efb25f2000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
>> 0xffffd27054678000-0xffffd2705467c000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
>> 0xffffd2adf716e000-0xffffd2adf7180000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
>> 0xffffd4ba5fb6b000-0xffffd4ba5fb6d000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
>> 0xffffded126192000-0xffffded126194000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
>> 0xffffe01a4dbcd000-0xffffe01a4dbcf000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffe4b649952000-0xffffe4b649954000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
>> 0xffffe71ed592a000-0xffffe71ed592c000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
>> 0xffffe7dc5824f000-0xffffe7dc58270000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
>> 0xffffe8f4f9800000-0xffffe8f4f9a00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc
>> 0xffffe8f4f9a19000-0xffffe8f4f9a1e000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
>>
>> With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
>> vmalloc area and therefore they also get randomized (only one example
>> line from /proc/vmallocinfo shown for brevity):
>>
>> unrandomized:
>> 0xffffc90000018000-0xffffc90000021000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
>>
>> randomized:
>> 0xffffcb57611a8000-0xffffcb57611b1000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
>>
>> CC: Andrew Morton <[email protected]>
>> CC: Andy Lutomirski <[email protected]>
>> CC: Jann Horn <[email protected]>
>> CC: Kees Cook <[email protected]>
>> CC: Linux API <[email protected]>
>> CC: Matthew Wilcox <[email protected]>
>> CC: Mike Rapoport <[email protected]>
>> CC: Vlad Rezki <[email protected]>
>> Signed-off-by: Topi Miettinen <[email protected]>
>> ---
>> v2: retry allocation from other end of vmalloc space in case of
>> failure (Matthew Wilcox), improve commit message and documentation
>> v3: randomize also percpu allocations (pcpu_get_vm_areas())
>> v4: use static branches (Kees Cook) and make the parameter boolean.
>> ---
>> .../admin-guide/kernel-parameters.txt | 24 ++++++++++
>> mm/vmalloc.c | 44 +++++++++++++++++--
>> 2 files changed, 65 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index a10b545c2070..726aec542079 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -4024,6 +4024,30 @@
>>
>> ramdisk_start= [RAM] RAM disk image start address
>>
>> + randomize_vmalloc= [KNL] Boolean option to randomize vmalloc()
>> + allocations. When enabled, the entire
>> + vmalloc() area is used randomly to make the
>> + allocations less predictable and harder to
>> + guess for attackers. Also module and BPF code
>> + locations get randomized (within their
>> + dedicated and rather small area though) and if
>> + CONFIG_VMAP_STACK is enabled, also kernel
>> + thread stack locations.
>> +
>> + On 32 bit systems this may cause problems due
>> + to increased VM fragmentation if the address
>> + space gets crowded.
>>
> What kind of problems? Could you please more cpecific? I guess fail
> ratio will be increased.

Maybe it would be simplest to disable the option entirely for 32 bit
architectures. I think if such a system has lots of RAM (maybe 4GB?), so
that the kernel VM area (1GB?) can be fully utilized, it's possible that
when the area is allocated randomly, the gaps between allocations can
become unusable for further allocations. Perhaps the option could still
work with a system on the low end (64MB?) because any allocations (for
example page tables needed) consume real RAM so the fragmentation does
not become the bottleneck.

>
>> +
>> + On all systems, it will reduce performance and
>> + increase memory and cache usage due to less
>> + efficient use of page tables and inability to
>> + merge adjacent VMAs with compatible
>> + attributes. On x86_64 with 5 level page
>> + tables, in the worst case, additional page
>> + table entries of up to 4 pages are created for
>> + each mapping, so with small mappings there's
>> + considerable penalty.
> Could you please put test results to the commit message? You can run
> "test_vmalloc.sh performance" on you system. It will give us some
> figures to understand the performance difference.

Thanks, I didn't notice this tool.

>
>> +
>> random.trust_cpu={on,off}
>> [KNL] Enable or disable trusting the use of the
>> CPU's random number generator (if available) to
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index e6f352bf0498..b5ecf27dc98e 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -34,6 +34,7 @@
>> #include <linux/bitops.h>
>> #include <linux/rbtree_augmented.h>
>> #include <linux/overflow.h>
>> +#include <linux/random.h>
>>
>> #include <linux/uaccess.h>
>> #include <asm/tlbflush.h>
>> @@ -1089,6 +1090,25 @@ adjust_va_to_fit_type(struct vmap_area *va,
>> return 0;
>> }
>>
>> +static DEFINE_STATIC_KEY_FALSE_RO(randomize_vmalloc);
>> +
>> +static int __init set_randomize_vmalloc(char *str)
>> +{
>> + int ret;
>> + bool bool_result;
>> +
>> + ret = kstrtobool(str, &bool_result);
>> + if (ret)
>> + return ret;
>> +
>> + if (bool_result)
>> + static_branch_enable(&randomize_vmalloc);
>> + else
>> + static_branch_disable(&randomize_vmalloc);
>> + return 1;
>> +}
>> +__setup("randomize_vmalloc=", set_randomize_vmalloc);
>> +
>> /*
>> * Returns a start address of the newly allocated area, if success.
>> * Otherwise a vend is returned that indicates failure.
>> @@ -1162,7 +1182,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>> int node, gfp_t gfp_mask)
>> {
>> struct vmap_area *va, *pva;
>> - unsigned long addr;
>> + unsigned long addr, voffset;
>> int purged = 0;
>> int ret;
>>
>> @@ -1217,11 +1237,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>> if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
>> kmem_cache_free(vmap_area_cachep, pva);
>>
>> + /* Randomize allocation */
>> + if (static_branch_unlikely(&randomize_vmalloc)) {
>> + voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
>> + voffset = PAGE_ALIGN(voffset);
>> + if (voffset + size > vend - vstart)
>> + voffset = vend - vstart - size;
>> + } else
>> + voffset = 0;
>> +
> Could you please wrap that change into a separate function? For example
> randomize_voffset_with_range(start, end).

OK. I'm relying on compiler noticing that voffset is zero (in some
cases) and then it can skip the addition. Maybe the addition could be
avoided entirely.

>
>> /*
>> * If an allocation fails, the "vend" address is
>> * returned. Therefore trigger the overflow path.
>> */
>> - addr = __alloc_vmap_area(size, align, vstart, vend);
>> + addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
>> +
>> + if (unlikely(addr == vend) && voffset)
>> + /* Retry randomization from other end */
>> + addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
>> spin_unlock(&free_vmap_area_lock);
>>
>> if (unlikely(addr == vend))
>> @@ -3258,7 +3291,12 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>> start = offsets[area];
>> end = start + sizes[area];
>>
>> - va = pvm_find_va_enclose_addr(vmalloc_end);
>> + if (static_branch_unlikely(&randomize_vmalloc))
>> + va = pvm_find_va_enclose_addr(vmalloc_start +
>> + (get_random_long() &
>> + (roundup_pow_of_two(vmalloc_end - vmalloc_start) - 1)));
>> + else
>> + va = pvm_find_va_enclose_addr(vmalloc_end);
>> base = pvm_determine_end_from_reverse(&va, align) - end;
> As for per-cpu embedded alloator. Even though currently it is part of
> the vmalloc space, it is not a vmalloc() allocation. Please do not change
> its code. It does alloations by "chunks" where an internal structure
> represent special memory layout that is used for actual allocations.

What's the problem with that? It seems to me that nothing relies on
specific addresses of the chunks, so it should be possible to randomize
these too. Also the alignment is honored.

>
>
> Also, using vmaloc test driver i can trigger a kernel BUG:
>
> <snip>
> [ 24.627577] kernel BUG at mm/vmalloc.c:1272!
> [ 24.628645] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 24.628684] CPU: 30 PID: 929 Comm: vmalloc_test/0 Tainted: G E 5.11.0-next-20210225-next #484
> [ 24.628724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 24.628757] RIP: 0010:alloc_vmap_area.isra.53+0x593/0xf10
> [ 24.628784] Code: 41 5d 41 5e 41 5f c3 0f 0b 0f 0b 48 c7 44 24 10 f0 ff ff ff eb c9 48 8d 5a f0 e9 9c fc ff ff 48 c7 44 24 10 f4 ff ff ff eb b5 <0f> 0b 4c 8d 4b 10 48 39 d0 74 12 48 8b 44 24 18 31 ff 48 89 03 48
> [ 24.628853] RSP: 0018:ffffc4296cf67d38 EFLAGS: 00010206
> [ 24.628876] RAX: ffffd6db9e61a000 RBX: ffff8ae9c9309440 RCX: 0000000000000001
> [ 24.628905] RDX: 0000000080000001 RSI: ffff8ae9c0046be8 RDI: 00000000ffffffff
> [ 24.628933] RBP: 0000000000002000 R08: ffff8ae9c13699e8 R09: ffffb98000000000
> [ 24.628961] R10: ffffd6db9e61a000 R11: 000000003aa1c801 R12: ffff8ae9c9f0d280
> [ 24.628989] R13: 0000008000001fff R14: ffffff8000000000 R15: 0000007fffffffff
> [ 24.629019] FS: 0000000000000000(0000) GS:ffff8af8bf580000(0000) knlGS:0000000000000000
> [ 24.629051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 24.629074] CR2: 000055916370aa80 CR3: 00000006bf40a000 CR4: 00000000000006e0
> [ 24.629103] Call Trace:
> [ 24.629128] ? map_kernel_range_noflush+0x27a/0x360
> [ 24.629150] ? kmem_cache_alloc_trace+0x340/0x460
> [ 24.629172] __get_vm_area_node.isra.54+0xa7/0x150
> [ 24.629195] ? fix_size_alloc_test+0x50/0x50 [test_vmalloc]
> [ 24.629221] __vmalloc_node_range+0x64/0x230
> [ 24.629241] ? test_func+0xdb/0x1f0 [test_vmalloc]
> [ 24.629263] ? fix_size_alloc_test+0x50/0x50 [test_vmalloc]
> [ 24.629288] __vmalloc_node+0x3b/0x40
> [ 24.629305] ? test_func+0xdb/0x1f0 [test_vmalloc]
> [ 24.629326] align_shift_alloc_test+0x39/0x50 [test_vmalloc]
> [ 24.629350] test_func+0xdb/0x1f0 [test_vmalloc]
> [ 24.629372] ? fix_align_alloc_test+0x50/0x50 [test_vmalloc]
> [ 24.629396] kthread+0x13d/0x160
> [ 24.629413] ? kthread_park+0x80/0x80
> [ 24.629431] ret_from_fork+0x22/0x30
> <snip>
>
> --
> Vlad Rezki
>

Hopefully I can also reproduce this with test driver. Thanks for testing!

-Topi

2021-03-15 11:52:00

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On 14.3.2021 19.23, Uladzislau Rezki wrote:
> Also, using vmaloc test driver i can trigger a kernel BUG:
>
> <snip>
> [ 24.627577] kernel BUG at mm/vmalloc.c:1272!

It seems that most tests indeed fail. Perhaps the vmalloc subsystem
isn't very robust in face of fragmented virtual memory. What could be
done to fix that?

In this patch, I could retry __alloc_vmap_area() with the whole region
after failure of both [random, vend] and [vstart, random] but I'm not
sure that would help much. Worth a try of course.

By the way, some of the tests in test_vmalloc.c don't check for
vmalloc() failure, for example in full_fit_alloc_test().

-Topi

2021-03-15 12:26:12

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> On 14.3.2021 19.23, Uladzislau Rezki wrote:
> > > Memory mappings inside kernel allocated with vmalloc() are in
> > > predictable order and packed tightly toward the low addresses, except
> > > for per-cpu areas which start from top of the vmalloc area. With
> > > new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> > > used randomly to make the allocations less predictable and harder to
> > > guess for attackers. Also module and BPF code locations get randomized
> > > (within their dedicated and rather small area though) and if
> > > CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
> > >
> > > On 32 bit systems this may cause problems due to increased VM
> > > fragmentation if the address space gets crowded.
> > >
> > > On all systems, it will reduce performance and increase memory and
> > > cache usage due to less efficient use of page tables and inability to
> > > merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
> > > page tables, in the worst case, additional page table entries of up to
> > > 4 pages are created for each mapping, so with small mappings there's
> > > considerable penalty.
> > >
> > > Without randomize_vmalloc=1:
> > > $ grep -v kernel_clone /proc/vmallocinfo
> > > 0xffffc90000000000-0xffffc90000009000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
> > > 0xffffc90000009000-0xffffc9000000b000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
> > > 0xffffc9000000c000-0xffffc9000000f000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
> > > 0xffffc9000000f000-0xffffc90000011000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
> > > 0xffffc90000011000-0xffffc90000013000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffc90000013000-0xffffc90000015000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffc90000015000-0xffffc90000017000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffc90000021000-0xffffc90000023000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffc90000023000-0xffffc90000025000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
> > > 0xffffc90000025000-0xffffc90000027000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
> > > 0xffffc90000031000-0xffffc90000036000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
> > > 0xffffc90000043000-0xffffc90000047000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
> > > 0xffffc90000211000-0xffffc90000232000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
> > > 0xffffc90000232000-0xffffc90000253000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
> > > 0xffffc900005a9000-0xffffc900005ba000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
> > > 0xffffc900005ba000-0xffffc900005cc000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
> > > 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc
> > >
> > > With randomize_vmalloc=1, the allocations are randomized:
> > > $ grep -v kernel_clone /proc/vmallocinfo
> > > 0xffffc9759d443000-0xffffc9759d445000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap
> > > 0xffffccf1e9f66000-0xffffccf1e9f68000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffcd2fc02a4000-0xffffcd2fc02a6000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffcdaefb898000-0xffffcdaefb89b000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap
> > > 0xffffcef8074c3000-0xffffcef8074cc000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap
> > > 0xffffcf725ca2e000-0xffffcf725ca4f000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
> > > 0xffffd0efb25e1000-0xffffd0efb25f2000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
> > > 0xffffd27054678000-0xffffd2705467c000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc
> > > 0xffffd2adf716e000-0xffffd2adf7180000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
> > > 0xffffd4ba5fb6b000-0xffffd4ba5fb6d000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap
> > > 0xffffded126192000-0xffffded126194000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap
> > > 0xffffe01a4dbcd000-0xffffe01a4dbcf000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffe4b649952000-0xffffe4b649954000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap
> > > 0xffffe71ed592a000-0xffffe71ed592c000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xffffe7dc5824f000-0xffffe7dc58270000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
> > > 0xffffe8f4f9800000-0xffffe8f4f9a00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc
> > > 0xffffe8f4f9a19000-0xffffe8f4f9a1e000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
> > >
> > > With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
> > > vmalloc area and therefore they also get randomized (only one example
> > > line from /proc/vmallocinfo shown for brevity):
> > >
> > > unrandomized:
> > > 0xffffc90000018000-0xffffc90000021000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
> > >
> > > randomized:
> > > 0xffffcb57611a8000-0xffffcb57611b1000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc
> > >
> > > CC: Andrew Morton <[email protected]>
> > > CC: Andy Lutomirski <[email protected]>
> > > CC: Jann Horn <[email protected]>
> > > CC: Kees Cook <[email protected]>
> > > CC: Linux API <[email protected]>
> > > CC: Matthew Wilcox <[email protected]>
> > > CC: Mike Rapoport <[email protected]>
> > > CC: Vlad Rezki <[email protected]>
> > > Signed-off-by: Topi Miettinen <[email protected]>
> > > ---
> > > v2: retry allocation from other end of vmalloc space in case of
> > > failure (Matthew Wilcox), improve commit message and documentation
> > > v3: randomize also percpu allocations (pcpu_get_vm_areas())
> > > v4: use static branches (Kees Cook) and make the parameter boolean.
> > > ---
> > > .../admin-guide/kernel-parameters.txt | 24 ++++++++++
> > > mm/vmalloc.c | 44 +++++++++++++++++--
> > > 2 files changed, 65 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index a10b545c2070..726aec542079 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -4024,6 +4024,30 @@
> > > ramdisk_start= [RAM] RAM disk image start address
> > > + randomize_vmalloc= [KNL] Boolean option to randomize vmalloc()
> > > + allocations. When enabled, the entire
> > > + vmalloc() area is used randomly to make the
> > > + allocations less predictable and harder to
> > > + guess for attackers. Also module and BPF code
> > > + locations get randomized (within their
> > > + dedicated and rather small area though) and if
> > > + CONFIG_VMAP_STACK is enabled, also kernel
> > > + thread stack locations.
> > > +
> > > + On 32 bit systems this may cause problems due
> > > + to increased VM fragmentation if the address
> > > + space gets crowded.
> > >
> > What kind of problems? Could you please more cpecific? I guess fail
> > ratio will be increased.
>
> Maybe it would be simplest to disable the option entirely for 32 bit
> architectures. I think if such a system has lots of RAM (maybe 4GB?), so
> that the kernel VM area (1GB?) can be fully utilized, it's possible that
> when the area is allocated randomly, the gaps between allocations can become
> unusable for further allocations. Perhaps the option could still work with a
> system on the low end (64MB?) because any allocations (for example page
> tables needed) consume real RAM so the fragmentation does not become the
> bottleneck.
>
Probably it make sense to disable it for 32-bit system. I suspect that
due to such randomization a fail ratio will be increased and vmalloc()
will start failing a lot.

> >
> > > +
> > > + On all systems, it will reduce performance and
> > > + increase memory and cache usage due to less
> > > + efficient use of page tables and inability to
> > > + merge adjacent VMAs with compatible
> > > + attributes. On x86_64 with 5 level page
> > > + tables, in the worst case, additional page
> > > + table entries of up to 4 pages are created for
> > > + each mapping, so with small mappings there's
> > > + considerable penalty.
> > Could you please put test results to the commit message? You can run
> > "test_vmalloc.sh performance" on you system. It will give us some
> > figures to understand the performance difference.
>
> Thanks, I didn't notice this tool.
>
You are welcome :)

> >
> > > +
> > > random.trust_cpu={on,off}
> > > [KNL] Enable or disable trusting the use of the
> > > CPU's random number generator (if available) to
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index e6f352bf0498..b5ecf27dc98e 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -34,6 +34,7 @@
> > > #include <linux/bitops.h>
> > > #include <linux/rbtree_augmented.h>
> > > #include <linux/overflow.h>
> > > +#include <linux/random.h>
> > > #include <linux/uaccess.h>
> > > #include <asm/tlbflush.h>
> > > @@ -1089,6 +1090,25 @@ adjust_va_to_fit_type(struct vmap_area *va,
> > > return 0;
> > > }
> > > +static DEFINE_STATIC_KEY_FALSE_RO(randomize_vmalloc);
> > > +
> > > +static int __init set_randomize_vmalloc(char *str)
> > > +{
> > > + int ret;
> > > + bool bool_result;
> > > +
> > > + ret = kstrtobool(str, &bool_result);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (bool_result)
> > > + static_branch_enable(&randomize_vmalloc);
> > > + else
> > > + static_branch_disable(&randomize_vmalloc);
> > > + return 1;
> > > +}
> > > +__setup("randomize_vmalloc=", set_randomize_vmalloc);
> > > +
> > > /*
> > > * Returns a start address of the newly allocated area, if success.
> > > * Otherwise a vend is returned that indicates failure.
> > > @@ -1162,7 +1182,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> > > int node, gfp_t gfp_mask)
> > > {
> > > struct vmap_area *va, *pva;
> > > - unsigned long addr;
> > > + unsigned long addr, voffset;
> > > int purged = 0;
> > > int ret;
> > > @@ -1217,11 +1237,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> > > if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
> > > kmem_cache_free(vmap_area_cachep, pva);
> > > + /* Randomize allocation */
> > > + if (static_branch_unlikely(&randomize_vmalloc)) {
> > > + voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
> > > + voffset = PAGE_ALIGN(voffset);
> > > + if (voffset + size > vend - vstart)
> > > + voffset = vend - vstart - size;
> > > + } else
> > > + voffset = 0;
> > > +
> > Could you please wrap that change into a separate function? For example
> > randomize_voffset_with_range(start, end).
>
> OK. I'm relying on compiler noticing that voffset is zero (in some cases)
> and then it can skip the addition. Maybe the addition could be avoided
> entirely.
>
It is just better to place it into separate function to make the code
less clogged. Also, it should be called outside of the lock. There is
no need to calculate and offset under holding free_vmap_area_lock.

> >
> > > /*
> > > * If an allocation fails, the "vend" address is
> > > * returned. Therefore trigger the overflow path.
> > > */
> > > - addr = __alloc_vmap_area(size, align, vstart, vend);
> > > + addr = __alloc_vmap_area(size, align, vstart + voffset, vend);
> > > +
> > > + if (unlikely(addr == vend) && voffset)
> > > + /* Retry randomization from other end */
> > > + addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size);
> > > spin_unlock(&free_vmap_area_lock);
> > > if (unlikely(addr == vend))
> > > @@ -3258,7 +3291,12 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
> > > start = offsets[area];
> > > end = start + sizes[area];
> > > - va = pvm_find_va_enclose_addr(vmalloc_end);
> > > + if (static_branch_unlikely(&randomize_vmalloc))
> > > + va = pvm_find_va_enclose_addr(vmalloc_start +
> > > + (get_random_long() &
> > > + (roundup_pow_of_two(vmalloc_end - vmalloc_start) - 1)));
> > > + else
> > > + va = pvm_find_va_enclose_addr(vmalloc_end);
> > > base = pvm_determine_end_from_reverse(&va, align) - end;
> > As for per-cpu embedded alloator. Even though currently it is part of
> > the vmalloc space, it is not a vmalloc() allocation. Please do not change
> > its code. It does alloations by "chunks" where an internal structure
> > represent special memory layout that is used for actual allocations.
>
> What's the problem with that? It seems to me that nothing relies on specific
> addresses of the chunks, so it should be possible to randomize these too.
> Also the alignment is honored.
>
My concern are:

- it is not a vmalloc allocator;
- per-cpu allocator allocates chunks, thus it might be it happens only once. It does not allocate it often;
- changing it will likely introduce issues you are not aware of;
- it is not supposed to be interacting with vmalloc allocator. Read the
comment under pcpu_get_vm_areas();

Therefore i propose just not touch it.

> >
> >
> > Also, using vmaloc test driver i can trigger a kernel BUG:
> >
> > <snip>
> > [ 24.627577] kernel BUG at mm/vmalloc.c:1272!
> > [ 24.628645] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > [ 24.628684] CPU: 30 PID: 929 Comm: vmalloc_test/0 Tainted: G E 5.11.0-next-20210225-next #484
> > [ 24.628724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > [ 24.628757] RIP: 0010:alloc_vmap_area.isra.53+0x593/0xf10
> > [ 24.628784] Code: 41 5d 41 5e 41 5f c3 0f 0b 0f 0b 48 c7 44 24 10 f0 ff ff ff eb c9 48 8d 5a f0 e9 9c fc ff ff 48 c7 44 24 10 f4 ff ff ff eb b5 <0f> 0b 4c 8d 4b 10 48 39 d0 74 12 48 8b 44 24 18 31 ff 48 89 03 48
> > [ 24.628853] RSP: 0018:ffffc4296cf67d38 EFLAGS: 00010206
> > [ 24.628876] RAX: ffffd6db9e61a000 RBX: ffff8ae9c9309440 RCX: 0000000000000001
> > [ 24.628905] RDX: 0000000080000001 RSI: ffff8ae9c0046be8 RDI: 00000000ffffffff
> > [ 24.628933] RBP: 0000000000002000 R08: ffff8ae9c13699e8 R09: ffffb98000000000
> > [ 24.628961] R10: ffffd6db9e61a000 R11: 000000003aa1c801 R12: ffff8ae9c9f0d280
> > [ 24.628989] R13: 0000008000001fff R14: ffffff8000000000 R15: 0000007fffffffff
> > [ 24.629019] FS: 0000000000000000(0000) GS:ffff8af8bf580000(0000) knlGS:0000000000000000
> > [ 24.629051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 24.629074] CR2: 000055916370aa80 CR3: 00000006bf40a000 CR4: 00000000000006e0
> > [ 24.629103] Call Trace:
> > [ 24.629128] ? map_kernel_range_noflush+0x27a/0x360
> > [ 24.629150] ? kmem_cache_alloc_trace+0x340/0x460
> > [ 24.629172] __get_vm_area_node.isra.54+0xa7/0x150
> > [ 24.629195] ? fix_size_alloc_test+0x50/0x50 [test_vmalloc]
> > [ 24.629221] __vmalloc_node_range+0x64/0x230
> > [ 24.629241] ? test_func+0xdb/0x1f0 [test_vmalloc]
> > [ 24.629263] ? fix_size_alloc_test+0x50/0x50 [test_vmalloc]
> > [ 24.629288] __vmalloc_node+0x3b/0x40
> > [ 24.629305] ? test_func+0xdb/0x1f0 [test_vmalloc]
> > [ 24.629326] align_shift_alloc_test+0x39/0x50 [test_vmalloc]
> > [ 24.629350] test_func+0xdb/0x1f0 [test_vmalloc]
> > [ 24.629372] ? fix_align_alloc_test+0x50/0x50 [test_vmalloc]
> > [ 24.629396] kthread+0x13d/0x160
> > [ 24.629413] ? kthread_park+0x80/0x80
> > [ 24.629431] ret_from_fork+0x22/0x30
> > <snip>
> >
> > --
> > Vlad Rezki
> >
>
> Hopefully I can also reproduce this with test driver. Thanks for testing!
>
I think you will manage to reproduce it.

--
Vlad Rezki

2021-03-15 15:37:00

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

> On 14.3.2021 19.23, Uladzislau Rezki wrote:
> > Also, using vmaloc test driver i can trigger a kernel BUG:
> >
> > <snip>
> > [ 24.627577] kernel BUG at mm/vmalloc.c:1272!
>
> It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
> very robust in face of fragmented virtual memory. What could be done to fix
> that?
>
Your patch is broken in context of checking "vend" when you try to
allocate next time after first attempt. Passed "vend" is different
there comparing what is checked later to figure out if an allocation
failed or not:

<snip>
if (unlikely(addr == vend))
goto overflow;
<snip>

>
> In this patch, I could retry __alloc_vmap_area() with the whole region after
> failure of both [random, vend] and [vstart, random] but I'm not sure that
> would help much. Worth a try of course.
>
There is no need in your second [vstart, random]. If a first bigger range
has not been successful, the smaller one will never be success anyway. The
best way to go here is to repeat with real [vsart:vend], if it still fails
on a real range, then it will not be possible to accomplish an allocation
request with given parameters.

>
> By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
> failure, for example in full_fit_alloc_test().
>
Where?

--
Vlad Rezki

2021-03-15 16:19:39

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:
> On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> > What's the problem with that? It seems to me that nothing relies on specific
> > addresses of the chunks, so it should be possible to randomize these too.
> > Also the alignment is honored.
> >
> My concern are:
>
> - it is not a vmalloc allocator;
> - per-cpu allocator allocates chunks, thus it might be it happens only once. It does not allocate it often;

That's actually the reason to randomize it: if it always ends up in the
same place at every boot, it becomes a stable target for attackers.

> - changing it will likely introduce issues you are not aware of;
> - it is not supposed to be interacting with vmalloc allocator. Read the
> comment under pcpu_get_vm_areas();
>
> Therefore i propose just not touch it.

How about splitting it from this patch instead? Then it can get separate
testing, etc.

--
Kees Cook

2021-03-15 18:37:39

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On 15.3.2021 17.35, Uladzislau Rezki wrote:
>> On 14.3.2021 19.23, Uladzislau Rezki wrote:
>>> Also, using vmaloc test driver i can trigger a kernel BUG:
>>>
>>> <snip>
>>> [ 24.627577] kernel BUG at mm/vmalloc.c:1272!
>>
>> It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
>> very robust in face of fragmented virtual memory. What could be done to fix
>> that?
>>
> Your patch is broken in context of checking "vend" when you try to
> allocate next time after first attempt. Passed "vend" is different
> there comparing what is checked later to figure out if an allocation
> failed or not:
>
> <snip>
> if (unlikely(addr == vend))
> goto overflow;
> <snip>


Thanks, I'll fix that.

>
>>
>> In this patch, I could retry __alloc_vmap_area() with the whole region after
>> failure of both [random, vend] and [vstart, random] but I'm not sure that
>> would help much. Worth a try of course.
>>
> There is no need in your second [vstart, random]. If a first bigger range
> has not been successful, the smaller one will never be success anyway. The
> best way to go here is to repeat with real [vsart:vend], if it still fails
> on a real range, then it will not be possible to accomplish an allocation
> request with given parameters.
>
>>
>> By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
>> failure, for example in full_fit_alloc_test().
>>
> Where?

Something like this:

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 5cf2fe9aab9e..27e5db9a96b4 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -182,9 +182,14 @@ static int long_busy_list_alloc_test(void)
if (!ptr)
return rv;

- for (i = 0; i < 15000; i++)
+ for (i = 0; i < 15000; i++) {
ptr[i] = vmalloc(1 * PAGE_SIZE);

+ if (!ptr[i])
+ goto leave;
+ }
+
+
for (i = 0; i < test_loop_count; i++) {
ptr_1 = vmalloc(100 * PAGE_SIZE);
if (!ptr_1)
@@ -236,7 +241,11 @@ static int full_fit_alloc_test(void)

for (i = 0; i < junk_length; i++) {
ptr[i] = vmalloc(1 * PAGE_SIZE);
+ if (!ptr[i])
+ goto error;
junk_ptr[i] = vmalloc(1 * PAGE_SIZE);
+ if (!junk_ptr[i])
+ goto error;
}

for (i = 0; i < junk_length; i++)
@@ -256,8 +265,10 @@ static int full_fit_alloc_test(void)
rv = 0;

error:
- for (i = 0; i < junk_length; i++)
+ for (i = 0; i < junk_length; i++) {
vfree(ptr[i]);
+ vfree(junk_ptr[i]);
+ }

vfree(ptr);
vfree(junk_ptr);

-Topi

2021-03-15 20:25:55

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On Mon, Mar 15, 2021 at 06:23:37PM +0200, Topi Miettinen wrote:
> On 15.3.2021 17.35, Uladzislau Rezki wrote:
> > > On 14.3.2021 19.23, Uladzislau Rezki wrote:
> > > > Also, using vmaloc test driver i can trigger a kernel BUG:
> > > >
> > > > <snip>
> > > > [ 24.627577] kernel BUG at mm/vmalloc.c:1272!
> > >
> > > It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
> > > very robust in face of fragmented virtual memory. What could be done to fix
> > > that?
> > >
> > Your patch is broken in context of checking "vend" when you try to
> > allocate next time after first attempt. Passed "vend" is different
> > there comparing what is checked later to figure out if an allocation
> > failed or not:
> >
> > <snip>
> > if (unlikely(addr == vend))
> > goto overflow;
> > <snip>
>
>
> Thanks, I'll fix that.
>
> >
> > >
> > > In this patch, I could retry __alloc_vmap_area() with the whole region after
> > > failure of both [random, vend] and [vstart, random] but I'm not sure that
> > > would help much. Worth a try of course.
> > >
> > There is no need in your second [vstart, random]. If a first bigger range
> > has not been successful, the smaller one will never be success anyway. The
> > best way to go here is to repeat with real [vsart:vend], if it still fails
> > on a real range, then it will not be possible to accomplish an allocation
> > request with given parameters.
> >
> > >
> > > By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
> > > failure, for example in full_fit_alloc_test().
> > >
> > Where?
>
> Something like this:
>
> diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
> index 5cf2fe9aab9e..27e5db9a96b4 100644
> --- a/lib/test_vmalloc.c
> +++ b/lib/test_vmalloc.c
> @@ -182,9 +182,14 @@ static int long_busy_list_alloc_test(void)
> if (!ptr)
> return rv;
>
> - for (i = 0; i < 15000; i++)
> + for (i = 0; i < 15000; i++) {
> ptr[i] = vmalloc(1 * PAGE_SIZE);
>
> + if (!ptr[i])
> + goto leave;
> + }
> +
>
Hmm. That is for creating a long list of allocated areas before running
a test. For example if one allocation among 15 000 fails, some index will
be set to NULL. Later on after "leave" label vfree() will bypass NULL freeing.

Either we have 15 000 extra elements or 10 000 does not really matter
and is considered as a corner case that is probably never happens. Yes,
you can simulate such precondition, but then a regular vmalloc()s will
likely also fails, thus the final results will be screwed up.

> +
> for (i = 0; i < test_loop_count; i++) {
> ptr_1 = vmalloc(100 * PAGE_SIZE);
> if (!ptr_1)
> @@ -236,7 +241,11 @@ static int full_fit_alloc_test(void)
>
> for (i = 0; i < junk_length; i++) {
> ptr[i] = vmalloc(1 * PAGE_SIZE);
> + if (!ptr[i])
> + goto error;
> junk_ptr[i] = vmalloc(1 * PAGE_SIZE);
> + if (!junk_ptr[i])
> + goto error;
> }
>
> for (i = 0; i < junk_length; i++)
> @@ -256,8 +265,10 @@ static int full_fit_alloc_test(void)
> rv = 0;
>
> error:
> - for (i = 0; i < junk_length; i++)
> + for (i = 0; i < junk_length; i++) {
> vfree(ptr[i]);
> + vfree(junk_ptr[i]);
> + }
>
> vfree(ptr);
> vfree(junk_ptr);
>
Same here.

--
Vlad Rezki

2021-03-15 23:09:41

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On Mon, Mar 15, 2021 at 09:16:26AM -0700, Kees Cook wrote:
> On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:
> > On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> > > What's the problem with that? It seems to me that nothing relies on specific
> > > addresses of the chunks, so it should be possible to randomize these too.
> > > Also the alignment is honored.
> > >
> > My concern are:
> >
> > - it is not a vmalloc allocator;
> > - per-cpu allocator allocates chunks, thus it might be it happens only once. It does not allocate it often;
>
> That's actually the reason to randomize it: if it always ends up in the
> same place at every boot, it becomes a stable target for attackers.
>
Probably we can randomize a base address only once when pcpu-allocator
allocates a fist chunk during the boot.

> > - changing it will likely introduce issues you are not aware of;
> > - it is not supposed to be interacting with vmalloc allocator. Read the
> > comment under pcpu_get_vm_areas();
> >
> > Therefore i propose just not touch it.
>
> How about splitting it from this patch instead? Then it can get separate
> testing, etc.
>
It should be split as well as tested.

--
Vlad Rezki

2021-03-16 09:03:33

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On 15.3.2021 20.02, Uladzislau Rezki wrote:
> On Mon, Mar 15, 2021 at 06:23:37PM +0200, Topi Miettinen wrote:
>> On 15.3.2021 17.35, Uladzislau Rezki wrote:
>>>> On 14.3.2021 19.23, Uladzislau Rezki wrote:
>>>>> Also, using vmaloc test driver i can trigger a kernel BUG:
>>>>>
>>>>> <snip>
>>>>> [ 24.627577] kernel BUG at mm/vmalloc.c:1272!
>>>>
>>>> It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
>>>> very robust in face of fragmented virtual memory. What could be done to fix
>>>> that?
>>>>
>>> Your patch is broken in context of checking "vend" when you try to
>>> allocate next time after first attempt. Passed "vend" is different
>>> there comparing what is checked later to figure out if an allocation
>>> failed or not:
>>>
>>> <snip>
>>> if (unlikely(addr == vend))
>>> goto overflow;
>>> <snip>
>>
>>
>> Thanks, I'll fix that.
>>
>>>
>>>>
>>>> In this patch, I could retry __alloc_vmap_area() with the whole region after
>>>> failure of both [random, vend] and [vstart, random] but I'm not sure that
>>>> would help much. Worth a try of course.
>>>>
>>> There is no need in your second [vstart, random]. If a first bigger range
>>> has not been successful, the smaller one will never be success anyway. The
>>> best way to go here is to repeat with real [vsart:vend], if it still fails
>>> on a real range, then it will not be possible to accomplish an allocation
>>> request with given parameters.
>>>
>>>>
>>>> By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
>>>> failure, for example in full_fit_alloc_test().
>>>>
>>> Where?
>>
>> Something like this:
>>
>> diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
>> index 5cf2fe9aab9e..27e5db9a96b4 100644
>> --- a/lib/test_vmalloc.c
>> +++ b/lib/test_vmalloc.c
>> @@ -182,9 +182,14 @@ static int long_busy_list_alloc_test(void)
>> if (!ptr)
>> return rv;
>>
>> - for (i = 0; i < 15000; i++)
>> + for (i = 0; i < 15000; i++) {
>> ptr[i] = vmalloc(1 * PAGE_SIZE);
>>
>> + if (!ptr[i])
>> + goto leave;
>> + }
>> +
>>
> Hmm. That is for creating a long list of allocated areas before running
> a test. For example if one allocation among 15 000 fails, some index will
> be set to NULL. Later on after "leave" label vfree() will bypass NULL freeing.
>
> Either we have 15 000 extra elements or 10 000 does not really matter
> and is considered as a corner case that is probably never happens. Yes,
> you can simulate such precondition, but then a regular vmalloc()s will
> likely also fails, thus the final results will be screwed up.

I'd argue that if the allocations fail, the test should be aborted
immediately since the results are not representative.

-Topi

>
>> +
>> for (i = 0; i < test_loop_count; i++) {
>> ptr_1 = vmalloc(100 * PAGE_SIZE);
>> if (!ptr_1)
>> @@ -236,7 +241,11 @@ static int full_fit_alloc_test(void)
>>
>> for (i = 0; i < junk_length; i++) {
>> ptr[i] = vmalloc(1 * PAGE_SIZE);
>> + if (!ptr[i])
>> + goto error;
>> junk_ptr[i] = vmalloc(1 * PAGE_SIZE);
>> + if (!junk_ptr[i])
>> + goto error;
>> }
>>
>> for (i = 0; i < junk_length; i++)
>> @@ -256,8 +265,10 @@ static int full_fit_alloc_test(void)
>> rv = 0;
>>
>> error:
>> - for (i = 0; i < junk_length; i++)
>> + for (i = 0; i < junk_length; i++) {
>> vfree(ptr[i]);
>> + vfree(junk_ptr[i]);
>> + }
>>
>> vfree(ptr);
>> vfree(junk_ptr);
>>
> Same here.
>
> --
> Vlad Rezki
>

2021-03-16 09:32:29

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On 15.3.2021 19.47, Uladzislau Rezki wrote:
> On Mon, Mar 15, 2021 at 09:16:26AM -0700, Kees Cook wrote:
>> On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:
>>> On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
>>>> What's the problem with that? It seems to me that nothing relies on specific
>>>> addresses of the chunks, so it should be possible to randomize these too.
>>>> Also the alignment is honored.
>>>>
>>> My concern are:
>>>
>>> - it is not a vmalloc allocator;
>>> - per-cpu allocator allocates chunks, thus it might be it happens only once. It does not allocate it often;
>>
>> That's actually the reason to randomize it: if it always ends up in the
>> same place at every boot, it becomes a stable target for attackers.
>>
> Probably we can randomize a base address only once when pcpu-allocator
> allocates a fist chunk during the boot.
>
>>> - changing it will likely introduce issues you are not aware of;
>>> - it is not supposed to be interacting with vmalloc allocator. Read the
>>> comment under pcpu_get_vm_areas();
>>>
>>> Therefore i propose just not touch it.
>>
>> How about splitting it from this patch instead? Then it can get separate
>> testing, etc.
>>
> It should be split as well as tested.

Would you prefer another kernel option `randomize_percpu_allocator=1`,
or would it be OK to make it a flag in `randomize_vmalloc`, like
`randomize_vmalloc=3`? Maybe the latter would not be compatible with
static branches.

-Topi

>
> --
> Vlad Rezki
>

2021-03-16 15:38:01

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

On Tue, Mar 16, 2021 at 10:01:46AM +0200, Topi Miettinen wrote:
> On 15.3.2021 19.47, Uladzislau Rezki wrote:
> > On Mon, Mar 15, 2021 at 09:16:26AM -0700, Kees Cook wrote:
> > > On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:
> > > > On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> > > > > What's the problem with that? It seems to me that nothing relies on specific
> > > > > addresses of the chunks, so it should be possible to randomize these too.
> > > > > Also the alignment is honored.
> > > > >
> > > > My concern are:
> > > >
> > > > - it is not a vmalloc allocator;
> > > > - per-cpu allocator allocates chunks, thus it might be it happens only once. It does not allocate it often;
> > >
> > > That's actually the reason to randomize it: if it always ends up in the
> > > same place at every boot, it becomes a stable target for attackers.
> > >
> > Probably we can randomize a base address only once when pcpu-allocator
> > allocates a fist chunk during the boot.
> >
> > > > - changing it will likely introduce issues you are not aware of;
> > > > - it is not supposed to be interacting with vmalloc allocator. Read the
> > > > comment under pcpu_get_vm_areas();
> > > >
> > > > Therefore i propose just not touch it.
> > >
> > > How about splitting it from this patch instead? Then it can get separate
> > > testing, etc.
> > >
> > It should be split as well as tested.
>
> Would you prefer another kernel option `randomize_percpu_allocator=1`, or
> would it be OK to make it a flag in `randomize_vmalloc`, like
> `randomize_vmalloc=3`? Maybe the latter would not be compatible with static
> branches.
>
I think it is better to have a separate option, because there are two
different allocators.

--
Vlad Rezki