2021-08-09 09:33:28

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v3 0/3] arm64: support page mapping percpu first chunk allocator

Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,
"percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
"percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
"percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"

then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
even the system could not boot successfully.

Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.

Also fix a crash when both NEED_PER_CPU_PAGE_FIRST_CHUNK and KASAN_VMALLOC enabled.

Tested on ARM64 qemu with cmdline "percpu_alloc=page" based on v5.14-rc5.

v3:
- search for a range that fits instead of always picking the end from
vmalloc area suggested by Catalin.
- use NUMA_NO_NODE to avoid "virt_to_phys used for non-linear address:"
issue in arm64 kasan_populate_early_vm_area_shadow().
- add Acked-by: Marco Elver <[email protected]> to patch v3

V2:
- fix build error when CONFIG_KASAN disabled, found by [email protected]
- drop wrong __weak comment from kasan_populate_early_vm_area_shadow(),
found by Marco Elver <[email protected]>

Kefeng Wang (3):
vmalloc: Choose a better start address in vm_area_register_early()
arm64: Support page mapping percpu first chunk allocator
kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC

arch/arm64/Kconfig | 4 ++
arch/arm64/mm/kasan_init.c | 16 ++++++++
drivers/base/arch_numa.c | 82 +++++++++++++++++++++++++++++++++-----
include/linux/kasan.h | 6 +++
mm/kasan/init.c | 5 +++
mm/vmalloc.c | 17 +++++---
6 files changed, 115 insertions(+), 15 deletions(-)

--
2.26.2


2021-08-09 09:34:53

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v3 1/3] vmalloc: Choose a better start address in vm_area_register_early()

There are some fixed locations in the vmalloc area be reserved
in ARM(see iotable_init()) and ARM64(see map_kernel()), but for
pcpu_page_first_chunk(), it calls vm_area_register_early() and
choose VMALLOC_START as the start address of vmap area which
could be conflicted with above address, then could trigger a
BUG_ON in vm_area_add_early().

Let's choose the end of existing address range in vmlist as the
start address instead of VMALLOC_START to avoid the BUG_ON.

Signed-off-by: Kefeng Wang <[email protected]>
---
mm/vmalloc.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d5cd52805149..1e8fe08725b8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2238,11 +2238,17 @@ void __init vm_area_add_early(struct vm_struct *vm)
*/
void __init vm_area_register_early(struct vm_struct *vm, size_t align)
{
- static size_t vm_init_off __initdata;
- unsigned long addr;
-
- addr = ALIGN(VMALLOC_START + vm_init_off, align);
- vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
+ struct vm_struct *head = vmlist, *curr, *next;
+ unsigned long addr = ALIGN(VMALLOC_START, align);
+
+ while (head != NULL) {
+ next = head->next;
+ curr = head;
+ head = next;
+ addr = ALIGN((unsigned long)curr->addr + curr->size, align);
+ if (next && (unsigned long)next->addr - addr > vm->size)
+ break;
+ }

vm->addr = (void *)addr;

--
2.26.2

2021-08-09 09:34:58

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v3 2/3] arm64: Support page mapping percpu first chunk allocator

Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,
"percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
"percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
"percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"

then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
even the system could not boot successfully.

Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.

Signed-off-by: Kefeng Wang <[email protected]>
---
arch/arm64/Kconfig | 4 ++
drivers/base/arch_numa.c | 82 +++++++++++++++++++++++++++++++++++-----
2 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fdcd54d39c1e..39f27e268c38 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1045,6 +1045,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
def_bool y
depends on NUMA

+config NEED_PER_CPU_PAGE_FIRST_CHUNK
+ def_bool y
+ depends on NUMA
+
source "kernel/Kconfig.hz"

config ARCH_SPARSEMEM_ENABLE
diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
index 4cc4e117727d..563b2013b75a 100644
--- a/drivers/base/arch_numa.c
+++ b/drivers/base/arch_numa.c
@@ -14,6 +14,7 @@
#include <linux/of.h>

#include <asm/sections.h>
+#include <asm/pgalloc.h>

struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
EXPORT_SYMBOL(node_data);
@@ -168,22 +169,83 @@ static void __init pcpu_fc_free(void *ptr, size_t size)
memblock_free_early(__pa(ptr), size);
}

+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+static void __init pcpu_populate_pte(unsigned long addr)
+{
+ pgd_t *pgd = pgd_offset_k(addr);
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ p4d = p4d_offset(pgd, addr);
+ if (p4d_none(*p4d)) {
+ pud_t *new;
+
+ new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+ if (!new)
+ goto err_alloc;
+ p4d_populate(&init_mm, p4d, new);
+ }
+
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud)) {
+ pmd_t *new;
+
+ new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+ if (!new)
+ goto err_alloc;
+ pud_populate(&init_mm, pud, new);
+ }
+
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_present(*pmd)) {
+ pte_t *new;
+
+ new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+ if (!new)
+ goto err_alloc;
+ pmd_populate_kernel(&init_mm, pmd, new);
+ }
+
+ return;
+
+err_alloc:
+ panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
+ __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+}
+#endif
+
void __init setup_per_cpu_areas(void)
{
unsigned long delta;
unsigned int cpu;
- int rc;
+ int rc = -EINVAL;
+
+ if (pcpu_chosen_fc != PCPU_FC_PAGE) {
+ /*
+ * Always reserve area for module percpu variables. That's
+ * what the legacy allocator did.
+ */
+ rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
+ PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
+ pcpu_cpu_distance,
+ pcpu_fc_alloc, pcpu_fc_free);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+ if (rc < 0)
+ pr_warn("PERCPU: %s allocator failed (%d), falling back to page size\n",
+ pcpu_fc_names[pcpu_chosen_fc], rc);
+#endif
+ }

- /*
- * Always reserve area for module percpu variables. That's
- * what the legacy allocator did.
- */
- rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
- PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
- pcpu_cpu_distance,
- pcpu_fc_alloc, pcpu_fc_free);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+ if (rc < 0)
+ rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE,
+ pcpu_fc_alloc,
+ pcpu_fc_free,
+ pcpu_populate_pte);
+#endif
if (rc < 0)
- panic("Failed to initialize percpu areas.");
+ panic("Failed to initialize percpu areas (err=%d).", rc);

delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
for_each_possible_cpu(cpu)
--
2.26.2

2021-08-09 10:20:13

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v3 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC

With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,

Unable to handle kernel paging request at virtual address ffff7000028f2000
...
swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
[ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
Hardware name: linux,dummy-virt (DT)
pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
pc : kasan_check_range+0x90/0x1a0
lr : memcpy+0x88/0xf4
sp : ffff80001378fe20
...
Call trace:
kasan_check_range+0x90/0x1a0
pcpu_page_first_chunk+0x3f0/0x568
setup_per_cpu_areas+0xb8/0x184
start_kernel+0x8c/0x328

The vm area used in vm_area_register_early() has no kasan shadow memory,
Let's add a new kasan_populate_early_vm_area_shadow() function to populate
the vm area shadow memory to fix the issue.

Signed-off-by: Kefeng Wang <[email protected]>
---
arch/arm64/mm/kasan_init.c | 16 ++++++++++++++++
include/linux/kasan.h | 6 ++++++
mm/kasan/init.c | 5 +++++
mm/vmalloc.c | 1 +
4 files changed, 28 insertions(+)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 61b52a92b8b6..5b996ca4d996 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -287,6 +287,22 @@ static void __init kasan_init_depth(void)
init_task.kasan_depth = 0;
}

+#ifdef CONFIG_KASAN_VMALLOC
+void __init kasan_populate_early_vm_area_shadow(void *start, unsigned long size)
+{
+ unsigned long shadow_start, shadow_end;
+
+ if (!is_vmalloc_or_module_addr(start))
+ return;
+
+ shadow_start = (unsigned long)kasan_mem_to_shadow(start);
+ shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE);
+ shadow_end = (unsigned long)kasan_mem_to_shadow(start + size);
+ shadow_end = ALIGN(shadow_end, PAGE_SIZE);
+ kasan_map_populate(shadow_start, shadow_end, NUMA_NO_NODE);
+}
+#endif
+
void __init kasan_init(void)
{
kasan_init_shadow();
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index dd874a1ee862..3f8c26d9ef82 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -133,6 +133,8 @@ struct kasan_cache {
bool is_kmalloc;
};

+void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
+
slab_flags_t __kasan_never_merge(void);
static __always_inline slab_flags_t kasan_never_merge(void)
{
@@ -303,6 +305,10 @@ void kasan_restore_multi_shot(bool enabled);

#else /* CONFIG_KASAN */

+static inline void kasan_populate_early_vm_area_shadow(void *start,
+ unsigned long size)
+{ }
+
static inline slab_flags_t kasan_never_merge(void)
{
return 0;
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index cc64ed6858c6..d39577d088a1 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -279,6 +279,11 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
return 0;
}

+void __init __weak kasan_populate_early_vm_area_shadow(void *start,
+ unsigned long size)
+{
+}
+
static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
{
pte_t *pte;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1e8fe08725b8..66a7e1ea2561 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2253,6 +2253,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
vm->addr = (void *)addr;

vm_area_add_early(vm);
+ kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
}

static void vmap_init_free_space(void)
--
2.26.2

2021-08-09 12:38:46

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC

On Mon, 9 Aug 2021 at 13:10, Kefeng Wang <[email protected]> wrote:
>
>
> On 2021/8/9 17:37, Kefeng Wang wrote:
> > With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,
> >
> > Unable to handle kernel paging request at virtual address ffff7000028f2000
> > ...
> > swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
> > [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
> > Internal error: Oops: 96000007 [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
> > Hardware name: linux,dummy-virt (DT)
> > pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
> > pc : kasan_check_range+0x90/0x1a0
> > lr : memcpy+0x88/0xf4
> > sp : ffff80001378fe20
> > ...
> > Call trace:
> > kasan_check_range+0x90/0x1a0
> > pcpu_page_first_chunk+0x3f0/0x568
> > setup_per_cpu_areas+0xb8/0x184
> > start_kernel+0x8c/0x328
> >
> > The vm area used in vm_area_register_early() has no kasan shadow memory,
> > Let's add a new kasan_populate_early_vm_area_shadow() function to populate
> > the vm area shadow memory to fix the issue.
>
> Should add Acked-by: Marco Elver <[email protected]> [for KASAN parts] ,

My Ack is still valid, thanks for noting.

> > Signed-off-by: Kefeng Wang <[email protected]>
> > ---
> > arch/arm64/mm/kasan_init.c | 16 ++++++++++++++++
> > include/linux/kasan.h | 6 ++++++
> > mm/kasan/init.c | 5 +++++
> > mm/vmalloc.c | 1 +
> > 4 files changed, 28 insertions(+)
> >
> > diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> > index 61b52a92b8b6..5b996ca4d996 100644
> > --- a/arch/arm64/mm/kasan_init.c
> > +++ b/arch/arm64/mm/kasan_init.c
> > @@ -287,6 +287,22 @@ static void __init kasan_init_depth(void)
> > init_task.kasan_depth = 0;
> > }
> >
> > +#ifdef CONFIG_KASAN_VMALLOC
> > +void __init kasan_populate_early_vm_area_shadow(void *start, unsigned long size)
> > +{
> > + unsigned long shadow_start, shadow_end;
> > +
> > + if (!is_vmalloc_or_module_addr(start))
> > + return;
> > +
> > + shadow_start = (unsigned long)kasan_mem_to_shadow(start);
> > + shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE);
> > + shadow_end = (unsigned long)kasan_mem_to_shadow(start + size);
> > + shadow_end = ALIGN(shadow_end, PAGE_SIZE);
> > + kasan_map_populate(shadow_start, shadow_end, NUMA_NO_NODE);
> > +}
> > +#endif
> > +
> > void __init kasan_init(void)
> > {
> > kasan_init_shadow();
> > diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> > index dd874a1ee862..3f8c26d9ef82 100644
> > --- a/include/linux/kasan.h
> > +++ b/include/linux/kasan.h
> > @@ -133,6 +133,8 @@ struct kasan_cache {
> > bool is_kmalloc;
> > };
> >
> > +void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
> > +
> > slab_flags_t __kasan_never_merge(void);
> > static __always_inline slab_flags_t kasan_never_merge(void)
> > {
> > @@ -303,6 +305,10 @@ void kasan_restore_multi_shot(bool enabled);
> >
> > #else /* CONFIG_KASAN */
> >
> > +static inline void kasan_populate_early_vm_area_shadow(void *start,
> > + unsigned long size)
> > +{ }
> > +
> > static inline slab_flags_t kasan_never_merge(void)
> > {
> > return 0;
> > diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> > index cc64ed6858c6..d39577d088a1 100644
> > --- a/mm/kasan/init.c
> > +++ b/mm/kasan/init.c
> > @@ -279,6 +279,11 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
> > return 0;
> > }
> >
> > +void __init __weak kasan_populate_early_vm_area_shadow(void *start,
> > + unsigned long size)
> > +{
> > +}
> > +
> > static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
> > {
> > pte_t *pte;
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 1e8fe08725b8..66a7e1ea2561 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2253,6 +2253,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
> > vm->addr = (void *)addr;
> >
> > vm_area_add_early(vm);
> > + kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
> > }
> >
> > static void vmap_init_free_space(void)

2021-08-09 13:39:10

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC


On 2021/8/9 19:21, Marco Elver wrote:
> On Mon, 9 Aug 2021 at 13:10, Kefeng Wang <[email protected]> wrote:
>>
>> On 2021/8/9 17:37, Kefeng Wang wrote:
>>> With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,
>>>
>>> Unable to handle kernel paging request at virtual address ffff7000028f2000
>>> ...
>>> swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
>>> [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
>>> Internal error: Oops: 96000007 [#1] PREEMPT SMP
>>> Modules linked in:
>>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
>>> Hardware name: linux,dummy-virt (DT)
>>> pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
>>> pc : kasan_check_range+0x90/0x1a0
>>> lr : memcpy+0x88/0xf4
>>> sp : ffff80001378fe20
>>> ...
>>> Call trace:
>>> kasan_check_range+0x90/0x1a0
>>> pcpu_page_first_chunk+0x3f0/0x568
>>> setup_per_cpu_areas+0xb8/0x184
>>> start_kernel+0x8c/0x328
>>>
>>> The vm area used in vm_area_register_early() has no kasan shadow memory,
>>> Let's add a new kasan_populate_early_vm_area_shadow() function to populate
>>> the vm area shadow memory to fix the issue.
>> Should add Acked-by: Marco Elver <[email protected]> [for KASAN parts] ,
> My Ack is still valid, thanks for noting.
Thanks,  Marco ;)

2021-08-09 14:01:18

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC


On 2021/8/9 17:37, Kefeng Wang wrote:
> With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,
>
> Unable to handle kernel paging request at virtual address ffff7000028f2000
> ...
> swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
> [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
> Internal error: Oops: 96000007 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
> Hardware name: linux,dummy-virt (DT)
> pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
> pc : kasan_check_range+0x90/0x1a0
> lr : memcpy+0x88/0xf4
> sp : ffff80001378fe20
> ...
> Call trace:
> kasan_check_range+0x90/0x1a0
> pcpu_page_first_chunk+0x3f0/0x568
> setup_per_cpu_areas+0xb8/0x184
> start_kernel+0x8c/0x328
>
> The vm area used in vm_area_register_early() has no kasan shadow memory,
> Let's add a new kasan_populate_early_vm_area_shadow() function to populate
> the vm area shadow memory to fix the issue.

Should add Acked-by: Marco Elver <[email protected]> [for KASAN parts] ,

missed here :(

> Signed-off-by: Kefeng Wang <[email protected]>
> ---
> arch/arm64/mm/kasan_init.c | 16 ++++++++++++++++
> include/linux/kasan.h | 6 ++++++
> mm/kasan/init.c | 5 +++++
> mm/vmalloc.c | 1 +
> 4 files changed, 28 insertions(+)
>
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 61b52a92b8b6..5b996ca4d996 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -287,6 +287,22 @@ static void __init kasan_init_depth(void)
> init_task.kasan_depth = 0;
> }
>
> +#ifdef CONFIG_KASAN_VMALLOC
> +void __init kasan_populate_early_vm_area_shadow(void *start, unsigned long size)
> +{
> + unsigned long shadow_start, shadow_end;
> +
> + if (!is_vmalloc_or_module_addr(start))
> + return;
> +
> + shadow_start = (unsigned long)kasan_mem_to_shadow(start);
> + shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE);
> + shadow_end = (unsigned long)kasan_mem_to_shadow(start + size);
> + shadow_end = ALIGN(shadow_end, PAGE_SIZE);
> + kasan_map_populate(shadow_start, shadow_end, NUMA_NO_NODE);
> +}
> +#endif
> +
> void __init kasan_init(void)
> {
> kasan_init_shadow();
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index dd874a1ee862..3f8c26d9ef82 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -133,6 +133,8 @@ struct kasan_cache {
> bool is_kmalloc;
> };
>
> +void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
> +
> slab_flags_t __kasan_never_merge(void);
> static __always_inline slab_flags_t kasan_never_merge(void)
> {
> @@ -303,6 +305,10 @@ void kasan_restore_multi_shot(bool enabled);
>
> #else /* CONFIG_KASAN */
>
> +static inline void kasan_populate_early_vm_area_shadow(void *start,
> + unsigned long size)
> +{ }
> +
> static inline slab_flags_t kasan_never_merge(void)
> {
> return 0;
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index cc64ed6858c6..d39577d088a1 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -279,6 +279,11 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
> return 0;
> }
>
> +void __init __weak kasan_populate_early_vm_area_shadow(void *start,
> + unsigned long size)
> +{
> +}
> +
> static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
> {
> pte_t *pte;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1e8fe08725b8..66a7e1ea2561 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2253,6 +2253,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
> vm->addr = (void *)addr;
>
> vm_area_add_early(vm);
> + kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
> }
>
> static void vmap_init_free_space(void)

2021-08-09 21:25:27

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC

On Mon, Aug 9, 2021 at 11:32 AM Kefeng Wang <[email protected]> wrote:
>
> With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,
>
> Unable to handle kernel paging request at virtual address ffff7000028f2000
> ...
> swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
> [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
> Internal error: Oops: 96000007 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
> Hardware name: linux,dummy-virt (DT)
> pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
> pc : kasan_check_range+0x90/0x1a0
> lr : memcpy+0x88/0xf4
> sp : ffff80001378fe20
> ...
> Call trace:
> kasan_check_range+0x90/0x1a0
> pcpu_page_first_chunk+0x3f0/0x568
> setup_per_cpu_areas+0xb8/0x184
> start_kernel+0x8c/0x328
>
> The vm area used in vm_area_register_early() has no kasan shadow memory,
> Let's add a new kasan_populate_early_vm_area_shadow() function to populate
> the vm area shadow memory to fix the issue.
>
> Signed-off-by: Kefeng Wang <[email protected]>
> ---
> arch/arm64/mm/kasan_init.c | 16 ++++++++++++++++
> include/linux/kasan.h | 6 ++++++
> mm/kasan/init.c | 5 +++++
> mm/vmalloc.c | 1 +
> 4 files changed, 28 insertions(+)
>
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 61b52a92b8b6..5b996ca4d996 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -287,6 +287,22 @@ static void __init kasan_init_depth(void)
> init_task.kasan_depth = 0;
> }
>
> +#ifdef CONFIG_KASAN_VMALLOC
> +void __init kasan_populate_early_vm_area_shadow(void *start, unsigned long size)
> +{
> + unsigned long shadow_start, shadow_end;
> +
> + if (!is_vmalloc_or_module_addr(start))
> + return;
> +
> + shadow_start = (unsigned long)kasan_mem_to_shadow(start);
> + shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE);
> + shadow_end = (unsigned long)kasan_mem_to_shadow(start + size);
> + shadow_end = ALIGN(shadow_end, PAGE_SIZE);
> + kasan_map_populate(shadow_start, shadow_end, NUMA_NO_NODE);
> +}
> +#endif
> +
> void __init kasan_init(void)
> {
> kasan_init_shadow();
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index dd874a1ee862..3f8c26d9ef82 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -133,6 +133,8 @@ struct kasan_cache {
> bool is_kmalloc;
> };
>
> +void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
> +
> slab_flags_t __kasan_never_merge(void);
> static __always_inline slab_flags_t kasan_never_merge(void)
> {
> @@ -303,6 +305,10 @@ void kasan_restore_multi_shot(bool enabled);
>
> #else /* CONFIG_KASAN */
>
> +static inline void kasan_populate_early_vm_area_shadow(void *start,
> + unsigned long size)
> +{ }
> +
> static inline slab_flags_t kasan_never_merge(void)
> {
> return 0;
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index cc64ed6858c6..d39577d088a1 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -279,6 +279,11 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
> return 0;
> }
>
> +void __init __weak kasan_populate_early_vm_area_shadow(void *start,
> + unsigned long size)
> +{
> +}
> +
> static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
> {
> pte_t *pte;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1e8fe08725b8..66a7e1ea2561 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2253,6 +2253,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
> vm->addr = (void *)addr;
>
> vm_area_add_early(vm);
> + kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
> }
>
> static void vmap_init_free_space(void)
> --
> 2.26.2
>

Acked-by: Andrey Konovalov <[email protected]>

for KASAN parts.

Thanks!

2021-08-12 06:09:24

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] arm64: support page mapping percpu first chunk allocator

Hi Catalin and Will,

The drivers/base/arch_numa.c is only shared by riscv and arm64,

and the change from patch2 won't broke riscv.

Could all patches be merged by arm64 tree? or any new comments?

Many thanks.

On 2021/8/9 17:37, Kefeng Wang wrote:
> Percpu embedded first chunk allocator is the firstly option, but it
> could fails on ARM64, eg,
> "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
> "percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
> "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"
>
> then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
> even the system could not boot successfully.
>
> Let's implement page mapping percpu first chunk allocator as a fallback
> to the embedding allocator to increase the robustness of the system.
>
> Also fix a crash when both NEED_PER_CPU_PAGE_FIRST_CHUNK and KASAN_VMALLOC enabled.
>
> Tested on ARM64 qemu with cmdline "percpu_alloc=page" based on v5.14-rc5.
>
> v3:
> - search for a range that fits instead of always picking the end from
> vmalloc area suggested by Catalin.
> - use NUMA_NO_NODE to avoid "virt_to_phys used for non-linear address:"
> issue in arm64 kasan_populate_early_vm_area_shadow().
> - add Acked-by: Marco Elver <[email protected]> to patch v3
>
> V2:
> - fix build error when CONFIG_KASAN disabled, found by [email protected]
> - drop wrong __weak comment from kasan_populate_early_vm_area_shadow(),
> found by Marco Elver <[email protected]>
>
> Kefeng Wang (3):
> vmalloc: Choose a better start address in vm_area_register_early()
> arm64: Support page mapping percpu first chunk allocator
> kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC
>
> arch/arm64/Kconfig | 4 ++
> arch/arm64/mm/kasan_init.c | 16 ++++++++
> drivers/base/arch_numa.c | 82 +++++++++++++++++++++++++++++++++-----
> include/linux/kasan.h | 6 +++
> mm/kasan/init.c | 5 +++
> mm/vmalloc.c | 17 +++++---
> 6 files changed, 115 insertions(+), 15 deletions(-)
>

2021-08-23 02:30:52

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] arm64: support page mapping percpu first chunk allocator


On 2021/8/12 14:07, Kefeng Wang wrote:
> Hi Catalin and Will,
>
> The drivers/base/arch_numa.c is only shared by riscv and arm64,
>
> and the change from patch2 won't broke riscv.
>
> Could all patches be merged by arm64 tree? or any new comments?

Kindly ping...

>
> Many thanks.
>
> On 2021/8/9 17:37, Kefeng Wang wrote:
>> Percpu embedded first chunk allocator is the firstly option, but it
>> could fails on ARM64, eg,
>>    "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space
>> 0x781fefff0000"
>>    "percpu: max_distance=0x600000540000 too large for vmalloc space
>> 0x7dffb7ff0000"
>>    "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space
>> 0x5dffb7ff0000"
>>
>> then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087
>> pcpu_get_vm_areas+0x488/0x838",
>> even the system could not boot successfully.
>>
>> Let's implement page mapping percpu first chunk allocator as a fallback
>> to the embedding allocator to increase the robustness of the system.
>>
>> Also fix a crash when both NEED_PER_CPU_PAGE_FIRST_CHUNK and
>> KASAN_VMALLOC enabled.
>>
>> Tested on ARM64 qemu with cmdline "percpu_alloc=page" based on
>> v5.14-rc5.
>>
>> v3:
>> - search for a range that fits instead of always picking the end from
>>    vmalloc area suggested by Catalin.
>> - use NUMA_NO_NODE to avoid "virt_to_phys used for non-linear address:"
>>    issue in arm64 kasan_populate_early_vm_area_shadow().
>> - add Acked-by: Marco Elver <[email protected]> to patch v3
>>
>> V2:
>> - fix build error when CONFIG_KASAN disabled, found by [email protected]
>> - drop wrong __weak comment from kasan_populate_early_vm_area_shadow(),
>>    found by Marco Elver <[email protected]>
>>
>> Kefeng Wang (3):
>>    vmalloc: Choose a better start address in vm_area_register_early()
>>    arm64: Support page mapping percpu first chunk allocator
>>    kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC
>>
>>   arch/arm64/Kconfig         |  4 ++
>>   arch/arm64/mm/kasan_init.c | 16 ++++++++
>>   drivers/base/arch_numa.c   | 82 +++++++++++++++++++++++++++++++++-----
>>   include/linux/kasan.h      |  6 +++
>>   mm/kasan/init.c            |  5 +++
>>   mm/vmalloc.c               | 17 +++++---
>>   6 files changed, 115 insertions(+), 15 deletions(-)
>>

2021-08-25 18:01:36

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] vmalloc: Choose a better start address in vm_area_register_early()

On Mon, Aug 09, 2021 at 05:37:48PM +0800, Kefeng Wang wrote:
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d5cd52805149..1e8fe08725b8 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2238,11 +2238,17 @@ void __init vm_area_add_early(struct vm_struct *vm)
> */
> void __init vm_area_register_early(struct vm_struct *vm, size_t align)
> {
> - static size_t vm_init_off __initdata;
> - unsigned long addr;
> -
> - addr = ALIGN(VMALLOC_START + vm_init_off, align);
> - vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
> + struct vm_struct *head = vmlist, *curr, *next;
> + unsigned long addr = ALIGN(VMALLOC_START, align);
> +
> + while (head != NULL) {

Nitpick: I'd use the same pattern as in vm_area_add_early(), i.e. a
'for' loop. You might as well insert it directly than calling the add
function and going through the loop again. Not a strong preference
either way.

> + next = head->next;
> + curr = head;
> + head = next;
> + addr = ALIGN((unsigned long)curr->addr + curr->size, align);
> + if (next && (unsigned long)next->addr - addr > vm->size)

Is greater or equal sufficient?

> + break;
> + }
>
> vm->addr = (void *)addr;

Another nitpick: it's very unlikely on a 64-bit architecture but not
impossible on 32-bit to hit VMALLOC_END here. Maybe some BUG_ON.

--
Catalin

2021-08-25 18:10:01

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] arm64: Support page mapping percpu first chunk allocator

On Mon, Aug 09, 2021 at 05:37:49PM +0800, Kefeng Wang wrote:
> Percpu embedded first chunk allocator is the firstly option, but it
> could fails on ARM64, eg,
> "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
> "percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
> "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"
>
> then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
> even the system could not boot successfully.
>
> Let's implement page mapping percpu first chunk allocator as a fallback
> to the embedding allocator to increase the robustness of the system.
>
> Signed-off-by: Kefeng Wang <[email protected]>

Reviewed-by: Catalin Marinas <[email protected]>

2021-08-25 21:03:40

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] arm64: support page mapping percpu first chunk allocator

On Thu, Aug 12, 2021 at 02:07:36PM +0800, Kefeng Wang wrote:
> The drivers/base/arch_numa.c is only shared by riscv and arm64,
>
> and the change from patch2 won't broke riscv.
>
> Could all patches be merged by arm64 tree? or any new comments?

The series touches drivers/ and mm/ but missing acks from both Greg and
Andrew (cc'ing them).

I'm also happy for the series to go in via the mm tree in case Andrew
wants to take it.

--
Catalin

2021-08-27 08:39:16

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] vmalloc: Choose a better start address in vm_area_register_early()


On 2021/8/26 1:59, Catalin Marinas wrote:
> On Mon, Aug 09, 2021 at 05:37:48PM +0800, Kefeng Wang wrote:
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index d5cd52805149..1e8fe08725b8 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -2238,11 +2238,17 @@ void __init vm_area_add_early(struct vm_struct *vm)
>> */
>> void __init vm_area_register_early(struct vm_struct *vm, size_t align)
>> {
>> - static size_t vm_init_off __initdata;
>> - unsigned long addr;
>> -
>> - addr = ALIGN(VMALLOC_START + vm_init_off, align);
>> - vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
>> + struct vm_struct *head = vmlist, *curr, *next;
>> + unsigned long addr = ALIGN(VMALLOC_START, align);
>> +
>> + while (head != NULL) {
> Nitpick: I'd use the same pattern as in vm_area_add_early(), i.e. a
> 'for' loop. You might as well insert it directly than calling the add
> function and going through the loop again. Not a strong preference
> either way.
>
>> + next = head->next;
>> + curr = head;
>> + head = next;
>> + addr = ALIGN((unsigned long)curr->addr + curr->size, align);
>> + if (next && (unsigned long)next->addr - addr > vm->size)
> Is greater or equal sufficient?
>
>> + break;
>> + }
>>
>> vm->addr = (void *)addr;
> Another nitpick: it's very unlikely on a 64-bit architecture but not
> impossible on 32-bit to hit VMALLOC_END here. Maybe some BUG_ON.

Hi Catalin, thank for your review, I will update in the next version,

Could you take a look the following change, is it OK?

void __init vm_area_register_early(struct vm_struct *vm, size_t align)

{

         struct vm_struct *next, *cur, **p;
         unsigned long addr = ALIGN(VMALLOC_START, align);
BUG_ON(vmap_initialized);

         for (p = &vmlist; (cur = *p) != NULL, next = cur->next; p =
&next) {
                 addr = ALIGN((unsigned long)cur->addr + cur->size,
align);
                 if (next && (unsigned long)next->addr - addr >=
vm->size) {
                         p = &next;
break;
}
}

         BUG_ON(addr > VMALLOC_END - vm->size);
         vm->addr = (void *)addr;
         vm->next = *p;
         *p = vm;
}


>