2024-03-23 14:16:10

by Uladzislau Rezki

[permalink] [raw]
Subject: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

During the boot the s390 system triggers "spinlock bad magic" messages
if the spinlock debugging is enabled:

[ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
[ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
[ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[ 0.466270] Call Trace:
[ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
[ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
[ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108
[ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40
[ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150
[ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118
[ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68
[ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708
[ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8
[ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40

it happens because such system tries to access some vmap areas
whereas the vmalloc initialization is not even yet done:

[ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
[ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[ 0.466270] Call Trace:
[ 0.466470] dump_stack_lvl (lib/dump_stack.c:117)
[ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
[ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
[ 0.466572] find_vm_area (mm/vmalloc.c:3150)
[ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
[ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
[ 0.466651] paging_init (arch/s390/mm/init.c:97)
[ 0.466677] setup_arch (arch/s390/kernel/setup.c:972)
[ 0.466702] start_kernel (init/main.c:899)
[ 0.466727] startup_continue (arch/s390/kernel/head64.S:35)
[ 0.466811] INFO: lockdep is turned off.
..
[ 0.718250] vmalloc init - busy lock init 0000000002871860
[ 0.718328] vmalloc init - busy lock init 00000000028731b8

Some background. It worked before because the lock that is in question
was statically defined and initialized. As of now, the locks and data
structures are initialized in the vmalloc_init() function.

To address that issue add the check whether the "vmap_initialized"
variable is set, if not find_vmap_area() bails out on entry returning NULL.

Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock")
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
---
mm/vmalloc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 22aa63f4ef63..0d77d171b5d9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2343,6 +2343,9 @@ struct vmap_area *find_vmap_area(unsigned long addr)
struct vmap_area *va;
int i, j;

+ if (unlikely(!vmap_initialized))
+ return NULL;
+
/*
* An addr_to_node_id(addr) converts an address to a node index
* where a VA is located. If VA spans several zones and passed
--
2.39.2



2024-03-23 23:22:34

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On 3/23/24 07:15, Uladzislau Rezki (Sony) wrote:
> During the boot the s390 system triggers "spinlock bad magic" messages
> if the spinlock debugging is enabled:
>
> [ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
> [ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
> [ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108
> [ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40
> [ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150
> [ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118
> [ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68
> [ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708
> [ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8
> [ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40
>
> it happens because such system tries to access some vmap areas
> whereas the vmalloc initialization is not even yet done:
>
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] dump_stack_lvl (lib/dump_stack.c:117)
> [ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
> [ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
> [ 0.466572] find_vm_area (mm/vmalloc.c:3150)
> [ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
> [ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
> [ 0.466651] paging_init (arch/s390/mm/init.c:97)
> [ 0.466677] setup_arch (arch/s390/kernel/setup.c:972)
> [ 0.466702] start_kernel (init/main.c:899)
> [ 0.466727] startup_continue (arch/s390/kernel/head64.S:35)
> [ 0.466811] INFO: lockdep is turned off.
> ...
> [ 0.718250] vmalloc init - busy lock init 0000000002871860
> [ 0.718328] vmalloc init - busy lock init 00000000028731b8
>
> Some background. It worked before because the lock that is in question
> was statically defined and initialized. As of now, the locks and data
> structures are initialized in the vmalloc_init() function.
>
> To address that issue add the check whether the "vmap_initialized"
> variable is set, if not find_vmap_area() bails out on entry returning NULL.
>
> Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock")
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

Tested-by: Guenter Roeck <[email protected]>


2024-03-24 02:57:27

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On 03/23/24 at 03:15pm, Uladzislau Rezki (Sony) wrote:
> During the boot the s390 system triggers "spinlock bad magic" messages
> if the spinlock debugging is enabled:
>
> [ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
> [ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
> [ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108
> [ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40
> [ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150
> [ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118
> [ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68
> [ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708
> [ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8
> [ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40
>
> it happens because such system tries to access some vmap areas
> whereas the vmalloc initialization is not even yet done:
>
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] dump_stack_lvl (lib/dump_stack.c:117)
> [ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
> [ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
> [ 0.466572] find_vm_area (mm/vmalloc.c:3150)
> [ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
> [ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
> [ 0.466651] paging_init (arch/s390/mm/init.c:97)
> [ 0.466677] setup_arch (arch/s390/kernel/setup.c:972)
> [ 0.466702] start_kernel (init/main.c:899)
> [ 0.466727] startup_continue (arch/s390/kernel/head64.S:35)
> [ 0.466811] INFO: lockdep is turned off.
> ...
> [ 0.718250] vmalloc init - busy lock init 0000000002871860
> [ 0.718328] vmalloc init - busy lock init 00000000028731b8
>
> Some background. It worked before because the lock that is in question
> was statically defined and initialized. As of now, the locks and data
> structures are initialized in the vmalloc_init() function.
>
> To address that issue add the check whether the "vmap_initialized"
> variable is set, if not find_vmap_area() bails out on entry returning NULL.
>
> Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock")
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
> ---
> mm/vmalloc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 22aa63f4ef63..0d77d171b5d9 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2343,6 +2343,9 @@ struct vmap_area *find_vmap_area(unsigned long addr)
> struct vmap_area *va;
> int i, j;
>
> + if (unlikely(!vmap_initialized))
> + return NULL;
> +
> /*
> * An addr_to_node_id(addr) converts an address to a node index
> * where a VA is located. If VA spans several zones and passed

LGTM,

Reviewed-by: Baoquan He <[email protected]>


2024-03-25 09:18:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

I guess this is ok as an urgend bandaid to get s390 booting again,
but calling find_vmap_area before the vmap area is initialized
seems an actual issue in the s390 mm init code.

Adding the s390 maintainers to see if they have and idea how this could
get fixed in a better way.


On Sat, Mar 23, 2024 at 03:15:44PM +0100, Uladzislau Rezki (Sony) wrote:
> During the boot the s390 system triggers "spinlock bad magic" messages
> if the spinlock debugging is enabled:
>
> [ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
> [ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
> [ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108
> [ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40
> [ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150
> [ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118
> [ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68
> [ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708
> [ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8
> [ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40
>
> it happens because such system tries to access some vmap areas
> whereas the vmalloc initialization is not even yet done:
>
> [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [ 0.466270] Call Trace:
> [ 0.466470] dump_stack_lvl (lib/dump_stack.c:117)
> [ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
> [ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
> [ 0.466572] find_vm_area (mm/vmalloc.c:3150)
> [ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
> [ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
> [ 0.466651] paging_init (arch/s390/mm/init.c:97)
> [ 0.466677] setup_arch (arch/s390/kernel/setup.c:972)
> [ 0.466702] start_kernel (init/main.c:899)
> [ 0.466727] startup_continue (arch/s390/kernel/head64.S:35)
> [ 0.466811] INFO: lockdep is turned off.
> ...
> [ 0.718250] vmalloc init - busy lock init 0000000002871860
> [ 0.718328] vmalloc init - busy lock init 00000000028731b8
>
> Some background. It worked before because the lock that is in question
> was statically defined and initialized. As of now, the locks and data
> structures are initialized in the vmalloc_init() function.
>
> To address that issue add the check whether the "vmap_initialized"
> variable is set, if not find_vmap_area() bails out on entry returning NULL.
>
> Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock")
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
> ---
> mm/vmalloc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 22aa63f4ef63..0d77d171b5d9 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2343,6 +2343,9 @@ struct vmap_area *find_vmap_area(unsigned long addr)
> struct vmap_area *va;
> int i, j;
>
> + if (unlikely(!vmap_initialized))
> + return NULL;
> +
> /*
> * An addr_to_node_id(addr) converts an address to a node index
> * where a VA is located. If VA spans several zones and passed
> --
> 2.39.2
>
---end quoted text---

2024-03-25 14:24:35

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On 03/25/24 at 10:39am, Heiko Carstens wrote:
> On Sun, Mar 24, 2024 at 04:32:00PM -0700, Christoph Hellwig wrote:
> > On Sat, Mar 23, 2024 at 03:15:44PM +0100, Uladzislau Rezki (Sony) wrote:
.....snip
> > I guess this is ok as an urgend bandaid to get s390 booting again,
> > but calling find_vmap_area before the vmap area is initialized
> > seems an actual issue in the s390 mm init code.
> >
> > Adding the s390 maintainers to see if they have and idea how this could
> > get fixed in a better way.
>
> I'm going to push the patch below to the s390 git tree later. This is not a
> piece of art, but I wanted to avoid to externalize vmalloc's vmap_initialized,
> or come up with some s390 specific change_page_attr_alias_early() variant where
> sooner or later nobody remembers what "early" means.
>
> So this seems to be "good enough".
>
> From 0308cd304fa3b01904c6060e2115234101811e48 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens <[email protected]>
> Date: Thu, 21 Mar 2024 09:41:20 +0100
> Subject: [PATCH] s390/mm,pageattr: avoid early calls into vmalloc code
>
> The vmalloc code got changed and doesn't have the global statically
> initialized vmap_area_lock spinlock anymore. This leads to the following
> lockdep splat when find_vm_area() is called before the vmalloc code is
> initialized:
>
> BUG: spinlock bad magic on CPU#0, swapper/0
> lock: single+0x1868/0x1978, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0
>
> CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-11767-g23956900041d #1
> Hardware name: IBM 3931 A01 701 (KVM/Linux)
> Call Trace:
> [<00000000010d840a>] dump_stack_lvl+0xba/0x148
> [<00000000001fdf5c>] do_raw_spin_unlock+0x7c/0xd0
> [<000000000111d848>] _raw_spin_unlock+0x38/0x68
> [<0000000000485830>] find_vmap_area+0xb0/0x108
> [<0000000000485ada>] find_vm_area+0x22/0x40
> [<0000000000132bbc>] __set_memory+0xbc/0x140
> [<0000000001a7f048>] vmem_map_init+0x40/0x158
> [<0000000001a7edc8>] paging_init+0x28/0x80
> [<0000000001a7a6e2>] setup_arch+0x4b2/0x6d8
> [<0000000001a74438>] start_kernel+0x98/0x4b0
> [<0000000000100036>] startup_continue+0x36/0x40
> INFO: lockdep is turned off.
>
> Add a slab_is_available() check to change_page_attr_alias() in order to
> avoid early calls into vmalloc code. slab_is_available() is not exactly
> what is needed, but there is currently no other way to tell if the vmalloc
> code is initialized or not, and there is no reason to expose
> e.g. vmap_initialized from vmalloc to achieve the same.

If so, I would rather add a vmalloc_is_available() to achieve the same.
The added code and the code comment definitely will confuse people and
make people to dig why.

>
> The fixes tag does not mean that the referenced commit is broken, but that
> there is a dependency to this commit if the vmalloc commit should be
> backported.
>
> Fixes: d093602919ad ("mm: vmalloc: remove global vmap_area_root rb-tree")
> Signed-off-by: Heiko Carstens <[email protected]>
> ---
> arch/s390/mm/pageattr.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
> index 01bc8fad64d6..b6c6453d66e2 100644
> --- a/arch/s390/mm/pageattr.c
> +++ b/arch/s390/mm/pageattr.c
> @@ -344,6 +344,9 @@ static int change_page_attr_alias(unsigned long addr, unsigned long end,
> struct vm_struct *area;
> int rc = 0;
>
> + /* Avoid early calls into not initialized vmalloc code. */
> + if (!slab_is_available())
> + return 0;
> /*
> * Changes to read-only permissions on kernel VA mappings are also
> * applied to the kernel direct mapping. Execute permissions are
> --
> 2.40.1
>


2024-03-25 14:49:56

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On Mon, Mar 25, 2024 at 06:09:26PM +0800, Baoquan He wrote:
> On 03/25/24 at 10:39am, Heiko Carstens wrote:
> > On Sun, Mar 24, 2024 at 04:32:00PM -0700, Christoph Hellwig wrote:
> > > On Sat, Mar 23, 2024 at 03:15:44PM +0100, Uladzislau Rezki (Sony) wrote:
> ......snip
> > > I guess this is ok as an urgend bandaid to get s390 booting again,
> > > but calling find_vmap_area before the vmap area is initialized
> > > seems an actual issue in the s390 mm init code.
> > >
> > > Adding the s390 maintainers to see if they have and idea how this could
> > > get fixed in a better way.
> >
> > I'm going to push the patch below to the s390 git tree later. This is not a
> > piece of art, but I wanted to avoid to externalize vmalloc's vmap_initialized,
> > or come up with some s390 specific change_page_attr_alias_early() variant where
> > sooner or later nobody remembers what "early" means.
> >
> > So this seems to be "good enough".
..
> > Add a slab_is_available() check to change_page_attr_alias() in order to
> > avoid early calls into vmalloc code. slab_is_available() is not exactly
> > what is needed, but there is currently no other way to tell if the vmalloc
> > code is initialized or not, and there is no reason to expose
> > e.g. vmap_initialized from vmalloc to achieve the same.
>
> If so, I would rather add a vmalloc_is_available() to achieve the same.
> The added code and the code comment definitely will confuse people and
> make people to dig why.

So after having given this a bit more thought I think Uladzislau's patch is
probably the best way to address this.

It seems to be better that the vmalloc code would just do the right thing,
regardless how early it is called, instead of adding yet another
subsystem_xyz_is_available() call.

Alternatively this could be addressed in s390 code with some sort of
"early" calls, but as already stated, sooner or later nobody would remember
what "early" means, and even if that would be remembered: would that
restriction still be valid?

2024-03-25 16:09:20

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On 03/25/24 at 12:16pm, Heiko Carstens wrote:
> On Mon, Mar 25, 2024 at 06:09:26PM +0800, Baoquan He wrote:
> > On 03/25/24 at 10:39am, Heiko Carstens wrote:
> > > On Sun, Mar 24, 2024 at 04:32:00PM -0700, Christoph Hellwig wrote:
> > > > On Sat, Mar 23, 2024 at 03:15:44PM +0100, Uladzislau Rezki (Sony) wrote:
> > ......snip
> > > > I guess this is ok as an urgend bandaid to get s390 booting again,
> > > > but calling find_vmap_area before the vmap area is initialized
> > > > seems an actual issue in the s390 mm init code.
> > > >
> > > > Adding the s390 maintainers to see if they have and idea how this could
> > > > get fixed in a better way.
> > >
> > > I'm going to push the patch below to the s390 git tree later. This is not a
> > > piece of art, but I wanted to avoid to externalize vmalloc's vmap_initialized,
> > > or come up with some s390 specific change_page_attr_alias_early() variant where
> > > sooner or later nobody remembers what "early" means.
> > >
> > > So this seems to be "good enough".
> ...
> > > Add a slab_is_available() check to change_page_attr_alias() in order to
> > > avoid early calls into vmalloc code. slab_is_available() is not exactly
> > > what is needed, but there is currently no other way to tell if the vmalloc
> > > code is initialized or not, and there is no reason to expose
> > > e.g. vmap_initialized from vmalloc to achieve the same.
> >
> > If so, I would rather add a vmalloc_is_available() to achieve the same.
> > The added code and the code comment definitely will confuse people and
> > make people to dig why.
>
> So after having given this a bit more thought I think Uladzislau's patch is
> probably the best way to address this.
>
> It seems to be better that the vmalloc code would just do the right thing,
> regardless how early it is called, instead of adding yet another
> subsystem_xyz_is_available() call.
>
> Alternatively this could be addressed in s390 code with some sort of
> "early" calls, but as already stated, sooner or later nobody would remember
> what "early" means, and even if that would be remembered: would that
> restriction still be valid?

I agree, it's better to let vmalloc code do the thing right whether it's
early ot not with Uladzislau's patch.


2024-03-25 16:36:53

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On Sun, Mar 24, 2024 at 04:32:00PM -0700, Christoph Hellwig wrote:
> On Sat, Mar 23, 2024 at 03:15:44PM +0100, Uladzislau Rezki (Sony) wrote:
> > During the boot the s390 system triggers "spinlock bad magic" messages
> > if the spinlock debugging is enabled:
> >
> > [ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
> > [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> > [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1
> > [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> > [ 0.466270] Call Trace:
> > [ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
> > [ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
> > [ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108
> > [ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40
> > [ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150
> > [ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118
> > [ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68
> > [ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708
> > [ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8
> > [ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40
..
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 22aa63f4ef63..0d77d171b5d9 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2343,6 +2343,9 @@ struct vmap_area *find_vmap_area(unsigned long addr)
> > struct vmap_area *va;
> > int i, j;
> >
> > + if (unlikely(!vmap_initialized))
> > + return NULL;
> > +
>
> I guess this is ok as an urgend bandaid to get s390 booting again,
> but calling find_vmap_area before the vmap area is initialized
> seems an actual issue in the s390 mm init code.
>
> Adding the s390 maintainers to see if they have and idea how this could
> get fixed in a better way.

I'm going to push the patch below to the s390 git tree later. This is not a
piece of art, but I wanted to avoid to externalize vmalloc's vmap_initialized,
or come up with some s390 specific change_page_attr_alias_early() variant where
sooner or later nobody remembers what "early" means.

So this seems to be "good enough".

From 0308cd304fa3b01904c6060e2115234101811e48 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <[email protected]>
Date: Thu, 21 Mar 2024 09:41:20 +0100
Subject: [PATCH] s390/mm,pageattr: avoid early calls into vmalloc code

The vmalloc code got changed and doesn't have the global statically
initialized vmap_area_lock spinlock anymore. This leads to the following
lockdep splat when find_vm_area() is called before the vmalloc code is
initialized:

BUG: spinlock bad magic on CPU#0, swapper/0
lock: single+0x1868/0x1978, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0

CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-11767-g23956900041d #1
Hardware name: IBM 3931 A01 701 (KVM/Linux)
Call Trace:
[<00000000010d840a>] dump_stack_lvl+0xba/0x148
[<00000000001fdf5c>] do_raw_spin_unlock+0x7c/0xd0
[<000000000111d848>] _raw_spin_unlock+0x38/0x68
[<0000000000485830>] find_vmap_area+0xb0/0x108
[<0000000000485ada>] find_vm_area+0x22/0x40
[<0000000000132bbc>] __set_memory+0xbc/0x140
[<0000000001a7f048>] vmem_map_init+0x40/0x158
[<0000000001a7edc8>] paging_init+0x28/0x80
[<0000000001a7a6e2>] setup_arch+0x4b2/0x6d8
[<0000000001a74438>] start_kernel+0x98/0x4b0
[<0000000000100036>] startup_continue+0x36/0x40
INFO: lockdep is turned off.

Add a slab_is_available() check to change_page_attr_alias() in order to
avoid early calls into vmalloc code. slab_is_available() is not exactly
what is needed, but there is currently no other way to tell if the vmalloc
code is initialized or not, and there is no reason to expose
e.g. vmap_initialized from vmalloc to achieve the same.

The fixes tag does not mean that the referenced commit is broken, but that
there is a dependency to this commit if the vmalloc commit should be
backported.

Fixes: d093602919ad ("mm: vmalloc: remove global vmap_area_root rb-tree")
Signed-off-by: Heiko Carstens <[email protected]>
---
arch/s390/mm/pageattr.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index 01bc8fad64d6..b6c6453d66e2 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -344,6 +344,9 @@ static int change_page_attr_alias(unsigned long addr, unsigned long end,
struct vm_struct *area;
int rc = 0;

+ /* Avoid early calls into not initialized vmalloc code. */
+ if (!slab_is_available())
+ return 0;
/*
* Changes to read-only permissions on kernel VA mappings are also
* applied to the kernel direct mapping. Execute permissions are
--
2.40.1


2024-03-26 07:49:43

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On Sat, Mar 23, 2024 at 03:15:44PM +0100, Uladzislau Rezki (Sony) wrote:
> During the boot the s390 system triggers "spinlock bad magic" messages
> if the spinlock debugging is enabled:
..
> Some background. It worked before because the lock that is in question
> was statically defined and initialized. As of now, the locks and data
> structures are initialized in the vmalloc_init() function.
>
> To address that issue add the check whether the "vmap_initialized"
> variable is set, if not find_vmap_area() bails out on entry returning NULL.
>
> Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock")
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
> ---
> mm/vmalloc.c | 3 +++
> 1 file changed, 3 insertions(+)

Thanks for fixing this, and FWIW:
Acked-by: Heiko Carstens <[email protected]>

2024-03-26 09:56:28

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: vmalloc: Bail out early in find_vmap_area() if vmap is not init

On Tue, Mar 26, 2024 at 08:48:46AM +0100, Heiko Carstens wrote:
> On Sat, Mar 23, 2024 at 03:15:44PM +0100, Uladzislau Rezki (Sony) wrote:
> > During the boot the s390 system triggers "spinlock bad magic" messages
> > if the spinlock debugging is enabled:
> ...
> > Some background. It worked before because the lock that is in question
> > was statically defined and initialized. As of now, the locks and data
> > structures are initialized in the vmalloc_init() function.
> >
> > To address that issue add the check whether the "vmap_initialized"
> > variable is set, if not find_vmap_area() bails out on entry returning NULL.
> >
> > Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock")
> > Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
> > ---
> > mm/vmalloc.c | 3 +++
> > 1 file changed, 3 insertions(+)
>
> Thanks for fixing this, and FWIW:
> Acked-by: Heiko Carstens <[email protected]>
>
You are welcome and thank you!

--
Uladzislau Rezki