LinuxLists.cc - [PATCH v2 0/3] mm: Make unregistration of super

2023-06-05 19:08:12

Subject: [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster

This patch set introduces a new scheme of shrinker unregistration. It allows to split
the unregistration in two parts: fast and slow. This allows to hide slow part from
a user, so user-visible unregistration becomes fast.

This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed
by kernel test robot:

https://lore.kernel.org/lkml/[email protected]/

---

Kirill Tkhai (2):
mm: Split unregister_shrinker() in fast and slow part
fs: Use delayed shrinker unregistration

Qi Zheng (1):
mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()

fs/super.c | 3 ++-
include/linux/shrinker.h | 4 ++++
mm/vmscan.c | 39 +++++++++++++++++++++++++++++++--------
3 files changed, 37 insertions(+), 9 deletions(-)

--
Signed-off-by: Kirill Tkhai <[email protected]>

2023-06-05 19:09:41

by Kirill Tkhai

[permalink] [raw]

Subject: [PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part

This splits unregister_shrinker() in two parts, and this allows
to make the unregistration faster by moving the slow part
in delayed asynchronous work. Note, that the guarantees remain
the same: no do_shrink_slab() calls are possible after the first
part. This will be used in next patch.

Signed-off-by: Kirill Tkhai <[email protected]>
---
include/linux/shrinker.h | 4 ++++
mm/vmscan.c | 35 +++++++++++++++++++++++++++++------
2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 224293b2dd06..1cc572fa6070 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -4,6 +4,7 @@

#include <linux/atomic.h>
#include <linux/types.h>
+#include <linux/rwsem.h>

/*
* This struct is used to pass information from page reclaim to the shrinkers.
@@ -83,6 +84,7 @@ struct shrinker {
#endif
/* objs pending delete, per node */
atomic_long_t *nr_deferred;
+ struct rw_semaphore rwsem;
};
#define DEFAULT_SEEKS 2 /* A good number if you don't know better. */

@@ -102,6 +104,8 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
extern int __printf(2, 3) register_shrinker(struct shrinker *shrinker,
const char *fmt, ...);
extern void unregister_shrinker(struct shrinker *shrinker);
+extern void unregister_shrinker_delayed_initiate(struct shrinker *shrinker);
+extern void unregister_shrinker_delayed_finalize(struct shrinker *shrinker);
extern void free_prealloced_shrinker(struct shrinker *shrinker);
extern void synchronize_shrinkers(void);

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a773e97e152e..f24fd58dcc2a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -706,6 +706,7 @@ static int __prealloc_shrinker(struct shrinker *shrinker)
if (!shrinker->nr_deferred)
return -ENOMEM;

+ init_rwsem(&shrinker->rwsem);
return 0;
}

@@ -757,7 +758,9 @@ void register_shrinker_prepared(struct shrinker *shrinker)
{
mutex_lock(&shrinker_mutex);
list_add_tail_rcu(&shrinker->list, &shrinker_list);
+ down_write(&shrinker->rwsem);
shrinker->flags |= SHRINKER_REGISTERED;
+ up_write(&shrinker->rwsem);
shrinker_debugfs_add(shrinker);
mutex_unlock(&shrinker_mutex);
}
@@ -802,7 +805,7 @@ EXPORT_SYMBOL(register_shrinker);
/*
* Remove one
*/
-void unregister_shrinker(struct shrinker *shrinker)
+void unregister_shrinker_delayed_initiate(struct shrinker *shrinker)
{
struct dentry *debugfs_entry;
int debugfs_id;
@@ -812,20 +815,33 @@ void unregister_shrinker(struct shrinker *shrinker)

mutex_lock(&shrinker_mutex);
list_del_rcu(&shrinker->list);
+ down_write(&shrinker->rwsem);
shrinker->flags &= ~SHRINKER_REGISTERED;
+ up_write(&shrinker->rwsem);
if (shrinker->flags & SHRINKER_MEMCG_AWARE)
unregister_memcg_shrinker(shrinker);
debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
mutex_unlock(&shrinker_mutex);

shrinker_debugfs_remove(debugfs_entry, debugfs_id);
+}
+EXPORT_SYMBOL(unregister_shrinker_delayed_initiate);

+void unregister_shrinker_delayed_finalize(struct shrinker *shrinker)
+{
atomic_inc(&shrinker_srcu_generation);
synchronize_srcu(&shrinker_srcu);

kfree(shrinker->nr_deferred);
shrinker->nr_deferred = NULL;
}
+EXPORT_SYMBOL(unregister_shrinker_delayed_finalize);
+
+void unregister_shrinker(struct shrinker *shrinker)
+{
+ unregister_shrinker_delayed_initiate(shrinker);
+ unregister_shrinker_delayed_finalize(shrinker);
+}
EXPORT_SYMBOL(unregister_shrinker);

/**
@@ -856,9 +872,15 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
: SHRINK_BATCH;
long scanned = 0, next_deferred;

+ if (!down_read_trylock(&shrinker->rwsem))
+ return 0;
+ if (!(shrinker->flags & SHRINKER_REGISTERED))
+ goto unlock;
freeable = shrinker->count_objects(shrinker, shrinkctl);
- if (freeable == 0 || freeable == SHRINK_EMPTY)
- return freeable;
+ if (freeable == 0 || freeable == SHRINK_EMPTY) {
+ freed = freeable;
+ goto unlock;
+ }

/*
* copy the current shrinker scan count into a local variable
@@ -937,6 +959,8 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
new_nr = add_nr_deferred(next_deferred, shrinker, shrinkctl);

trace_mm_shrink_slab_end(shrinker, shrinkctl->nid, freed, nr, new_nr, total_scan);
+unlock:
+ up_read(&shrinker->rwsem);
return freed;
}

@@ -968,9 +992,8 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
struct shrinker *shrinker;

shrinker = idr_find(&shrinker_idr, i);
- if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) {
- if (!shrinker)
- clear_bit(i, info->map);
+ if (unlikely(!shrinker)) {
+ clear_bit(i, info->map);
continue;
}

2023-06-05 19:10:29

by Kirill Tkhai

[permalink] [raw]

Subject: [PATCH v2 1/3] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()

From: Qi Zheng <[email protected]>

The debugfs_remove_recursive() will wait for debugfs_file_put()
to return, so there is no need to put it after synchronize_srcu()
to wait for the rcu read-side critical section to exit.

Just move it before synchronize_srcu(), which is also convenient
to put the heavy synchronize_srcu() in the delayed work later.

Signed-off-by: Qi Zheng <[email protected]>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eeca83e28c9b..a773e97e152e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -818,11 +818,11 @@ void unregister_shrinker(struct shrinker *shrinker)
debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
mutex_unlock(&shrinker_mutex);

+ shrinker_debugfs_remove(debugfs_entry, debugfs_id);
+
atomic_inc(&shrinker_srcu_generation);
synchronize_srcu(&shrinker_srcu);

- shrinker_debugfs_remove(debugfs_entry, debugfs_id);
-
kfree(shrinker->nr_deferred);
shrinker->nr_deferred = NULL;
}

2023-06-05 22:34:12

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster

On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote:
> This patch set introduces a new scheme of shrinker unregistration. It allows to split
> the unregistration in two parts: fast and slow. This allows to hide slow part from
> a user, so user-visible unregistration becomes fast.
>
> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed
> by kernel test robot:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> ---
>
> Kirill Tkhai (2):
> mm: Split unregister_shrinker() in fast and slow part
> fs: Use delayed shrinker unregistration

Did you test any filesystem other than ramfs?

Filesystems more complex than ramfs have internal shrinkers, and so
they will still be running the slow synchronize_srcu() - potentially
multiple times! - in every unmount. Both XFS and ext4 have 3
internal shrinker instances per mount, so they will still call
synchronize_srcu() at least 3 times per unmount after this change.

What about any other subsystem that runs a shrinker - do they have
context depedent shrinker instances that get frequently created and
destroyed? They'll need the same treatment.

Seriously, part of changing shrinker infrastructure is doing an
audit of all the shrinker instances to determine how the change will
impact those shrinkers, and if the same structural changes are
needed to those implementations.

I don't see any of this being done - this looks like a "slap a bandaid
over the visible symptom" patch set without any deeper investigation
of the scope of the issue having been gained.

Along with all shrinkers now running under a SRCU critical region
and requiring a machine wide synchronisation point for every
unregister_shrinker() call made, the ability to repeated abort
global shrinker passes via external SRCU expediting, and now an
intricate locking and state dance in do_shrink_slab() vs
unregister_shrinker, I can't say I'm particularly liking any of
this, regardles of the benefits it supposedly provides.

-Dave.
--
Dave Chinner
[email protected]

2023-06-06 00:56:26

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()

On Mon, Jun 05, 2023 at 10:03:02PM +0300, Kirill Tkhai wrote:
> From: Qi Zheng <[email protected]>
>
> The debugfs_remove_recursive() will wait for debugfs_file_put()
> to return, so there is no need to put it after synchronize_srcu()
> to wait for the rcu read-side critical section to exit.
>
> Just move it before synchronize_srcu(), which is also convenient
> to put the heavy synchronize_srcu() in the delayed work later.
>
> Signed-off-by: Qi Zheng <[email protected]>

Acked-by: Roman Gushchin <[email protected]>

2023-06-06 21:20:31

by Kirill Tkhai

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster

On 06.06.2023 01:32, Dave Chinner wrote:
> On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote:
>> This patch set introduces a new scheme of shrinker unregistration. It allows to split
>> the unregistration in two parts: fast and slow. This allows to hide slow part from
>> a user, so user-visible unregistration becomes fast.
>>
>> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed
>> by kernel test robot:
>>
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> ---
>>
>> Kirill Tkhai (2):
>> mm: Split unregister_shrinker() in fast and slow part
>> fs: Use delayed shrinker unregistration
>
> Did you test any filesystem other than ramfs?
>
> Filesystems more complex than ramfs have internal shrinkers, and so
> they will still be running the slow synchronize_srcu() - potentially
> multiple times! - in every unmount. Both XFS and ext4 have 3
> internal shrinker instances per mount, so they will still call
> synchronize_srcu() at least 3 times per unmount after this change.
>
> What about any other subsystem that runs a shrinker - do they have
> context depedent shrinker instances that get frequently created and
> destroyed? They'll need the same treatment.

Of course, all of shrinkers should be fixed. This patch set just aims to describe
the idea more wider, because I'm not sure most people read replys to kernel robot reports.

This is my suggestion of way to go. Probably, Qi is right person to ask whether
we're going to extend this and to maintain f95bdb700bc6 in tree.

There is not much time. Unfortunately, kernel test robot reported this significantly late.

> Seriously, part of changing shrinker infrastructure is doing an
> audit of all the shrinker instances to determine how the change will
> impact those shrinkers, and if the same structural changes are
> needed to those implementations.
>
> I don't see any of this being done - this looks like a "slap a bandaid
> over the visible symptom" patch set without any deeper investigation
> of the scope of the issue having been gained.
>
> Along with all shrinkers now running under a SRCU critical region
> and requiring a machine wide synchronisation point for every
> unregister_shrinker() call made, the ability to repeated abort
> global shrinker passes via external SRCU expediting, and now an
> intricate locking and state dance in do_shrink_slab() vs
> unregister_shrinker, I can't say I'm particularly liking any of
> this, regardles of the benefits it supposedly provides.
>
> -Dave.

2023-06-06 22:22:35

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster

On Wed, Jun 07, 2023 at 12:06:03AM +0300, Kirill Tkhai wrote:
> On 06.06.2023 01:32, Dave Chinner wrote:
> > On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote:
> >> This patch set introduces a new scheme of shrinker unregistration. It allows to split
> >> the unregistration in two parts: fast and slow. This allows to hide slow part from
> >> a user, so user-visible unregistration becomes fast.
> >>
> >> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed
> >> by kernel test robot:
> >>
> >> https://lore.kernel.org/lkml/[email protected]/
> >>
> >> ---
> >>
> >> Kirill Tkhai (2):
> >> mm: Split unregister_shrinker() in fast and slow part
> >> fs: Use delayed shrinker unregistration
> >
> > Did you test any filesystem other than ramfs?
> >
> > Filesystems more complex than ramfs have internal shrinkers, and so
> > they will still be running the slow synchronize_srcu() - potentially
> > multiple times! - in every unmount. Both XFS and ext4 have 3
> > internal shrinker instances per mount, so they will still call
> > synchronize_srcu() at least 3 times per unmount after this change.
> >
> > What about any other subsystem that runs a shrinker - do they have
> > context depedent shrinker instances that get frequently created and
> > destroyed? They'll need the same treatment.
>
> Of course, all of shrinkers should be fixed. This patch set just aims to describe
> the idea more wider, because I'm not sure most people read replys to kernel robot reports.
>
> This is my suggestion of way to go. Probably, Qi is right person to ask whether
> we're going to extend this and to maintain f95bdb700bc6 in tree.
>
> There is not much time. Unfortunately, kernel test robot reported this significantly late.

And that's why it should be reverted rather than trying to rush to
try to fix it.

I'm kind of tired of finding out about mm reclaim regressions only
when I see patches making naive and/or broken changes to subsystem
shrinker implementations without any real clue about what they are
doing. If people/subsystems who maintain shrinker implementations
were cc'd on the changes to the shrinker implementation, this would
have all been resolved before merging occurred....

Lockless shrinker lists need a heap of supporting changes to be done
first so that they aren't reliant on synchronise_srcu() *at all*. If
the code was properly designed in the first place (i.e. dynamic
shrinker structures freed via call_rcu()), we wouldn't be in rushing
to fix weird regressions right now.

Can we please revert this and start again with a properly throught
out and reveiwed design?

-Dave.
--
Dave Chinner
[email protected]

2023-06-07 03:22:00

by Qi Zheng

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster

On 2023/6/7 06:02, Dave Chinner wrote:
> On Wed, Jun 07, 2023 at 12:06:03AM +0300, Kirill Tkhai wrote:
>> On 06.06.2023 01:32, Dave Chinner wrote:
>>> On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote:
>>>> This patch set introduces a new scheme of shrinker unregistration. It allows to split
>>>> the unregistration in two parts: fast and slow. This allows to hide slow part from
>>>> a user, so user-visible unregistration becomes fast.
>>>>
>>>> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed
>>>> by kernel test robot:
>>>>
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> ---
>>>>
>>>> Kirill Tkhai (2):
>>>> mm: Split unregister_shrinker() in fast and slow part
>>>> fs: Use delayed shrinker unregistration
>>>
>>> Did you test any filesystem other than ramfs?
>>>
>>> Filesystems more complex than ramfs have internal shrinkers, and so
>>> they will still be running the slow synchronize_srcu() - potentially
>>> multiple times! - in every unmount. Both XFS and ext4 have 3
>>> internal shrinker instances per mount, so they will still call
>>> synchronize_srcu() at least 3 times per unmount after this change.
>>>
>>> What about any other subsystem that runs a shrinker - do they have
>>> context depedent shrinker instances that get frequently created and
>>> destroyed? They'll need the same treatment.
>>
>> Of course, all of shrinkers should be fixed. This patch set just aims to describe
>> the idea more wider, because I'm not sure most people read replys to kernel robot reports.

Thank you, Kirill.

>>
>> This is my suggestion of way to go. Probably, Qi is right person to ask whether
>> we're going to extend this and to maintain f95bdb700bc6 in tree.
>>
>> There is not much time. Unfortunately, kernel test robot reported this significantly late.
>
> And that's why it should be reverted rather than trying to rush to
> try to fix it.
>
> I'm kind of tired of finding out about mm reclaim regressions only
> when I see patches making naive and/or broken changes to subsystem
> shrinker implementations without any real clue about what they are
> doing. If people/subsystems who maintain shrinker implementations
> were cc'd on the changes to the shrinker implementation, this would
> have all been resolved before merging occurred....
>
> Lockless shrinker lists need a heap of supporting changes to be done
> first so that they aren't reliant on synchronise_srcu() *at all*. If
> the code was properly designed in the first place (i.e. dynamic
> shrinker structures freed via call_rcu()), we wouldn't be in rushing
> to fix weird regressions right now.
>
> Can we please revert this and start again with a properly throught
> out and reveiwed design?

I have no idea on whether to revert this, I follow the final decision of
the community.

From my personal point of view, I think it is worth sacrificing the
speed of unregistration alone compared to the benefits it brings
(lockless shrink, etc).

Of course, it would be better if there is a more perfect solution.
If you have a better idea, it might be better to post the code first for
discussion. Otherwise, I am afraid that if we just revert it, the
problem of shrinker_rwsem will continue for many years.

And hi Dave, I know you're mad that I didn't cc you in the original
patch. Sorry again. How about splitting shrinker-related codes into
the separate files? Then we can add a MAINTAINERS entry to it and add
[email protected] to this entry? So that future people
will not miss to cc fs folks.

Qi.

>
> -Dave.

2023-06-07 05:24:10

by Oliver Sang

[permalink] [raw]

Subject: Re: [PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part

Hello,

kernel test robot noticed "INFO:trying_to_register_non-static_key" on:

commit: 107ed33204f77282d67b90f5c37f34c4b1ec9ffb ("[PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part")
url: https://github.com/intel-lab-lkp/linux/commits/Kirill-Tkhai/mm-vmscan-move-shrinker_debugfs_remove-before-synchronize_srcu/20230606-030419
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git f8dba31b0a826e691949cd4fdfa5c30defaac8c5
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part

in testcase: boot

compiler: clang-15
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)

If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]

[ 0.538549][ T0] INFO: trying to register non-static key.
[ 0.539249][ T0] The code is fine but needs lockdep annotation, or maybe
[ 0.539385][ T0] you didn't initialize this object before use?
[ 0.539385][ T0] turning off the locking correctness validator.
[ 0.539385][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.4.0-rc5-00004-g107ed33204f7 #1
[ 0.539385][ T0] Call Trace:
[ 0.539385][ T0] dump_stack_lvl (??:?)
[ 0.539385][ T0] dump_stack (??:?)
[ 0.539385][ T0] assign_lock_key (lockdep.c:?)
[ 0.539385][ T0] register_lock_class (lockdep.c:?)
[ 0.539385][ T0] __lock_acquire (lockdep.c:?)
[ 0.539385][ T0] ? lock_acquire (??:?)
[ 0.539385][ T0] ? register_shrinker_prepared (??:?)
[ 0.539385][ T0] ? __might_resched (??:?)
[ 0.539385][ T0] lock_acquire (??:?)
[ 0.539385][ T0] ? register_shrinker_prepared (??:?)
[ 0.539385][ T0] ? __might_resched (??:?)
[ 0.539385][ T0] down_write (??:?)
[ 0.539385][ T0] ? register_shrinker_prepared (??:?)
[ 0.539385][ T0] register_shrinker_prepared (??:?)
[ 0.539385][ T0] sget_fc (??:?)
[ 0.539385][ T0] ? kill_litter_super (??:?)
[ 0.539385][ T0] ? shmem_reconfigure (shmem.c:?)
[ 0.539385][ T0] get_tree_nodev (??:?)
[ 0.539385][ T0] shmem_get_tree (shmem.c:?)
[ 0.539385][ T0] vfs_get_tree (??:?)
[ 0.539385][ T0] vfs_kern_mount (??:?)
[ 0.539385][ T0] kern_mount (??:?)
[ 0.539385][ T0] shmem_init (??:?)
[ 0.539385][ T0] mnt_init (??:?)
[ 0.539385][ T0] vfs_caches_init (??:?)
[ 0.539385][ T0] start_kernel (??:?)
[ 0.539385][ T0] i386_start_kernel (??:?)
[ 0.539385][ T0] startup_32_smp (??:?)
[ 0.539391][ T0] ------------[ cut here ]------------
[ 0.540097][ T0] DEBUG_RWSEMS_WARN_ON(sem->magic != sem): count = 0x1, magic = 0x0, owner = 0x81d77c00, curr 0x81d77c00, list not empty
[ 0.540405][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/rwsem.c:1364 up_write (??:?)
[ 0.541389][ T0] Modules linked in:
[ 0.542390][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.4.0-rc5-00004-g107ed33204f7 #1
[ 0.543389][ T0] EIP: up_write (??:?)
[ 0.543920][ T0] Code: ee 8f c6 81 39 c1 74 05 bb 7f 8b c4 81 53 52 ff 74 24 08 57 ff 74 24 14 68 44 a3 cb 81 68 6d 1a ce 81 e8 32 68 fb ff 83 c4 1c <0f> 0b 39 f7 0f 85 a8 fe ff ff e9 a8 fe ff ff 0f 0b e9 db fe ff ff
All code
========
0: ee out %al,(%dx)
1: 8f c6 pop %rsi
3: 81 39 c1 74 05 bb cmpl $0xbb0574c1,(%rcx)
9: 7f 8b jg 0xffffffffffffff96
b: c4 81 53 52 (bad)
f: ff 74 24 08 pushq 0x8(%rsp)
13: 57 push %rdi
14: ff 74 24 14 pushq 0x14(%rsp)
18: 68 44 a3 cb 81 pushq $0xffffffff81cba344
1d: 68 6d 1a ce 81 pushq $0xffffffff81ce1a6d
22: e8 32 68 fb ff callq 0xfffffffffffb6859
27: 83 c4 1c add $0x1c,%esp
2a:* 0f 0b ud2 <-- trapping instruction
2c: 39 f7 cmp %esi,%edi
2e: 0f 85 a8 fe ff ff jne 0xfffffffffffffedc
34: e9 a8 fe ff ff jmpq 0xfffffffffffffee1
39: 0f 0b ud2
3b: e9 db fe ff ff jmpq 0xffffffffffffff1b

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 39 f7 cmp %esi,%edi
4: 0f 85 a8 fe ff ff jne 0xfffffffffffffeb2
a: e9 a8 fe ff ff jmpq 0xfffffffffffffeb7
f: 0f 0b ud2
11: e9 db fe ff ff jmpq 0xfffffffffffffef1
[ 0.544391][ T0] EAX: e6de3575 EBX: 81c48b7f ECX: e6de3575 EDX: 81d67dd0
[ 0.545388][ T0] ESI: 831f4c4c EDI: 00000000 EBP: 81d67ed0 ESP: 81d67ea8
[ 0.546389][ T0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210292
[ 0.547388][ T0] CR0: 80050033 CR2: ffd98000 CR3: 02280000 CR4: 00040690
[ 0.548392][ T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 0.549388][ T0] DR6: fffe0ff0 DR7: 00000400
[ 0.549947][ T0] Call Trace:
[ 0.550391][ T0] ? show_regs (??:?)
[ 0.550910][ T0] ? up_write (??:?)
[ 0.551389][ T0] ? __warn (??:?)
[ 0.551869][ T0] ? up_write (??:?)
[ 0.552389][ T0] ? up_write (??:?)
[ 0.552921][ T0] ? report_bug (??:?)
[ 0.553390][ T0] ? exc_overflow (??:?)
[ 0.554390][ T0] ? handle_bug (traps.c:?)
[ 0.554923][ T0] ? exc_invalid_op (??:?)
[ 0.555390][ T0] ? handle_exception (init_task.c:?)
[ 0.556022][ T0] ? arch_report_meminfo (??:?)
[ 0.556389][ T0] ? exc_overflow (??:?)
[ 0.557389][ T0] ? up_write (??:?)
[ 0.557927][ T0] ? exc_overflow (??:?)
[ 0.558389][ T0] ? up_write (??:?)
[ 0.558904][ T0] register_shrinker_prepared (??:?)
[ 0.559390][ T0] sget_fc (??:?)
[ 0.559887][ T0] ? kill_litter_super (??:?)
[ 0.560390][ T0] ? shmem_reconfigure (shmem.c:?)
[ 0.561390][ T0] get_tree_nodev (??:?)
[ 0.562390][ T0] shmem_get_tree (shmem.c:?)
[ 0.563390][ T0] vfs_get_tree (??:?)
[ 0.563914][ T0] vfs_kern_mount (??:?)
[ 0.564390][ T0] kern_mount (??:?)
[ 0.564908][ T0] shmem_init (??:?)
[ 0.565389][ T0] mnt_init (??:?)
[ 0.565892][ T0] vfs_caches_init (??:?)
[ 0.566390][ T0] start_kernel (??:?)
[ 0.567390][ T0] i386_start_kernel (??:?)
[ 0.568009][ T0] startup_32_smp (??:?)
[ 0.568392][ T0] irq event stamp: 1613
[ 0.568888][ T0] hardirqs last enabled at (1613): _raw_spin_unlock_irqrestore (??:?)
[ 0.569389][ T0] hardirqs last disabled at (1612): _raw_spin_lock_irqsave (??:?)
[ 0.570389][ T0] softirqs last enabled at (1480): do_softirq_own_stack (??:?)
[ 0.571389][ T0] softirqs last disabled at (1473): do_softirq_own_stack (??:?)
[ 0.572389][ T0] ---[ end trace 0000000000000000 ]---
[ 0.573889][ T0] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.574391][ T0] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0

To reproduce:

# build kernel
cd linux
cp config-6.4.0-rc5-00004-g107ed33204f7 .config
make HOSTCC=clang-15 CC=clang-15 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Attachments:

(No filename) (7.57 kB)
config-6.4.0-rc5-00004-g107ed33204f7 (146.33 kB)
job-script (5.06 kB)
dmesg.xz (32.27 kB)
Download all attachments

2023-06-07 07:59:52

by Yujie Liu

[permalink] [raw]

Subject: Re: [PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part

On Wed, Jun 07, 2023 at 12:49:55PM +0800, kernel test robot wrote:
>
> Hello,
>
> kernel test robot noticed "INFO:trying_to_register_non-static_key" on:
>
> commit: 107ed33204f77282d67b90f5c37f34c4b1ec9ffb ("[PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part")
> url: https://github.com/intel-lab-lkp/linux/commits/Kirill-Tkhai/mm-vmscan-move-shrinker_debugfs_remove-before-synchronize_srcu/20230606-030419
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git f8dba31b0a826e691949cd4fdfa5c30defaac8c5
> patch link: https://lore.kernel.org/all/[email protected]/
> patch subject: [PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part
>
> in testcase: boot
>
> compiler: clang-15
> test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-lkp/[email protected]

Sorry the stacktrace in the report was broken due to a toolchain issue
in the bot. Please check the correct stacktrace as below:

[ 0.538549][ T0] INFO: trying to register non-static key.
[ 0.539249][ T0] The code is fine but needs lockdep annotation, or maybe
[ 0.539385][ T0] you didn't initialize this object before use?
[ 0.539385][ T0] turning off the locking correctness validator.
[ 0.539385][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.4.0-rc5-00004-g107ed33204f7 #1
[ 0.539385][ T0] Call Trace:
[ 0.539385][ T0] dump_stack_lvl (lib/dump_stack.c:?)
[ 0.539385][ T0] dump_stack (lib/dump_stack.c:113)
[ 0.539385][ T0] assign_lock_key (kernel/locking/lockdep.c:?)
[ 0.539385][ T0] register_lock_class (kernel/locking/lockdep.c:?)
[ 0.539385][ T0] __lock_acquire (kernel/locking/lockdep.c:4965)
[ 0.539385][ T0] ? lock_acquire (kernel/locking/lockdep.c:5705)
[ 0.539385][ T0] ? register_shrinker_prepared (mm/vmscan.c:760)
[ 0.539385][ T0] ? __might_resched (kernel/sched/core.c:10115)
[ 0.539385][ T0] lock_acquire (kernel/locking/lockdep.c:5705)
[ 0.539385][ T0] ? register_shrinker_prepared (mm/vmscan.c:762)
[ 0.539385][ T0] ? __might_resched (kernel/sched/core.c:10115)
[ 0.539385][ T0] down_write (kernel/locking/rwsem.c:1573)
[ 0.539385][ T0] ? register_shrinker_prepared (mm/vmscan.c:762)
[ 0.539385][ T0] register_shrinker_prepared (mm/vmscan.c:762)
[ 0.539385][ T0] sget_fc (fs/super.c:616)
[ 0.539385][ T0] ? kill_litter_super (fs/super.c:1121)
[ 0.539385][ T0] ? shmem_reconfigure (mm/shmem.c:3791)
[ 0.539385][ T0] get_tree_nodev (fs/super.c:1144)
[ 0.539385][ T0] shmem_get_tree (mm/shmem.c:3879)
[ 0.539385][ T0] vfs_get_tree (fs/super.c:1511)
[ 0.539385][ T0] vfs_kern_mount (fs/namespace.c:1036 fs/namespace.c:1065)
[ 0.539385][ T0] kern_mount (fs/namespace.c:4455)
[ 0.539385][ T0] shmem_init (mm/shmem.c:4106)
[ 0.539385][ T0] mnt_init (fs/namespace.c:4440)
[ 0.539385][ T0] vfs_caches_init (fs/dcache.c:3355)
[ 0.539385][ T0] start_kernel (init/main.c:1071)
[ 0.539385][ T0] i386_start_kernel (arch/x86/kernel/head32.c:56)
[ 0.539385][ T0] startup_32_smp (arch/x86/kernel/head_32.S:319)
[ 0.539391][ T0] ------------[ cut here ]------------
[ 0.540097][ T0] DEBUG_RWSEMS_WARN_ON(sem->magic != sem): count = 0x1, magic = 0x0, owner = 0x81d77c00, curr 0x81d77c00, list not empty
[ 0.540405][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/rwsem.c:1364 up_write (kernel/locking/rwsem.c:1364)
[ 0.541389][ T0] Modules linked in:
[ 0.542390][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.4.0-rc5-00004-g107ed33204f7 #1
[ 0.543389][ T0] EIP: up_write (kernel/locking/rwsem.c:1364)
[ 0.543920][ T0] Code: ee 8f c6 81 39 c1 74 05 bb 7f 8b c4 81 53 52 ff 74 24 08 57 ff 74 24 14 68 44 a3 cb 81 68 6d 1a ce 81 e8 32 68 fb ff 83 c4 1c <0f> 0b 39 f7 0f 85 a8 fe ff ff e9 a8 fe ff ff 0f 0b e9 db fe ff ff
All code
========
0: ee out %al,(%dx)
1: 8f c6 pop %rsi
3: 81 39 c1 74 05 bb cmpl $0xbb0574c1,(%rcx)
9: 7f 8b jg 0xffffffffffffff96
b: c4 81 53 52 (bad)
f: ff 74 24 08 push 0x8(%rsp)
13: 57 push %rdi
14: ff 74 24 14 push 0x14(%rsp)
18: 68 44 a3 cb 81 push $0xffffffff81cba344
1d: 68 6d 1a ce 81 push $0xffffffff81ce1a6d
22: e8 32 68 fb ff call 0xfffffffffffb6859
27: 83 c4 1c add $0x1c,%esp
2a:* 0f 0b ud2 <-- trapping instruction
2c: 39 f7 cmp %esi,%edi
2e: 0f 85 a8 fe ff ff jne 0xfffffffffffffedc
34: e9 a8 fe ff ff jmp 0xfffffffffffffee1
39: 0f 0b ud2
3b: e9 db fe ff ff jmp 0xffffffffffffff1b

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 39 f7 cmp %esi,%edi
4: 0f 85 a8 fe ff ff jne 0xfffffffffffffeb2
a: e9 a8 fe ff ff jmp 0xfffffffffffffeb7
f: 0f 0b ud2
11: e9 db fe ff ff jmp 0xfffffffffffffef1
[ 0.544391][ T0] EAX: e6de3575 EBX: 81c48b7f ECX: e6de3575 EDX: 81d67dd0
[ 0.545388][ T0] ESI: 831f4c4c EDI: 00000000 EBP: 81d67ed0 ESP: 81d67ea8
[ 0.546389][ T0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210292
[ 0.547388][ T0] CR0: 80050033 CR2: ffd98000 CR3: 02280000 CR4: 00040690
[ 0.548392][ T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 0.549388][ T0] DR6: fffe0ff0 DR7: 00000400
[ 0.549947][ T0] Call Trace:
[ 0.550391][ T0] ? show_regs (arch/x86/kernel/dumpstack.c:478)
[ 0.550910][ T0] ? up_write (kernel/locking/rwsem.c:1364)
[ 0.551389][ T0] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[ 0.551869][ T0] ? up_write (kernel/locking/rwsem.c:1364)
[ 0.552389][ T0] ? up_write (kernel/locking/rwsem.c:1364)
[ 0.552921][ T0] ? report_bug (lib/bug.c:199)
[ 0.553390][ T0] ? exc_overflow (arch/x86/kernel/traps.c:337)
[ 0.554390][ T0] ? handle_bug (arch/x86/kernel/traps.c:324)
[ 0.554923][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:345)
[ 0.555390][ T0] ? handle_exception (arch/x86/entry/entry_32.S:1076)
[ 0.556022][ T0] ? arch_report_meminfo (arch/x86/mm/pat/set_memory.c:112)
[ 0.556389][ T0] ? exc_overflow (arch/x86/kernel/traps.c:337)
[ 0.557389][ T0] ? up_write (kernel/locking/rwsem.c:1364)
[ 0.557927][ T0] ? exc_overflow (arch/x86/kernel/traps.c:337)
[ 0.558389][ T0] ? up_write (kernel/locking/rwsem.c:1364)
[ 0.558904][ T0] register_shrinker_prepared (mm/vmscan.c:765)
[ 0.559390][ T0] sget_fc (fs/super.c:616)
[ 0.559887][ T0] ? kill_litter_super (fs/super.c:1121)
[ 0.560390][ T0] ? shmem_reconfigure (mm/shmem.c:3791)
[ 0.561390][ T0] get_tree_nodev (fs/super.c:1144)
[ 0.562390][ T0] shmem_get_tree (mm/shmem.c:3879)
[ 0.563390][ T0] vfs_get_tree (fs/super.c:1511)
[ 0.563914][ T0] vfs_kern_mount (fs/namespace.c:1036 fs/namespace.c:1065)
[ 0.564390][ T0] kern_mount (fs/namespace.c:4455)
[ 0.564908][ T0] shmem_init (mm/shmem.c:4106)
[ 0.565389][ T0] mnt_init (fs/namespace.c:4440)
[ 0.565892][ T0] vfs_caches_init (fs/dcache.c:3355)
[ 0.566390][ T0] start_kernel (init/main.c:1071)
[ 0.567390][ T0] i386_start_kernel (arch/x86/kernel/head32.c:56)
[ 0.568009][ T0] startup_32_smp (arch/x86/kernel/head_32.S:319)
[ 0.568392][ T0] irq event stamp: 1613
[ 0.568888][ T0] hardirqs last enabled at (1613): _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 0.569389][ T0] hardirqs last disabled at (1612): _raw_spin_lock_irqsave (arch/x86/include/asm/preempt.h:80 include/linux/spinlock_api_smp.h:109 kernel/locking/spinlock.c:162)
[ 0.570389][ T0] softirqs last enabled at (1480): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147)
[ 0.571389][ T0] softirqs last disabled at (1473): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147)
[ 0.572389][ T0] ---[ end trace 0000000000000000 ]---

2023-06-08 22:48:46

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster

On Wed, Jun 07, 2023 at 10:51:35AM +0800, Qi Zheng wrote:
> From my personal point of view, I think it is worth sacrificing the
> speed of unregistration alone compared to the benefits it brings
> (lockless shrink, etc).

Nobody is questioning whether this is a worthwhile improvement. The
lockless shrinker instance iteration is definitely a good direction
to move in. The problem is the -process- that has been followed has
lead to a very sub-optimal result.

> Of course, it would be better if there is a more perfect solution.
> If you have a better idea, it might be better to post the code first for
> discussion. Otherwise, I am afraid that if we just revert it, the
> problem of shrinker_rwsem will continue for many years.

No, a revert doesn't mean we don't want the change; a revert means
the way the change was attempted has caused unexpected problems.
We need to go back to the drawing board and work out a better way to
do this.

> And hi Dave, I know you're mad that I didn't cc you in the original
> patch.

No, I'm not mad at you.

If I'm annoyed at anyone, it's the senior mm developers and
maintainers that I'm annoyed at - not informing relevant parties
about modifications to shrinker infrastructure or implementations
has lead to regressions escaping out to user systems multiple times
in the past.

Yet here we are again....

> Sorry again. How about splitting shrinker-related codes into
> the separate files? Then we can add a MAINTAINERS entry to it and add
> [email protected] to this entry? So that future people
> will not miss to cc fs folks.

I don't think that fixes the problem, because the scope if much
wider than fs-devel: look at all the different subsystems
that have a shrinker.

The whole kernel development process has always worked by the rule
that we're changing common infrastructure, all the subsystems using
that infrastructure need to be cc'd on the changes to the
infrastructure they are using. Just cc'ing -fsdevel isn't enough,
we've also got shrinkers in graphics driver infrastructure, *RCU*,
virtio, DM, bcache and various other subsystems.

And I'm betting most of them don't know that significant changes
have been made to how the shrinkers work....

Cheers,

Dave.
--
Dave Chinner
[email protected]