LinuxLists.cc - [PATCH 2/2] erofs: sequence each shrink task

2022-07-08 03:16:52

Subject: [PATCH 2/2] erofs: sequence each shrink task

From: duguowei <[email protected]>

Because of 'list_move_tail', if two or more tasks are shrinking, there
will be different results for them.

For example:
After the first round, if shrink_run_no of entry equals run_no of task,
task will break directly at the beginning of next round; if they are
not equal, task will continue to shrink until encounter one entry
which has the same number.

It is difficult to confirm the real results of all tasks, so add a lock
to only allow one task to shrink at the same time.

How to test:
task1:
root#echo 3 > /proc/sys/vm/drop_caches
[743071.839051] Call Trace:
[743071.839052] <TASK>
[743071.839054] do_shrink_slab+0x112/0x300
[743071.839058] shrink_slab+0x211/0x2a0
[743071.839060] drop_slab+0x72/0xe0
[743071.839061] drop_caches_sysctl_handler+0x50/0xb0
[743071.839063] proc_sys_call_handler+0x173/0x250
[743071.839066] proc_sys_write+0x13/0x20
[743071.839067] new_sync_write+0x104/0x180
[743071.839070] ? send_command+0xe0/0x270
[743071.839073] vfs_write+0x247/0x2a0
[743071.839074] ksys_write+0xa7/0xe0
[743071.839075] ? fpregs_assert_state_consistent+0x23/0x50
[743071.839078] __x64_sys_write+0x1a/0x20
[743071.839079] do_syscall_64+0x3a/0x80
[743071.839081] entry_SYSCALL_64_after_hwframe+0x46/0xb0

task2:
root#echo 3 > /proc/sys/vm/drop_caches
[743079.843214] Call Trace:
[743079.843214] <TASK>
[743079.843215] do_shrink_slab+0x112/0x300
[743079.843219] shrink_slab+0x211/0x2a0
[743079.843221] drop_slab+0x72/0xe0
[743079.843222] drop_caches_sysctl_handler+0x50/0xb0
[743079.843224] proc_sys_call_handler+0x173/0x250
[743079.843227] proc_sys_write+0x13/0x20
[743079.843228] new_sync_write+0x104/0x180
[743079.843231] ? send_command+0xe0/0x270
[743079.843233] vfs_write+0x247/0x2a0
[743079.843234] ksys_write+0xa7/0xe0
[743079.843235] ? fpregs_assert_state_consistent+0x23/0x50
[743079.843238] __x64_sys_write+0x1a/0x20
[743079.843239] do_syscall_64+0x3a/0x80
[743079.843241] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Signed-off-by: duguowei <[email protected]>
---
fs/erofs/utils.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index ec9a1d780dc1..9eca13a7e594 100644
--- a/fs/erofs/utils.c
+++ b/fs/erofs/utils.c
@@ -186,6 +186,8 @@ static unsigned int shrinker_run_no;

/* protects the mounted 'erofs_sb_list' */
static DEFINE_SPINLOCK(erofs_sb_list_lock);
+/* sequence each shrink task */
+static DEFINE_SPINLOCK(erofs_sb_shrink_lock);
static LIST_HEAD(erofs_sb_list);

void erofs_shrinker_register(struct super_block *sb)
@@ -226,13 +228,14 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
struct list_head *p;

unsigned long nr = sc->nr_to_scan;
- unsigned int run_no;
unsigned long freed = 0;

+ spin_lock(&erofs_sb_shrink_lock);
spin_lock(&erofs_sb_list_lock);
- do {
- run_no = ++shrinker_run_no;
- } while (run_no == 0);
+ shrinker_run_no++;
+ /* if overflow, restarting from 1 */
+ if (shrinker_run_no == 0)
+ shrinker_run_no = 1;

/* Iterate over all mounted superblocks and try to shrink them */
p = erofs_sb_list.next;
@@ -243,7 +246,7 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
* We move the ones we do to the end of the list, so we stop
* when we see one we have already done.
*/
- if (sbi->shrinker_run_no == run_no)
+ if (sbi->shrinker_run_no == shrinker_run_no)
break;

if (!mutex_trylock(&sbi->umount_mutex)) {
@@ -252,7 +255,7 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
}

spin_unlock(&erofs_sb_list_lock);
- sbi->shrinker_run_no = run_no;
+ sbi->shrinker_run_no = shrinker_run_no;

freed += erofs_shrink_workstation(sbi, nr - freed);

@@ -271,6 +274,7 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
break;
}
spin_unlock(&erofs_sb_list_lock);
+ spin_unlock(&erofs_sb_shrink_lock);
return freed;
}

--
2.36.1

2022-07-08 03:33:08

by Gao Xiang

[permalink] [raw]

Subject: Re: [PATCH 2/2] erofs: sequence each shrink task

Hi Guowei,

On Fri, Jul 08, 2022 at 11:11:55AM +0800, Guowei Du wrote:
> From: duguowei <[email protected]>
>
> Because of 'list_move_tail', if two or more tasks are shrinking, there
> will be different results for them.

Thanks for the patch. Two quick questions:
1) where is the PATCH 1/2;
2) What problem is the current patch trying to resolve...

>
> For example:
> After the first round, if shrink_run_no of entry equals run_no of task,
> task will break directly at the beginning of next round; if they are
> not equal, task will continue to shrink until encounter one entry
> which has the same number.
>
> It is difficult to confirm the real results of all tasks, so add a lock
> to only allow one task to shrink at the same time.
>
> How to test:
> task1:
> root#echo 3 > /proc/sys/vm/drop_caches
> [743071.839051] Call Trace:
> [743071.839052] <TASK>
> [743071.839054] do_shrink_slab+0x112/0x300
> [743071.839058] shrink_slab+0x211/0x2a0
> [743071.839060] drop_slab+0x72/0xe0
> [743071.839061] drop_caches_sysctl_handler+0x50/0xb0
> [743071.839063] proc_sys_call_handler+0x173/0x250
> [743071.839066] proc_sys_write+0x13/0x20
> [743071.839067] new_sync_write+0x104/0x180
> [743071.839070] ? send_command+0xe0/0x270
> [743071.839073] vfs_write+0x247/0x2a0
> [743071.839074] ksys_write+0xa7/0xe0
> [743071.839075] ? fpregs_assert_state_consistent+0x23/0x50
> [743071.839078] __x64_sys_write+0x1a/0x20
> [743071.839079] do_syscall_64+0x3a/0x80
> [743071.839081] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> task2:
> root#echo 3 > /proc/sys/vm/drop_caches
> [743079.843214] Call Trace:
> [743079.843214] <TASK>
> [743079.843215] do_shrink_slab+0x112/0x300
> [743079.843219] shrink_slab+0x211/0x2a0
> [743079.843221] drop_slab+0x72/0xe0
> [743079.843222] drop_caches_sysctl_handler+0x50/0xb0
> [743079.843224] proc_sys_call_handler+0x173/0x250
> [743079.843227] proc_sys_write+0x13/0x20
> [743079.843228] new_sync_write+0x104/0x180
> [743079.843231] ? send_command+0xe0/0x270
> [743079.843233] vfs_write+0x247/0x2a0
> [743079.843234] ksys_write+0xa7/0xe0
> [743079.843235] ? fpregs_assert_state_consistent+0x23/0x50
> [743079.843238] __x64_sys_write+0x1a/0x20
> [743079.843239] do_syscall_64+0x3a/0x80
> [743079.843241] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Signed-off-by: duguowei <[email protected]>
> ---
> fs/erofs/utils.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
> index ec9a1d780dc1..9eca13a7e594 100644
> --- a/fs/erofs/utils.c
> +++ b/fs/erofs/utils.c
> @@ -186,6 +186,8 @@ static unsigned int shrinker_run_no;
>
> /* protects the mounted 'erofs_sb_list' */
> static DEFINE_SPINLOCK(erofs_sb_list_lock);
> +/* sequence each shrink task */
> +static DEFINE_SPINLOCK(erofs_sb_shrink_lock);
> static LIST_HEAD(erofs_sb_list);
>
> void erofs_shrinker_register(struct super_block *sb)
> @@ -226,13 +228,14 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
> struct list_head *p;
>
> unsigned long nr = sc->nr_to_scan;
> - unsigned int run_no;
> unsigned long freed = 0;
>
> + spin_lock(&erofs_sb_shrink_lock);

Btw, we cannot make the whole shrinker under one spin_lock.

Thanks,
Gao Xiang

2022-07-08 05:59:19

by Gao Xiang

[permalink] [raw]

Subject: Re: [PATCH 2/2] erofs: sequence each shrink task

Hi,

On Fri, Jul 08, 2022 at 12:49:07PM +0800, guowei du wrote:
> hi,
> I am sorry，there is only one patch.
>
> If two or more tasks are doing a shrinking job, there are different results
> for each task.
> That is to say, the status of each task is not persistent each time,
> although they have
> the same purpose to release memory.
>
> And, If two tasks are shrinking, the erofs_sb_list will become no order,
> because each task will
> move one sbi to tail, but sbi is random, so results are more complex.

Thanks for the explanation. So it doesn't sound like a real issue but seems
like an improvement? If it's more like this, you could patch this to the
products first and beta for more time and see if it works well.. I'm
more careful about such change to shrinker since it could impact
downstream users...

Yes, I know this behavior, but I'm not sure if it's needed to be treated
as this way, because in principle shrinker can be processed by multiple
tasks since otherwise it could be stuck by some low priority task (I
remember it sometimes happens in Android.)

>
> Because of the use of the local variable 'run_no', it took me a long time
> to understand the
> procedure of each task when they are concurrent.
>
> There is the same issue for other fs, such as
> fs/ubifs/shrink.c、fs/f2fs/shrink.c.
>
> If scan_objects cost a long time ,it will trigger a watchdog, shrinking
> should
> not make work time-consuming. It should be done ASAP.
> So, I add a new spin lock to let tasks shrink fs sequentially, it will just
> make all tasks shrink
> one by one.

Actually such shrinker is used for managed slots (sorry I needs more
work to rename workgroup to such name). But currently one of my ongoing
improvements is to remove pclusters immediately from managed slots if
no compressed buffer is cached, so it's used for inflight I/Os (to merge
decompression requests, including ongoing deduplication requests) and
cached I/O only. So in that way objects will be more fewer than now.

>
>
> Thanks very much.

Thank you.

Thanks,
Gao Xiang