2023-09-12 09:41:32

by Wang Jianchao

[permalink] [raw]
Subject: [PATCH V2] xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail


In our production environment, we find that mounting a 500M /boot
which is umount cleanly needs ~6s. One cause is that ffs() is
used by xlog_write_log_records() to decide the buffer size. It
can cause a lot of small IO easily when xlog_clear_stale_blocks()
needs to wrap around the end of log area and log head block is
not power of two. Things are similar in xlog_find_verify_cycle().

The code is able to handed bigger buffer very well, we can use
roundup_pow_of_two() to replace ffs() directly to avoid small
and sychronous IOs.

Changes in V1:
- Also replace the ffs in xlog_find_verify_cycle()

Signed-off-by: Wang Jianchao <[email protected]>
---

fs/xfs/xfs_log_recover.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 82c81d20459d..13b94d2e605b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -329,7 +329,7 @@ xlog_find_verify_cycle(
* try a smaller size. We need to be able to read at least
* a log sector, or we're out of luck.
*/
- bufblks = 1 << ffs(nbblks);
+ bufblks = roundup_pow_of_two(nbblks);
while (bufblks > log->l_logBBsize)
bufblks >>= 1;
while (!(buffer = xlog_alloc_buffer(log, bufblks))) {
@@ -1528,7 +1528,7 @@ xlog_write_log_records(
* a smaller size. We need to be able to write at least a
* log sector, or we're out of luck.
*/
- bufblks = 1 << ffs(blocks);
+ bufblks = roundup_pow_of_two(blocks);
while (bufblks > log->l_logBBsize)
bufblks >>= 1;
while (!(buffer = xlog_alloc_buffer(log, bufblks))) {
--
2.34.1


2023-09-13 03:35:17

by Wang Jianchao

[permalink] [raw]
Subject: Re: [PATCH V2] xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail



On 2023/9/13 06:44, Dave Chinner wrote:
> On Tue, Sep 12, 2023 at 03:20:56PM +0800, Wang Jianchao wrote:
>>
>> In our production environment, we find that mounting a 500M /boot
>> which is umount cleanly needs ~6s. One cause is that ffs() is
>> used by xlog_write_log_records() to decide the buffer size. It
>> can cause a lot of small IO easily when xlog_clear_stale_blocks()
>> needs to wrap around the end of log area and log head block is
>> not power of two. Things are similar in xlog_find_verify_cycle().
>>
>> The code is able to handed bigger buffer very well, we can use
>> roundup_pow_of_two() to replace ffs() directly to avoid small
>> and sychronous IOs.
>>
>> Changes in V1:
>> - Also replace the ffs in xlog_find_verify_cycle()
>
> Change logs go either below the --- line or in the cover letter,
> not the commit itself.

OK

>
> Other than that, the change looks ok. The use of ffs() was added in
> 2002 simply to make buffers a power-of-2 size. I don't think it had
> anything to do with trying to maximise the actual buffer size at
> all, otherwise it would have made to use fls() like
> roundup_pow_of_two() does...
>
> Reviewed-by: Dave Chinner <[email protected]>
>

Thanks
Jianchao

2023-09-13 06:30:48

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH V2] xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail

On Tue, Sep 12, 2023 at 03:20:56PM +0800, Wang Jianchao wrote:
>
> In our production environment, we find that mounting a 500M /boot
> which is umount cleanly needs ~6s. One cause is that ffs() is
> used by xlog_write_log_records() to decide the buffer size. It
> can cause a lot of small IO easily when xlog_clear_stale_blocks()
> needs to wrap around the end of log area and log head block is
> not power of two. Things are similar in xlog_find_verify_cycle().
>
> The code is able to handed bigger buffer very well, we can use
> roundup_pow_of_two() to replace ffs() directly to avoid small
> and sychronous IOs.
>
> Changes in V1:
> - Also replace the ffs in xlog_find_verify_cycle()

Change logs go either below the --- line or in the cover letter,
not the commit itself.

Other than that, the change looks ok. The use of ffs() was added in
2002 simply to make buffers a power-of-2 size. I don't think it had
anything to do with trying to maximise the actual buffer size at
all, otherwise it would have made to use fls() like
roundup_pow_of_two() does...

Reviewed-by: Dave Chinner <[email protected]>

--
Dave Chinner
[email protected]