Hello,
After updating to kernel 5.13, my ext4 partition is read for ~20 seconds
upon mounting by ext4lazyinit. It does not write anything, only reads
(inspected with iotop), and it does so only on mount and only for
relatively short amount of time.
My partition is several years old and have been fully initialized long
ago. Mounting with `init_itable=0` did not change anything: ext4lazyinit
does not write anything to begin with.
5.12.15 does not have such behavior. Did I miss a configuration change?
Is that a regression or a new feature?
On Mon, Aug 09, 2021 at 01:13:03AM +0300, ValdikSS wrote:
> Hello,
> After updating to kernel 5.13, my ext4 partition is read for ~20 seconds
> upon mounting by ext4lazyinit. It does not write anything, only reads
> (inspected with iotop), and it does so only on mount and only for relatively
> short amount of time.
>
> My partition is several years old and have been fully initialized long ago.
> Mounting with `init_itable=0` did not change anything: ext4lazyinit does not
> write anything to begin with.
>
> 5.12.15 does not have such behavior. Did I miss a configuration change? Is
> that a regression or a new feature?
It's a new feature which optimizes block allocations for very large
file systems. The work being done by ext4lazyinit is to read the
block allocation bitmaps so we can cache the buddy bitmaps and how
fragmented (or not) various block groups are, which is used to
optimize the block allocator.
Quoting from the commit description for 196e402adf2e:
With this patchset, following experiment was performed:
Created a highly fragmented disk of size 65TB. The disk had no
contiguous 2M regions. Following command was run consecutively for 3
times:
time dd if=/dev/urandom of=file bs=2M count=10
Here are the results with and without cr 0/1 optimizations introduced
in this patch:
|---------+------------------------------+---------------------------|
| | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
|---------+------------------------------+---------------------------|
| 1st run | 5m1.871s | 2m47.642s |
| 2nd run | 2m28.390s | 0m0.611s |
| 3rd run | 2m26.530s | 0m1.255s |
|---------+------------------------------+---------------------------|
The timings are done with a freshly mounted file system; the
prefetched block bitmaps plus the mballoc optimizations more than
doubles the time to allocate a 20 MiB file on a highly fragmented file
system.
Cheers,
- Ted
On 09.08.2021 04:51, Theodore Ts'o wrote:
> It's a new feature which optimizes block allocations for very large
> file systems. The work being done by ext4lazyinit is to read the
> block allocation bitmaps so we can cache the buddy bitmaps and how
> fragmented (or not) various block groups are, which is used to
> optimize the block allocator.
Thanks for the info. To revert old behavior, the filesystem should be
mounted with -o no_prefetch_block_bitmaps
Is it safe to use this option with new optimizations? Should I expect
only less optimal filesystem speed and no other issues?
On Mon, Aug 9, 2021 at 2:51 AM ValdikSS <[email protected]> wrote:
>
> On 09.08.2021 04:51, Theodore Ts'o wrote:
> > It's a new feature which optimizes block allocations for very large
> > file systems. The work being done by ext4lazyinit is to read the
> > block allocation bitmaps so we can cache the buddy bitmaps and how
> > fragmented (or not) various block groups are, which is used to
> > optimize the block allocator.
>
> Thanks for the info. To revert old behavior, the filesystem should be
> mounted with -o no_prefetch_block_bitmaps
>
> Is it safe to use this option with new optimizations? Should I expect
> only less optimal filesystem speed and no other issues?
It is perfectly safe to use "no_prefetch_block_bitmaps" with new
optimizations. In fact file system throughput also will NOT be
affected if you mount the file system with this option. The only
impact of mounting with this mount option would be that the file
system can potentially make sub-optimal decisions for allocation
requests in certain scenarios. For example, let's say the allocator
gets a request to allocate 10 contiguous blocks and only the last
group in the file system has 10 contiguous blocks. If you mount the
file system with "no_prefetch_block_bitmaps", Ext4 will not have
cached the last group's buddy bitmap because of which it might not
know that the last group has 10 contiguous blocks available. At this
point, Ext4 will satisfy the request for 10 blocks by allocating
fragments instead of allocating a contiguous region. This might
increase the fragmentation levels of the file system. However, note
that this is not a regression. If you were not using
"prefetch_block_bitmaps" before 5.13, then this is the allocator
behavior that you would have seen anyway. So, mounting with
"no_prefetch_block_bitmaps" in your setup, would not cause any
regressions whatsoever.
Thanks,
Harshad
On Mon, Aug 09, 2021 at 10:43:08AM +0300, ValdikSS wrote:
> On 09.08.2021 04:51, Theodore Ts'o wrote:
> > It's a new feature which optimizes block allocations for very large
> > file systems. The work being done by ext4lazyinit is to read the
> > block allocation bitmaps so we can cache the buddy bitmaps and how
> > fragmented (or not) various block groups are, which is used to
> > optimize the block allocator.
>
> Thanks for the info. To revert old behavior, the filesystem should be
> mounted with -o no_prefetch_block_bitmaps
>
> Is it safe to use this option with new optimizations? Should I expect only
> less optimal filesystem speed and no other issues?
It's not been tested, but it should be safe in terms that it shouldn't
lead to any file system corruption or data loss. However, it may
result in non-optional block placement that might cause more file or
free-space fragmentation that might otherwise be the case. (This was
true even before the latest optimizations, but it's more the case with
the new optimizations.)
Can you say something about why you want to disable to block
allocation prefetch? How is it causing problems for you?
Cheers,
- Ted
On 09.08.2021 21:26, Theodore Ts'o wrote:
>
> It's not been tested, but it should be safe in terms that it shouldn't
> lead to any file system corruption or data loss. However, it may
> result in non-optional block placement that might cause more file or
> free-space fragmentation that might otherwise be the case. (This was
> true even before the latest optimizations, but it's more the case with
> the new optimizations.)
>
> Can you say something about why you want to disable to block
> allocation prefetch? How is it causing problems for you?
My old HDDs are now screeching their heads for 20 seconds after mount.
That's secondary disks (internal and external drives) which are not
fragmented, mostly idle and have plenty of free space. It's a bit
annoying to hear the sounds and see strange load right after mounting,
so I'd prefer old behavior.
Just aesthetics, not a technical issue per se.