i started running e4defrag out of curiosity on some large files that i'm
archiving long term. its results seem exceedingly optimistic and i have
a hard time agreeing with it. am i pessimistic ?
for example, i have a ~4GB archive:
$ e4defrag -c ./foo.tar.xz
<File> now/best size/ext
39442/2 93 KB
Total/best extents 39442/2
Average size per extent 93 KB
Fragmentation score 34
[0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
This file (./foo.tar.xz) does not need defragmentation.
i have a real hard time seeing this file as barely "a little bit fragmented".
shouldn't the fragmentation score be higher ?
as a measure of "how fragmented is it really", if i copy the file and then
delete the original, there's a noticeable delay before `rm` finishes.
On Apr 28, 2021, at 11:33 PM, Mike Frysinger <[email protected]> wrote:
> i started running e4defrag out of curiosity on some large files that i'm
> archiving long term. its results seem exceedingly optimistic and i have
> a hard time agreeing with it. am i pessimistic ?
> for example, i have a ~4GB archive:
> $ e4defrag -c ./foo.tar.xz
> <File> now/best size/ext
> 39442/2 93 KB
> Total/best extents 39442/2
> Average size per extent 93 KB
> Fragmentation score 34
> [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
> This file (./foo.tar.xz) does not need defragmentation.
> i have a real hard time seeing this file as barely "a little bit fragmented".
> shouldn't the fragmentation score be higher ?
I would tend to agree. A 4GB file with 39k 100KB extents is not great.
On an HDD with 125 IOPS (not counting track buffers and such) this would
take about 300s to read at a whopping 13MB/s. On flash, small writes do
lead to increased wear, but the seeks are free and you may not care.
IMHO, anything below 1MB/extent is sub-optimal in terms of IO performance,
and a sign of filesystem fragmentation (or a very poor IO pattern), since
mballoc should try to do allocation in 8MB chunks for large writes.
In many respects, if the extents are large enough, the "cost" of a seek
hidden by the device bandwidth (e.g. 250 MB/s / 125 seeks/sec = 2MB for
a good HDD today, scale linearly for RAID-5/6), so any extent larger than
this is not limited by seeks. Should 1024 x 4MB extents in a 4GB file be
considered fragmented or not? Definitely 108KB/extent should be.
However, the "ideal = 2" case is bogus, since extents are max size 128MB,
so you would need at least 32 for a perfect 4GB file. In that respect,
e4defrag is at best a "working prototype" but I don't think many people
use it, and has not gotten many improvements since it was first landed.
If you have a better idea for a "fragmentation score" I would be open
to looking at it, doubly so if it comes in the form of a patch.
You could check the actual file layout using "fallocate -v" before/after
running e4defrag to see how the allocation was changed. This would tell
you if it is actually helping or not. I've thought for a while that it
would be useful to add the same "fragmentation score" to filefrag, but
that would be contingent on the score actually making sense.
You can also use "e2freefrag" to check the filesystem as a whole to see
whether the free space is badly fragmented (i.e. most free chunks < 8MB).
In that case, running e4defrag _may_ help you, but it is not "smart" like
the old DOS defrag utilities, since it just rewrites each file separately
instead of having a "plan" for how to defrag the whole filesystem.
> as a measure of "how fragmented is it really", if i copy the file and then
> delete the original, there's a noticeable delay before `rm` finishes.
Yes, that would be totally clear if you ran filefrag on the file first.