2016-12-06 13:21:54

by Renaud Mariana

[permalink] [raw]
Subject: HUGE slowdown when doing dpkg with ext4 over nbd

Hello,

We have noticed a HUGE slowdown when doing dpkg with ext4 over nbd (qemu-nbd as a server)

dpkg times with default mkfs & mount options is :

- ext4: 10 mn
- xfs : 30s.

Disabling extents :
mkfs.ext4 -O ^extend
fixed the pb that is ext4 is as fast as xfs.

Any ideas ?
Renaud


Details :

xnbd-client --blocksize 4096 qemu-nbd-ip 10000 /dev/nbd0
mkfs.ext4 /dev/nbd0
mount /dev/nbd0 /newroot

wget https://download.elastic.co/kibana/kibana/kibana-4.6.1-amd64.deb
time dpkg -i kibana-4.6.1-amd64.deb

kibana installs many small files (23209 entries) under /opt/kibana


2016-12-06 18:46:37

by Andreas Dilger

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On Dec 6, 2016, at 6:13 AM, Renaud Mariana <[email protected]> wrote:
>
> Hello,
>
> We have noticed a HUGE slowdown when doing dpkg with ext4 over nbd (qemu-nbd as a server)
>
> dpkg times with default mkfs & mount options is :
>
> - ext4: 10 mn
> - xfs : 30s.
>
> Disabling extents :
> mkfs.ext4 -O ^extent
> fixed the pb that is ext4 is as fast as xfs.
>
> Any ideas ?

Is dpkg using fallocate() (uninitialized extents)? Could you strace the dpkg
process with -fff to catch forked processes and search for fallocate() calls.

fallocate() is the only thing I can think of that would make extents slower
than block-mapped files.

> Details :
>
> xnbd-client --blocksize 4096 qemu-nbd-ip 10000 /dev/nbd0
> mkfs.ext4 /dev/nbd0
> mount /dev/nbd0 /newroot
>
> wget https://download.elastic.co/kibana/kibana/kibana-4.6.1-amd64.deb
> time dpkg -i kibana-4.6.1-amd64.deb

What kernel version are you currently running?

Is this problem new with this kernel (i.e. it worked fine with older kernels?

> kibana installs many small files (23209 entries) under /opt/kibana

How small are the files?

Cheers, Andreas




Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-12-07 09:53:22

by Renaud Mariana

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd


Here are my answers, hope it will help solve this issue, thanks.

Recap:
dpkg kibana on ext4 over a nbd device takes 10 minutes
with xfs it's only 30s.
with ext4 no extends only 30s.


kernels :
4.5.7 has this issue as older kernel like 4.4.34
The issue is also when nbd client & server run on same host


How small are the files?
here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
We can see 5000 files around 512 Bytes.

dpkg using fallocate() ?
Yes, there are 16044 calls by the same process
what are these uninitialized extents ?


Cheers, Renaud

2016-12-07 16:24:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

We had a pretty similar issue lately with SCSI, and I suspect it's
something similar. There are two interrelated issues:

- dpkg uses fallocate where it absolutely shouldn't - it creates
new files with typically a single write call, so fallocate doesn't
help anything, but actively hurts.
- For small enough regions ext4 does not create unwritten extents
but zeroes data on disk, and whenever zeroing out the data is
expensive this really shows up. The SCSI case was a device
that has a horrible slow WRITE_SAME implementation, but for nbd
we'll just write the zero page repeatedly.

Patching out the stupid fallocate in dpkg will speed ext4 up a lot
(especially for small files) and will probably speed XFS up a tiny
bit as well. But all my mails to the dpkg folks were simply ignored
unfortunately.

2016-12-07 17:59:06

by Andreas Dilger

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
>
> Here are my answers, hope it will help solve this issue, thanks.
>
> Recap:
> dpkg kibana on ext4 over a nbd device takes 10 minutes
> with xfs it's only 30s.
> with ext4 no extends only 30s.
>
>
> kernels :
> 4.5.7 has this issue as older kernel like 4.4.34
> The issue is also when nbd client & server run on same host
>
>
> How small are the files?
> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
> We can see 5000 files around 512 Bytes.

Definitely there is no value to use fallocate for 512-byte files, or any
of the files that can be written in a single write() syscall. I'd expect
any reasonable tool to be using a write buffer of at least 2-4MB these
days to get good performance, so writes below the buffer size shouldn't
use fallocate() at all.

> dpkg using fallocate() ?
> Yes, there are 16044 calls by the same process
> what are these uninitialized extents ?

Uninitialized extents are preallocated ranges of a file on disk that will
read back as zero, but are not necessarily zero-filled at allocation time.
For large files that are written randomly (or written slowly and may have
contention from other writers) fallocate() + uninitialized extents will
preallocate the space for the file so that it is (largely) contiguous on
disk and overwrites will not result in random block allocation.

Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-12-07 18:12:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
>
> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
>>
>> Here are my answers, hope it will help solve this issue, thanks.
>>
>> Recap:
>> dpkg kibana on ext4 over a nbd device takes 10 minutes
>> with xfs it's only 30s.
>> with ext4 no extends only 30s.
>>
>>
>> kernels :
>> 4.5.7 has this issue as older kernel like 4.4.34
>> The issue is also when nbd client & server run on same host
>>
>>
>> How small are the files?
>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
>> We can see 5000 files around 512 Bytes.
>
> Definitely there is no value to use fallocate for 512-byte files, or any
> of the files that can be written in a single write() syscall. I'd expect
> any reasonable tool to be using a write buffer of at least 2-4MB these
> days to get good performance, so writes below the buffer size shouldn't
> use fallocate() at all.
>
>> dpkg using fallocate() ?
>> Yes, there are 16044 calls by the same process
>> what are these uninitialized extents ?
>
> Uninitialized extents are preallocated ranges of a file on disk that will
> read back as zero, but are not necessarily zero-filled at allocation time.
> For large files that are written randomly (or written slowly and may have
> contention from other writers) fallocate() + uninitialized extents will
> preallocate the space for the file so that it is (largely) contiguous on
> disk and overwrites will not result in random block allocation.

It would probably be worthwhile to file a ticket in the Debian BTS:
https://bugs.debian.org/cgi-bin/pkgreport.cgi?dist=unstable;package=dpkg

with your information, so that there is something to submit a patch against.

Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-12-07 18:16:50

by Andreas Dilger

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

Add debian-dpkg mailing list to CC.

On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
>
> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
>>
>> Here are my answers, hope it will help solve this issue, thanks.
>>
>> Recap:
>> dpkg kibana on ext4 over a nbd device takes 10 minutes
>> with xfs it's only 30s.
>> with ext4 no extends only 30s.
>>
>>
>> kernels :
>> 4.5.7 has this issue as older kernel like 4.4.34
>> The issue is also when nbd client & server run on same host
>>
>>
>> How small are the files?
>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
>> We can see 5000 files around 512 Bytes.
>
> Definitely there is no value to use fallocate for 512-byte files, or any
> of the files that can be written in a single write() syscall. I'd expect
> any reasonable tool to be using a write buffer of at least 2-4MB these
> days to get good performance, so writes below the buffer size shouldn't
> use fallocate() at all.
>
>> dpkg using fallocate() ?
>> Yes, there are 16044 calls by the same process
>> what are these uninitialized extents ?
>
> Uninitialized extents are preallocated ranges of a file on disk that will
> read back as zero, but are not necessarily zero-filled at allocation time.
> For large files that are written randomly (or written slowly and may have
> contention from other writers) fallocate() + uninitialized extents will
> preallocate the space for the file so that it is (largely) contiguous on
> disk and overwrites will not result in random block allocation.


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-12-07 18:34:25

by Sven Joachim

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On 2016-12-07 11:16 -0700, Andreas Dilger wrote:

> Add debian-dpkg mailing list to CC.
>
> On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
>>
>> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
>>>
>>> Here are my answers, hope it will help solve this issue, thanks.
>>>
>>> Recap:
>>> dpkg kibana on ext4 over a nbd device takes 10 minutes
>>> with xfs it's only 30s.
>>> with ext4 no extends only 30s.
>>>
>>>
>>> kernels :
>>> 4.5.7 has this issue as older kernel like 4.4.34
>>> The issue is also when nbd client & server run on same host
>>>
>>>
>>> How small are the files?
>>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
>>> We can see 5000 files around 512 Bytes.
>>
>> Definitely there is no value to use fallocate for 512-byte files, or any
>> of the files that can be written in a single write() syscall. I'd expect
>> any reasonable tool to be using a write buffer of at least 2-4MB these
>> days to get good performance, so writes below the buffer size shouldn't
>> use fallocate() at all.

It should be noted that the latest dpkg (1.18.15) only uses fallocate
for files which are at least 16 KiB in size[1], so it would be nice if
Renaud could recheck with that version, or cherry-pick the patch into
whatever version he uses.

>>> dpkg using fallocate() ?
>>> Yes, there are 16044 calls by the same process
>>> what are these uninitialized extents ?
>>
>> Uninitialized extents are preallocated ranges of a file on disk that will
>> read back as zero, but are not necessarily zero-filled at allocation time.
>> For large files that are written randomly (or written slowly and may have
>> contention from other writers) fallocate() + uninitialized extents will
>> preallocate the space for the file so that it is (largely) contiguous on
>> disk and overwrites will not result in random block allocation.

Cheers,
Sven


1. https://anonscm.debian.org/cgit/dpkg/dpkg.git/commit/?id=a971ad91437af8880cad4703695dcf12ee45959b

2016-12-07 20:14:22

by Andreas Dilger

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On Dec 7, 2016, at 11:34 AM, Sven Joachim <[email protected]> wrote:
>
> On 2016-12-07 11:16 -0700, Andreas Dilger wrote:
>
>> Add debian-dpkg mailing list to CC.
>>
>> On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
>>>
>>> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
>>>>
>>>> Here are my answers, hope it will help solve this issue, thanks.
>>>>
>>>> Recap:
>>>> dpkg kibana on ext4 over a nbd device takes 10 minutes
>>>> with xfs it's only 30s.
>>>> with ext4 no extends only 30s.
>>>>
>>>>
>>>> kernels :
>>>> 4.5.7 has this issue as older kernel like 4.4.34
>>>> The issue is also when nbd client & server run on same host
>>>>
>>>>
>>>> How small are the files?
>>>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
>>>> We can see 5000 files around 512 Bytes.
>>>
>>> Definitely there is no value to use fallocate for 512-byte files, or any
>>> of the files that can be written in a single write() syscall. I'd expect
>>> any reasonable tool to be using a write buffer of at least 2-4MB these
>>> days to get good performance, so writes below the buffer size shouldn't
>>> use fallocate() at all.
>
> It should be noted that the latest dpkg (1.18.15) only uses fallocate
> for files which are at least 16 KiB in size[1], so it would be nice if
> Renaud could recheck with that version, or cherry-pick the patch into
> whatever version he uses.

Any particular reason you chose 16KB for the fallocate() limit? If the
write() is being submitted in a single call and isn't outrageously large
(i.e. over tens of MB), then there isn't any real benefit from fallocate()
since ext4, XFS, btrfs will already do delayed block allocation based on
the file size size so it will always get a contiguous allocation on disk
if there is free space available. I'm assuming that you aren't doing
sync writes when extracting the files from the .deb?

My recommendation would be to avoid fallocate() until the file size is
larger than 1MB or more (ext4 will automatically preallocate up to 8MB
actually, but not sure about other filesystems). I was trying to look
through the dpkg code to see what buffer sizes are used for writes, but
that was not very easy to find.

> 1. https://anonscm.debian.org/cgit/dpkg/dpkg.git/commit/?id=a971ad91437af8880cad4703695dcf12ee45959b


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-12-08 13:14:07

by Renaud Mariana

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

> Add debian-dpkg mailing list to CC.
>
> On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
>>
>> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
>>>
>>> Here are my answers, hope it will help solve this issue, thanks.
>>>
>>> Recap:
>>> dpkg kibana on ext4 over a nbd device takes 10 minutes
>>> with xfs it's only 30s.
>>> with ext4 no extends only 30s.
>>>
>>>
>>> kernels :
>>> 4.5.7 has this issue as older kernel like 4.4.34
>>> The issue is also when nbd client & server run on same host
>>>
>>>
>>> How small are the files?
>>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
>>> We can see 5000 files around 512 Bytes.
>>
>> Definitely there is no value to use fallocate for 512-byte files, or any
>> of the files that can be written in a single write() syscall. I'd expect
>> any reasonable tool to be using a write buffer of at least 2-4MB these
>> days to get good performance, so writes below the buffer size shouldn't
>> use fallocate() at all.

It should be noted that the latest dpkg (1.18.15) only uses fallocate
for files which are at least 16 KiB in size[1], so it would be nice if
Renaud could recheck with that version, or cherry-pick the patch into
whatever version he uses.

>>> dpkg using fallocate() ?
>>> Yes, there are 16044 calls by the same process
>>> what are these uninitialized extents ?
>>
>> Uninitialized extents are preallocated ranges of a file on disk that will
>> read back as zero, but are not necessarily zero-filled at allocation time.
>> For large files that are written randomly (or written slowly and may have
>> contention from other writers) fallocate() + uninitialized extents will
>> preallocate the space for the file so that it is (largely) contiguous on
>> disk and overwrites will not result in random block allocation.

Cheers,
Sven


1. https://anonscm.debian.org/cgit/dpkg/dpkg.git/commit/?id=a971ad91437af8880cad4703695dcf12ee45959b

Thanks Sven for this link

This slowdown is related to an old issue :
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=824636
but amplified by the nbd layer (tcp massive zero writes)

I quote the fix:
Ftrace showed that the delay is caused by sb_issue_zeroout() in
ext4_ext_zeroout() called by ext4_ext_convert_to_initialized(). This
call chain is initially triggered by fallocate().
this can be disabled using the max_zeroout parameter.
echo 0 > /sys/fs/ext4/sdb2/extent_max_zeroout_kb

in my case echo 3 > /sys/fs/ext4/nbd0/extent_max_zeroout_kb
seems sufficient for dpkg and may be other intensive fallocate applications ?
dpkg time is then reduced to an acceptable time of 20s.

Is there any recommendations / warnings with this value extent_max_zeroout_kb = 3 ?

Cheers Renaud


2016-12-09 01:25:42

by Dave Chinner

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On Wed, Dec 07, 2016 at 07:34:17PM +0100, Sven Joachim wrote:
> On 2016-12-07 11:16 -0700, Andreas Dilger wrote:
>
> > Add debian-dpkg mailing list to CC.
> >
> > On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
> >>
> >> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
> >>>
> >>> Here are my answers, hope it will help solve this issue, thanks.
> >>>
> >>> Recap:
> >>> dpkg kibana on ext4 over a nbd device takes 10 minutes
> >>> with xfs it's only 30s.
> >>> with ext4 no extends only 30s.
> >>>
> >>>
> >>> kernels :
> >>> 4.5.7 has this issue as older kernel like 4.4.34
> >>> The issue is also when nbd client & server run on same host
> >>>
> >>>
> >>> How small are the files?
> >>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
> >>> We can see 5000 files around 512 Bytes.
> >>
> >> Definitely there is no value to use fallocate for 512-byte files, or any
> >> of the files that can be written in a single write() syscall. I'd expect
> >> any reasonable tool to be using a write buffer of at least 2-4MB these
> >> days to get good performance, so writes below the buffer size shouldn't
> >> use fallocate() at all.
>
> It should be noted that the latest dpkg (1.18.15) only uses fallocate
> for files which are at least 16 KiB in size[1], so it would be nice if
> Renaud could recheck with that version, or cherry-pick the patch into
> whatever version he uses.

The fallocate() call should be removed completely. Applications
should not be attempting to control file allocation like this as it
defeats all the optimisations that filesystems use to optimise IO
patterns and minimise filesystem fragmentation (e.g. delayed
allocation).

There is /rarely/ a need for applications to use fallocate() to
manage fragmentation - especailly as excessive use of fallocate()
actively harms performance and accelerates filesystem aging.

Unless an application has a specific, repeatable performance problem
due to file fragmentation, it should not be using fallocate() to
allocate file space.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-12-09 20:28:15

by Andreas Dilger

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On Dec 8, 2016, at 6:25 PM, Dave Chinner <[email protected]> wrote:
>
> On Wed, Dec 07, 2016 at 07:34:17PM +0100, Sven Joachim wrote:
>> On 2016-12-07 11:16 -0700, Andreas Dilger wrote:
>>
>>> Add debian-dpkg mailing list to CC.
>>>
>>> On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
>>>>
>>>> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
>>>>>
>>>>> Here are my answers, hope it will help solve this issue, thanks.
>>>>>
>>>>> Recap:
>>>>> dpkg kibana on ext4 over a nbd device takes 10 minutes
>>>>> with xfs it's only 30s.
>>>>> with ext4 no extends only 30s.
>>>>>
>>>>>
>>>>> kernels :
>>>>> 4.5.7 has this issue as older kernel like 4.4.34
>>>>> The issue is also when nbd client & server run on same host
>>>>>
>>>>>
>>>>> How small are the files?
>>>>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
>>>>> We can see 5000 files around 512 Bytes.
>>>>
>>>> Definitely there is no value to use fallocate for 512-byte files, or any
>>>> of the files that can be written in a single write() syscall. I'd expect
>>>> any reasonable tool to be using a write buffer of at least 2-4MB these
>>>> days to get good performance, so writes below the buffer size shouldn't
>>>> use fallocate() at all.
>>
>> It should be noted that the latest dpkg (1.18.15) only uses fallocate
>> for files which are at least 16 KiB in size[1], so it would be nice if
>> Renaud could recheck with that version, or cherry-pick the patch into
>> whatever version he uses.
>
> The fallocate() call should be removed completely. Applications
> should not be attempting to control file allocation like this as it
> defeats all the optimisations that filesystems use to optimise IO
> patterns and minimise filesystem fragmentation (e.g. delayed
> allocation).
>
> There is /rarely/ a need for applications to use fallocate() to
> manage fragmentation - especailly as excessive use of fallocate()
> actively harms performance and accelerates filesystem aging.
>
> Unless an application has a specific, repeatable performance problem
> due to file fragmentation, it should not be using fallocate() to
> allocate file space.

I'm not sure I'd go so far as to say that fallocate() should be removed
completely. Isn't that the best (only) way for an application to tell
the filesystem that it is about to write a file of X size and try to
find a suitable amount of free space for it? Otherwise, if the file
is large and/or written slowly and/or the system has memory pressure
the filesystem (even with delalloc) can't make a good decision about
allocation. However, fallocate() won't really help if the file size
is small (e.g. a few MB) since that can easily fit into RAM and will
be written to disk in a single chunk.

Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-12-09 21:32:50

by Dave Chinner

[permalink] [raw]
Subject: Re: HUGE slowdown when doing dpkg with ext4 over nbd

On Fri, Dec 09, 2016 at 01:28:05PM -0700, Andreas Dilger wrote:
> On Dec 8, 2016, at 6:25 PM, Dave Chinner <[email protected]> wrote:
> >
> > On Wed, Dec 07, 2016 at 07:34:17PM +0100, Sven Joachim wrote:
> >> On 2016-12-07 11:16 -0700, Andreas Dilger wrote:
> >>
> >>> Add debian-dpkg mailing list to CC.
> >>>
> >>> On Dec 7, 2016, at 10:58 AM, Andreas Dilger <[email protected]> wrote:
> >>>>
> >>>> On Dec 7, 2016, at 2:52 AM, Renaud Mariana <[email protected]> wrote:
> >>>>>
> >>>>> Here are my answers, hope it will help solve this issue, thanks.
> >>>>>
> >>>>> Recap:
> >>>>> dpkg kibana on ext4 over a nbd device takes 10 minutes
> >>>>> with xfs it's only 30s.
> >>>>> with ext4 no extends only 30s.
> >>>>>
> >>>>>
> >>>>> kernels :
> >>>>> 4.5.7 has this issue as older kernel like 4.4.34
> >>>>> The issue is also when nbd client & server run on same host
> >>>>>
> >>>>>
> >>>>> How small are the files?
> >>>>> here is the histogram of file sizes : http://pasteboard.co/6HC3nKyk2.png
> >>>>> We can see 5000 files around 512 Bytes.
> >>>>
> >>>> Definitely there is no value to use fallocate for 512-byte files, or any
> >>>> of the files that can be written in a single write() syscall. I'd expect
> >>>> any reasonable tool to be using a write buffer of at least 2-4MB these
> >>>> days to get good performance, so writes below the buffer size shouldn't
> >>>> use fallocate() at all.
> >>
> >> It should be noted that the latest dpkg (1.18.15) only uses fallocate
> >> for files which are at least 16 KiB in size[1], so it would be nice if
> >> Renaud could recheck with that version, or cherry-pick the patch into
> >> whatever version he uses.
> >
> > The fallocate() call should be removed completely. Applications
> > should not be attempting to control file allocation like this as it
> > defeats all the optimisations that filesystems use to optimise IO
> > patterns and minimise filesystem fragmentation (e.g. delayed
> > allocation).
> >
> > There is /rarely/ a need for applications to use fallocate() to
> > manage fragmentation - especailly as excessive use of fallocate()
> > actively harms performance and accelerates filesystem aging.
> >
> > Unless an application has a specific, repeatable performance problem
> > due to file fragmentation, it should not be using fallocate() to
> > allocate file space.
>
> I'm not sure I'd go so far as to say that fallocate() should be removed
> completely. Isn't that the best (only) way for an application to tell
> the filesystem that it is about to write a file of X size

That's most definitely not what preallocation is for. Filesystems
optimise the "growing file via sequential writes at EOF" case just
fine - using fallocate for this sort of thing is simply defeats all
the writeback optimisations and improvements we've developed over
the past 20 years for this /very common/ workload...

> and try to
> find a suitable amount of free space for it?

fallocate() does give a guarantee than a subsequent write won't
ENOSPC, but "suitable" is very dependent on context. This contenxt
is something
applications don't have - they have no idea what allocation
optimisations are required to provide fast, efficient IO, and have
no idea that different filesystems will require /different
optimisations/.

e.g. btrfs will probably also suffer horribly under fallocate usage
like what dpkg is doing, and I can tell you for certain it will make
a mess of XFS filesystems, too....

> Otherwise, if the file
> is large and/or written slowly and/or the system has memory pressure
> the filesystem (even with delalloc) can't make a good decision about
> allocation.

None of which are the case for dpkg. Nor is it the case for /most
applications/. And fallocate actually makes memory pressure
problems worse, because it defeats writeback optimisations to
maximise dirty page cleaning rates...

Preallocation is *not a general purpose tool*. It's for applications
that have performance problems caused by known, repeatable
fragmentation or file layout issue.

> However, fallocate() won't really help if the file size
> is small (e.g. a few MB) since that can easily fit into RAM and will
> be written to disk in a single chunk.

In my experience, the list of "where fallocate is harmful" is quite
a bit larger than the list of "where fallocate is beneficial". This
is just one example of where it's harmful.

-Dave.
--
Dave Chinner
[email protected]