2016-06-15 17:42:17

by Omar Sandoval

[permalink] [raw]
Subject: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

From: Omar Sandoval <[email protected]>

Someone at Facebook reported that their coredumps were much faster when
using a pipe helper than when dumping directly to a file, which doesn't
make much sense. It turns out that this difference is because in
do_coredump(), we truncate the core file and thus trigger the ext4
auto_da_alloc heuristic. We can't use O_TRUNC because we might bail out
of do_coredump() in certain conditions, so instead, avoid truncating
when the file is already empty. In cases where we're actually
overwriting a core file, this won't help, but the common case will be
much better.

Signed-off-by: Omar Sandoval <[email protected]>
---
Hi, Al and Ted,

This is probably the wrong solution to the problem I described in the
commit message. Do you guys have any better ideas? Something like
0eab928221ba ("ext4: Don't treat a truncation of a zero-length file as
replace-via-truncate") would also work, but that apparently wasn't
right, as it was reverted in 5534fb5bb35a ("ext4: Fix the alloc on close
after a truncate hueristic").

Thanks.

fs/coredump.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 281b768000e6..9da7357773f0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
goto close_fail;
if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
goto close_fail;
- if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
- goto close_fail;
+ if (i_size_read(file_inode(cprm.file)) != 0) {
+ if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
+ goto close_fail;
+ }
}

/* get us an unshared descriptor table; almost always a no-op */
--
2.8.3



2016-06-29 18:34:44

by Omar Sandoval

[permalink] [raw]
Subject: Re: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

On Wed, Jun 15, 2016 at 10:42:05AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <[email protected]>
>
> Someone at Facebook reported that their coredumps were much faster when
> using a pipe helper than when dumping directly to a file, which doesn't
> make much sense. It turns out that this difference is because in
> do_coredump(), we truncate the core file and thus trigger the ext4
> auto_da_alloc heuristic. We can't use O_TRUNC because we might bail out
> of do_coredump() in certain conditions, so instead, avoid truncating
> when the file is already empty. In cases where we're actually
> overwriting a core file, this won't help, but the common case will be
> much better.
>
> Signed-off-by: Omar Sandoval <[email protected]>
> ---
> Hi, Al and Ted,
>
> This is probably the wrong solution to the problem I described in the
> commit message. Do you guys have any better ideas? Something like
> 0eab928221ba ("ext4: Don't treat a truncation of a zero-length file as
> replace-via-truncate") would also work, but that apparently wasn't
> right, as it was reverted in 5534fb5bb35a ("ext4: Fix the alloc on close
> after a truncate hueristic").
>
> Thanks.

Ping, any thoughts on this?

> fs/coredump.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 281b768000e6..9da7357773f0 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
> goto close_fail;
> if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
> goto close_fail;
> - if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> - goto close_fail;
> + if (i_size_read(file_inode(cprm.file)) != 0) {
> + if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> + goto close_fail;
> + }
> }
>
> /* get us an unshared descriptor table; almost always a no-op */
> --
> 2.8.3
>

--
Omar

2016-07-04 02:24:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

On Wed, Jun 29, 2016 at 11:34:44AM -0700, Omar Sandoval wrote:
> > Someone at Facebook reported that their coredumps were much faster when
> > using a pipe helper than when dumping directly to a file, which doesn't
> > make much sense. It turns out that this difference is because in
> > do_coredump(), we truncate the core file and thus trigger the ext4
> > auto_da_alloc heuristic. We can't use O_TRUNC because we might bail out
> > of do_coredump() in certain conditions, so instead, avoid truncating
> > when the file is already empty. In cases where we're actually
> > overwriting a core file, this won't help, but the common case will be
> > much better.
> >
> > Signed-off-by: Omar Sandoval <[email protected]>
> > ---
> > Hi, Al and Ted,
> >
> > This is probably the wrong solution to the problem I described in the
> > commit message. Do you guys have any better ideas? Something like
> > 0eab928221ba ("ext4: Don't treat a truncation of a zero-length file as
> > replace-via-truncate") would also work, but that apparently wasn't
> > right, as it was reverted in 5534fb5bb35a ("ext4: Fix the alloc on close
> > after a truncate hueristic").

Does this fix things for you?

- Ted

>From bf21c027d84ded545d2c08fa01fd184d29641458 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <[email protected]>
Date: Sun, 3 Jul 2016 22:20:49 -0400
Subject: [PATCH] ext4: in ext4_setattr(), only call ext4_truncate() if there is no data to drop

If there are no blocks associated with the inode (and no inline data),
there's no point calling ext4_truncate(). This avoids setting the
replace-via-truncate hueristic if there is an attempt to truncate a
file which is already zero-length --- which is something that happens
in the core dumping code, in case there is an already existing core
file. In the comon case, there is not a previous core file, so by not
enabling the replace-via-truncate hueristic, we can speed up core
dumps.

Reported-by: Omar Sandoval <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
---
fs/ext4/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 44ee5d9..cd757f8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5171,7 +5171,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
* in data=journal mode to make pages freeable.
*/
truncate_pagecache(inode, inode->i_size);
- if (shrink)
+ if (shrink && (inode->i_blocks || ext4_has_inline_data(inode)))
ext4_truncate(inode);
up_write(&EXT4_I(inode)->i_mmap_sem);
}
--
2.5.0


2016-07-04 15:11:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

On Sun, Jul 03, 2016 at 10:24:55PM -0400, Theodore Ts'o wrote:
> From bf21c027d84ded545d2c08fa01fd184d29641458 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <[email protected]>
> Date: Sun, 3 Jul 2016 22:20:49 -0400
> Subject: [PATCH] ext4: in ext4_setattr(), only call ext4_truncate() if there is no data to drop
>
> If there are no blocks associated with the inode (and no inline data),
> there's no point calling ext4_truncate(). This avoids setting the
> replace-via-truncate hueristic if there is an attempt to truncate a
> file which is already zero-length --- which is something that happens
> in the core dumping code, in case there is an already existing core
> file. In the comon case, there is not a previous core file, so by not
> enabling the replace-via-truncate hueristic, we can speed up core
> dumps.
>
> Reported-by: Omar Sandoval <[email protected]>
> Signed-off-by: Theodore Ts'o <[email protected]>

This patch is buggy; when I tried running regression tests, it failed
early. So you probably want to skip this.

- Ted


2016-07-05 13:42:33

by Josef Bacik

[permalink] [raw]
Subject: Re: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

On 06/15/2016 01:42 PM, Omar Sandoval wrote:
> From: Omar Sandoval <[email protected]>
>
> Someone at Facebook reported that their coredumps were much faster when
> using a pipe helper than when dumping directly to a file, which doesn't
> make much sense. It turns out that this difference is because in
> do_coredump(), we truncate the core file and thus trigger the ext4
> auto_da_alloc heuristic. We can't use O_TRUNC because we might bail out
> of do_coredump() in certain conditions, so instead, avoid truncating
> when the file is already empty. In cases where we're actually
> overwriting a core file, this won't help, but the common case will be
> much better.
>
> Signed-off-by: Omar Sandoval <[email protected]>
> ---
> Hi, Al and Ted,
>
> This is probably the wrong solution to the problem I described in the
> commit message. Do you guys have any better ideas? Something like
> 0eab928221ba ("ext4: Don't treat a truncation of a zero-length file as
> replace-via-truncate") would also work, but that apparently wasn't
> right, as it was reverted in 5534fb5bb35a ("ext4: Fix the alloc on close
> after a truncate hueristic").
>
> Thanks.
>
> fs/coredump.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 281b768000e6..9da7357773f0 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
> goto close_fail;
> if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
> goto close_fail;
> - if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> - goto close_fail;
> + if (i_size_read(file_inode(cprm.file)) != 0) {
> + if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> + goto close_fail;
> + }
> }
>
> /* get us an unshared descriptor table; almost always a no-op */
>

Omar, this probably breaks the case where we do fallocate(FALLOC_FL_KEEP_SIZE),
the i_size will be 0 but there will be blocks to truncate. Probably want to
check i_blocks or something. Thanks,

Josef

2016-07-05 14:37:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

On Tue, Jul 05, 2016 at 09:42:13AM -0400, Josef Bacik wrote:
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index 281b768000e6..9da7357773f0 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
> > goto close_fail;
> > if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
> > goto close_fail;
> > - if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> > - goto close_fail;
> > + if (i_size_read(file_inode(cprm.file)) != 0) {
> > + if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> > + goto close_fail;
> > + }
> > }
> >
> > /* get us an unshared descriptor table; almost always a no-op */
> >
>
> Omar, this probably breaks the case where we do
> fallocate(FALLOC_FL_KEEP_SIZE), the i_size will be 0 but there will be
> blocks to truncate. Probably want to check i_blocks or something. Thanks,

Sure, but this is in the coredump code; do we care there? What are
the odds that someone will have fallocated blocks beyond i_size in a
file named "core"? And if so, it's not like it's going to make the
coredump invalid or non-useful in any way.

- Ted

2016-07-05 15:01:40

by Josef Bacik

[permalink] [raw]
Subject: Re: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

On 07/05/2016 10:37 AM, Theodore Ts'o wrote:
> On Tue, Jul 05, 2016 at 09:42:13AM -0400, Josef Bacik wrote:
>>> diff --git a/fs/coredump.c b/fs/coredump.c
>>> index 281b768000e6..9da7357773f0 100644
>>> --- a/fs/coredump.c
>>> +++ b/fs/coredump.c
>>> @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
>>> goto close_fail;
>>> if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
>>> goto close_fail;
>>> - if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
>>> - goto close_fail;
>>> + if (i_size_read(file_inode(cprm.file)) != 0) {
>>> + if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
>>> + goto close_fail;
>>> + }
>>> }
>>>
>>> /* get us an unshared descriptor table; almost always a no-op */
>>>
>>
>> Omar, this probably breaks the case where we do
>> fallocate(FALLOC_FL_KEEP_SIZE), the i_size will be 0 but there will be
>> blocks to truncate. Probably want to check i_blocks or something. Thanks,
>
> Sure, but this is in the coredump code; do we care there? What are
> the odds that someone will have fallocated blocks beyond i_size in a
> file named "core"? And if so, it's not like it's going to make the
> coredump invalid or non-useful in any way.

Wow I totally didn't notice this was in coredump.c, I thought it was in ext4
code because you said it failed regression tests, which I assumed were your ext4
tests. Ignore me. Thanks,

Josef


2016-07-05 16:57:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC PATCH] coredump: avoid ext4 auto_da_alloc for core file

On Tue, Jul 05, 2016 at 11:01:40AM -0400, Josef Bacik wrote:
> > > Omar, this probably breaks the case where we do
> > > fallocate(FALLOC_FL_KEEP_SIZE), the i_size will be 0 but there will be
> > > blocks to truncate. Probably want to check i_blocks or something. Thanks,
> >
> > Sure, but this is in the coredump code; do we care there? What are
> > the odds that someone will have fallocated blocks beyond i_size in a
> > file named "core"? And if so, it's not like it's going to make the
> > coredump invalid or non-useful in any way.
>
> Wow I totally didn't notice this was in coredump.c, I thought it was in ext4
> code because you said it failed regression tests, which I assumed were your
> ext4 tests. Ignore me. Thanks,

Yeah, Omar's original patch was something he described as a "hack" to
the coredump code. I actually don't think it's that bad, but it does
make sense to have ext4 not enable the "replace-via-truncate" code
when the truncate is a no-op, but it turns out this is a bit tricky
because the places where we set i_size and where we decide to truncate
beyond i_size are separated. I tried to do something simple but it
didn't quite work right; I'll look into why it didn't work hopefully
later today.

- Ted