2016-03-02 17:15:12

by Benjamin LaHaise

[permalink] [raw]
Subject: ext4 bug: getdents uninterruptible for 117 seconds

Hi folks,

While working on a bug involving write starvation, the test I was running
managed to trigger some pretty horrific worst case behaviour in ext4. The
filesystem I'm working on is about 4TB in size, and is used for storing a
number of spool files across 100 subdirectories in the filesystem. One of
these subdirectories ended up growing to ~497MB in size. Once all of the
files were removed from these directories, the filesystem was unmounted.
On subsequent mounts of the filesystem, it became apparent that whenever
a specific directory was accessed using ls or find, the kernel would block
in getdents() for north of 117 seconds. It is clear that ext4 is slowly
reading the entire contents of the directory into memory during this time
at a rate of ~4MB/s. This filesystem is being stored on an external 8Gbps
FC SAN comprised of about 8 x 10Krpm spindles.

I've placed a copy of the e2image for the filesystem at
http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz . The problematic
directory is broken/1. The relevant snippet of strace output is below.
Thoughts?

-ben

write(1, "/mnt/broken/"..., 34) = 34 <0.000039>
openat(5, "1", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 6 <0.000007>
fcntl(6, F_GETFD) = 0 <0.000004>
fcntl(6, F_SETFD, FD_CLOEXEC) = 0 <0.000004>
fstat(6, {st_mode=S_IFDIR|0755, st_size=520896512, ...}) = 0 <0.000004>
fcntl(6, F_GETFL) = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW) <0.000004>
fcntl(6, F_SETFD, FD_CLOEXEC) = 0 <0.000004>
newfstatat(5, "1", {st_mode=S_IFDIR|0755, st_size=520896512, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
fcntl(6, F_DUPFD, 3) = 7 <0.000005>
fcntl(7, F_GETFD) = 0 <0.000005>
fcntl(7, F_SETFD, FD_CLOEXEC) = 0 <0.000004>
getdents(6, /* 2 entries */, 32768) = 48 <117.122463>
getdents(6, /* 0 entries */, 32768) = 0 <0.000005>
close(6) = 0 <0.000004>
--
"Thought is the essence of where you are now."


2016-03-02 21:43:34

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 bug: getdents uninterruptible for 117 seconds

On Wed, Mar 02, 2016 at 12:15:11PM -0500, Benjamin LaHaise wrote:
> Hi folks,
>
> While working on a bug involving write starvation, the test I was running
> managed to trigger some pretty horrific worst case behaviour in ext4. The
> filesystem I'm working on is about 4TB in size, and is used for storing a
> number of spool files across 100 subdirectories in the filesystem. One of
> these subdirectories ended up growing to ~497MB in size. Once all of the
> files were removed from these directories, the filesystem was unmounted.
> On subsequent mounts of the filesystem, it became apparent that whenever
> a specific directory was accessed using ls or find, the kernel would block
> in getdents() for north of 117 seconds. It is clear that ext4 is slowly
> reading the entire contents of the directory into memory during this time
> at a rate of ~4MB/s. This filesystem is being stored on an external 8Gbps
> FC SAN comprised of about 8 x 10Krpm spindles.
>
> I've placed a copy of the e2image for the filesystem at
> http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz . The problematic
> directory is broken/1. The relevant snippet of strace output is below.
> Thoughts?

Yes, this is a known problem. Right now we don't have a way of
removing empty directory blocks from a directory. This can be fixed
up by running "e2fsck -fD /dev/sdXX" off-line, but it's not terribly
satisfying.

There are things we could do in theory try to make things better, but
they haven't been implemented yet. In practice they tend to happen
with pathological workloads, but they do happen occasionally in real
life. It's just not something we've had time to address up until now.

- Ted

2016-03-14 17:57:33

by Benjamin LaHaise

[permalink] [raw]
Subject: [PATCH] ext4: make it possible to interrupt initial readdir call

This patch is a follow up to the email "ext4 bug: getdents uninterruptible
for 117 seconds". When doing the initial readdir on an ext4 filesystem
that grew a directory to 497MB (the filesystem image can be downloaded at
http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz), the first getdents64()
system call blocked for 117 seconds. Begin to address this issue by making
the ext4 readdir() operation interruptible by fatal signals so that it is
possible to abort a process that is stuck on this operation.

fs/ext4/dir.c | 3 +++
fs/ext4/namei.c | 4 ++++
2 files changed, 7 insertions(+)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 33f5e2a..a3e32e8 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -553,6 +553,9 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
info->curr_node = rb_first(&info->root);

while (1) {
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
+
/*
* Fill the rbtree if we have no more entries,
* or the inode has changed since we last read in the
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 48e4b89..8097cd1 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -959,6 +959,10 @@ static int htree_dirblock_to_tree(struct file *dir_file,
bh = ext4_read_dirblock(dir, block, DIRENT);
if (IS_ERR(bh))
return PTR_ERR(bh);
+ if (fatal_signal_pending(current)) {
+ brelse(bh);
+ return -ERESTARTSYS;
+ }

de = (struct ext4_dir_entry_2 *) bh->b_data;
top = (struct ext4_dir_entry_2 *) ((char *) de +
--
"Thought is the essence of where you are now."

2016-04-08 18:18:42

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH] ext4: make it possible to interrupt initial readdir call

No response on this patch yet.... Given that it's clearly a bug, and we've
got a test case that reproduces it, why isn't this being applied?

-ben

On Mon, Mar 14, 2016 at 01:57:32PM -0400, Benjamin LaHaise wrote:
> This patch is a follow up to the email "ext4 bug: getdents uninterruptible
> for 117 seconds". When doing the initial readdir on an ext4 filesystem
> that grew a directory to 497MB (the filesystem image can be downloaded at
> http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz), the first getdents64()
> system call blocked for 117 seconds. Begin to address this issue by making
> the ext4 readdir() operation interruptible by fatal signals so that it is
> possible to abort a process that is stuck on this operation.
>
> fs/ext4/dir.c | 3 +++
> fs/ext4/namei.c | 4 ++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 33f5e2a..a3e32e8 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -553,6 +553,9 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
> info->curr_node = rb_first(&info->root);
>
> while (1) {
> + if (fatal_signal_pending(current))
> + return -ERESTARTSYS;
> +
> /*
> * Fill the rbtree if we have no more entries,
> * or the inode has changed since we last read in the
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 48e4b89..8097cd1 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -959,6 +959,10 @@ static int htree_dirblock_to_tree(struct file *dir_file,
> bh = ext4_read_dirblock(dir, block, DIRENT);
> if (IS_ERR(bh))
> return PTR_ERR(bh);
> + if (fatal_signal_pending(current)) {
> + brelse(bh);
> + return -ERESTARTSYS;
> + }
>
> de = (struct ext4_dir_entry_2 *) bh->b_data;
> top = (struct ext4_dir_entry_2 *) ((char *) de +
> --
> "Thought is the essence of where you are now."

--
"Thought is the essence of where you are now."

2016-04-12 01:35:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: make it possible to interrupt initial readdir call

On Fri, Apr 08, 2016 at 02:18:41PM -0400, Benjamin LaHaise wrote:
> No response on this patch yet.... Given that it's clearly a bug, and we've
> got a test case that reproduces it, why isn't this being applied?

Sorry, I forgot to ack the your patch. I did apply it and it went to
Linus during the merge window. Unfortunately, it casued a regression,
and Linus reverted it. What went to Linus was changed slightly,
because I was trying to fix another potential case where we could end
up looping for a very long time. Unfortunately I screwed up the
modification patch and the regression tests didn't catch the problem.

- Ted