The following changes since commit 243d50678583100855862bc084b8b307eea67f68:
Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs (2016-03-22 13:11:15 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus_stable
for you to fetch changes up to c325a67c72903e1cc30e990a15ce745bda0dbfde:
ext4: ignore quota mount options if the quota feature is enabled (2016-04-03 17:03:37 -0400)
----------------------------------------------------------------
These changes contains a fix for overlayfs interacting with some
(badly behaved) dentry code in various file systems. These have been
reviewed by Al and the respective file system mtinainers and are going
through the ext4 tree for convenience.
This also has a few ext4 encryption bug fixes that were discovered in
Android testing (yes, we will need to get these sync'ed up with the
fs/crypto code; I'll take care of that). It also has some bug fixes
and a change to ignore the legacy quota options to allow for xfstests
regression testing of ext4's internal quota feature and to be more
consistent with how xfs handles this case.
----------------------------------------------------------------
Dan Carpenter (1):
ext4 crypto: fix some error handling
Filipe Manana (1):
btrfs: fix crash/invalid memory access on fsync when using overlayfs
Jan Kara (1):
ext4: retry block allocation for failed DIO and DAX writes
Miklos Szeredi (4):
fs: add file_dentry()
nfs: use file_dentry()
ext4: use dget_parent() in ext4_file_open()
ext4: use file_dentry()
Theodore Ts'o (7):
ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
ext4 crypto: don't let data integrity writebacks fail with ENOMEM
ext4 crypto: use dget_parent() in ext4_d_revalidate()
ext4: allow readdir()'s of large empty directories to be interrupted
ext4: add lockdep annotations for i_data_sem
ext4: avoid calling dquot_get_next_id() if quota is not enabled
ext4: ignore quota mount options if the quota feature is enabled
fs/btrfs/file.c | 2 +-
fs/dcache.c | 5 ++++-
fs/ext4/crypto.c | 49 +++++++++++++++++++++++++++++--------------------
fs/ext4/dir.c | 5 +++++
fs/ext4/ext4.h | 29 +++++++++++++++++++++++++++--
fs/ext4/file.c | 12 ++++++++----
fs/ext4/inode.c | 58 ++++++++++++++++++++++++++++------------------------------
fs/ext4/move_extent.c | 11 +++++++++--
fs/ext4/namei.c | 5 +++++
fs/ext4/page-io.c | 14 +++++++++++++-
fs/ext4/readpage.c | 2 +-
fs/ext4/super.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++--------------
fs/ext4/xattr.c | 32 ++++++++++++++++++++++++++++----
fs/nfs/dir.c | 6 +++---
fs/nfs/inode.c | 2 +-
fs/nfs/nfs4file.c | 4 ++--
fs/overlayfs/super.c | 33 +++++++++++++++++++++++++++++++++
include/linux/dcache.h | 10 ++++++++++
include/linux/fs.h | 10 ++++++++++
19 files changed, 264 insertions(+), 86 deletions(-)
Theodore Ts'o wrote:
> The following changes since commit 243d50678583100855862bc084b8b307eea67f68:
>
> Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs (2016-03-22 13:11:15 -0700)
>
n> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus_stable
>
> for you to fetch changes up to c325a67c72903e1cc30e990a15ce745bda0dbfde:
>
> ext4: ignore quota mount options if the quota feature is enabled (2016-04-03 17:03:37 -0400)
>
> ----------------------------------------------------------------
> These changes contains a fix for overlayfs interacting with some
> (badly behaved) dentry code in various file systems. These have been
> reviewed by Al and the respective file system mtinainers and are going
> through the ext4 tree for convenience.
>
> This also has a few ext4 encryption bug fixes that were discovered in
> Android testing (yes, we will need to get these sync'ed up with the
> fs/crypto code; I'll take care of that). It also has some bug fixes
> and a change to ignore the legacy quota options to allow for xfstests
> regression testing of ext4's internal quota feature and to be more
> consistent with how xfs handles this case.
>
> ----------------------------------------------------------------
> Dan Carpenter (1):
> ext4 crypto: fix some error handling
>
> Filipe Manana (1):
> btrfs: fix crash/invalid memory access on fsync when using overlayfs
>
> Jan Kara (1):
> ext4: retry block allocation for failed DIO and DAX writes
>
> Miklos Szeredi (4):
> fs: add file_dentry()
> nfs: use file_dentry()
> ext4: use dget_parent() in ext4_file_open()
> ext4: use file_dentry()
>
> Theodore Ts'o (7):
> ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
> ext4 crypto: don't let data integrity writebacks fail with ENOMEM
> ext4 crypto: use dget_parent() in ext4_d_revalidate()
> ext4: allow readdir()'s of large empty directories to be interrupted
Ted,
I've been testing 5b5b7fd185e9 (linus/master) and seeing that
interrupted readdir() now returns duplicate dirents.
Reverting commit 1028b55bafb7 ("ext4: allow readdir()'s of large empty
directories to be interrupted") avoids duplicates.
On 5b5b7fd185e9 a SIGPROF to the test program below occasionally causes
"already seen name" error. On older kernels, it runs indefinitely
without error.
/*
mkdir /tmp/foo
cd /tmp/foo
for i in $(seq 0 599); do touch $i; done
/tmp/scanner &
kill -PROF %1
*/
#include <dirent.h>
#include <err.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
static void handler(int unused)
{
}
int main() {
enum { count = 600 };
char seen[count];
struct dirent *ent;
DIR *dir;
struct sigaction sa = {0};
sa.sa_handler = handler;
if (sigemptyset(&sa.sa_mask))
err(1, "sigemptyset");
sa.sa_flags = SA_RESTART;
if (sigaction(SIGPROF, &sa, NULL))
err(1, "sigaction");
for (;;) {
memset(seen, 0, sizeof(seen));
dir = opendir(".");
if (dir == NULL)
err(1, "opendir(.)");
while ((ent = readdir(dir)) != NULL) {
int idx;
if ((strcmp(ent->d_name, ".") == 0) ||
(strcmp(ent->d_name, "..") == 0)) {
continue;
}
idx = atoi(ent->d_name);
if (idx >= count) {
errx(1, "bogus name %s index %d", ent->d_name, idx);
}
if (seen[idx]) {
errx(1, "already seen name %s index %d", ent->d_name, idx);
}
seen[idx] = 1;
}
if (closedir(dir))
err(1, "closedir(.)");
}
}
> ext4: add lockdep annotations for i_data_sem
> ext4: avoid calling dquot_get_next_id() if quota is not enabled
> ext4: ignore quota mount options if the quota feature is enabled
>
> fs/btrfs/file.c | 2 +-
> fs/dcache.c | 5 ++++-
> fs/ext4/crypto.c | 49 +++++++++++++++++++++++++++++--------------------
> fs/ext4/dir.c | 5 +++++
> fs/ext4/ext4.h | 29 +++++++++++++++++++++++++++--
> fs/ext4/file.c | 12 ++++++++----
> fs/ext4/inode.c | 58 ++++++++++++++++++++++++++++------------------------------
> fs/ext4/move_extent.c | 11 +++++++++--
> fs/ext4/namei.c | 5 +++++
> fs/ext4/page-io.c | 14 +++++++++++++-
> fs/ext4/readpage.c | 2 +-
> fs/ext4/super.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++--------------
> fs/ext4/xattr.c | 32 ++++++++++++++++++++++++++++----
> fs/nfs/dir.c | 6 +++---
> fs/nfs/inode.c | 2 +-
> fs/nfs/nfs4file.c | 4 ++--
> fs/overlayfs/super.c | 33 +++++++++++++++++++++++++++++++++
> include/linux/dcache.h | 10 ++++++++++
> include/linux/fs.h | 10 ++++++++++
> 19 files changed, 264 insertions(+), 86 deletions(-)
On Sat, Apr 9, 2016 at 7:23 PM, Greg Thelen <[email protected]> wrote:
>
>
> I've been testing 5b5b7fd185e9 (linus/master) and seeing that
> interrupted readdir() now returns duplicate dirents.
Yes, your test-program recreates this beautifully here. Sadly not
while stracing.
Worth adding as a filesystem test to fsstress?
> Reverting commit 1028b55bafb7 ("ext4: allow readdir()'s of large empty
> directories to be interrupted") avoids duplicates.
Ok, so the the "fatal_signal_pending()" case in ext4_readdir() should
be ok, since we're killing the thread at that point.
So I assume it's the ext4_htree_fill_tree() part of the patch.
What I *think* happens is:
- We are perfectly happy to take a break at "call_filldir()" time (if
that returns an error because the buffer was full or whatever), and at
that point, ctx->pos will point to the last entry that we did *not*
successfully fill in.
- but if we take an exception at ext4_htree_fill_tree() time, we end
the loop, and now "ctx->pos" contains the offset of the previous entry
- the one we *successfully* already copied
- so now, when we exit the getdents, we will save the re-start value
at the entry that we already successfully handled.
So I think that commit is entirely bogus.
I think the right choice might be to
(a) revert that patch (or just change the signal_pending() into
fatal_signal_pending())
(b) to get the latency advantage, do something like this:
diff --git a/fs/readdir.c b/fs/readdir.c
index e69ef3b79787..a2244fe9c003 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -248,6 +248,8 @@ static int filldir64(struct dir_context *ctx,
const char *name, int namlen,
return -EINVAL;
dirent = buf->previous;
if (dirent) {
+ if (signal_pending)
+ return -EINTR;
if (__put_user(offset, &dirent->d_off))
goto efault;
}
which returns that error at a point where we can actually take it
(note that we only do this if we have already filled in at least one
entry: "buf->previous" was non-NULL, so we can return EINTR, because
the caller will actually return the length of the result, never the
EINTR error we're returning).
The above patch is *entirely* untested. It might be pure garbage. But
that commit 1028b55bafb7 is _definitely_ broken, and the above _might_
work.
Caveat patchor.
Linus
On Sun, Apr 10, 2016 at 1:29 PM, Linus Torvalds
<[email protected]> wrote:
>
> (b) to get the latency advantage, do something like this:
>
> + if (signal_pending)
> + return -EINTR;
That should be
if (signal_pending())
return -EINTR;
of course. Missing function call parenthesis..
I'm just testing that people are awake.
Linus
On Sun, Apr 10, 2016 at 1:32 PM, Linus Torvalds
<[email protected]> wrote:
>
> if (signal_pending())
> return -EINTR;
"signal_pending(current)", dammit.
I give up. I'm sure you understood what I meant. I need to go make
myself more coffee. Or something.
Linus
On Sun, Apr 10, 2016 at 1:29 PM, Linus Torvalds
<[email protected]> wrote:
>
> I think the right choice might be to
>
> (a) revert that patch (or just change the signal_pending() into
> fatal_signal_pending())
>
> (b) to get the latency advantage, do something like this:
The attached patch is actually tested and seems to fix the issue.
I do not have a good way to check the latency of signal delivery and I
didn't check if the signal_pending() actually ever triggers, but your
test-case that showed the problem before seems to be fine with it.
Linus
On Sun, Apr 10, 2016 at 1:48 PM, Linus Torvalds
<[email protected]> wrote:
>
> The attached patch is actually tested and seems to fix the issue.
Christ. Now really attached.
I'll just go back to bed, because today is just not working out. Maybe
things will be better tomorrow.
Or maybe I should start drinking, and at least have an excuse.
Linus
Linus,
Thanks for reverting the broken commit. I'll work on a proper fix for
the readdir() issue for the next merge cycle, and make sure it gets
proper testing.
I'll also take care of packaging up Greg's test program and getting it
submitted to xfstests so we have a proper regression test case.
Cheers,
- Ted