2020-12-16 23:34:44

by Vivek Goyal

[permalink] [raw]
Subject: [RFC PATCH 0/3] vfs, overlayfs: Fix syncfs() to return error

Hi,

This is V2 of patches which tries to fix syncfs() for overlayfs to
return error code when sync_filesystem(upper_sb) returns error or
there are writeback errors on upper_sb.

I posted V1 of patch here.

https://lore.kernel.org/linux-fsdevel/[email protected]/

This is just compile tested patch series. Trying to get early feedback
to figure out what direction to move in to fix this issue.

Thanks
Vivek

Vivek Goyal (3):
vfs: add new f_op->syncfs vector
overlayfs: Implement f_op->syncfs() call
overlayfs: Check writeback errors w.r.t upper in ->syncfs()

fs/overlayfs/file.c | 1 +
fs/overlayfs/overlayfs.h | 3 +++
fs/overlayfs/ovl_entry.h | 2 ++
fs/overlayfs/readdir.c | 1 +
fs/overlayfs/super.c | 41 +++++++++++++++++++++++++++++++++++++++-
fs/sync.c | 29 +++++++++++++++++++---------
include/linux/fs.h | 1 +
7 files changed, 68 insertions(+), 10 deletions(-)

--
2.25.4


2020-12-16 23:35:42

by Vivek Goyal

[permalink] [raw]
Subject: [PATCH 1/3] vfs: add new f_op->syncfs vector

Current implementation of __sync_filesystem() ignores the return code
from ->sync_fs(). I am not sure why that's the case. There must have
been some historical reason for this.

Ignoring ->sync_fs() return code is problematic for overlayfs where
it can return error if sync_filesystem() on upper super block failed.
That error will simply be lost and sycnfs(overlay_fd), will get
success (despite the fact it failed).

If we modify existing implementation, there is a concern that it will
lead to user space visible behavior changes and break things. So
instead implement a new file_operations->syncfs() call which will
be called in syncfs() syscall path. Return code from this new
call will be captured. And all the writeback error detection
logic can go in there as well. Only filesystems which implement
this call get affected by this change. Others continue to fallback
to existing mechanism.

To be clear, I mean something like this (draft, untested) patch. You'd
also need to add a new ->syncfs op for overlayfs, and that could just do
a check_and_advance against the upper layer sb's errseq_t after calling
sync_filesystem.

Vivek, fixed couple of minor compile errors in original patch.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/sync.c | 29 ++++++++++++++++++++---------
include/linux/fs.h | 1 +
2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 1373a610dc78..06caa9758d93 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -155,27 +155,38 @@ void emergency_sync(void)
}
}

+static int generic_syncfs(struct file *file)
+{
+ int ret, ret2;
+ struct super_block *sb = file->f_path.dentry->d_sb;
+
+ down_read(&sb->s_umount);
+ ret = sync_filesystem(sb);
+ up_read(&sb->s_umount);
+
+ ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err);
+
+ return ret ? ret : ret2;
+}
+
/*
* sync a single super
*/
SYSCALL_DEFINE1(syncfs, int, fd)
{
struct fd f = fdget(fd);
- struct super_block *sb;
- int ret, ret2;
+ int ret;

if (!f.file)
return -EBADF;
- sb = f.file->f_path.dentry->d_sb;
-
- down_read(&sb->s_umount);
- ret = sync_filesystem(sb);
- up_read(&sb->s_umount);

- ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
+ if (f.file->f_op->syncfs)
+ ret = f.file->f_op->syncfs(f.file);
+ else
+ ret = generic_syncfs(f.file);

fdput(f);
- return ret ? ret : ret2;
+ return ret;
}

/**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8667d0cdc71e..6710469b7e33 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1859,6 +1859,7 @@ struct file_operations {
struct file *file_out, loff_t pos_out,
loff_t len, unsigned int remap_flags);
int (*fadvise)(struct file *, loff_t, loff_t, int);
+ int (*syncfs)(struct file *);
} __randomize_layout;

struct inode_operations {
--
2.25.4

2020-12-16 23:35:48

by Vivek Goyal

[permalink] [raw]
Subject: [PATCH 3/3] overlayfs: Check writeback errors w.r.t upper in ->syncfs()

Check for writeback error on overlay super block w.r.t "struct file"
passed in ->syncfs().

As of now real error happens on upper sb. So this patch first propagates
error from upper sb to overlay sb and then checks error w.r.t struct
file passed in.

Jeff, I know you prefer that I should rather file upper file and check
error directly on on upper sb w.r.t this real upper file. While I was
implementing that I thought what if file is on lower (and has not been
copied up yet). In that case shall we not check writeback errors and
return back to user space? That does not sound right though because,
we are not checking for writeback errors on this file. Rather we
are checking for any error on superblock. Upper might have an error
and we should report it to user even if file in question is a lower
file. And that's why I fell back to this approach. But I am open to
change it if there are issues in this method.

Signed-off-by: Vivek Goyal <[email protected]>
---
fs/overlayfs/ovl_entry.h | 2 ++
fs/overlayfs/super.c | 15 ++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1b5a2094df8e..a08fd719ee7b 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -79,6 +79,8 @@ struct ovl_fs {
atomic_long_t last_ino;
/* Whiteout dentry cache */
struct dentry *whiteout;
+ /* Protects multiple sb->s_wb_err update from upper_sb . */
+ spinlock_t errseq_lock;
};

static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index b4d92e6fa5ce..e7bc4492205e 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -291,7 +291,7 @@ int ovl_syncfs(struct file *file)
struct super_block *sb = file->f_path.dentry->d_sb;
struct ovl_fs *ofs = sb->s_fs_info;
struct super_block *upper_sb;
- int ret;
+ int ret, ret2;

ret = 0;
down_read(&sb->s_umount);
@@ -310,10 +310,18 @@ int ovl_syncfs(struct file *file)
ret = sync_filesystem(upper_sb);
up_read(&upper_sb->s_umount);

+ /* Update overlay sb->s_wb_err */
+ if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
+ /* Upper sb has errors since last time */
+ spin_lock(&ofs->errseq_lock);
+ errseq_check_and_advance(&upper_sb->s_wb_err, &sb->s_wb_err);
+ spin_unlock(&ofs->errseq_lock);
+ }

+ ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err);
out:
up_read(&sb->s_umount);
- return ret;
+ return ret ? ret : ret2;
}

/**
@@ -1903,6 +1911,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
if (!cred)
goto out_err;

+ spin_lock_init(&ofs->errseq_lock);
/* Is there a reason anyone would want not to share whiteouts? */
ofs->share_whiteout = true;

@@ -1975,7 +1984,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)

sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
-
+ sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
}
oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
err = PTR_ERR(oe);
--
2.25.4

2020-12-16 23:36:44

by Vivek Goyal

[permalink] [raw]
Subject: [PATCH 2/3] overlayfs: Implement f_op->syncfs() call

Provide an implementation for ->syncfs(). Now if there is an error
returned by sync_filesystem(upper_sb), it will be visible to user
space. Currently in ovl_sync_fs() path, this error is ignored by VFS.

A later patch also adds logic to detect writeback error.

Signed-off-by: Vivek Goyal <[email protected]>
---
fs/overlayfs/file.c | 1 +
fs/overlayfs/overlayfs.h | 3 +++
fs/overlayfs/readdir.c | 1 +
fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
4 files changed, 35 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index efccb7c1f9bc..affc1ba63202 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -806,6 +806,7 @@ const struct file_operations ovl_file_operations = {

.copy_file_range = ovl_copy_file_range,
.remap_file_range = ovl_remap_file_range,
+ .syncfs = ovl_syncfs,
};

int __init ovl_aio_request_cache_init(void)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f8880aa2ba0e..1efb13800755 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -520,3 +520,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,

/* export.c */
extern const struct export_operations ovl_export_operations;
+
+/* super.c */
+int ovl_syncfs(struct file *file);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 01620ebae1bd..e89b450c8f8f 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -975,6 +975,7 @@ const struct file_operations ovl_dir_operations = {
#ifdef CONFIG_COMPAT
.compat_ioctl = ovl_compat_ioctl,
#endif
+ .syncfs = ovl_syncfs,
};

int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983bcfbb3..b4d92e6fa5ce 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -286,6 +286,36 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
return ret;
}

+int ovl_syncfs(struct file *file)
+{
+ struct super_block *sb = file->f_path.dentry->d_sb;
+ struct ovl_fs *ofs = sb->s_fs_info;
+ struct super_block *upper_sb;
+ int ret;
+
+ ret = 0;
+ down_read(&sb->s_umount);
+ if (sb_rdonly(sb))
+ goto out;
+
+ if (!ovl_upper_mnt(ofs))
+ goto out;
+
+ if (!ovl_should_sync(ofs))
+ goto out;
+
+ upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
+
+ down_read(&upper_sb->s_umount);
+ ret = sync_filesystem(upper_sb);
+ up_read(&upper_sb->s_umount);
+
+
+out:
+ up_read(&sb->s_umount);
+ return ret;
+}
+
/**
* ovl_statfs
* @sb: The overlayfs super block
--
2.25.4

2020-12-17 00:52:29

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/3] vfs: add new f_op->syncfs vector

[Christoph added to Cc...]
On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote:
> Current implementation of __sync_filesystem() ignores the return code
> from ->sync_fs(). I am not sure why that's the case. There must have
> been some historical reason for this.
>
> Ignoring ->sync_fs() return code is problematic for overlayfs where
> it can return error if sync_filesystem() on upper super block failed.
> That error will simply be lost and sycnfs(overlay_fd), will get
> success (despite the fact it failed).
>
> If we modify existing implementation, there is a concern that it will
> lead to user space visible behavior changes and break things. So
> instead implement a new file_operations->syncfs() call which will
> be called in syncfs() syscall path. Return code from this new
> call will be captured. And all the writeback error detection
> logic can go in there as well. Only filesystems which implement
> this call get affected by this change. Others continue to fallback
> to existing mechanism.

That smells like a massive source of confusion down the road. I'd just
looked through the existing instances; many always return 0, but quite
a few sometimes try to return an error:
fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs,
fs/exfat/super.c:204: .sync_fs = exfat_sync_fs,
fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs,
fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs,
fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs,
fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs,
fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs,
fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs,
fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs,
fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs,
is the list of such. There are 4 method callers:
dquot_quota_sync(), dquot_disable(), __sync_filesystem() and
sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the
return value; for __sync_filesystem() we almost certainly
do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait),
after all. The question for that one is whether we want
__sync_blockdev() called even in case of ->sync_fs() reporting
a failure, and I suspect that it's safer to call it anyway and
return the first error value we'd got. No idea about quota
situation.

2020-12-17 09:59:19

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/3] vfs: add new f_op->syncfs vector

On Thu 17-12-20 00:49:35, Al Viro wrote:
> [Christoph added to Cc...]
> On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote:
> > Current implementation of __sync_filesystem() ignores the return code
> > from ->sync_fs(). I am not sure why that's the case. There must have
> > been some historical reason for this.
> >
> > Ignoring ->sync_fs() return code is problematic for overlayfs where
> > it can return error if sync_filesystem() on upper super block failed.
> > That error will simply be lost and sycnfs(overlay_fd), will get
> > success (despite the fact it failed).
> >
> > If we modify existing implementation, there is a concern that it will
> > lead to user space visible behavior changes and break things. So
> > instead implement a new file_operations->syncfs() call which will
> > be called in syncfs() syscall path. Return code from this new
> > call will be captured. And all the writeback error detection
> > logic can go in there as well. Only filesystems which implement
> > this call get affected by this change. Others continue to fallback
> > to existing mechanism.
>
> That smells like a massive source of confusion down the road. I'd just
> looked through the existing instances; many always return 0, but quite
> a few sometimes try to return an error:
> fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs,
> fs/exfat/super.c:204: .sync_fs = exfat_sync_fs,
> fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs,
> fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs,
> fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs,
> fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs,
> fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs,
> fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs,
> fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs,
> fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs,
> is the list of such. There are 4 method callers:
> dquot_quota_sync(), dquot_disable(), __sync_filesystem() and
> sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the
> return value; for __sync_filesystem() we almost certainly
> do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait),
> after all. The question for that one is whether we want
> __sync_blockdev() called even in case of ->sync_fs() reporting
> a failure, and I suspect that it's safer to call it anyway and
> return the first error value we'd got. No idea about quota
> situation.

WRT quota situation: All the ->sync_fs() calls there are due to cache
coherency reasons (we need to get quota changes to disk, then prune quota
files's page cache, and then userspace can read current quota structures
from the disk). We don't want to fail dquot_disable() just because caches
might be incoherent so ignoring ->sync_fs() return value there is fine.
With dquot_quota_sync() it might make some sense to return the error -
that's just a backend for Q_SYNC quotactl(2). OTOH I'm not sure anybody
really cares - Q_SYNC is rarely used.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-12-17 15:04:25

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] vfs: add new f_op->syncfs vector

On Thu, Dec 17, 2020 at 12:49:35AM +0000, Al Viro wrote:
> [Christoph added to Cc...]
> On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote:
> > Current implementation of __sync_filesystem() ignores the return code
> > from ->sync_fs(). I am not sure why that's the case. There must have
> > been some historical reason for this.
> >
> > Ignoring ->sync_fs() return code is problematic for overlayfs where
> > it can return error if sync_filesystem() on upper super block failed.
> > That error will simply be lost and sycnfs(overlay_fd), will get
> > success (despite the fact it failed).
> >
> > If we modify existing implementation, there is a concern that it will
> > lead to user space visible behavior changes and break things. So
> > instead implement a new file_operations->syncfs() call which will
> > be called in syncfs() syscall path. Return code from this new
> > call will be captured. And all the writeback error detection
> > logic can go in there as well. Only filesystems which implement
> > this call get affected by this change. Others continue to fallback
> > to existing mechanism.
>
> That smells like a massive source of confusion down the road. I'd just
> looked through the existing instances; many always return 0, but quite
> a few sometimes try to return an error:
> fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs,
> fs/exfat/super.c:204: .sync_fs = exfat_sync_fs,
> fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs,
> fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs,
> fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs,
> fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs,
> fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs,
> fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs,
> fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs,
> fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs,
> is the list of such. There are 4 method callers:
> dquot_quota_sync(), dquot_disable(), __sync_filesystem() and
> sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the
> return value; for __sync_filesystem() we almost certainly
> do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait),
> after all. The question for that one is whether we want
> __sync_blockdev() called even in case of ->sync_fs() reporting
> a failure, and I suspect that it's safer to call it anyway and
> return the first error value we'd got.

I posted V1 patch to do exactly above. In __sync_filesystem(), capture
return code from ->sync_fs() but continue to call __sync_blockdev() and
and return error code from ->sync_fs() if there is one otherwise
return error code from __sync_blockdev().

https://lore.kernel.org/linux-fsdevel/[email protected]/

Thanks
Vivek

> No idea about quota situation.
>

2020-12-17 16:19:15

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] vfs: add new f_op->syncfs vector

On Thu, Dec 17, 2020 at 10:57:28AM +0100, Jan Kara wrote:
> On Thu 17-12-20 00:49:35, Al Viro wrote:
> > [Christoph added to Cc...]
> > On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote:
> > > Current implementation of __sync_filesystem() ignores the return code
> > > from ->sync_fs(). I am not sure why that's the case. There must have
> > > been some historical reason for this.
> > >
> > > Ignoring ->sync_fs() return code is problematic for overlayfs where
> > > it can return error if sync_filesystem() on upper super block failed.
> > > That error will simply be lost and sycnfs(overlay_fd), will get
> > > success (despite the fact it failed).
> > >
> > > If we modify existing implementation, there is a concern that it will
> > > lead to user space visible behavior changes and break things. So
> > > instead implement a new file_operations->syncfs() call which will
> > > be called in syncfs() syscall path. Return code from this new
> > > call will be captured. And all the writeback error detection
> > > logic can go in there as well. Only filesystems which implement
> > > this call get affected by this change. Others continue to fallback
> > > to existing mechanism.
> >
> > That smells like a massive source of confusion down the road. I'd just
> > looked through the existing instances; many always return 0, but quite
> > a few sometimes try to return an error:
> > fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs,
> > fs/exfat/super.c:204: .sync_fs = exfat_sync_fs,
> > fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs,
> > fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs,
> > fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs,
> > fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs,
> > fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs,
> > fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs,
> > fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs,
> > fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs,
> > is the list of such. There are 4 method callers:
> > dquot_quota_sync(), dquot_disable(), __sync_filesystem() and
> > sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the
> > return value; for __sync_filesystem() we almost certainly
> > do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait),
> > after all. The question for that one is whether we want
> > __sync_blockdev() called even in case of ->sync_fs() reporting
> > a failure, and I suspect that it's safer to call it anyway and
> > return the first error value we'd got. No idea about quota
> > situation.
>
> WRT quota situation: All the ->sync_fs() calls there are due to cache
> coherency reasons (we need to get quota changes to disk, then prune quota
> files's page cache, and then userspace can read current quota structures
> from the disk). We don't want to fail dquot_disable() just because caches
> might be incoherent so ignoring ->sync_fs() return value there is fine.
> With dquot_quota_sync() it might make some sense to return the error -
> that's just a backend for Q_SYNC quotactl(2). OTOH I'm not sure anybody
> really cares - Q_SYNC is rarely used.

Thanks Jan. May be I will leave dquot_quota_sync() untouched for now. When
somebody has a need to capture return code from ->sync_fs() there, it
can be easily added.

Vivek

2020-12-17 19:53:02

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/3] vfs: add new f_op->syncfs vector

On Thu, 2020-12-17 at 00:49 +0000, Al Viro wrote:
> [Christoph added to Cc...]
> On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote:
> > Current implementation of __sync_filesystem() ignores the return code
> > from ->sync_fs(). I am not sure why that's the case. There must have
> > been some historical reason for this.
> >
> > Ignoring ->sync_fs() return code is problematic for overlayfs where
> > it can return error if sync_filesystem() on upper super block failed.
> > That error will simply be lost and sycnfs(overlay_fd), will get
> > success (despite the fact it failed).
> >
> > If we modify existing implementation, there is a concern that it will
> > lead to user space visible behavior changes and break things. So
> > instead implement a new file_operations->syncfs() call which will
> > be called in syncfs() syscall path. Return code from this new
> > call will be captured. And all the writeback error detection
> > logic can go in there as well. Only filesystems which implement
> > this call get affected by this change. Others continue to fallback
> > to existing mechanism.
>
> That smells like a massive source of confusion down the road. I'd just
> looked through the existing instances; many always return 0, but quite
> a few sometimes try to return an error:
> fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs,
> fs/exfat/super.c:204: .sync_fs = exfat_sync_fs,
> fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs,
> fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs,
> fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs,
> fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs,
> fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs,
> fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs,
> fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs,
> fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs,
> is the list of such. There are 4 method callers:
> dquot_quota_sync(), dquot_disable(), __sync_filesystem() and
> sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the
> return value; for __sync_filesystem() we almost certainly
> do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait),
> after all. The question for that one is whether we want
> __sync_blockdev() called even in case of ->sync_fs() reporting
> a failure, and I suspect that it's safer to call it anyway and
> return the first error value we'd got. No idea about quota
> situation.
>

The only problem there is that makes it a bit difficult to override the
error return to syncfs, which is really what overlayfs wants to be able
to do. Their syncfs syncs out the upper layer, so it makes sense to just
have their file->f_sb_err track the upper layer's sb->s_wb_err.

You can plumb the errors from sync_fs all the way through to the syncfs
syscall, but we can't currently tell whether we're doing the sync_fs op
on behalf of sync(), syncfs() or something else entirely. We need to
ensure that if it does return an error, that it doesn't get dropped on
the floor.

I think it'd be simpler to just add f_op->syncfs and change
s_op->sync_fs to a different name, to lessen the confusion.
s_op->sync_fs sort of makes it look like you're implementing syncfs(2),
but there's a bit more to it than that.

Maybe s_op->sync_filesystem? There are only about 113 instances
"sync_fs" in the tree. Changing the name might also help highlight the
fact that the return code won't be ignored like it used to be.
--
Jeff Layton <[email protected]>

2020-12-17 20:11:15

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 3/3] overlayfs: Check writeback errors w.r.t upper in ->syncfs()

On Wed, Dec 16, 2020 at 06:31:49PM -0500, Vivek Goyal wrote:
> Check for writeback error on overlay super block w.r.t "struct file"
> passed in ->syncfs().
>
> As of now real error happens on upper sb. So this patch first propagates
> error from upper sb to overlay sb and then checks error w.r.t struct
> file passed in.
>
> Jeff, I know you prefer that I should rather file upper file and check
> error directly on on upper sb w.r.t this real upper file. While I was
> implementing that I thought what if file is on lower (and has not been
> copied up yet). In that case shall we not check writeback errors and
> return back to user space? That does not sound right though because,
> we are not checking for writeback errors on this file. Rather we
> are checking for any error on superblock. Upper might have an error
> and we should report it to user even if file in question is a lower
> file. And that's why I fell back to this approach. But I am open to
> change it if there are issues in this method.
>
> Signed-off-by: Vivek Goyal <[email protected]>
> ---
> fs/overlayfs/ovl_entry.h | 2 ++
> fs/overlayfs/super.c | 15 ++++++++++++---
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 1b5a2094df8e..a08fd719ee7b 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -79,6 +79,8 @@ struct ovl_fs {
> atomic_long_t last_ino;
> /* Whiteout dentry cache */
> struct dentry *whiteout;
> + /* Protects multiple sb->s_wb_err update from upper_sb . */
> + spinlock_t errseq_lock;
> };
>
> static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index b4d92e6fa5ce..e7bc4492205e 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -291,7 +291,7 @@ int ovl_syncfs(struct file *file)
> struct super_block *sb = file->f_path.dentry->d_sb;
> struct ovl_fs *ofs = sb->s_fs_info;
> struct super_block *upper_sb;
> - int ret;
> + int ret, ret2;
>
> ret = 0;
> down_read(&sb->s_umount);
> @@ -310,10 +310,18 @@ int ovl_syncfs(struct file *file)
> ret = sync_filesystem(upper_sb);
> up_read(&upper_sb->s_umount);
>
> + /* Update overlay sb->s_wb_err */
> + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> + /* Upper sb has errors since last time */
> + spin_lock(&ofs->errseq_lock);
> + errseq_check_and_advance(&upper_sb->s_wb_err, &sb->s_wb_err);
> + spin_unlock(&ofs->errseq_lock);
> + }

So, the problem here is that the resulting value in sb->s_wb_err is
going to end up with the REPORTED flag set (using the naming in my
latest set). So, a later opener of a file on sb->s_wb_err won't see it.

For instance, suppose you call sync() on the box and does the above
check and advance. Then, you open the file and call syncfs() and get
back no error because REPORTED flag was set when you opened. That error
will then be lost.

>
> + ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err);
> out:
> up_read(&sb->s_umount);
> - return ret;
> + return ret ? ret : ret2;
> }
>
> /**
> @@ -1903,6 +1911,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> if (!cred)
> goto out_err;
>
> + spin_lock_init(&ofs->errseq_lock);
> /* Is there a reason anyone would want not to share whiteouts? */
> ofs->share_whiteout = true;
>
> @@ -1975,7 +1984,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>
> sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> -
> + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);

This will mark the error on the upper_sb as REPORTED, and that's not
really that's the case if you're just using it set s_wb_err in the
overlay. You might want to use errseq_peek in this situation.

> }
> oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
> err = PTR_ERR(oe);
> --
> 2.25.4
>

2020-12-18 14:47:10

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 3/3] overlayfs: Check writeback errors w.r.t upper in ->syncfs()

On Thu, Dec 17, 2020 at 03:08:56PM -0500, Jeffrey Layton wrote:
> On Wed, Dec 16, 2020 at 06:31:49PM -0500, Vivek Goyal wrote:
> > Check for writeback error on overlay super block w.r.t "struct file"
> > passed in ->syncfs().
> >
> > As of now real error happens on upper sb. So this patch first propagates
> > error from upper sb to overlay sb and then checks error w.r.t struct
> > file passed in.
> >
> > Jeff, I know you prefer that I should rather file upper file and check
> > error directly on on upper sb w.r.t this real upper file. While I was
> > implementing that I thought what if file is on lower (and has not been
> > copied up yet). In that case shall we not check writeback errors and
> > return back to user space? That does not sound right though because,
> > we are not checking for writeback errors on this file. Rather we
> > are checking for any error on superblock. Upper might have an error
> > and we should report it to user even if file in question is a lower
> > file. And that's why I fell back to this approach. But I am open to
> > change it if there are issues in this method.
> >
> > Signed-off-by: Vivek Goyal <[email protected]>
> > ---
> > fs/overlayfs/ovl_entry.h | 2 ++
> > fs/overlayfs/super.c | 15 ++++++++++++---
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > index 1b5a2094df8e..a08fd719ee7b 100644
> > --- a/fs/overlayfs/ovl_entry.h
> > +++ b/fs/overlayfs/ovl_entry.h
> > @@ -79,6 +79,8 @@ struct ovl_fs {
> > atomic_long_t last_ino;
> > /* Whiteout dentry cache */
> > struct dentry *whiteout;
> > + /* Protects multiple sb->s_wb_err update from upper_sb . */
> > + spinlock_t errseq_lock;
> > };
> >
> > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index b4d92e6fa5ce..e7bc4492205e 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -291,7 +291,7 @@ int ovl_syncfs(struct file *file)
> > struct super_block *sb = file->f_path.dentry->d_sb;
> > struct ovl_fs *ofs = sb->s_fs_info;
> > struct super_block *upper_sb;
> > - int ret;
> > + int ret, ret2;
> >
> > ret = 0;
> > down_read(&sb->s_umount);
> > @@ -310,10 +310,18 @@ int ovl_syncfs(struct file *file)
> > ret = sync_filesystem(upper_sb);
> > up_read(&upper_sb->s_umount);
> >
> > + /* Update overlay sb->s_wb_err */
> > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > + /* Upper sb has errors since last time */
> > + spin_lock(&ofs->errseq_lock);
> > + errseq_check_and_advance(&upper_sb->s_wb_err, &sb->s_wb_err);
> > + spin_unlock(&ofs->errseq_lock);
> > + }
>
> So, the problem here is that the resulting value in sb->s_wb_err is
> going to end up with the REPORTED flag set (using the naming in my
> latest set). So, a later opener of a file on sb->s_wb_err won't see it.
>
> For instance, suppose you call sync() on the box and does the above
> check and advance. Then, you open the file and call syncfs() and get
> back no error because REPORTED flag was set when you opened. That error
> will then be lost.

Hi Jeff,

In this patch, I am doing this only in ->syncfs() path and not in
->sync_fs() path. IOW, errseq_check_and_advance() will take place
only if there is a valid "struct file" passed in. That means there
is a consumer of the error and that means it should be fine to
set the sb->s_wb_err as SEEN/REPORTED, right?

If we end up plumbming "struct file" in existing ->sync_fs() routine,
then I will call this only if a non NULL struct file has been
passed in. Otherwise skip this step.

IOW, sync() call will not result in errseq_check_and_advance() instead
a syncfs() call will.

>
> >
> > + ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err);
> > out:
> > up_read(&sb->s_umount);
> > - return ret;
> > + return ret ? ret : ret2;
> > }
> >
> > /**
> > @@ -1903,6 +1911,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > if (!cred)
> > goto out_err;
> >
> > + spin_lock_init(&ofs->errseq_lock);
> > /* Is there a reason anyone would want not to share whiteouts? */
> > ofs->share_whiteout = true;
> >
> > @@ -1975,7 +1984,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >
> > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > -
> > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
>
> This will mark the error on the upper_sb as REPORTED, and that's not
> really that's the case if you're just using it set s_wb_err in the
> overlay. You might want to use errseq_peek in this situation.

For now I am still looking at existing code and not new code. Because
I belive that new code does not change existing behavior instead
provides additional functionality to allow sampling the error without
marking it seen as well as provide helper to not force seeing an
unseen error.

So current errseq_sample() does not mark error SEEN. And if it is
an unseen error, we will get 0 and be forced to see the error next
time.

One small issue with this is that say upper has unseen error. Now
we mount overlay and save that value in sb->s_wb_err (unseen). Say
a file is opened on upper and error is now seen on upper. But
we still have unseen error cached in overlay and if overlay fd is
now opened, f->f_sb_err will be 0 and it will be forced to see
err on next syncfs().

IOW, despite the fact that overlay fd was opened after upper sb had
been marked seen, it still will see error. I think it probably is
not a big issue.

Vivek

>
> > }
> > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
> > err = PTR_ERR(oe);
> > --
> > 2.25.4
> >
>

2020-12-18 15:05:25

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 3/3] overlayfs: Check writeback errors w.r.t upper in ->syncfs()

On Fri, Dec 18, 2020 at 09:44:18AM -0500, Vivek Goyal wrote:
> On Thu, Dec 17, 2020 at 03:08:56PM -0500, Jeffrey Layton wrote:
> > On Wed, Dec 16, 2020 at 06:31:49PM -0500, Vivek Goyal wrote:
> > > Check for writeback error on overlay super block w.r.t "struct file"
> > > passed in ->syncfs().
> > >
> > > As of now real error happens on upper sb. So this patch first propagates
> > > error from upper sb to overlay sb and then checks error w.r.t struct
> > > file passed in.
> > >
> > > Jeff, I know you prefer that I should rather file upper file and check
> > > error directly on on upper sb w.r.t this real upper file. While I was
> > > implementing that I thought what if file is on lower (and has not been
> > > copied up yet). In that case shall we not check writeback errors and
> > > return back to user space? That does not sound right though because,
> > > we are not checking for writeback errors on this file. Rather we
> > > are checking for any error on superblock. Upper might have an error
> > > and we should report it to user even if file in question is a lower
> > > file. And that's why I fell back to this approach. But I am open to
> > > change it if there are issues in this method.
> > >
> > > Signed-off-by: Vivek Goyal <[email protected]>
> > > ---
> > > fs/overlayfs/ovl_entry.h | 2 ++
> > > fs/overlayfs/super.c | 15 ++++++++++++---
> > > 2 files changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > > index 1b5a2094df8e..a08fd719ee7b 100644
> > > --- a/fs/overlayfs/ovl_entry.h
> > > +++ b/fs/overlayfs/ovl_entry.h
> > > @@ -79,6 +79,8 @@ struct ovl_fs {
> > > atomic_long_t last_ino;
> > > /* Whiteout dentry cache */
> > > struct dentry *whiteout;
> > > + /* Protects multiple sb->s_wb_err update from upper_sb . */
> > > + spinlock_t errseq_lock;
> > > };
> > >
> > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > index b4d92e6fa5ce..e7bc4492205e 100644
> > > --- a/fs/overlayfs/super.c
> > > +++ b/fs/overlayfs/super.c
> > > @@ -291,7 +291,7 @@ int ovl_syncfs(struct file *file)
> > > struct super_block *sb = file->f_path.dentry->d_sb;
> > > struct ovl_fs *ofs = sb->s_fs_info;
> > > struct super_block *upper_sb;
> > > - int ret;
> > > + int ret, ret2;
> > >
> > > ret = 0;
> > > down_read(&sb->s_umount);
> > > @@ -310,10 +310,18 @@ int ovl_syncfs(struct file *file)
> > > ret = sync_filesystem(upper_sb);
> > > up_read(&upper_sb->s_umount);
> > >
> > > + /* Update overlay sb->s_wb_err */
> > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > > + /* Upper sb has errors since last time */
> > > + spin_lock(&ofs->errseq_lock);
> > > + errseq_check_and_advance(&upper_sb->s_wb_err, &sb->s_wb_err);
> > > + spin_unlock(&ofs->errseq_lock);
> > > + }
> >
> > So, the problem here is that the resulting value in sb->s_wb_err is
> > going to end up with the REPORTED flag set (using the naming in my
> > latest set). So, a later opener of a file on sb->s_wb_err won't see it.
> >
> > For instance, suppose you call sync() on the box and does the above
> > check and advance. Then, you open the file and call syncfs() and get
> > back no error because REPORTED flag was set when you opened. That error
> > will then be lost.
>
> Hi Jeff,
>
> In this patch, I am doing this only in ->syncfs() path and not in
> ->sync_fs() path. IOW, errseq_check_and_advance() will take place
> only if there is a valid "struct file" passed in. That means there
> is a consumer of the error and that means it should be fine to
> set the sb->s_wb_err as SEEN/REPORTED, right?
>
> If we end up plumbming "struct file" in existing ->sync_fs() routine,
> then I will call this only if a non NULL struct file has been
> passed in. Otherwise skip this step.
>
> IOW, sync() call will not result in errseq_check_and_advance() instead
> a syncfs() call will.
>

It still seems odd and I'm not sure you won't end up with weird corner
cases due to the flag handling. If you're doing this in the new
f_op->syncfs, then why bother with sb->s_wb_err at all? You can just do
this, and avoid the overlayfs sb altogether:

if (errseq_check(&upper_sb->s_wb_err, file->f_sb_err)) {
/* Upper sb has errors since last time */
spin_lock(&file->f_lock);
errseq_check_and_advance(&upper_sb->s_wb_err, &file->f_sb_err);
spin_unlock(&file->f_lock);
}

That's simpler than trying to propagate the error between two
errseq_t's. You would need to sample the upper_sb->s_wb_err at
open time in the overlayfs ->open handler though, to make sure
you're tracking the right one.

> >
> > >
> > > + ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err);
> > > out:
> > > up_read(&sb->s_umount);
> > > - return ret;
> > > + return ret ? ret : ret2;
> > > }
> > >
> > > /**
> > > @@ -1903,6 +1911,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > if (!cred)
> > > goto out_err;
> > >
> > > + spin_lock_init(&ofs->errseq_lock);
> > > /* Is there a reason anyone would want not to share whiteouts? */
> > > ofs->share_whiteout = true;
> > >
> > > @@ -1975,7 +1984,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > >
> > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> > > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > > -
> > > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> >
> > This will mark the error on the upper_sb as REPORTED, and that's not
> > really that's the case if you're just using it set s_wb_err in the
> > overlay. You might want to use errseq_peek in this situation.
>
> For now I am still looking at existing code and not new code. Because
> I belive that new code does not change existing behavior instead
> provides additional functionality to allow sampling the error without
> marking it seen as well as provide helper to not force seeing an
> unseen error.
>
> So current errseq_sample() does not mark error SEEN. And if it is
> an unseen error, we will get 0 and be forced to see the error next
> time.
>
> One small issue with this is that say upper has unseen error. Now
> we mount overlay and save that value in sb->s_wb_err (unseen). Say
> a file is opened on upper and error is now seen on upper. But
> we still have unseen error cached in overlay and if overlay fd is
> now opened, f->f_sb_err will be 0 and it will be forced to see
> err on next syncfs().
>
> IOW, despite the fact that overlay fd was opened after upper sb had
> been marked seen, it still will see error. I think it probably is
> not a big issue.
>

Good point. I was thinking about the newer code that may mark it
OBSERVED when you sample at open time.

Still, I think working with the overlayfs sb->s_wb_err is just adding
complexity for little benefit. Assuming that writeback errors can only
happen on the upper layer, you're better off avoiding it.

>
> >
> > > }
> > > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
> > > err = PTR_ERR(oe);
> > > --
> > > 2.25.4
> > >
> >
>

2020-12-18 16:33:45

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 3/3] overlayfs: Check writeback errors w.r.t upper in ->syncfs()

On Fri, Dec 18, 2020 at 10:02:58AM -0500, Jeff Layton wrote:
> On Fri, Dec 18, 2020 at 09:44:18AM -0500, Vivek Goyal wrote:
> > On Thu, Dec 17, 2020 at 03:08:56PM -0500, Jeffrey Layton wrote:
> > > On Wed, Dec 16, 2020 at 06:31:49PM -0500, Vivek Goyal wrote:
> > > > Check for writeback error on overlay super block w.r.t "struct file"
> > > > passed in ->syncfs().
> > > >
> > > > As of now real error happens on upper sb. So this patch first propagates
> > > > error from upper sb to overlay sb and then checks error w.r.t struct
> > > > file passed in.
> > > >
> > > > Jeff, I know you prefer that I should rather file upper file and check
> > > > error directly on on upper sb w.r.t this real upper file. While I was
> > > > implementing that I thought what if file is on lower (and has not been
> > > > copied up yet). In that case shall we not check writeback errors and
> > > > return back to user space? That does not sound right though because,
> > > > we are not checking for writeback errors on this file. Rather we
> > > > are checking for any error on superblock. Upper might have an error
> > > > and we should report it to user even if file in question is a lower
> > > > file. And that's why I fell back to this approach. But I am open to
> > > > change it if there are issues in this method.
> > > >
> > > > Signed-off-by: Vivek Goyal <[email protected]>
> > > > ---
> > > > fs/overlayfs/ovl_entry.h | 2 ++
> > > > fs/overlayfs/super.c | 15 ++++++++++++---
> > > > 2 files changed, 14 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > > > index 1b5a2094df8e..a08fd719ee7b 100644
> > > > --- a/fs/overlayfs/ovl_entry.h
> > > > +++ b/fs/overlayfs/ovl_entry.h
> > > > @@ -79,6 +79,8 @@ struct ovl_fs {
> > > > atomic_long_t last_ino;
> > > > /* Whiteout dentry cache */
> > > > struct dentry *whiteout;
> > > > + /* Protects multiple sb->s_wb_err update from upper_sb . */
> > > > + spinlock_t errseq_lock;
> > > > };
> > > >
> > > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > > index b4d92e6fa5ce..e7bc4492205e 100644
> > > > --- a/fs/overlayfs/super.c
> > > > +++ b/fs/overlayfs/super.c
> > > > @@ -291,7 +291,7 @@ int ovl_syncfs(struct file *file)
> > > > struct super_block *sb = file->f_path.dentry->d_sb;
> > > > struct ovl_fs *ofs = sb->s_fs_info;
> > > > struct super_block *upper_sb;
> > > > - int ret;
> > > > + int ret, ret2;
> > > >
> > > > ret = 0;
> > > > down_read(&sb->s_umount);
> > > > @@ -310,10 +310,18 @@ int ovl_syncfs(struct file *file)
> > > > ret = sync_filesystem(upper_sb);
> > > > up_read(&upper_sb->s_umount);
> > > >
> > > > + /* Update overlay sb->s_wb_err */
> > > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > > > + /* Upper sb has errors since last time */
> > > > + spin_lock(&ofs->errseq_lock);
> > > > + errseq_check_and_advance(&upper_sb->s_wb_err, &sb->s_wb_err);
> > > > + spin_unlock(&ofs->errseq_lock);
> > > > + }
> > >
> > > So, the problem here is that the resulting value in sb->s_wb_err is
> > > going to end up with the REPORTED flag set (using the naming in my
> > > latest set). So, a later opener of a file on sb->s_wb_err won't see it.
> > >
> > > For instance, suppose you call sync() on the box and does the above
> > > check and advance. Then, you open the file and call syncfs() and get
> > > back no error because REPORTED flag was set when you opened. That error
> > > will then be lost.
> >
> > Hi Jeff,
> >
> > In this patch, I am doing this only in ->syncfs() path and not in
> > ->sync_fs() path. IOW, errseq_check_and_advance() will take place
> > only if there is a valid "struct file" passed in. That means there
> > is a consumer of the error and that means it should be fine to
> > set the sb->s_wb_err as SEEN/REPORTED, right?
> >
> > If we end up plumbming "struct file" in existing ->sync_fs() routine,
> > then I will call this only if a non NULL struct file has been
> > passed in. Otherwise skip this step.
> >
> > IOW, sync() call will not result in errseq_check_and_advance() instead
> > a syncfs() call will.
> >
>
> It still seems odd and I'm not sure you won't end up with weird corner
> cases due to the flag handling. If you're doing this in the new
> f_op->syncfs, then why bother with sb->s_wb_err at all? You can just do
> this, and avoid the overlayfs sb altogether:
>
> if (errseq_check(&upper_sb->s_wb_err, file->f_sb_err)) {
> /* Upper sb has errors since last time */
> spin_lock(&file->f_lock);
> errseq_check_and_advance(&upper_sb->s_wb_err, &file->f_sb_err);
> spin_unlock(&file->f_lock);
> }
>
> That's simpler than trying to propagate the error between two
> errseq_t's. You would need to sample the upper_sb->s_wb_err at
> open time in the overlayfs ->open handler though, to make sure
> you're tracking the right one.

IIUC, you are suggesting that when and overlay file is opened (lower or
upper), always install current upper_sb->s_wb_err in f->f_sb_err.
IOW, overide following VFS operations.

f->f_sb_err = file_sample_sb_err(f);

In ovl_open() and ovl_dir_open() with something like.

f->f_sb_err = errseq_sample(upper_sb->s_wb_err);

And then ->sync_fs() or ->syncfs(), can check for new errors w.r.t upper
sb?

if (errseq_check(&upper_sb->s_wb_err, file->f_sb_err)) {
/* Upper sb has errors since last time */
spin_lock(&file->f_lock);
ret = errseq_check_and_advance(&upper_sb->s_wb_err, &file->f_sb_err);
spin_unlock(&file->f_lock);
}

I guess I can try this. But if we don't update ovl_sb->s_wb_err, then
question remains that how to avoid errseq_check_and_advance() call
in SYSCALL(sycnfs). That will do more bad things in this case.

This will lead back to either creating new f_op->syncfs() where fs
is responsible for writeback error checks (and not vfs). Or plumb
"struct file" in exisitng ->sync_fs() and let filesystems do
error checks (instead of VFS). This will be somewhat similar to your old
proposal here.

https://lore.kernel.org/linux-fsdevel/[email protected]/

So advantage of updating ovl_sb->s_wb_err is that it reduces the
churn needed in ->sync_fs() and moving errseq_check_and_advance()
check out of vfs syncfs().

>
> > >
> > > >
> > > > + ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err);
> > > > out:
> > > > up_read(&sb->s_umount);
> > > > - return ret;
> > > > + return ret ? ret : ret2;
> > > > }
> > > >
> > > > /**
> > > > @@ -1903,6 +1911,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > > if (!cred)
> > > > goto out_err;
> > > >
> > > > + spin_lock_init(&ofs->errseq_lock);
> > > > /* Is there a reason anyone would want not to share whiteouts? */
> > > > ofs->share_whiteout = true;
> > > >
> > > > @@ -1975,7 +1984,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > >
> > > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> > > > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > > > -
> > > > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> > >
> > > This will mark the error on the upper_sb as REPORTED, and that's not
> > > really that's the case if you're just using it set s_wb_err in the
> > > overlay. You might want to use errseq_peek in this situation.
> >
> > For now I am still looking at existing code and not new code. Because
> > I belive that new code does not change existing behavior instead
> > provides additional functionality to allow sampling the error without
> > marking it seen as well as provide helper to not force seeing an
> > unseen error.
> >
> > So current errseq_sample() does not mark error SEEN. And if it is
> > an unseen error, we will get 0 and be forced to see the error next
> > time.
> >
> > One small issue with this is that say upper has unseen error. Now
> > we mount overlay and save that value in sb->s_wb_err (unseen). Say
> > a file is opened on upper and error is now seen on upper. But
> > we still have unseen error cached in overlay and if overlay fd is
> > now opened, f->f_sb_err will be 0 and it will be forced to see
> > err on next syncfs().
> >
> > IOW, despite the fact that overlay fd was opened after upper sb had
> > been marked seen, it still will see error. I think it probably is
> > not a big issue.
> >
>
> Good point. I was thinking about the newer code that may mark it
> OBSERVED when you sample at open time.
>
> Still, I think working with the overlayfs sb->s_wb_err is just adding
> complexity for little benefit. Assuming that writeback errors can only
> happen on the upper layer, you're better off avoiding it.

If I want to avoid ovl_sb->s_wb_err updation, I will have to move
ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
check in individual filesystems. And it will still not be same. Because
currently after ->sync_fs() call, __sync_blockdev() is called and
then we check for writeback errors. That means, I will have to
move __sync_blockdev() also inside ->sync_fs().

Something like.

fs_sync_fs()
{
ret = do_fs_specific_sync_stuff();
ret2 = __sync_blockdev();
ret3 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
if (ret) {
return ret;
else
return ret2 ? ret2 : ret3;
}

Does not look pretty.

Vivek

2020-12-18 16:59:03

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 3/3] overlayfs: Check writeback errors w.r.t upper in ->syncfs()

On Fri, Dec 18, 2020 at 11:28:19AM -0500, Vivek Goyal wrote:
> On Fri, Dec 18, 2020 at 10:02:58AM -0500, Jeff Layton wrote:
> > On Fri, Dec 18, 2020 at 09:44:18AM -0500, Vivek Goyal wrote:
> > > On Thu, Dec 17, 2020 at 03:08:56PM -0500, Jeffrey Layton wrote:
> > > > On Wed, Dec 16, 2020 at 06:31:49PM -0500, Vivek Goyal wrote:
> > > > > Check for writeback error on overlay super block w.r.t "struct file"
> > > > > passed in ->syncfs().
> > > > >
> > > > > As of now real error happens on upper sb. So this patch first propagates
> > > > > error from upper sb to overlay sb and then checks error w.r.t struct
> > > > > file passed in.
> > > > >
> > > > > Jeff, I know you prefer that I should rather file upper file and check
> > > > > error directly on on upper sb w.r.t this real upper file. While I was
> > > > > implementing that I thought what if file is on lower (and has not been
> > > > > copied up yet). In that case shall we not check writeback errors and
> > > > > return back to user space? That does not sound right though because,
> > > > > we are not checking for writeback errors on this file. Rather we
> > > > > are checking for any error on superblock. Upper might have an error
> > > > > and we should report it to user even if file in question is a lower
> > > > > file. And that's why I fell back to this approach. But I am open to
> > > > > change it if there are issues in this method.
> > > > >
> > > > > Signed-off-by: Vivek Goyal <[email protected]>
> > > > > ---
> > > > > fs/overlayfs/ovl_entry.h | 2 ++
> > > > > fs/overlayfs/super.c | 15 ++++++++++++---
> > > > > 2 files changed, 14 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > > > > index 1b5a2094df8e..a08fd719ee7b 100644
> > > > > --- a/fs/overlayfs/ovl_entry.h
> > > > > +++ b/fs/overlayfs/ovl_entry.h
> > > > > @@ -79,6 +79,8 @@ struct ovl_fs {
> > > > > atomic_long_t last_ino;
> > > > > /* Whiteout dentry cache */
> > > > > struct dentry *whiteout;
> > > > > + /* Protects multiple sb->s_wb_err update from upper_sb . */
> > > > > + spinlock_t errseq_lock;
> > > > > };
> > > > >
> > > > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > > > index b4d92e6fa5ce..e7bc4492205e 100644
> > > > > --- a/fs/overlayfs/super.c
> > > > > +++ b/fs/overlayfs/super.c
> > > > > @@ -291,7 +291,7 @@ int ovl_syncfs(struct file *file)
> > > > > struct super_block *sb = file->f_path.dentry->d_sb;
> > > > > struct ovl_fs *ofs = sb->s_fs_info;
> > > > > struct super_block *upper_sb;
> > > > > - int ret;
> > > > > + int ret, ret2;
> > > > >
> > > > > ret = 0;
> > > > > down_read(&sb->s_umount);
> > > > > @@ -310,10 +310,18 @@ int ovl_syncfs(struct file *file)
> > > > > ret = sync_filesystem(upper_sb);
> > > > > up_read(&upper_sb->s_umount);
> > > > >
> > > > > + /* Update overlay sb->s_wb_err */
> > > > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > > > > + /* Upper sb has errors since last time */
> > > > > + spin_lock(&ofs->errseq_lock);
> > > > > + errseq_check_and_advance(&upper_sb->s_wb_err, &sb->s_wb_err);
> > > > > + spin_unlock(&ofs->errseq_lock);
> > > > > + }
> > > >
> > > > So, the problem here is that the resulting value in sb->s_wb_err is
> > > > going to end up with the REPORTED flag set (using the naming in my
> > > > latest set). So, a later opener of a file on sb->s_wb_err won't see it.
> > > >
> > > > For instance, suppose you call sync() on the box and does the above
> > > > check and advance. Then, you open the file and call syncfs() and get
> > > > back no error because REPORTED flag was set when you opened. That error
> > > > will then be lost.
> > >
> > > Hi Jeff,
> > >
> > > In this patch, I am doing this only in ->syncfs() path and not in
> > > ->sync_fs() path. IOW, errseq_check_and_advance() will take place
> > > only if there is a valid "struct file" passed in. That means there
> > > is a consumer of the error and that means it should be fine to
> > > set the sb->s_wb_err as SEEN/REPORTED, right?
> > >
> > > If we end up plumbming "struct file" in existing ->sync_fs() routine,
> > > then I will call this only if a non NULL struct file has been
> > > passed in. Otherwise skip this step.
> > >
> > > IOW, sync() call will not result in errseq_check_and_advance() instead
> > > a syncfs() call will.
> > >
> >
> > It still seems odd and I'm not sure you won't end up with weird corner
> > cases due to the flag handling. If you're doing this in the new
> > f_op->syncfs, then why bother with sb->s_wb_err at all? You can just do
> > this, and avoid the overlayfs sb altogether:
> >
> > if (errseq_check(&upper_sb->s_wb_err, file->f_sb_err)) {
> > /* Upper sb has errors since last time */
> > spin_lock(&file->f_lock);
> > errseq_check_and_advance(&upper_sb->s_wb_err, &file->f_sb_err);
> > spin_unlock(&file->f_lock);
> > }
> >
> > That's simpler than trying to propagate the error between two
> > errseq_t's. You would need to sample the upper_sb->s_wb_err at
> > open time in the overlayfs ->open handler though, to make sure
> > you're tracking the right one.
>
> IIUC, you are suggesting that when and overlay file is opened (lower or
> upper), always install current upper_sb->s_wb_err in f->f_sb_err.
> IOW, overide following VFS operations.
>
> f->f_sb_err = file_sample_sb_err(f);
>
> In ovl_open() and ovl_dir_open() with something like.
>
> f->f_sb_err = errseq_sample(upper_sb->s_wb_err);
>
> And then ->sync_fs() or ->syncfs(), can check for new errors w.r.t upper
> sb?
>
> if (errseq_check(&upper_sb->s_wb_err, file->f_sb_err)) {
> /* Upper sb has errors since last time */
> spin_lock(&file->f_lock);
> ret = errseq_check_and_advance(&upper_sb->s_wb_err, &file->f_sb_err);
> spin_unlock(&file->f_lock);
> }
>
> I guess I can try this. But if we don't update ovl_sb->s_wb_err, then
> question remains that how to avoid errseq_check_and_advance() call
> in SYSCALL(sycnfs). That will do more bad things in this case.
>
> This will lead back to either creating new f_op->syncfs() where fs
> is responsible for writeback error checks (and not vfs). Or plumb
> "struct file" in exisitng ->sync_fs() and let filesystems do
> error checks (instead of VFS). This will be somewhat similar to your old
> proposal here.
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/
>
> So advantage of updating ovl_sb->s_wb_err is that it reduces the
> churn needed in ->sync_fs() and moving errseq_check_and_advance()
> check out of vfs syncfs().
>

Correct.

The patch we're discussing here _does_ add a f_op->syncfs, which is why
I was suggesting to do it that way.

> >
> > > >
> > > > >
> > > > > + ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err);
> > > > > out:
> > > > > up_read(&sb->s_umount);
> > > > > - return ret;
> > > > > + return ret ? ret : ret2;
> > > > > }
> > > > >
> > > > > /**
> > > > > @@ -1903,6 +1911,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > > > if (!cred)
> > > > > goto out_err;
> > > > >
> > > > > + spin_lock_init(&ofs->errseq_lock);
> > > > > /* Is there a reason anyone would want not to share whiteouts? */
> > > > > ofs->share_whiteout = true;
> > > > >
> > > > > @@ -1975,7 +1984,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > > >
> > > > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> > > > > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > > > > -
> > > > > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> > > >
> > > > This will mark the error on the upper_sb as REPORTED, and that's not
> > > > really that's the case if you're just using it set s_wb_err in the
> > > > overlay. You might want to use errseq_peek in this situation.
> > >
> > > For now I am still looking at existing code and not new code. Because
> > > I belive that new code does not change existing behavior instead
> > > provides additional functionality to allow sampling the error without
> > > marking it seen as well as provide helper to not force seeing an
> > > unseen error.
> > >
> > > So current errseq_sample() does not mark error SEEN. And if it is
> > > an unseen error, we will get 0 and be forced to see the error next
> > > time.
> > >
> > > One small issue with this is that say upper has unseen error. Now
> > > we mount overlay and save that value in sb->s_wb_err (unseen). Say
> > > a file is opened on upper and error is now seen on upper. But
> > > we still have unseen error cached in overlay and if overlay fd is
> > > now opened, f->f_sb_err will be 0 and it will be forced to see
> > > err on next syncfs().
> > >
> > > IOW, despite the fact that overlay fd was opened after upper sb had
> > > been marked seen, it still will see error. I think it probably is
> > > not a big issue.
> > >
> >
> > Good point. I was thinking about the newer code that may mark it
> > OBSERVED when you sample at open time.
> >
> > Still, I think working with the overlayfs sb->s_wb_err is just adding
> > complexity for little benefit. Assuming that writeback errors can only
> > happen on the upper layer, you're better off avoiding it.
>
> If I want to avoid ovl_sb->s_wb_err updation, I will have to move
> ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> check in individual filesystems. And it will still not be same. Because
> currently after ->sync_fs() call, __sync_blockdev() is called and
> then we check for writeback errors. That means, I will have to
> move __sync_blockdev() also inside ->sync_fs().
>
> Something like.
>
> fs_sync_fs()
> {
> ret = do_fs_specific_sync_stuff();
> ret2 = __sync_blockdev();
> ret3 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> if (ret) {
> return ret;
> else
> return ret2 ? ret2 : ret3;
> }
>
> Does not look pretty.
>


If adding a new f_op approach isn't acceptable, then this is quite a bit
more difficult, and you'll need to plumb the error through from
->sync_fs to the syncfs syscall wrapper somehow.

That's the main reason I'm advocating for a new f_op. It's a lot more
straightforward, and I think it'll be less error-prone over the long
haul.

-- Jeff

2020-12-18 20:56:00

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 3/3] overlayfs: Check writeback errors w.r.t upper in ->syncfs()

On Fri, Dec 18 2020, Jeffrey Layton wrote:
>
> The patch we're discussing here _does_ add a f_op->syncfs, which is why
> I was suggesting to do it that way.

I haven't thought through the issues to decide what I think of adding a
new op, but I already know what I think of adding ->syncfs. Don't Do
It. The name is much too easily confused with ->sync_fs.

If you call it ->sync_fs_return_error() it would be MUCH better.

And having said that, the solution becomes obvious. Add a new flag,
either as another bit in 'int wait', or as a new bool.
The new flag would be "return_error" - or whatever is appropriate.

NeilBrown


Attachments:
signature.asc (869.00 B)