From: Chengguang Xu <[email protected]>
Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
on upper_sb to synchronize whole dirty inodes in upper filesystem
regardless of the overlay ownership of the inode. In the use case of
container, when multiple containers using the same underlying upper
filesystem, it has some shortcomings as below.
(1) Performance
Synchronization is probably heavy because it actually syncs unnecessary
inodes for target overlayfs.
(2) Interference
Unplanned synchronization will probably impact IO performance of
unrelated container processes on the other overlayfs.
This series try to implement containerized syncfs for overlayfs so that
only sync target dirty upper inodes which are belong to specific overlayfs
instance. By doing this, it is able to reduce cost of synchronization and
will not seriously impact IO performance of unrelated processes.
v1->v2:
- Mark overlayfs' inode dirty itself instead of adding notification
mechanism to vfs inode.
v2->v3:
- Introduce overlayfs' extra syncfs wait list to wait target upper inodes
in ->sync_fs.
v3->v4:
- Using wait_sb_inodes() to wait syncing upper inodes.
- Mark overlay inode dirty only when having upper inode and VM_SHARED
flag in ovl_mmap().
- Check upper i_state after checking upper mmap state
in ovl_write_inode.
v4->v5:
- Add underlying inode dirtiness check after mnt_drop_write().
- Handle both wait/no-wait mode of syncfs(2) in overlayfs' ->sync_fs().
v5->v6:
- Rebase to latest overlayfs-next tree.
- Mark oerlay inode dirty when it has upper instead of marking dirty on
modification.
- Trigger dirty page writeback in overlayfs' ->write_inode().
- Mark overlay inode 'DONTCACHE' flag.
- Delete overlayfs' ->writepages() and ->evict_inode() operations.
Chengguang Xu (7):
ovl: setup overlayfs' private bdi
ovl: mark overlayfs inode dirty when it has upper
ovl: implement overlayfs' own ->write_inode operation
ovl: set 'DONTCACHE' flag for overlayfs inode
fs: export wait_sb_inodes()
ovl: introduce ovl_sync_upper_blockdev()
ovl: implement containerized syncfs for overlayfs
fs/fs-writeback.c | 3 ++-
fs/overlayfs/inode.c | 5 +++-
fs/overlayfs/super.c | 49 ++++++++++++++++++++++++++++++++-------
fs/overlayfs/util.c | 1 +
include/linux/writeback.h | 1 +
5 files changed, 48 insertions(+), 11 deletions(-)
--
2.27.0
From: Chengguang Xu <[email protected]>
In order to wait syncing upper inodes we need to
call wait_sb_inodes() in overlayfs' ->sync_fs.
Signed-off-by: Chengguang Xu <[email protected]>
---
fs/fs-writeback.c | 3 ++-
include/linux/writeback.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 81ec192ce067..0438c911241e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2505,7 +2505,7 @@ EXPORT_SYMBOL(__mark_inode_dirty);
* completed by the time we have gained the lock and waited for all IO that is
* in progress regardless of the order callers are granted the lock.
*/
-static void wait_sb_inodes(struct super_block *sb)
+void wait_sb_inodes(struct super_block *sb)
{
LIST_HEAD(sync_list);
@@ -2589,6 +2589,7 @@ static void wait_sb_inodes(struct super_block *sb)
rcu_read_unlock();
mutex_unlock(&sb->s_sync_lock);
}
+EXPORT_SYMBOL(wait_sb_inodes);
static void __writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr,
enum wb_reason reason, bool skip_if_busy)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index d1f65adf6a26..d7aacd0434cf 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -198,6 +198,7 @@ void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi,
enum wb_reason reason);
void inode_wait_for_writeback(struct inode *inode);
void inode_io_list_del(struct inode *inode);
+void wait_sb_inodes(struct super_block *sb);
/* writeback.h requires fs.h; it, too, is not included from here. */
static inline void wait_on_inode(struct inode *inode)
--
2.27.0
On Mon 22-11-21 11:00:36, Chengguang Xu wrote:
> From: Chengguang Xu <[email protected]>
>
> In order to wait syncing upper inodes we need to
> call wait_sb_inodes() in overlayfs' ->sync_fs.
>
> Signed-off-by: Chengguang Xu <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> fs/fs-writeback.c | 3 ++-
> include/linux/writeback.h | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 81ec192ce067..0438c911241e 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2505,7 +2505,7 @@ EXPORT_SYMBOL(__mark_inode_dirty);
> * completed by the time we have gained the lock and waited for all IO that is
> * in progress regardless of the order callers are granted the lock.
> */
> -static void wait_sb_inodes(struct super_block *sb)
> +void wait_sb_inodes(struct super_block *sb)
> {
> LIST_HEAD(sync_list);
>
> @@ -2589,6 +2589,7 @@ static void wait_sb_inodes(struct super_block *sb)
> rcu_read_unlock();
> mutex_unlock(&sb->s_sync_lock);
> }
> +EXPORT_SYMBOL(wait_sb_inodes);
>
> static void __writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr,
> enum wb_reason reason, bool skip_if_busy)
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index d1f65adf6a26..d7aacd0434cf 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -198,6 +198,7 @@ void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi,
> enum wb_reason reason);
> void inode_wait_for_writeback(struct inode *inode);
> void inode_io_list_del(struct inode *inode);
> +void wait_sb_inodes(struct super_block *sb);
>
> /* writeback.h requires fs.h; it, too, is not included from here. */
> static inline void wait_on_inode(struct inode *inode)
> --
> 2.27.0
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
---- 在 星期一, 2021-11-22 11:00:31 Chengguang Xu <[email protected]> 撰写 ----
> From: Chengguang Xu <[email protected]>
>
> Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
> on upper_sb to synchronize whole dirty inodes in upper filesystem
> regardless of the overlay ownership of the inode. In the use case of
> container, when multiple containers using the same underlying upper
> filesystem, it has some shortcomings as below.
>
> (1) Performance
> Synchronization is probably heavy because it actually syncs unnecessary
> inodes for target overlayfs.
>
> (2) Interference
> Unplanned synchronization will probably impact IO performance of
> unrelated container processes on the other overlayfs.
>
> This series try to implement containerized syncfs for overlayfs so that
> only sync target dirty upper inodes which are belong to specific overlayfs
> instance. By doing this, it is able to reduce cost of synchronization and
> will not seriously impact IO performance of unrelated processes.
>
> v1->v2:
> - Mark overlayfs' inode dirty itself instead of adding notification
> mechanism to vfs inode.
>
> v2->v3:
> - Introduce overlayfs' extra syncfs wait list to wait target upper inodes
> in ->sync_fs.
>
> v3->v4:
> - Using wait_sb_inodes() to wait syncing upper inodes.
> - Mark overlay inode dirty only when having upper inode and VM_SHARED
> flag in ovl_mmap().
> - Check upper i_state after checking upper mmap state
> in ovl_write_inode.
>
> v4->v5:
> - Add underlying inode dirtiness check after mnt_drop_write().
> - Handle both wait/no-wait mode of syncfs(2) in overlayfs' ->sync_fs().
>
> v5->v6:
> - Rebase to latest overlayfs-next tree.
> - Mark oerlay inode dirty when it has upper instead of marking dirty on
> modification.
> - Trigger dirty page writeback in overlayfs' ->write_inode().
> - Mark overlay inode 'DONTCACHE' flag.
> - Delete overlayfs' ->writepages() and ->evict_inode() operations.
Hi Miklos,
Have you got time to have a look at this V6 series? I think this version has already fixed
the issues in previous feedbacks of you guys and passed fstests (generic/overlay cases).
I did some stress long time tests (tar & syncfs & diff on w/wo copy-up) and found no obvious problem.
For syncfs time with 1M clean upper inodes, there was extra 1.3s wasted on waiting scheduling.
I guess this 1.3s will not bring significant impact to container instance in most cases, I also
agree with Jack that we can start with this approach and do some improvements afterwards if there is
complain from any real users.
Thanks,
Chengguang
>
> Chengguang Xu (7):
> ovl: setup overlayfs' private bdi
> ovl: mark overlayfs inode dirty when it has upper
> ovl: implement overlayfs' own ->write_inode operation
> ovl: set 'DONTCACHE' flag for overlayfs inode
> fs: export wait_sb_inodes()
> ovl: introduce ovl_sync_upper_blockdev()
> ovl: implement containerized syncfs for overlayfs
>
> fs/fs-writeback.c | 3 ++-
> fs/overlayfs/inode.c | 5 +++-
> fs/overlayfs/super.c | 49 ++++++++++++++++++++++++++++++++-------
> fs/overlayfs/util.c | 1 +
> include/linux/writeback.h | 1 +
> 5 files changed, 48 insertions(+), 11 deletions(-)
>
> --
> 2.27.0
>
>