2022-03-22 01:39:52

by David Sterba

[permalink] [raw]
Subject: [GIT PULL] Btrfs updates for 5.18

Hi,

this update contains feature updates, performance improvements,
preparatory and core work and some related VFS updates. Please pull,
thanks.

Features:

- encoded read/write ioctls, allows user space to read or write raw data
directly to extents (now compressed, encrypted in the future), will be
used by send/receive v2 where it saves processing time

- zoned mode now works with metadata DUP (the mkfs.btrfs default)

- error message header updates:
- print error state: transaction abort, other error, log tree errors
- print transient filesystem state: remount, device replace, ignored
checksum verifications

- tree-checker: verify the transaction id of the to-be-written dirty
extent buffer

Performance improvements:

- fsync speedups
- directory logging speedups (up to -90% run time)
- avoid logging all directory changes during renames (up to -60% run
time)
- avoid inode logging during rename and link when possible (up to -60%
run time)
- prepare extents to be logged before locking a log tree path
(throughput +7%)
- stop copying old file extents when doing a full fsync ()
- improved logging of old extents after truncate

Core, fixes:

- improved stale device identification by dev_t and not just path (for
devices that are behind other layers like device mapper)

- continued extent tree v2 preparatory work
- disable features that won't work yet
- add wrappers and abstractions for new tree roots

- improved error handling

- add super block write annotations around background block group
reclaim

- fix device scanning messages potentially accessing stale pointer

- cleanups and refactoring

VFS:

- allow reflinks/deduplication from two different mounts of the same
filesystem

- export and add helpers for read/write range verification, for the
encoded ioctls

----------------------------------------------------------------
The following changes since commit 09688c0166e76ce2fb85e86b9d99be8b0084cdf9:

Linux 5.17-rc8 (2022-03-13 13:23:37 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.18-tag

for you to fetch changes up to d3e29967079c522ce1c5cab0e9fab2c280b977eb:

btrfs: zoned: put block group after final usage (2022-03-14 13:13:54 +0100)

----------------------------------------------------------------
Anand Jain (6):
btrfs: simplify fs_devices member access in btrfs_init_dev_replace_tgtdev
btrfs: harden identification of a stale device
btrfs: match stale devices by dev_t
btrfs: add device major-minor info in the struct btrfs_device
btrfs: use dev_t to match device in device_matched
btrfs: cleanup temporary variables when finding rotational device status

David Sterba (1):
btrfs: replace BUILD_BUG_ON by static_assert

Dongliang Mu (1):
btrfs: don't access possibly stale fs_info data in device_list_add

Dāvis Mosāns (1):
btrfs: add lzo workspace buffer length constants

Filipe Manana (28):
btrfs: remove write and wait of struct walk_control
btrfs: don't log unnecessary boundary keys when logging directory
btrfs: put initial index value of a directory in a constant
btrfs: stop copying old dir items when logging a directory
btrfs: stop trying to log subdirectories created in past transactions
btrfs: add helper to delete a dir entry from a log tree
btrfs: pass the dentry to btrfs_log_new_name() instead of the inode
btrfs: avoid logging all directory changes during renames
btrfs: stop doing unnecessary log updates during a rename
btrfs: avoid inode logging during rename and link when possible
btrfs: use single variable to track return value at btrfs_log_inode()
btrfs: remove unnecessary leaf free space checks when pushing items
btrfs: avoid unnecessary COW of leaves when deleting items from a leaf
btrfs: avoid unnecessary computation when deleting items from a leaf
btrfs: remove constraint on number of visited leaves when replacing extents
btrfs: remove useless path release in the fast fsync path
btrfs: prepare extents to be logged before locking a log tree path
btrfs: stop checking for NULL return from btrfs_get_extent()
btrfs: fix lost error return value when reading a data page
btrfs: remove no longer used counter when reading data page
btrfs: assert we have a write lock when removing and replacing extent maps
btrfs: stop copying old file extents when doing a full fsync
btrfs: hold on to less memory when logging checksums during full fsync
btrfs: voluntarily relinquish cpu when doing a full fsync
btrfs: reset last_reflink_trans after fsyncing inode
btrfs: fix unexpected error path when reflinking an inline extent
btrfs: deal with unexpected extent type during reflinking
btrfs: add and use helper for unlinking inode during log replay

Jiapeng Chong (2):
btrfs: zoned: remove redundant initialization of to_add
btrfs: scrub: remove redundant initialization of increment

Johannes Thumshirn (5):
btrfs: zoned: make zone activation multi stripe capable
btrfs: zoned: make zone finishing multi stripe capable
btrfs: zoned: prepare for allowing DUP on zoned
btrfs: zoned: allow DUP on meta-data block groups
btrfs: stop checking for NULL return from btrfs_get_extent_fiemap()

Josef Bacik (27):
btrfs: add definition for EXTENT_TREE_V2
btrfs: disable balance for extent tree v2 for now
btrfs: disable device manipulation ioctl's EXTENT_TREE_V2
btrfs: disable qgroups in extent tree v2
btrfs: disable scrub for extent-tree-v2
btrfs: disable snapshot creation/deletion for extent tree v2
btrfs: disable space cache related mount options for extent tree v2
btrfs: tree-checker: don't fail on empty extent roots for extent tree v2
btrfs: abstract out loading the tree root
btrfs: add code to support the block group root
btrfs: add support for multiple global roots
btrfs: make search_csum_tree return 0 if we get -EFBIG
btrfs: handle csum lookup errors properly on reads
btrfs: check correct bio in finish_compressed_bio_read
btrfs: remove the bio argument from finish_compressed_bio_read
btrfs: track compressed bio errors as blk_status_t
btrfs: do not double complete bio on errors during compressed reads
btrfs: do not try to repair bio that has no mirror set
btrfs: do not clean up repair bio if submit fails
btrfs: pass btrfs_fs_info for deleting snapshots and cleaner
btrfs: pass btrfs_fs_info to btrfs_recover_relocation
btrfs: remove the cross file system checks from remap
fs: allow cross-vfsmount reflink/dedupe
btrfs: remove BUG_ON(ret) in alloc_reserved_tree_block
btrfs: add a alloc_reserved_extent helper
btrfs: remove last_ref from the extent freeing code
btrfs: factor out do_free_extent_accounting helper

Minghao Chi (1):
btrfs: send: remove redundant ret variable in fs_path_copy

Naohiro Aota (1):
btrfs: zoned: mark relocation as writing

Niels Dossche (2):
btrfs: extend locking to all space_info members accesses
btrfs: add lockdep_assert_held to need_preemptive_reclaim

Nikolay Borisov (3):
btrfs: move missing device handling in a dedicate function
btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker
btrfs: zoned: put block group after final usage

Omar Sandoval (10):
fs: export rw_verify_area()
fs: export variant of generic_write_checks without iov_iter
btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio()
btrfs: add ram_bytes and offset to btrfs_ordered_extent
btrfs: support different disk extent size for delalloc
btrfs: clean up cow_file_range_inline()
btrfs: optionally extend i_size in cow_file_range_inline()
btrfs: add definitions and documentation for encoded I/O ioctls
btrfs: add BTRFS_IOC_ENCODED_READ ioctl
btrfs: add BTRFS_IOC_ENCODED_WRITE

Pankaj Raghav (1):
btrfs: zoned: remove redundant assignment in btrfs_check_zoned_mode

Qu Wenruo (4):
btrfs: populate extent_map::generation when reading from disk
btrfs: unify the error handling pattern for read_tree_block()
btrfs: unify the error handling of btrfs_read_buffer()
btrfs: verify the tranisd of the to-be-written dirty extent buffer

Sahil Kang (2):
btrfs: reuse existing pointers from btrfs_ioctl
btrfs: reuse existing inode from btrfs_ioctl

Sidong Yang (2):
btrfs: qgroup: remove duplicated check in adding qgroup relations
btrfs: qgroup: remove outdated TODO comments

Sweet Tea Dorminy (1):
btrfs: add filesystems state details to error messages

fs/btrfs/backref.c | 7 +-
fs/btrfs/block-group.c | 36 +-
fs/btrfs/block-group.h | 1 +
fs/btrfs/btrfs_inode.h | 42 +-
fs/btrfs/compression.c | 63 +-
fs/btrfs/compression.h | 10 +-
fs/btrfs/ctree.c | 108 ++--
fs/btrfs/ctree.h | 83 ++-
fs/btrfs/delalloc-space.c | 18 +-
fs/btrfs/dev-replace.c | 18 +-
fs/btrfs/disk-io.c | 219 +++++--
fs/btrfs/disk-io.h | 2 +
fs/btrfs/extent-tree.c | 148 +++--
fs/btrfs/extent_io.c | 45 +-
fs/btrfs/extent_map.c | 4 +
fs/btrfs/file-item.c | 76 +--
fs/btrfs/file.c | 79 ++-
fs/btrfs/free-space-tree.c | 2 +
fs/btrfs/inode.c | 1183 +++++++++++++++++++++++++++++--------
fs/btrfs/ioctl.c | 309 ++++++++--
fs/btrfs/lzo.c | 11 +-
fs/btrfs/ordered-data.c | 132 ++---
fs/btrfs/ordered-data.h | 25 +-
fs/btrfs/print-tree.c | 5 +-
fs/btrfs/qgroup.c | 72 ++-
fs/btrfs/reflink.c | 43 +-
fs/btrfs/relocation.c | 11 +-
fs/btrfs/scrub.c | 2 +-
fs/btrfs/send.c | 11 +-
fs/btrfs/send.h | 2 +-
fs/btrfs/space-info.c | 5 +-
fs/btrfs/super.c | 96 ++-
fs/btrfs/sysfs.c | 15 +-
fs/btrfs/tests/extent-map-tests.c | 2 +
fs/btrfs/transaction.c | 19 +-
fs/btrfs/transaction.h | 2 +-
fs/btrfs/tree-checker.c | 35 +-
fs/btrfs/tree-log.c | 982 ++++++++++++++++++------------
fs/btrfs/tree-log.h | 7 +-
fs/btrfs/volumes.c | 147 ++---
fs/btrfs/volumes.h | 7 +-
fs/btrfs/zoned.c | 167 ++++--
fs/internal.h | 5 -
fs/ioctl.c | 4 -
fs/read_write.c | 34 +-
fs/remap_range.c | 7 +-
include/linux/fs.h | 2 +
include/trace/events/btrfs.h | 1 +
include/uapi/linux/btrfs.h | 133 +++++
include/uapi/linux/btrfs_tree.h | 3 +
50 files changed, 3109 insertions(+), 1331 deletions(-)


2022-03-22 03:02:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

On Mon, Mar 21, 2022 at 2:37 PM David Sterba <[email protected]> wrote:
>
> this update contains feature updates, performance improvements,
> preparatory and core work and some related VFS updates. Please pull,
> thanks.

Hmm. This was marked as spam for me.

Not sure why - the email looks fine, and has proper DKIM and SPF records.

Linus

2022-03-22 22:41:57

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

The pull request you sent on Mon, 21 Mar 2022 22:33:04 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.18-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5191290407668028179f2544a11ae9b57f0bcf07

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2022-03-23 09:20:45

by Josef Bacik

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote:
> On Mon, Mar 21, 2022 at 2:37 PM David Sterba <[email protected]> wrote:
> >
> > - allow reflinks/deduplication from two different mounts of the same
> > filesystem
>
> So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
>
> In particular, I'm not seeing any commentary about different
> filesystems for this.
>
> There are several filesystems that use that ->remap_file_range()
> operation, so these relaxed rules don't just affect btrfs.
>
> Yes, yes, checking for i_sb matching does seem sensible, but I'd
> *really* have liked some sign that people checked with other
> filesystem maintainers and this is ok for all of them, and they didn't
> make assumptions about "always same mount" rather than "always same
> filesystem".
>

> This affects at least cifs, nfs, overlayfs and ocfs2.

I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the
patch. This did surprise nfsd when xfstests started failing, but talking with
Bruce he didn't complain once he understood what was going on. Believe me I
have 0 interest in getting the other maintainers upset with me by sneaking
something by them, I made sure to run it by people first, tho I probably should
have checked with people directly other than Darrick.

>
> Adding fsdevel, and pointing to that
>
> - if (src_file->f_path.mnt != dst_file->f_path.mnt)
> + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)
>
> change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")
>
> And yes, there was already a comment about "Practically, they only
> need to be on the same file system" from before that matches the new
> behavior, but hey, comments have been known to be wrong in the past
> too.
>
> And yes, I'm also aware that do_clone_file_range() already had that
> exact same i_sb check and it's not new, but since ioctl_file_clone()
> cheched for the mount path, I don't think you could actually reach it
> without being on the same mount.
>
> And while discussing these sanity checks: wouldn't it make sense to
> check that *both* the source file and the destination file support
> that remap_file_range() op, and it's the same op?
>
> Yes, yes, it probably always is in practice, but I could imagine some
> type confusion thing. So wouldn't it be nice to also have something
> like
>
> if (dst_file->f_op != src_file->f_op)
> goto out_drop_write;
>
> in there? I'm thinking "how about dedupe from a directory to a regular
> file" kind of craziness...
>

This more fine-grained checking is handled by generic_remap_file_range_prep() to
make sure we don't try to dedup a directory or pipe or some other nonsense.
Thanks,

Josef

2022-03-23 11:43:07

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote:
> On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote:
> > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <[email protected]> wrote:
> > >
> > > - allow reflinks/deduplication from two different mounts of the same
> > > filesystem
> >
> > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
> >
> > In particular, I'm not seeing any commentary about different
> > filesystems for this.
> >
> > There are several filesystems that use that ->remap_file_range()
> > operation, so these relaxed rules don't just affect btrfs.
> >
> > Yes, yes, checking for i_sb matching does seem sensible, but I'd
> > *really* have liked some sign that people checked with other
> > filesystem maintainers and this is ok for all of them, and they didn't
> > make assumptions about "always same mount" rather than "always same
> > filesystem".
> >
>
> > This affects at least cifs, nfs, overlayfs and ocfs2.
>
> I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the
> patch. This did surprise nfsd when xfstests started failing, but talking with
> Bruce he didn't complain once he understood what was going on.

FWIW, I remember talking about this with Bruce and (probably Anna too)
during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was
2018(?) At the time, I think we resolved that nfs42_remap_file_range
was capable of detecting and dealing with unsupported requests, so a
direct comparison of the ->remap_file_range or ->f_op wasn't necessary
for them.

> Believe me I
> have 0 interest in getting the other maintainers upset with me by sneaking
> something by them, I made sure to run it by people first, tho I probably should
> have checked with people directly other than Darrick.

I /am/ a little curious what Steve French has to say w.r.t CIFS.

AFAICT overlayfs passes the request down to the appropriate fs
under-layer, so its correctness mostly depends on the under-layer's
implementation. But I'll let Amir or someone chime in on that. ;)

As for ocfs2, back when I added support for ->remap_file_range to ocfs2,
cross-mount reflink and dedupe worked fine, or at least as well as
anything works on ocfs2.

(XFS has always supported cross-mount remappings.)

> >
> > Adding fsdevel, and pointing to that
> >
> > - if (src_file->f_path.mnt != dst_file->f_path.mnt)
> > + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)
> >
> > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")
> >
> > And yes, there was already a comment about "Practically, they only
> > need to be on the same file system" from before that matches the new
> > behavior, but hey, comments have been known to be wrong in the past
> > too.
> >
> > And yes, I'm also aware that do_clone_file_range() already had that
> > exact same i_sb check and it's not new, but since ioctl_file_clone()
> > cheched for the mount path, I don't think you could actually reach it
> > without being on the same mount.
> >
> > And while discussing these sanity checks: wouldn't it make sense to
> > check that *both* the source file and the destination file support
> > that remap_file_range() op, and it's the same op?
> >
> > Yes, yes, it probably always is in practice, but I could imagine some
> > type confusion thing. So wouldn't it be nice to also have something
> > like
> >
> > if (dst_file->f_op != src_file->f_op)
> > goto out_drop_write;
> >
> > in there? I'm thinking "how about dedupe from a directory to a regular
> > file" kind of craziness...
> >
>
> This more fine-grained checking is handled by generic_remap_file_range_prep() to
> make sure we don't try to dedup a directory or pipe or some other nonsense.

Yes. The VFS only allows remapping between regular files.

--D

> Thanks,
>
> Josef

2022-03-23 18:08:16

by Amir Goldstein

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

On Tue, Mar 22, 2022 at 10:11 PM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, Mar 21, 2022 at 2:37 PM David Sterba <[email protected]> wrote:
> >
> > - allow reflinks/deduplication from two different mounts of the same
> > filesystem
>
> So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
>
> In particular, I'm not seeing any commentary about different
> filesystems for this.
>
> There are several filesystems that use that ->remap_file_range()
> operation, so these relaxed rules don't just affect btrfs.
>
> Yes, yes, checking for i_sb matching does seem sensible, but I'd
> *really* have liked some sign that people checked with other
> filesystem maintainers and this is ok for all of them, and they didn't
> make assumptions about "always same mount" rather than "always same
> filesystem".
>
> This affects at least cifs, nfs, overlayfs and ocfs2.

overlayfs shouldn't have a problem with that change.

IIUC, cifs would also gain from this, because clone is implemented
on server side and even two different sb's could technically do
server side duplicate_extents.
Same goes for nfs v4.2.

There was a lot of discussion on these aspects when cross server
(i.e. cross sb) copy was implemented not so long ago.
Relaxing cross-mnt clone is nothing compared to that.


>
> Adding fsdevel, and pointing to that
>
> - if (src_file->f_path.mnt != dst_file->f_path.mnt)
> + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)
>
> change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")
>
> And yes, there was already a comment about "Practically, they only
> need to be on the same file system" from before that matches the new
> behavior, but hey, comments have been known to be wrong in the past
> too.

As the one who left this comment I can say it is based only on common
sense, similar to the rationale in this recent commit.
If it is any help, overlayfs has been doing cross mnt clones since
913b86e92e1f vfs: allow vfs_clone_file_range() across mount points
I left the comment because I did not need to take responsibility for changing
user behavior at the time, but I do not see any immediate harm from the user
behavior changes now.

>
> And yes, I'm also aware that do_clone_file_range() already had that
> exact same i_sb check and it's not new, but since ioctl_file_clone()
> cheched for the mount path, I don't think you could actually reach it
> without being on the same mount.
>
> And while discussing these sanity checks: wouldn't it make sense to
> check that *both* the source file and the destination file support
> that remap_file_range() op, and it's the same op?
>
> Yes, yes, it probably always is in practice, but I could imagine some
> type confusion thing. So wouldn't it be nice to also have something
> like
>
> if (dst_file->f_op != src_file->f_op)
> goto out_drop_write;
>
> in there? I'm thinking "how about dedupe from a directory to a regular
> file" kind of craziness...

Both S_ISDIR and !S_ISREG cases are already checked for both clone
and dedupe on both files (twice in fact), so at least that is not a concern.

There may be other reasons to worry about, but I can't think of any.

Thanks,
Amir.

2022-03-23 23:43:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

On Tue, Mar 22, 2022 at 1:55 PM Josef Bacik <[email protected]> wrote:
>
> This more fine-grained checking is handled by generic_remap_file_range_prep() to
> make sure we don't try to dedup a directory or pipe or some other nonsense.

Yeah, that does seem to take care of the obvious cases, and requires
that both files be regular files at least.

I'm still not a huge fan of how we use the 'f_op->remap_file_range' of
the source file, without really checking that the destination file is
ok with remap_file_range.

They end up _superficially_ very similar, yes, but I can point to
filesystems that use different f_op's for different files.

And some of those depend on - wait for it - how the filesystem was mounted.

See for example cifs:

if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL)
file->f_op = &cifs_file_direct_nobrl_ops;
else
file->f_op = &cifs_file_direct_ops;

so I'm just thinking "what about doing remap_file_range between two
regular files that act differently - either due to mount options or
other details".

In that cifs example, read_iter and write_iter are different. Yes,
copy/remap_file_range uses the same function pointer, but it still
worries me about copying from a mount to another if there might be
different semantics for IO between them.

I think in this cifs case, the superblock ends up being the same, so
the mnt_cifs_flags end up being the same, and the above is not
actually a real issue. But conceptually I could imagine cases where
that wasn't the case - or even cases like /proc that have
fundamentally different file operations for different files)

Linus

2022-03-25 01:32:42

by Amir Goldstein

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

On Thu, Mar 24, 2022 at 1:35 AM Darrick J. Wong <[email protected]> wrote:
>
> On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote:
> > On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote:
> > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <[email protected]> wrote:
> > > >
> > > > - allow reflinks/deduplication from two different mounts of the same
> > > > filesystem
> > >
> > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
> > >
> > > In particular, I'm not seeing any commentary about different
> > > filesystems for this.
> > >
> > > There are several filesystems that use that ->remap_file_range()
> > > operation, so these relaxed rules don't just affect btrfs.
> > >
> > > Yes, yes, checking for i_sb matching does seem sensible, but I'd
> > > *really* have liked some sign that people checked with other
> > > filesystem maintainers and this is ok for all of them, and they didn't
> > > make assumptions about "always same mount" rather than "always same
> > > filesystem".
> > >
> >
> > > This affects at least cifs, nfs, overlayfs and ocfs2.
> >
> > I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the
> > patch. This did surprise nfsd when xfstests started failing, but talking with
> > Bruce he didn't complain once he understood what was going on.
>
> FWIW, I remember talking about this with Bruce and (probably Anna too)
> during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was
> 2018(?) At the time, I think we resolved that nfs42_remap_file_range
> was capable of detecting and dealing with unsupported requests, so a
> direct comparison of the ->remap_file_range or ->f_op wasn't necessary
> for them.
>
> > Believe me I
> > have 0 interest in getting the other maintainers upset with me by sneaking
> > something by them, I made sure to run it by people first, tho I probably should
> > have checked with people directly other than Darrick.
>
> I /am/ a little curious what Steve French has to say w.r.t CIFS.
>
> AFAICT overlayfs passes the request down to the appropriate fs
> under-layer, so its correctness mostly depends on the under-layer's
> implementation. But I'll let Amir or someone chime in on that. ;)
>

The thing is, overlayfs ALREADY does cross-mnt clone_file_range()
on underlying layers. So if there was a bug with allowing cross mnt
clones on xfs it would have been in the wild for a long time already.

OTOH, overlayfs doesn't support nfs/cifs/ocfs2 as upper fs.

If you mount an overlay with lower and upper layer on the same
xfs/btrfs sb the original mnt of lower path and upper patch is irrelevant.

Overlayfs uses different private mnt per layer anyway, so if the source
file is on lower layer then even if originally the overlay mount was done
with different upper/lower mounts of the same sb, clone via overlayfs
would work.

Allowing cross overlayfs mount clone makes very little difference.

Thanks,
Amir.

2022-03-25 18:00:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Btrfs updates for 5.18

On Mon, Mar 21, 2022 at 2:37 PM David Sterba <[email protected]> wrote:
>
> - allow reflinks/deduplication from two different mounts of the same
> filesystem

So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.

In particular, I'm not seeing any commentary about different
filesystems for this.

There are several filesystems that use that ->remap_file_range()
operation, so these relaxed rules don't just affect btrfs.

Yes, yes, checking for i_sb matching does seem sensible, but I'd
*really* have liked some sign that people checked with other
filesystem maintainers and this is ok for all of them, and they didn't
make assumptions about "always same mount" rather than "always same
filesystem".

This affects at least cifs, nfs, overlayfs and ocfs2.

Adding fsdevel, and pointing to that

- if (src_file->f_path.mnt != dst_file->f_path.mnt)
+ if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)

change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")

And yes, there was already a comment about "Practically, they only
need to be on the same file system" from before that matches the new
behavior, but hey, comments have been known to be wrong in the past
too.

And yes, I'm also aware that do_clone_file_range() already had that
exact same i_sb check and it's not new, but since ioctl_file_clone()
cheched for the mount path, I don't think you could actually reach it
without being on the same mount.

And while discussing these sanity checks: wouldn't it make sense to
check that *both* the source file and the destination file support
that remap_file_range() op, and it's the same op?

Yes, yes, it probably always is in practice, but I could imagine some
type confusion thing. So wouldn't it be nice to also have something
like

if (dst_file->f_op != src_file->f_op)
goto out_drop_write;

in there? I'm thinking "how about dedupe from a directory to a regular
file" kind of craziness...

Linus