2012-06-20 10:35:14

by Alexander Block

[permalink] [raw]
Subject: [PATCH] Allow cross subvolume reflinks (2nd attempt)

Hello,

This is the second attempt to bring in cross subvolume reflinks into btrfs.
The first attempt was NAKed due to missing vfs mount checks and a clear
description of what btrfs subvolumes are and probably also why cross
subvolume reflinks are ok in the case of btrfs. This version of the patch
comes from David and is in SUSE kernels since a long time, so it is tested
and working. The patch also does proper vfs mount checks, so cross mount
point reflinks are not possible with this patch. It only allows cross
reflinks between two subvolumes which are in the same mount point.

As requested in the older discussion, I'll try to describe what subvolumes
in btrfs are. From the users point of view, a subvolume is just a directory
in the mounted fs with its own inode space. This means, that two inodes
with the same number from different subvolumes are different files.
Parallel to the sub directory form, a subvolume may be mounted in its own
vfs mount point (I'm however not sure if this is already possible or just
planned). Internally, a btrfs subvolume is a separate btree, containing
all the metadata needed to store the fs of the subvolume.
Additionally to the subvolumes, btrfs has snapshots, which are basically
the same as subvolumes, with the difference that they share metadata and
data with the snapshot origin/parent. Everything that applies to
subvolumes also applies to snapshots.

The clone ioctl requires the destination file to be already existing and
opened for writing. This means that we do not touch any vfs semantics
regarding creation/opening of files. The ioctl only copies the extents
information from the source to the destination, doing proper access
checks before. In my opinion, we can compare the clone ioctl to the
sendfile syscall, with the difference that we don't copy data but the
information where the data lies.

Link to the old discussion:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/9864
Only PATCH 2/2 is of interrest now. The discussion also contains some
arguments why cross subvolume reflinks are a good thing, but I think
this should not be the matter of the new discussion.

Alex.

David Sterba (1):
btrfs: allow cross-subvolume file clone

fs/btrfs/ioctl.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

--
1.7.10


2012-06-20 10:35:35

by Alexander Block

[permalink] [raw]
Subject: [PATCH] btrfs: allow cross-subvolume file clone

From: David Sterba <[email protected]>

Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.

Signed-off-by: David Sterba <[email protected]>
---
fs/btrfs/ioctl.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 00fd8b5..9e89255 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2352,6 +2352,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
goto out_drop_write;
}

+ ret = -EXDEV;
+ if (src_file->f_path.mnt != file->f_path.mnt)
+ goto out_fput;
+
src = src_file->f_dentry->d_inode;

ret = -EINVAL;
@@ -2372,7 +2376,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
goto out_fput;

ret = -EXDEV;
- if (src->i_sb != inode->i_sb || BTRFS_I(src)->root != root)
+ if (src->i_sb != inode->i_sb)
goto out_fput;

ret = -ENOMEM;
@@ -2446,13 +2450,14 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
* note the key will change type as we walk through the
* tree.
*/
- ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+ ret = btrfs_search_slot(NULL, BTRFS_I(src)->root, &key, path,
+ 0, 0);
if (ret < 0)
goto out;

nritems = btrfs_header_nritems(path->nodes[0]);
if (path->slots[0] >= nritems) {
- ret = btrfs_next_leaf(root, path);
+ ret = btrfs_next_leaf(BTRFS_I(src)->root, path);
if (ret < 0)
goto out;
if (ret > 0)
--
1.7.10

2012-06-20 17:24:13

by Goffredo Baroncelli

[permalink] [raw]
Subject: Re: [PATCH] Allow cross subvolume reflinks (2nd attempt)

Hi Alexander,

On 06/20/2012 12:35 PM, Alexander Block wrote:
> The patch also does proper vfs mount checks, so cross mount
> point reflinks are not possible with this patch. It only allows cross
> reflinks between two subvolumes which are in the same mount point.

Thanks for working on that. What happens if two subvolumes of the same
filesystem are mounted on two different places ?

I usually do:
mount -o subvol=__active /dev/sdX /
mount -o subvol=. /dev/sdX /var/btrfs

So two different subvolumes of the same filesystem (the one on /dev/sdX)
are mounted on two different places.

Is it possible to do
cp --reflink /tmp/foo /var/btrfs/tmp/foo2

Thanks
G.Baroncelli

2012-06-20 17:27:17

by Alexander Block

[permalink] [raw]
Subject: Re: [PATCH] Allow cross subvolume reflinks (2nd attempt)

On Wed, Jun 20, 2012 at 7:18 PM, Goffredo Baroncelli <[email protected]> wrote:
> Hi Alexander,
>
> On 06/20/2012 12:35 PM, Alexander Block wrote:
>> The patch also does proper vfs mount checks, so cross mount
>> point reflinks are not possible with this patch. It only allows cross
>> reflinks between two subvolumes which are in the same mount point.
>
> Thanks for working on that. What happens if two subvolumes of the same
> filesystem are mounted on two different places ?
>
> I usually do:
> ?mount -o subvol=__active /dev/sdX /
> ?mount -o subvol=. ? ? ? ?/dev/sdX /var/btrfs
>
> So two different subvolumes of the same filesystem (the one on /dev/sdX)
> are mounted on two different places.
>
> Is it possible to do
> ?cp --reflink /tmp/foo /var/btrfs/tmp/foo2
This will fail with -EXDEV. The patch explicitly forbids reflinks crossing vfs
mount boundaries.
>
> Thanks
> G.Baroncelli
>

2012-06-20 18:07:28

by Calvin Walton

[permalink] [raw]
Subject: Re: [PATCH] Allow cross subvolume reflinks (2nd attempt)

On Wed, 2012-06-20 at 19:27 +0200, Alexander Block wrote:
> On Wed, Jun 20, 2012 at 7:18 PM, Goffredo Baroncelli <[email protected]> wrote:
> > Hi Alexander,
> >
> > On 06/20/2012 12:35 PM, Alexander Block wrote:
> >> The patch also does proper vfs mount checks, so cross mount
> >> point reflinks are not possible with this patch. It only allows cross
> >> reflinks between two subvolumes which are in the same mount point.
> >
> > Thanks for working on that. What happens if two subvolumes of the same
> > filesystem are mounted on two different places ?
> >
> > I usually do:
> > mount -o subvol=__active /dev/sdX /
> > mount -o subvol=. /dev/sdX /var/btrfs
> >
> > So two different subvolumes of the same filesystem (the one on /dev/sdX)
> > are mounted on two different places.
> >
> > Is it possible to do
> > cp --reflink /tmp/foo /var/btrfs/tmp/foo2
> This will fail with -EXDEV. The patch explicitly forbids reflinks crossing vfs
> mount boundaries.

Which means that if I want to do a reflink copy between two separately
mounted subvolumes, I will have to go and do a third mount with a common
subvolume above those two, then do the reflink operation within this
extra mount? This seems slightly impractical.

Is it any extra code to enable doing cross-mount reflinks? Why was the
decision made not to allow them?

Is there some particular opposition to supporting cross-mount operations
on multiple mounts of the same filesystem in general? (I'd love to have
rename() work across bind mounts, for example...)

--
Calvin Walton <[email protected]>

2012-06-20 18:11:29

by Goffredo Baroncelli

[permalink] [raw]
Subject: Re: [PATCH] Allow cross subvolume reflinks (2nd attempt)

On 06/20/2012 08:07 PM, Calvin Walton wrote:
> On Wed, 2012-06-20 at 19:27 +0200, Alexander Block wrote:
>> On Wed, Jun 20, 2012 at 7:18 PM, Goffredo Baroncelli <[email protected]> wrote:
>>> Hi Alexander,
>>>
>>> On 06/20/2012 12:35 PM, Alexander Block wrote:
>>>> The patch also does proper vfs mount checks, so cross mount
>>>> point reflinks are not possible with this patch. It only allows cross
>>>> reflinks between two subvolumes which are in the same mount point.
>>>
>>> Thanks for working on that. What happens if two subvolumes of the same
>>> filesystem are mounted on two different places ?
>>>
>>> I usually do:
>>> mount -o subvol=__active /dev/sdX /
>>> mount -o subvol=. /dev/sdX /var/btrfs
>>>
>>> So two different subvolumes of the same filesystem (the one on /dev/sdX)
>>> are mounted on two different places.
>>>
>>> Is it possible to do
>>> cp --reflink /tmp/foo /var/btrfs/tmp/foo2
>> This will fail with -EXDEV. The patch explicitly forbids reflinks crossing vfs
>> mount boundaries.
>
> Which means that if I want to do a reflink copy between two separately
> mounted subvolumes, I will have to go and do a third mount with a common
> subvolume above those two, then do the reflink operation within this
> extra mount? This seems slightly impractical.
>
> Is it any extra code to enable doing cross-mount reflinks? Why was the
> decision made not to allow them?
>
> Is there some particular opposition to supporting cross-mount operations
> on multiple mounts of the same filesystem in general? (I'd love to have
> rename() work across bind mounts, for example...)

Yes please, could someone explain the reason behind this decision ? May
be there are valid reasons, I am asking only to know which ones ?

2012-06-20 20:05:48

by Alexander Block

[permalink] [raw]
Subject: Re: [PATCH] Allow cross subvolume reflinks (2nd attempt)

On Wed, Jun 20, 2012 at 8:11 PM, Goffredo Baroncelli <[email protected]> wrote:
>
> Yes please, could someone explain the reason behind this decision ? May
> be there are valid reasons, I am asking only to know which ones ?
>
The reason is that at the moment no user visible operations span mount
boundaries
and there were discussion on whether this should be allowed for the
clone ioctl or
not. For the moment, I would already be happy to have cross subvolume reflinks
which do not span mount points, this would already be much more as we have at
the moment.

2012-06-20 20:21:03

by Goffredo Baroncelli

[permalink] [raw]
Subject: Re: [PATCH] Allow cross subvolume reflinks (2nd attempt)

On 06/20/2012 10:05 PM, Alexander Block wrote:
> On Wed, Jun 20, 2012 at 8:11 PM, Goffredo Baroncelli <[email protected]> wrote:
>>
>> Yes please, could someone explain the reason behind this decision ? May
>> be there are valid reasons, I am asking only to know which ones ?
>>
> The reason is that at the moment no user visible operations span mount
> boundaries
> and there were discussion on whether this should be allowed for the
> clone ioctl or
> not.

Yes, I understood the same. But nobody was able to explain why .

For the moment, I would already be happy to have cross subvolume reflinks
> which do not span mount points, this would already be much more as we have at
> the moment.

This is reasonable.

> .
>

2012-06-27 17:22:43

by Marc MERLIN

[permalink] [raw]
Subject: Re: [PATCH] Allow cross subvolume reflinks (2nd attempt)

On Wed, Jun 20, 2012 at 12:35:11PM +0200, Alexander Block wrote:
> Hello,
>
> This is the second attempt to bring in cross subvolume reflinks into btrfs.
> The first attempt was NAKed due to missing vfs mount checks and a clear
> description of what btrfs subvolumes are and probably also why cross
> subvolume reflinks are ok in the case of btrfs. This version of the patch
> comes from David and is in SUSE kernels since a long time, so it is tested
> and working. The patch also does proper vfs mount checks, so cross mount
> point reflinks are not possible with this patch. It only allows cross
> reflinks between two subvolumes which are in the same mount point.

Thank you for bringing this back. This would have saved me a lot of
trouble

I'm not familiar with the code, but here's a big thumbs up from me.

Anyone doing reviews, please consider this :)

Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/