2019-05-30 16:02:01

by Darrick J. Wong

[permalink] [raw]
Subject: NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517: notrun on NFS due to unaligned dedupe in test

Hi everyone,

Murphy Zhou sent a patch to generic/517 in fstests to fix a dedupe
failure he was seeing on NFS:

On Thu, May 30, 2019 at 05:41:47PM +0800, Murphy Zhou wrote:
> NFSv4.2 could pass _require_scratch_dedupe, since the test offset and
> size are aligned, while generic/517 is performing unaligned dedupe.
> NFS does not support unaligned dedupe now, returns EINVAL.
>
> Signed-off-by: Murphy Zhou <[email protected]>
> ---
> tests/generic/517 | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/tests/generic/517 b/tests/generic/517
> index 601bb24e..23665782 100755
> --- a/tests/generic/517
> +++ b/tests/generic/517
> @@ -30,6 +30,7 @@ _cleanup()
> _supported_fs generic
> _supported_os Linux
> _require_scratch_dedupe
> +$FSTYP == "nfs" && _notrun "NFS can't handle unaligned deduplication"

I was surprised to see a dedupe fix for NFS since (at least to my
knowledge) neither of these two network filesystems actually support
server-side deduplication commands, and therefore the
_require_scratch_dedupe should have _notrun the test.

Then I looked at fs/nfs/nfs4file.c:

static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, loff_t count,
unsigned int remap_flags)
{
<local variable declarations>

if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
return -EINVAL;

<check alignment, lock inodes, flush pending writes>

ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);

The NFS client code will accept REMAP_FILE_DEDUP through remap_flags,
which is how dedupe requests are sent to filesystems nowadays. The nfs
client code does not itself compare the file contents, but it does issue
a CLONE command to the NFS server. The end result, AFAICT, is that a
user program can write 'A's to file1, 'B's to file2, issue a dedup
ioctl to the kernel, and have a block of 'B's mapped into file1. That's
broken behavior, according to the dedup ioctl manpage.

Notice how remap_flags is checked but is not included in the
nfs42_proc_clone call? That's how I conclude that the NFS client cannot
possibly be sending the dedup request to the server.

The same goes for fs/cifs/cifsfs.c:

static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
struct file *dst_file, loff_t destoff, loff_t len,
unsigned int remap_flags)
{
<local variable declarations>

if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
return -EINVAL;

<check files, lock inodes, flush pages>

if (target_tcon->ses->server->ops->duplicate_extents)
rc = target_tcon->ses->server->ops->duplicate_extents(xid,
smb_file_src, smb_file_target, off, len, destoff);
else
rc = -EOPNOTSUPP;

Again, remap_flags is checked here but it has no influence over the
->duplicate_extents call.

Next I got to thinking that when I reworked the clone/dedupe code last
year, I didn't include REMAP_FILE_DEDUP support for cifs or nfs, because
as far as I knew, neither protocol supports a verb for deduplication.
The remap_flags checks were modified to allow REMAP_FILE_DEDUP in
commits ce96e888fe48e (NFS) and b073a08016a10 (CIFS) with this
justification (the cifs commit has a similar message):

"Subject: Fix nfs4.2 return -EINVAL when do dedupe operation

"dedupe_file_range operations is combiled into remap_file_range.
" But in nfs42_remap_file_range, it's skiped for dedupe operations.
" Before this patch:
" # dd if=/dev/zero of=nfs/file bs=1M count=1
" # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
" XFS_IOC_FILE_EXTENT_SAME: Invalid argument
" After this patch:
" # dd if=/dev/zero of=nfs/file bs=1M count=1
" # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
" deduped 4096/4096 bytes at offset 65536
" 4 KiB, 1 ops; 0.0046 sec (865.988 KiB/sec and 216.4971 ops/sec)"

This sort of looks like monkeypatching to make an error message go away.
One could argue that this ought to return EOPNOSUPP instead of EINVAL,
and maybe that's what should've happened.

So, uh, do NFS and CIFS both support server-side dedupe now, or are
these patches just plain wrong?

No, they're just wrong, because I can corrupt files like so on NFS:

$ rm -rf urk moo
$ xfs_io -f -c "pwrite -S 0x58 0 31048" urk
wrote 31048/31048 bytes at offset 0
30 KiB, 8 ops; 0.0000 sec (569.417 MiB/sec and 153846.1538 ops/sec)
$ xfs_io -f -c "pwrite -S 0x59 0 31048" moo
wrote 31048/31048 bytes at offset 0
30 KiB, 8 ops; 0.0001 sec (177.303 MiB/sec and 47904.1916 ops/sec)
$ md5sum urk moo
37d3713e5f9c4fe0f8a1f813b27cb284 urk
a5b6f953f27aa17e42450ff4674fa2df moo
$ xfs_io -c "dedupe urk 0 0 4096" moo
deduped 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0012 sec (3.054 MiB/sec and 781.8608 ops/sec)
$ md5sum urk moo
37d3713e5f9c4fe0f8a1f813b27cb284 urk
2c992d70131c489da954f1d96d8c456e moo

(Not sure about cifs, since I don't have a Windows Server handy)

I'm not an expert in CIFS or NFS, so I'm asking: do either support
dedupe or is this a kernel bug?

--D


2019-05-31 10:49:12

by Aurélien Aptel

[permalink] [raw]
Subject: Re: NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517: notrun on NFS due to unaligned dedupe in test

"Darrick J. Wong" <[email protected]> writes:
> (Not sure about cifs, since I don't have a Windows Server handy)
>
> I'm not an expert in CIFS or NFS, so I'm asking: do either support
> dedupe or is this a kernel bug?

AFAIK, the SMB protocol has 2 ioctl to do server side copies:
- FSCTL_SRV_COPYCHUNK [1] generic
- FSCTL_DUPLICATE_EXTENTS_TO_FILE [2], only supported on windows "new" CoW
filesystem ReFS

Cheers,

1:https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/cd0162e4-7650-4293-8a2a-d696923203ef
2:https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/4f08d2f8-bd17-4181-9cec-54c4f6a1b439
--
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3
SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)

2019-05-31 13:29:07

by Tom Talpey

[permalink] [raw]
Subject: RE: NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517: notrun on NFS due to unaligned dedupe in test

> -----Original Message-----
> From: [email protected] <[email protected]> On
> Behalf Of Aurélien Aptel
> Sent: Friday, May 31, 2019 6:49 AM
> To: Darrick J. Wong <[email protected]>; [email protected];
> [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; Murphy Zhou <[email protected]>; linux-
> [email protected]; [email protected]; linux-fsdevel <linux-
> [email protected]>
> Subject: Re: NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517:
> notrun on NFS due to unaligned dedupe in test
>
> "Darrick J. Wong" <[email protected]> writes:
> > (Not sure about cifs, since I don't have a Windows Server handy)
> >
> > I'm not an expert in CIFS or NFS, so I'm asking: do either support
> > dedupe or is this a kernel bug?
>
> AFAIK, the SMB protocol has 2 ioctl to do server side copies:
> - FSCTL_SRV_COPYCHUNK [1] generic
> - FSCTL_DUPLICATE_EXTENTS_TO_FILE [2], only supported on windows "new"
> CoW
> filesystem ReFS

Windows also supports the T10 copy offload, when the backend storage (e.g. a SAN) supports it.

There is no explicit support for dedup in SMB, that is considered a backend storage function and is not surfaced in the protocol. There are, however, some attributes relevant to dedup which are passed through.

Tom.

2019-05-31 15:25:23

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517: notrun on NFS due to unaligned dedupe in test

On Thu, May 30, 2019 at 12:02 PM Darrick J. Wong
<[email protected]> wrote:
>
> Hi everyone,
>
> Murphy Zhou sent a patch to generic/517 in fstests to fix a dedupe
> failure he was seeing on NFS:
>
> On Thu, May 30, 2019 at 05:41:47PM +0800, Murphy Zhou wrote:
> > NFSv4.2 could pass _require_scratch_dedupe, since the test offset and
> > size are aligned, while generic/517 is performing unaligned dedupe.
> > NFS does not support unaligned dedupe now, returns EINVAL.
> >
> > Signed-off-by: Murphy Zhou <[email protected]>
> > ---
> > tests/generic/517 | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/tests/generic/517 b/tests/generic/517
> > index 601bb24e..23665782 100755
> > --- a/tests/generic/517
> > +++ b/tests/generic/517
> > @@ -30,6 +30,7 @@ _cleanup()
> > _supported_fs generic
> > _supported_os Linux
> > _require_scratch_dedupe
> > +$FSTYP == "nfs" && _notrun "NFS can't handle unaligned deduplication"
>
> I was surprised to see a dedupe fix for NFS since (at least to my
> knowledge) neither of these two network filesystems actually support
> server-side deduplication commands, and therefore the
> _require_scratch_dedupe should have _notrun the test.
>
> Then I looked at fs/nfs/nfs4file.c:
>
> static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off,
> struct file *dst_file, loff_t dst_off, loff_t count,
> unsigned int remap_flags)
> {
> <local variable declarations>
>
> if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
> return -EINVAL;
>
> <check alignment, lock inodes, flush pending writes>
>
> ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
>
> The NFS client code will accept REMAP_FILE_DEDUP through remap_flags,
> which is how dedupe requests are sent to filesystems nowadays. The nfs
> client code does not itself compare the file contents, but it does issue
> a CLONE command to the NFS server. The end result, AFAICT, is that a
> user program can write 'A's to file1, 'B's to file2, issue a dedup
> ioctl to the kernel, and have a block of 'B's mapped into file1. That's
> broken behavior, according to the dedup ioctl manpage.
>
> Notice how remap_flags is checked but is not included in the
> nfs42_proc_clone call? That's how I conclude that the NFS client cannot
> possibly be sending the dedup request to the server.
>
> The same goes for fs/cifs/cifsfs.c:
>
> static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
> struct file *dst_file, loff_t destoff, loff_t len,
> unsigned int remap_flags)
> {
> <local variable declarations>
>
> if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
> return -EINVAL;
>
> <check files, lock inodes, flush pages>
>
> if (target_tcon->ses->server->ops->duplicate_extents)
> rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> smb_file_src, smb_file_target, off, len, destoff);
> else
> rc = -EOPNOTSUPP;
>
> Again, remap_flags is checked here but it has no influence over the
> ->duplicate_extents call.
>
> Next I got to thinking that when I reworked the clone/dedupe code last
> year, I didn't include REMAP_FILE_DEDUP support for cifs or nfs, because
> as far as I knew, neither protocol supports a verb for deduplication.
> The remap_flags checks were modified to allow REMAP_FILE_DEDUP in
> commits ce96e888fe48e (NFS) and b073a08016a10 (CIFS) with this
> justification (the cifs commit has a similar message):
>
> "Subject: Fix nfs4.2 return -EINVAL when do dedupe operation
>
> "dedupe_file_range operations is combiled into remap_file_range.
> " But in nfs42_remap_file_range, it's skiped for dedupe operations.
> " Before this patch:
> " # dd if=/dev/zero of=nfs/file bs=1M count=1
> " # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
> " XFS_IOC_FILE_EXTENT_SAME: Invalid argument
> " After this patch:
> " # dd if=/dev/zero of=nfs/file bs=1M count=1
> " # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
> " deduped 4096/4096 bytes at offset 65536
> " 4 KiB, 1 ops; 0.0046 sec (865.988 KiB/sec and 216.4971 ops/sec)"
>
> This sort of looks like monkeypatching to make an error message go away.
> One could argue that this ought to return EOPNOSUPP instead of EINVAL,
> and maybe that's what should've happened.
>
> So, uh, do NFS and CIFS both support server-side dedupe now, or are
> these patches just plain wrong?
>
> No, they're just wrong, because I can corrupt files like so on NFS:
>
> $ rm -rf urk moo
> $ xfs_io -f -c "pwrite -S 0x58 0 31048" urk
> wrote 31048/31048 bytes at offset 0
> 30 KiB, 8 ops; 0.0000 sec (569.417 MiB/sec and 153846.1538 ops/sec)
> $ xfs_io -f -c "pwrite -S 0x59 0 31048" moo
> wrote 31048/31048 bytes at offset 0
> 30 KiB, 8 ops; 0.0001 sec (177.303 MiB/sec and 47904.1916 ops/sec)
> $ md5sum urk moo
> 37d3713e5f9c4fe0f8a1f813b27cb284 urk
> a5b6f953f27aa17e42450ff4674fa2df moo
> $ xfs_io -c "dedupe urk 0 0 4096" moo
> deduped 4096/4096 bytes at offset 0
> 4 KiB, 1 ops; 0.0012 sec (3.054 MiB/sec and 781.8608 ops/sec)
> $ md5sum urk moo
> 37d3713e5f9c4fe0f8a1f813b27cb284 urk
> 2c992d70131c489da954f1d96d8c456e moo
>
> (Not sure about cifs, since I don't have a Windows Server handy)
>
> I'm not an expert in CIFS or NFS, so I'm asking: do either support
> dedupe or is this a kernel bug?

NFS does not support dedupe and only supports cloning (whole) files.

2019-05-31 15:37:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517: notrun on NFS due to unaligned dedupe in test

On Fri, 31 May 2019 at 11:25, Olga Kornievskaia <[email protected]> wrote:
>
> On Thu, May 30, 2019 at 12:02 PM Darrick J. Wong
> <[email protected]> wrote:
> >
> > Hi everyone,
> >
> > Murphy Zhou sent a patch to generic/517 in fstests to fix a dedupe
> > failure he was seeing on NFS:
> >
> > On Thu, May 30, 2019 at 05:41:47PM +0800, Murphy Zhou wrote:
> > > NFSv4.2 could pass _require_scratch_dedupe, since the test offset and
> > > size are aligned, while generic/517 is performing unaligned dedupe.
> > > NFS does not support unaligned dedupe now, returns EINVAL.
> > >
> > > Signed-off-by: Murphy Zhou <[email protected]>
> > > ---
> > > tests/generic/517 | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/tests/generic/517 b/tests/generic/517
> > > index 601bb24e..23665782 100755
> > > --- a/tests/generic/517
> > > +++ b/tests/generic/517
> > > @@ -30,6 +30,7 @@ _cleanup()
> > > _supported_fs generic
> > > _supported_os Linux
> > > _require_scratch_dedupe
> > > +$FSTYP == "nfs" && _notrun "NFS can't handle unaligned deduplication"
> >
> > I was surprised to see a dedupe fix for NFS since (at least to my
> > knowledge) neither of these two network filesystems actually support
> > server-side deduplication commands, and therefore the
> > _require_scratch_dedupe should have _notrun the test.
> >
> > Then I looked at fs/nfs/nfs4file.c:
> >
> > static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off,
> > struct file *dst_file, loff_t dst_off, loff_t count,
> > unsigned int remap_flags)
> > {
> > <local variable declarations>
> >
> > if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
> > return -EINVAL;
> >
> > <check alignment, lock inodes, flush pending writes>
> >
> > ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
> >
> > The NFS client code will accept REMAP_FILE_DEDUP through remap_flags,
> > which is how dedupe requests are sent to filesystems nowadays. The nfs
> > client code does not itself compare the file contents, but it does issue
> > a CLONE command to the NFS server. The end result, AFAICT, is that a
> > user program can write 'A's to file1, 'B's to file2, issue a dedup
> > ioctl to the kernel, and have a block of 'B's mapped into file1. That's
> > broken behavior, according to the dedup ioctl manpage.
> >
> > Notice how remap_flags is checked but is not included in the
> > nfs42_proc_clone call? That's how I conclude that the NFS client cannot
> > possibly be sending the dedup request to the server.
> >
> > The same goes for fs/cifs/cifsfs.c:
> >
> > static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
> > struct file *dst_file, loff_t destoff, loff_t len,
> > unsigned int remap_flags)
> > {
> > <local variable declarations>
> >
> > if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
> > return -EINVAL;
> >
> > <check files, lock inodes, flush pages>
> >
> > if (target_tcon->ses->server->ops->duplicate_extents)
> > rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> > smb_file_src, smb_file_target, off, len, destoff);
> > else
> > rc = -EOPNOTSUPP;
> >
> > Again, remap_flags is checked here but it has no influence over the
> > ->duplicate_extents call.
> >
> > Next I got to thinking that when I reworked the clone/dedupe code last
> > year, I didn't include REMAP_FILE_DEDUP support for cifs or nfs, because
> > as far as I knew, neither protocol supports a verb for deduplication.
> > The remap_flags checks were modified to allow REMAP_FILE_DEDUP in
> > commits ce96e888fe48e (NFS) and b073a08016a10 (CIFS) with this
> > justification (the cifs commit has a similar message):
> >
> > "Subject: Fix nfs4.2 return -EINVAL when do dedupe operation
> >
> > "dedupe_file_range operations is combiled into remap_file_range.
> > " But in nfs42_remap_file_range, it's skiped for dedupe operations.
> > " Before this patch:
> > " # dd if=/dev/zero of=nfs/file bs=1M count=1
> > " # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
> > " XFS_IOC_FILE_EXTENT_SAME: Invalid argument
> > " After this patch:
> > " # dd if=/dev/zero of=nfs/file bs=1M count=1
> > " # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
> > " deduped 4096/4096 bytes at offset 65536
> > " 4 KiB, 1 ops; 0.0046 sec (865.988 KiB/sec and 216.4971 ops/sec)"
> >
> > This sort of looks like monkeypatching to make an error message go away.
> > One could argue that this ought to return EOPNOSUPP instead of EINVAL,
> > and maybe that's what should've happened.
> >
> > So, uh, do NFS and CIFS both support server-side dedupe now, or are
> > these patches just plain wrong?
> >
> > No, they're just wrong, because I can corrupt files like so on NFS:
> >
> > $ rm -rf urk moo
> > $ xfs_io -f -c "pwrite -S 0x58 0 31048" urk
> > wrote 31048/31048 bytes at offset 0
> > 30 KiB, 8 ops; 0.0000 sec (569.417 MiB/sec and 153846.1538 ops/sec)
> > $ xfs_io -f -c "pwrite -S 0x59 0 31048" moo
> > wrote 31048/31048 bytes at offset 0
> > 30 KiB, 8 ops; 0.0001 sec (177.303 MiB/sec and 47904.1916 ops/sec)
> > $ md5sum urk moo
> > 37d3713e5f9c4fe0f8a1f813b27cb284 urk
> > a5b6f953f27aa17e42450ff4674fa2df moo
> > $ xfs_io -c "dedupe urk 0 0 4096" moo
> > deduped 4096/4096 bytes at offset 0
> > 4 KiB, 1 ops; 0.0012 sec (3.054 MiB/sec and 781.8608 ops/sec)
> > $ md5sum urk moo
> > 37d3713e5f9c4fe0f8a1f813b27cb284 urk
> > 2c992d70131c489da954f1d96d8c456e moo
> >
> > (Not sure about cifs, since I don't have a Windows Server handy)
> >
> > I'm not an expert in CIFS or NFS, so I'm asking: do either support
> > dedupe or is this a kernel bug?
>
> NFS does not support dedupe and only supports cloning (whole) files.

That is not quite true. It does support range based cloning, and can
even support cloning parts of a file onto itself (as long as the
source and target ranges do not overlap). However it does not support
the kind of conditional cloning that I understand from Darrick is
needed for dedup.

Cheers
Trond