2011-12-14 19:22:13

by Ric Wheeler

[permalink] [raw]
Subject: copy offload support in Linux - new system call needed?


Back at LinuxCon Prague, we talked about the new NFS and SCSI commands that let
us offload copy operations to a storage device (like an NFS server or storage
array).

This got new life in the virtual machine world where you might want to clone
bulky guest files or ranges of blocks and was driven through the standards
bodies by vmware, microsoft and some of the major storage vendors. Windows8 has
this functionality fully coded and integrated in the GUI, I assume vmware also
uses it and there are some vendors who announced support at the SNIA SDC conference.

We had an active thread a couple of years back that came out of the reflink work
and, at the time, there seemed to be moderately positive support for adding a
new system call that would fit this use case (Joel Becker's copyfile()).

Can we resurrect this effort? Is copyfile() still a good way to go, or should we
look at other hooks?

Thanks!

Ric





2011-12-15 14:59:08

by Myklebust, Trond

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > On 12/14/2011 02:27 PM, Al Viro wrote:
> > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > >
> > >>We had an active thread a couple of years back that came out of the
> > >>reflink work and, at the time, there seemed to be moderately
> > >>positive support for adding a new system call that would fit this
> > >>use case (Joel Becker's copyfile()).
> > >>
> > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > >>or should we look at other hooks?
> > >copyfile(2) is probably a good way to go, provided that we do _not_
> > >go baroque as it had happened the last time syscall had been discussed.
> > >
> > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > >If it works - fine, if not - caller has to be ready to deal with handling
> > >cross-device case anyway.
> >
> > I think that this approach makes a lot of sense. Most of the
> > devices/targets that support the copy offload, will do it in very
> > reasonable amounts of time.
>
> The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> one operation:
>
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
>
> Perhaps we should ask for separate operations for the two cases. (Or at
> least a "please don't bother if this is going to take 8 hours" flag....)

How would the server know? I suggest we deal with this by adding an
ioctl() to allow the application to poll for progress: I'm assuming now
that we don't expect more than 1 copyfile() system call at a time per
file descriptor...

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-16 08:00:31

by Joel Becker

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> On 12/14/2011 02:27 PM, Al Viro wrote:
> >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> >
> >>We had an active thread a couple of years back that came out of the
> >>reflink work and, at the time, there seemed to be moderately
> >>positive support for adding a new system call that would fit this
> >>use case (Joel Becker's copyfile()).
> >>
> >>Can we resurrect this effort? Is copyfile() still a good way to go,
> >>or should we look at other hooks?
> >copyfile(2) is probably a good way to go, provided that we do _not_
> >go baroque as it had happened the last time syscall had been discussed.
> >
> >IOW, to hell with progress reports, etc. - just a fastpath kind of
> >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> >If it works - fine, if not - caller has to be ready to deal with handling
> >cross-device case anyway.
>
> I think that this approach makes a lot of sense. Most of the
> devices/targets that support the copy offload, will do it in very
> reasonable amounts of time.
>
> Let me see if I can dig up some of the presentations from the NetApp
> guys who presented overviews or the specifications from the IETF and
> T10....

Whee! I've been down the rabbit hole, but I've promised myself
to get the updated patch out soon. I know that Trond et al are probably
wondering what happened to the patch. more soon.

Joel

--

Life's Little Instruction Book #207

"Swing for the fence."

http://www.jlbec.org/
[email protected]

2011-12-14 20:30:17

by Ric Wheeler

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On 12/14/2011 02:59 PM, Jeremy Allison wrote:
> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>> Back at LinuxCon Prague, we talked about the new NFS and SCSI
>> commands that let us offload copy operations to a storage device
>> (like an NFS server or storage array).
>>
>> This got new life in the virtual machine world where you might want
>> to clone bulky guest files or ranges of blocks and was driven
>> through the standards bodies by vmware, microsoft and some of the
>> major storage vendors. Windows8 has this functionality fully coded
>> and integrated in the GUI, I assume vmware also uses it and there
>> are some vendors who announced support at the SNIA SDC conference.
>>
>> We had an active thread a couple of years back that came out of the
>> reflink work and, at the time, there seemed to be moderately
>> positive support for adding a new system call that would fit this
>> use case (Joel Becker's copyfile()).
>>
>> Can we resurrect this effort? Is copyfile() still a good way to go,
>> or should we look at other hooks?
> Windows uses a COPYCHUNK call, which specifies the
> following parameters:
>
> Definition of a copy "chunk":
>
> hyper source_off;
> hyper target_off;
> uint32 length;
>
> and an array of these chunks which is passed
> into their kernel.
>
> This is what we have to implement in Samba.
>
> Jeremy.

This is a public pointer to the draft NFS proposal:

http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt

The T10 site has some click through that I was not too happy about agreeing to.
NetApp (Fred Knight) had some nice presentations that he presented about how
SCSI does this in two different ways...

Ric




2011-12-15 16:53:39

by Myklebust, Trond

[permalink] [raw]
Subject: RE: copy offload support in Linux - new system call needed?

On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote:
> > >
> > > Why not support something like the async-iocb?
> >
> > You could, but that would tie copyfile() to the aio interface which was
> > one of the things that I believe Al was opposed to when we discussed
> > this at LSF/MM-2010.
> >
>
> virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors.

The application is thin provisioning, not the 'cp' command. When
virtualisation vendors do support this, it will mainly be as part of
their image management toolkits, not the hypervisor.

> So let's think about it from end-users perspective:
> Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation.

'Am I done' is easy: copyfile() returns with the number of bytes that
have been copied.

'Is my copyfile() syscall making progress' is the question that needs
answering.

> We can just support full-copy. Partial copies can be returned as failure.

Then you have to check the entire range on error instead of just
resuming the copy from where it stopped.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-15 17:27:57

by Loke, Chetan

[permalink] [raw]
Subject: RE: copy offload support in Linux - new system call needed?

PiA+IHZpcnR1YWxpemF0aW9uIHZlbmRvcnMgd2hvIHN1cHBvcnQgdGhpcyBvZmZsb2FkIGRvIGl0
IGF0IGEgbGF5ZXINCj4gYWJvdmUgdGhlIGd1ZXN0LU9TKEludHJhLUxVTih0bSkgbG9ja2luZyBv
ciB3aGF0ZXZlciBmYW5jeSBsb2NraW5nKS4gU28NCj4gSSB0aGluayAnY29weWZpbGUnIGlzIGdv
aW5nIHRvIGJlIGFwcGVhbGluZyB0byBhcHBsaWNhdGlvbi1kZXZlbG9wZXJzDQo+IG1vcmUgdGhh
biB0aGUgaHlwZXJ2aXNvci12ZW5kb3JzLg0KPiANCj4gVGhlIGFwcGxpY2F0aW9uIGlzIHRoaW4g
cHJvdmlzaW9uaW5nLCBub3QgdGhlICdjcCcgY29tbWFuZC4gV2hlbg0KDQoNCnRoaW4tcHJvdmlz
aW9uaW5nIGlzIG9uZSB1c2UtY2FzZS4gVGhlcmUgYXJlIHF1aXRlIGEgZmV3IHVzZS1jYXNlcyBv
ZiAnY29weWZpbGUnIGRlcGVuZGluZyBvbiB5b3VyIGJ1c2luZXNzLWxvZ2ljIGFuZCB0aGUgdHlw
ZSBvZiBhcHBsaWFuY2UgeW91IHNlbGwuDQoNCg0KPiB2aXJ0dWFsaXNhdGlvbiB2ZW5kb3JzIGRv
IHN1cHBvcnQgdGhpcywgaXQgd2lsbCBtYWlubHkgYmUgYXMgcGFydCBvZg0KPiB0aGVpciBpbWFn
ZSBtYW5hZ2VtZW50IHRvb2xraXRzLCBub3QgdGhlIGh5cGVydmlzb3IuDQo+IA0KDQpUb29sa2l0
cz8gTWF5IG5vdCBiZSB0cnVlLiBUaGUgdG9vbGtpdCBtaWdodCBuZWVkIHRvIHRhbGsgdG8gc29t
ZSBoeXBlcnZpc29yLWNvbXBvbmVudCB0byBlbnN1cmUgTFVOLWxvY2tpbmcgZXRjIG9uIHRoZSB0
YXJnZXQuIFNvIHRoaXMgaXMgbm90IGVudGlyZWx5IGlzb2xhdGVkIGFzIHlvdSBtaWdodCB0aGlu
ay4gVGhlcmUgaXMgc29tZSBpbnRlZ3JhdGlvbi4gQXMgYW4gZXhhbXBsZShqdXN0IHRvIHByb3Zl
IHRoZSBwb2ludCkgLSBIYXZlIHlvdSBldmVyIHNlZW4gYW55b25lIG5vdCB1c2UgdnNwaGVyZS1j
bGllbnQgb24gVk13YXJlIGZvciBjb3B5aW5nIFZNIHRlbXBsYXRlcz8NCg0KDQo+ID4gU28gbGV0
J3MgdGhpbmsgYWJvdXQgaXQgZnJvbSBlbmQtdXNlcnMgcGVyc3BlY3RpdmU6DQo+ID4gV29uJ3Qg
ZXZlcnlvbmUgcmVwbGljYXRlIGNvZGUgdG8gY2hlY2sgLSAnQW0gSSBkb25lJz8gSXQgd2lsbCBq
dXN0DQo+IG1ha2UgYXBwbGljYXRpb24gZm9sa3Mgd3JpdGUgbW9yZSAodWdseSljb2RlLiBCZWNh
dXNlIHlvdSB3b3VsZCB0aGVuDQo+IGhhdmUgdG8gbWFpbnRhaW4gYW5vdGhlciBxdWV1ZS9ldGMg
dG8gY2hlY2sgZm9yIHRoaXMgb3BlcmF0aW9uLg0KPiANCj4gJ0FtIEkgZG9uZScgaXMgZWFzeTog
Y29weWZpbGUoKSByZXR1cm5zIHdpdGggdGhlIG51bWJlciBvZiBieXRlcyB0aGF0DQo+IGhhdmUg
YmVlbiBjb3BpZWQuDQo+IA0KPiAnSXMgbXkgY29weWZpbGUoKSBzeXNjYWxsIG1ha2luZyBwcm9n
cmVzcycgaXMgdGhlIHF1ZXN0aW9uIHRoYXQgbmVlZHMNCj4gYW5zd2VyaW5nLg0KPiANCg0KVW5k
ZXJzdG9vZC4gQnV0IGFzIGEgdXNlciwgd2UgZG9uJ3Qga25vdyB3aGF0ICdhbSBJIGRvbmUnIGlz
IGdvaW5nIHRvIHJlcG9ydC4NCg0KJ2FtIEkgZG9uZScgY2FuIHJldHVybjoNCjEpQUNLW2NvcHkg
ZG9uZV0gLSBzaW1wbGlzdGljIGNhc2UuDQoyKUlOLXByb2dyZXNzLg0KMylOQUNLW2NvcHkgZmFp
bGVkKHdpdGggc3RhdHVzIHZhbHVlcykgb3IgY29weSBwYXJ0aWFsbHkgY29tcGxldGVkXQ0KDQpB
bmQgaWYgeW91IGFyZSB1c2luZyB0aGUgY29weS1WTSB1c2UtY2FzZSB0aGVuIHZlcnkgZmV3IFZN
cyBhcmUgdW5kZXIgNEdCcy4gU28gd2Ugd2lsbCBoaXQgMikgYWJvdmUgbW9yZSBmcmVxdWVudGx5
IHRoYW4gMSkgYW5kIDMpLg0KDQoNCj4gPiBXZSBjYW4ganVzdCBzdXBwb3J0IGZ1bGwtY29weS4g
UGFydGlhbCBjb3BpZXMgY2FuIGJlIHJldHVybmVkIGFzDQo+IGZhaWx1cmUuDQo+IA0KPiBUaGVu
IHlvdSBoYXZlIHRvIGNoZWNrIHRoZSBlbnRpcmUgcmFuZ2Ugb24gZXJyb3IgaW5zdGVhZCBvZiBq
dXN0DQo+IHJlc3VtaW5nIHRoZSBjb3B5IGZyb20gd2hlcmUgaXQgc3RvcHBlZC4NCj4gDQoNCldo
eSBub3QgcmVzdGFydD8gV2hhdCBpZiB0aGUgTFVOIHdhcyBpbXBsZW1lbnRpbmcgdGhpbi1wcm92
aXNpb25pbmcgYW5kIG5vdyBpdCByYW4gb3V0LW9mLXNwYWNlIGFmdGVyIHBhcnRpYWxseSBjb3B5
aW5nIHlvdXIgZGF0YS4NClNvIHdoeSBub3QgcmVzdGFydCB0aGUgY29weT8gSWYgdGhlIHRhcmdl
dCBkb2Vzbid0IHN1cHBvcnQgYXV0by1leHRlbmQsIHNvbWVvbmUoc3RvcmFnZS1hZG1pbiBldGMp
IHdvdWxkIGhhdmUgdG8gc3RlcC1pbiBhbmQgbWFuYWdlIHRoYXQgTFVOLg0KWW91IG1pZ2h0IGFz
LXdlbGwgcmVzdGFydCB0aGUgY29weSBpbiB0aGlzIGNhc2UuDQoNCg0KDQpDaGV0YW4gTG9rZQ0K
DQoNCg==

2011-12-19 12:38:25

by Hannes Reinecke

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On 12/14/2011 09:30 PM, Ric Wheeler wrote:
> On 12/14/2011 02:59 PM, Jeremy Allison wrote:
>> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>>> Back at LinuxCon Prague, we talked about the new NFS and SCSI
>>> commands that let us offload copy operations to a storage device
>>> (like an NFS server or storage array).
>>>
>>> This got new life in the virtual machine world where you might want
>>> to clone bulky guest files or ranges of blocks and was driven
>>> through the standards bodies by vmware, microsoft and some of the
>>> major storage vendors. Windows8 has this functionality fully coded
>>> and integrated in the GUI, I assume vmware also uses it and there
>>> are some vendors who announced support at the SNIA SDC conference.
>>>
>>> We had an active thread a couple of years back that came out of the
>>> reflink work and, at the time, there seemed to be moderately
>>> positive support for adding a new system call that would fit this
>>> use case (Joel Becker's copyfile()).
>>>
>>> Can we resurrect this effort? Is copyfile() still a good way to go,
>>> or should we look at other hooks?
>> Windows uses a COPYCHUNK call, which specifies the
>> following parameters:
>>
>> Definition of a copy "chunk":
>>
>> hyper source_off;
>> hyper target_off;
>> uint32 length;
>>
>> and an array of these chunks which is passed
>> into their kernel.
>>
>> This is what we have to implement in Samba.
>>
>> Jeremy.
>
> This is a public pointer to the draft NFS proposal:
>
> http://tools.ietf.org/id/draft-lentini-nfsv4-server-side-copy-06.txt
>
> The T10 site has some click through that I was not too happy about
> agreeing to. NetApp (Fred Knight) had some nice presentations that
> he presented about how SCSI does this in two different ways...
>

Yes, the 'XCOPY Lite' mechanism.

With that the whole copy process is broken into two steps:
- Create a reference to the requested blocks
- Use that reference to request the operation

The neat thing with that is that there might be some delay between
those steps, effectively creating a snapshot in time.

An additional bonus is that one doesn't have to create those
over-complicated source and target descriptors, but rather have the
array create one for you.

So all-in-all nice and easy to use. With the slight disadvantage
that no-one implements it. Yet.

Hence we might be wanting to use the old-style EXTENDED COPY after
all ...

However, both approaches have in common that an opaque 'identifier'
is used to identify any currently running copy process.

So when designing this interface we should keep in mind that we
would need to store this identifier somewhere. As as loath as I'm to
admit it, the async-I/O mechanism would fit the bill far better than
a single copyfile() call ...

Which could be easily implemented on top of the Async I/O call, btw.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2011-12-15 17:25:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, 2011-12-15 at 12:18 -0500, Ric Wheeler wrote:
> What I would like to see is a way to make sure that we can interrupt any long
> running command & also make sure that our timeouts (for SCSI specifically) are
> not too aggressive.

The draft NFSv4.2 protocol contains features to make interruption
possible, so as far as the NFS client is concerned, that should be
doable. I can't answer for CIFS or SCSI...

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-15 15:52:37

by Chris Mason

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > >
> > > >>We had an active thread a couple of years back that came out of the
> > > >>reflink work and, at the time, there seemed to be moderately
> > > >>positive support for adding a new system call that would fit this
> > > >>use case (Joel Becker's copyfile()).
> > > >>
> > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > >>or should we look at other hooks?
> > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > >go baroque as it had happened the last time syscall had been discussed.
> > > >
> > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > >cross-device case anyway.
> > >
> > > I think that this approach makes a lot of sense. Most of the
> > > devices/targets that support the copy offload, will do it in very
> > > reasonable amounts of time.
> >
> > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > one operation:
> >
> > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> >
> > Perhaps we should ask for separate operations for the two cases. (Or at
> > least a "please don't bother if this is going to take 8 hours" flag....)
>
> How would the server know? I suggest we deal with this by adding an
> ioctl() to allow the application to poll for progress: I'm assuming now
> that we don't expect more than 1 copyfile() system call at a time per
> file descriptor...

If we're using this to copy VM image files, I could easily imagine
wanting to clone multiple copies of the VM in parallel.

-chris


2011-12-14 19:59:34

by Jeremy Allison

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>
> Back at LinuxCon Prague, we talked about the new NFS and SCSI
> commands that let us offload copy operations to a storage device
> (like an NFS server or storage array).
>
> This got new life in the virtual machine world where you might want
> to clone bulky guest files or ranges of blocks and was driven
> through the standards bodies by vmware, microsoft and some of the
> major storage vendors. Windows8 has this functionality fully coded
> and integrated in the GUI, I assume vmware also uses it and there
> are some vendors who announced support at the SNIA SDC conference.
>
> We had an active thread a couple of years back that came out of the
> reflink work and, at the time, there seemed to be moderately
> positive support for adding a new system call that would fit this
> use case (Joel Becker's copyfile()).
>
> Can we resurrect this effort? Is copyfile() still a good way to go,
> or should we look at other hooks?

Windows uses a COPYCHUNK call, which specifies the
following parameters:

Definition of a copy "chunk":

hyper source_off;
hyper target_off;
uint32 length;

and an array of these chunks which is passed
into their kernel.

This is what we have to implement in Samba.

Jeremy.

2011-12-19 23:02:30

by Dave Chinner

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote:
> On 12/14/2011 11:59 AM, Jeremy Allison wrote:
> >>
> >> Can we resurrect this effort? Is copyfile() still a good way to go,
> >> or should we look at other hooks?
> >
> > Windows uses a COPYCHUNK call, which specifies the
> > following parameters:
> >
> > Definition of a copy "chunk":
> >
> > hyper source_off;
> > hyper target_off;
> > uint32 length;
> >
> > and an array of these chunks which is passed
> > into their kernel.
> >
> > This is what we have to implement in Samba.
> >
>
> Could we do this by (re-)allowing sendfile() between two files?

That was my immediate thought, but sendfile has plumbing that is
page cache based and we require completely different infrastructure
and semantics for an array offload.

e.g. for an array offload, we have to flush the source file page
cache first so that the data being copied is known to be on disk,
then invalidate the destination page cache if overwriting or extend
and pre-allocate blocks if not. Then we have to map both files and
hand that off to the array.

Then there's a whole bunch of tricky questions about what the state
of the destination file should look like while the copy is in
progress, whether the source file should be allowed to change (e.g.
it can't be truncated and have blocks freed and then reused by other
files half way through the copy offload operation), and so on.

sendfile() has well known, fixed semantics that we can't change to
suit what is needed for an offload operation that could potentially
take hours to complete. Hence I think an new syscall is the way to
go....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-12-15 16:38:52

by Myklebust, Trond

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, 2011-12-15 at 11:16 -0500, Jeff Layton wrote:
> On Thu, 15 Dec 2011 11:06:16 -0500
> Trond Myklebust <[email protected]> wrote:
>
> > On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote:
> > > On Thu, 15 Dec 2011 10:52:13 -0500
> > > Chris Mason <[email protected]> wrote:
> > >
> > > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > > > > >
> > > > > > > >>We had an active thread a couple of years back that came out of the
> > > > > > > >>reflink work and, at the time, there seemed to be moderately
> > > > > > > >>positive support for adding a new system call that would fit this
> > > > > > > >>use case (Joel Becker's copyfile()).
> > > > > > > >>
> > > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > > > > >>or should we look at other hooks?
> > > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > > > > >
> > > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > > > > >cross-device case anyway.
> > > > > > >
> > > > > > > I think that this approach makes a lot of sense. Most of the
> > > > > > > devices/targets that support the copy offload, will do it in very
> > > > > > > reasonable amounts of time.
> > > > > >
> > > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > > > > one operation:
> > > > > >
> > > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > > > > >
> > > > > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > > > > least a "please don't bother if this is going to take 8 hours" flag....)
> > > > >
> > > > > How would the server know? I suggest we deal with this by adding an
> > > > > ioctl() to allow the application to poll for progress: I'm assuming now
> > > > > that we don't expect more than 1 copyfile() system call at a time per
> > > > > file descriptor...
> > > >
> > > > If we're using this to copy VM image files, I could easily imagine
> > > > wanting to clone multiple copies of the VM in parallel.
> > > >
> > > > -chris
> > > >
> > >
> > > Not really a problem is it? Just dup() the fd before you issue the
> > > copyfile()? Or even simpler, just do periodic stat() on the destination
> > > file if you want a progress report.
> > >
> > > Regardless, I like the simple approach that Al is suggesting here.
> >
> > Periodic stat() isn't good enough if you are copying subranges of a
> > file. Part of the application here (as I understood it) is to initialise
> > specific disk volumes on existing VM images when doing thin
> > provisioning. In that case, the reported image size won't ever change...
> >
>
> If they were sparse files then st_blocks would presumably change, but
> that's not necessarily going to be the case. So, ok stat() is out for
> this...
>
> What's the use-case for these sorts of progress reports anyway?
> Progress meters in GUI apps?

Mainly... If you are copying several GB worth of data, you expect it to
take some time, but you'd like to know that the server hasn't just
crashed or something...

> Either way, I think adding as simple an interface as possible to begin
> with makes sense. If you want to add progress reports or other
> doohickeys later, then that can be done in a separate set of patches...

Agreed. ...and doing it as an ioctl allows for that. I just want to make
sure someone else here doesn't have a use case that might blow that idea
out of the water...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-15 17:44:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > >
> > > >>We had an active thread a couple of years back that came out of the
> > > >>reflink work and, at the time, there seemed to be moderately
> > > >>positive support for adding a new system call that would fit this
> > > >>use case (Joel Becker's copyfile()).
> > > >>
> > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > >>or should we look at other hooks?
> > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > >go baroque as it had happened the last time syscall had been discussed.
> > > >
> > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > >cross-device case anyway.
> > >
> > > I think that this approach makes a lot of sense. Most of the
> > > devices/targets that support the copy offload, will do it in very
> > > reasonable amounts of time.
> >
> > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > one operation:
> >
> > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> >
> > Perhaps we should ask for separate operations for the two cases. (Or at
> > least a "please don't bother if this is going to take 8 hours" flag....)
>
> How would the server know?

Sorry, "8 hours" was a joke--no, you can't require the server to predict
whether an operation will take more or less than some precise duration.

I'm assuming the "fast" case that Al's proposing we do as a first step
cover CoW operations? (So O(1) or close to it, users typically won't be
asking for progress reports, operation may be atomic (with no
partial-failure case), ?)

--b.

2011-12-19 22:20:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On 12/14/2011 11:59 AM, Jeremy Allison wrote:
>>
>> Can we resurrect this effort? Is copyfile() still a good way to go,
>> or should we look at other hooks?
>
> Windows uses a COPYCHUNK call, which specifies the
> following parameters:
>
> Definition of a copy "chunk":
>
> hyper source_off;
> hyper target_off;
> uint32 length;
>
> and an array of these chunks which is passed
> into their kernel.
>
> This is what we have to implement in Samba.
>

Could we do this by (re-)allowing sendfile() between two files?

-hpa


2011-12-15 17:58:44

by Ric Wheeler

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On 12/15/2011 12:31 PM, Loke, Chetan wrote:
>> I think that hypervisor vendors will be very interested in this feature
>> which
>> would explain why vmware was active in drafting both the NFS and T10
> Specs are the only way to convince storage-target-vendors ;). Otherwise target-stack will need to implement multiple custom-CDB-handlers for different front-end APIs(which is ugly).
>
>
> Chetan

Hi Chetan,

I should post from my "Red Hat" email to make this less confusing for you - I
know that this is in fact interesting to vendors :)

Thanks!

Ric


2011-12-14 22:27:30

by J. Bruce Fields

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> On 12/14/2011 02:27 PM, Al Viro wrote:
> >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> >
> >>We had an active thread a couple of years back that came out of the
> >>reflink work and, at the time, there seemed to be moderately
> >>positive support for adding a new system call that would fit this
> >>use case (Joel Becker's copyfile()).
> >>
> >>Can we resurrect this effort? Is copyfile() still a good way to go,
> >>or should we look at other hooks?
> >copyfile(2) is probably a good way to go, provided that we do _not_
> >go baroque as it had happened the last time syscall had been discussed.
> >
> >IOW, to hell with progress reports, etc. - just a fastpath kind of
> >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> >If it works - fine, if not - caller has to be ready to deal with handling
> >cross-device case anyway.
>
> I think that this approach makes a lot of sense. Most of the
> devices/targets that support the copy offload, will do it in very
> reasonable amounts of time.

The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
one operation:

http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2

Perhaps we should ask for separate operations for the two cases. (Or at
least a "please don't bother if this is going to take 8 hours" flag....)

--b.

>
> Let me see if I can dig up some of the presentations from the NetApp
> guys who presented overviews or the specifications from the IETF and
> T10....
>
> Ric
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-12-19 22:34:10

by Jeremy Allison

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Mon, Dec 19, 2011 at 02:19:43PM -0800, H. Peter Anvin wrote:
> On 12/14/2011 11:59 AM, Jeremy Allison wrote:
> >>
> >> Can we resurrect this effort? Is copyfile() still a good way to go,
> >> or should we look at other hooks?
> >
> > Windows uses a COPYCHUNK call, which specifies the
> > following parameters:
> >
> > Definition of a copy "chunk":
> >
> > hyper source_off;
> > hyper target_off;
> > uint32 length;
> >
> > and an array of these chunks which is passed
> > into their kernel.
> >
> > This is what we have to implement in Samba.
> >
>
> Could we do this by (re-)allowing sendfile() between two files?

Oooh - nice idea ! Yes, having a completely symmetric sendfile
which allows socket -> file, file -> socket, socket -> socket,
file -> file would be a great idea (IMHO).

Jeremy.

2011-12-15 16:00:37

by Myklebust, Trond

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, 2011-12-15 at 10:52 -0500, Chris Mason wrote:
> On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > >
> > > > >>We had an active thread a couple of years back that came out of the
> > > > >>reflink work and, at the time, there seemed to be moderately
> > > > >>positive support for adding a new system call that would fit this
> > > > >>use case (Joel Becker's copyfile()).
> > > > >>
> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > >>or should we look at other hooks?
> > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > >
> > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > >cross-device case anyway.
> > > >
> > > > I think that this approach makes a lot of sense. Most of the
> > > > devices/targets that support the copy offload, will do it in very
> > > > reasonable amounts of time.
> > >
> > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > one operation:
> > >
> > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > >
> > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > least a "please don't bother if this is going to take 8 hours" flag....)
> >
> > How would the server know? I suggest we deal with this by adding an
> > ioctl() to allow the application to poll for progress: I'm assuming now
> > that we don't expect more than 1 copyfile() system call at a time per
> > file descriptor...
>
> If we're using this to copy VM image files, I could easily imagine
> wanting to clone multiple copies of the VM in parallel.

Sure, but in that case, your target file descriptors will differ, right?

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-15 16:11:41

by Myklebust, Trond

[permalink] [raw]
Subject: RE: copy offload support in Linux - new system call needed?

On Thu, 2011-12-15 at 11:08 -0500, Loke, Chetan wrote:
> > How would the server know? I suggest we deal with this by adding an
> > ioctl() to allow the application to poll for progress: I'm assuming now
>
> Why not support something like the async-iocb?

You could, but that would tie copyfile() to the aio interface which was
one of the things that I believe Al was opposed to when we discussed
this at LSF/MM-2010.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-15 16:15:33

by Loke, Chetan

[permalink] [raw]
Subject: RE: copy offload support in Linux - new system call needed?

PiBIb3cgd291bGQgdGhlIHNlcnZlciBrbm93PyBJIHN1Z2dlc3Qgd2UgZGVhbCB3aXRoIHRoaXMg
YnkgYWRkaW5nIGFuDQo+IGlvY3RsKCkgdG8gYWxsb3cgdGhlIGFwcGxpY2F0aW9uIHRvIHBvbGwg
Zm9yIHByb2dyZXNzOiBJJ20gYXNzdW1pbmcgbm93DQoNCldoeSBub3Qgc3VwcG9ydCBzb21ldGhp
bmcgbGlrZSB0aGUgYXN5bmMtaW9jYj8NCg0KDQo=

2011-12-14 19:42:43

by Ric Wheeler

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On 12/14/2011 02:27 PM, Al Viro wrote:
> On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
>
>> We had an active thread a couple of years back that came out of the
>> reflink work and, at the time, there seemed to be moderately
>> positive support for adding a new system call that would fit this
>> use case (Joel Becker's copyfile()).
>>
>> Can we resurrect this effort? Is copyfile() still a good way to go,
>> or should we look at other hooks?
> copyfile(2) is probably a good way to go, provided that we do _not_
> go baroque as it had happened the last time syscall had been discussed.
>
> IOW, to hell with progress reports, etc. - just a fastpath kind of
> thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> If it works - fine, if not - caller has to be ready to deal with handling
> cross-device case anyway.

I think that this approach makes a lot of sense. Most of the devices/targets
that support the copy offload, will do it in very reasonable amounts of time.

Let me see if I can dig up some of the presentations from the NetApp guys who
presented overviews or the specifications from the IETF and T10....

Ric


2011-12-14 19:27:43

by Al Viro

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:

> We had an active thread a couple of years back that came out of the
> reflink work and, at the time, there seemed to be moderately
> positive support for adding a new system call that would fit this
> use case (Joel Becker's copyfile()).
>
> Can we resurrect this effort? Is copyfile() still a good way to go,
> or should we look at other hooks?

copyfile(2) is probably a good way to go, provided that we do _not_
go baroque as it had happened the last time syscall had been discussed.

IOW, to hell with progress reports, etc. - just a fastpath kind of
thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
If it works - fine, if not - caller has to be ready to deal with handling
cross-device case anyway.

2011-12-15 17:20:58

by Ric Wheeler

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On 12/15/2011 11:53 AM, Trond Myklebust wrote:
> On Thu, 2011-12-15 at 11:40 -0500, Loke, Chetan wrote:
>>>> Why not support something like the async-iocb?
>>> You could, but that would tie copyfile() to the aio interface which was
>>> one of the things that I believe Al was opposed to when we discussed
>>> this at LSF/MM-2010.
>>>
>> virtualization vendors who support this offload do it at a layer above the guest-OS(Intra-LUN(tm) locking or whatever fancy locking). So I think 'copyfile' is going to be appealing to application-developers more than the hypervisor-vendors.
> The application is thin provisioning, not the 'cp' command. When
> virtualisation vendors do support this, it will mainly be as part of
> their image management toolkits, not the hypervisor.

I think that hypervisor vendors will be very interested in this feature which
would explain why vmware was active in drafting both the NFS and T10 specs. Not
to mention those of us who use KVM or XEN :)

As Trond mentions, we might have this in the management tool chain or other
places in the stack.

>
>> So let's think about it from end-users perspective:
>> Won't everyone replicate code to check - 'Am I done'? It will just make application folks write more (ugly)code. Because you would then have to maintain another queue/etc to check for this operation.
> 'Am I done' is easy: copyfile() returns with the number of bytes that
> have been copied.
>
> 'Is my copyfile() syscall making progress' is the question that needs
> answering.
>
>> We can just support full-copy. Partial copies can be returned as failure.
> Then you have to check the entire range on error instead of just
> resuming the copy from where it stopped.
>

I also like simple first. I am not too certain about the need for polling
(especially given how little we have done historically to take advantage of the
notifications, water marks, etc in things like thin provisioning :)).

On the other hand, I also don't object to having the ability to poll (through
the ioctl or whatever) if others find that useful.

What I would like to see is a way to make sure that we can interrupt any long
running command & also make sure that our timeouts (for SCSI specifically) are
not too aggressive.

Ric



2011-12-19 23:30:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On 12/19/2011 02:57 PM, Dave Chinner wrote:
>
> That was my immediate thought, but sendfile has plumbing that is
> page cache based and we require completely different infrastructure
> and semantics for an array offload.
>

The plumbing is internal to the kernel and doesn't mean we have to use
the same VFS methods.

> e.g. for an array offload, we have to flush the source file page
> cache first so that the data being copied is known to be on disk,
> then invalidate the destination page cache if overwriting or extend
> and pre-allocate blocks if not. Then we have to map both files and
> hand that off to the array.
>
> Then there's a whole bunch of tricky questions about what the state
> of the destination file should look like while the copy is in
> progress, whether the source file should be allowed to change (e.g.
> it can't be truncated and have blocks freed and then reused by other
> files half way through the copy offload operation), and so on.
>
> sendfile() has well known, fixed semantics that we can't change to
> suit what is needed for an offload operation that could potentially
> take hours to complete. Hence I think an new syscall is the way to
> go....

Perhaps what we need first in an explicit enumeration of the semantics
you're looking for.

-hpa


2011-12-15 16:02:31

by Jeff Layton

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, 15 Dec 2011 10:52:13 -0500
Chris Mason <[email protected]> wrote:

> On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > >
> > > > >>We had an active thread a couple of years back that came out of the
> > > > >>reflink work and, at the time, there seemed to be moderately
> > > > >>positive support for adding a new system call that would fit this
> > > > >>use case (Joel Becker's copyfile()).
> > > > >>
> > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > >>or should we look at other hooks?
> > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > >
> > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > >cross-device case anyway.
> > > >
> > > > I think that this approach makes a lot of sense. Most of the
> > > > devices/targets that support the copy offload, will do it in very
> > > > reasonable amounts of time.
> > >
> > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > one operation:
> > >
> > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > >
> > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > least a "please don't bother if this is going to take 8 hours" flag....)
> >
> > How would the server know? I suggest we deal with this by adding an
> > ioctl() to allow the application to poll for progress: I'm assuming now
> > that we don't expect more than 1 copyfile() system call at a time per
> > file descriptor...
>
> If we're using this to copy VM image files, I could easily imagine
> wanting to clone multiple copies of the VM in parallel.
>
> -chris
>

Not really a problem is it? Just dup() the fd before you issue the
copyfile()? Or even simpler, just do periodic stat() on the destination
file if you want a progress report.

Regardless, I like the simple approach that Al is suggesting here.
--
Jeff Layton <[email protected]>

2011-12-15 17:31:46

by Loke, Chetan

[permalink] [raw]
Subject: RE: copy offload support in Linux - new system call needed?

PiANCj4gSSB0aGluayB0aGF0IGh5cGVydmlzb3IgdmVuZG9ycyB3aWxsIGJlIHZlcnkgaW50ZXJl
c3RlZCBpbiB0aGlzIGZlYXR1cmUNCj4gd2hpY2gNCj4gd291bGQgZXhwbGFpbiB3aHkgdm13YXJl
IHdhcyBhY3RpdmUgaW4gZHJhZnRpbmcgYm90aCB0aGUgTkZTIGFuZCBUMTANCg0KU3BlY3MgYXJl
IHRoZSBvbmx5IHdheSB0byBjb252aW5jZSBzdG9yYWdlLXRhcmdldC12ZW5kb3JzIDspLiBPdGhl
cndpc2UgdGFyZ2V0LXN0YWNrIHdpbGwgbmVlZCB0byBpbXBsZW1lbnQgbXVsdGlwbGUgY3VzdG9t
LUNEQi1oYW5kbGVycyBmb3IgZGlmZmVyZW50IGZyb250LWVuZCBBUElzKHdoaWNoIGlzIHVnbHkp
Lg0KDQoNCkNoZXRhbg0KDQoNCg0K

2011-12-15 16:28:09

by Jeff Layton

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, 15 Dec 2011 11:06:16 -0500
Trond Myklebust <[email protected]> wrote:

> On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote:
> > On Thu, 15 Dec 2011 10:52:13 -0500
> > Chris Mason <[email protected]> wrote:
> >
> > > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > > > >
> > > > > > >>We had an active thread a couple of years back that came out of the
> > > > > > >>reflink work and, at the time, there seemed to be moderately
> > > > > > >>positive support for adding a new system call that would fit this
> > > > > > >>use case (Joel Becker's copyfile()).
> > > > > > >>
> > > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > > > >>or should we look at other hooks?
> > > > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > > > >
> > > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > > > >cross-device case anyway.
> > > > > >
> > > > > > I think that this approach makes a lot of sense. Most of the
> > > > > > devices/targets that support the copy offload, will do it in very
> > > > > > reasonable amounts of time.
> > > > >
> > > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > > > one operation:
> > > > >
> > > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > > > >
> > > > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > > > least a "please don't bother if this is going to take 8 hours" flag....)
> > > >
> > > > How would the server know? I suggest we deal with this by adding an
> > > > ioctl() to allow the application to poll for progress: I'm assuming now
> > > > that we don't expect more than 1 copyfile() system call at a time per
> > > > file descriptor...
> > >
> > > If we're using this to copy VM image files, I could easily imagine
> > > wanting to clone multiple copies of the VM in parallel.
> > >
> > > -chris
> > >
> >
> > Not really a problem is it? Just dup() the fd before you issue the
> > copyfile()? Or even simpler, just do periodic stat() on the destination
> > file if you want a progress report.
> >
> > Regardless, I like the simple approach that Al is suggesting here.
>
> Periodic stat() isn't good enough if you are copying subranges of a
> file. Part of the application here (as I understood it) is to initialise
> specific disk volumes on existing VM images when doing thin
> provisioning. In that case, the reported image size won't ever change...
>

If they were sparse files then st_blocks would presumably change, but
that's not necessarily going to be the case. So, ok stat() is out for
this...

What's the use-case for these sorts of progress reports anyway?
Progress meters in GUI apps?

Either way, I think adding as simple an interface as possible to begin
with makes sense. If you want to add progress reports or other
doohickeys later, then that can be done in a separate set of patches...

--
Jeff Layton <[email protected]>

2011-12-15 16:40:58

by Loke, Chetan

[permalink] [raw]
Subject: RE: copy offload support in Linux - new system call needed?

PiA+DQo+ID4gV2h5IG5vdCBzdXBwb3J0IHNvbWV0aGluZyBsaWtlIHRoZSBhc3luYy1pb2NiPw0K
PiANCj4gWW91IGNvdWxkLCBidXQgdGhhdCB3b3VsZCB0aWUgY29weWZpbGUoKSB0byB0aGUgYWlv
IGludGVyZmFjZSB3aGljaCB3YXMNCj4gb25lIG9mIHRoZSB0aGluZ3MgdGhhdCBJIGJlbGlldmUg
QWwgd2FzIG9wcG9zZWQgdG8gd2hlbiB3ZSBkaXNjdXNzZWQNCj4gdGhpcyBhdCBMU0YvTU0tMjAx
MC4NCj4gDQoNCnZpcnR1YWxpemF0aW9uIHZlbmRvcnMgd2hvIHN1cHBvcnQgdGhpcyBvZmZsb2Fk
IGRvIGl0IGF0IGEgbGF5ZXIgYWJvdmUgdGhlIGd1ZXN0LU9TKEludHJhLUxVTih0bSkgbG9ja2lu
ZyBvciB3aGF0ZXZlciBmYW5jeSBsb2NraW5nKS4gU28gSSB0aGluayAnY29weWZpbGUnIGlzIGdv
aW5nIHRvIGJlIGFwcGVhbGluZyB0byBhcHBsaWNhdGlvbi1kZXZlbG9wZXJzIG1vcmUgdGhhbiB0
aGUgaHlwZXJ2aXNvci12ZW5kb3JzLg0KDQpTbyBsZXQncyB0aGluayBhYm91dCBpdCBmcm9tIGVu
ZC11c2VycyBwZXJzcGVjdGl2ZToNCldvbid0IGV2ZXJ5b25lIHJlcGxpY2F0ZSBjb2RlIHRvIGNo
ZWNrIC0gJ0FtIEkgZG9uZSc/IEl0IHdpbGwganVzdCBtYWtlIGFwcGxpY2F0aW9uIGZvbGtzIHdy
aXRlIG1vcmUgKHVnbHkpY29kZS4gQmVjYXVzZSB5b3Ugd291bGQgdGhlbiBoYXZlIHRvIG1haW50
YWluIGFub3RoZXIgcXVldWUvZXRjIHRvIGNoZWNrIGZvciB0aGlzIG9wZXJhdGlvbi4NCg0KV2Ug
Y2FuIGp1c3Qgc3VwcG9ydCBmdWxsLWNvcHkuIFBhcnRpYWwgY29waWVzIGNhbiBiZSByZXR1cm5l
ZCBhcyBmYWlsdXJlLg0K

2011-12-15 16:06:19

by Myklebust, Trond

[permalink] [raw]
Subject: Re: copy offload support in Linux - new system call needed?

On Thu, 2011-12-15 at 11:03 -0500, Jeff Layton wrote:
> On Thu, 15 Dec 2011 10:52:13 -0500
> Chris Mason <[email protected]> wrote:
>
> > On Thu, Dec 15, 2011 at 09:59:00AM -0500, Trond Myklebust wrote:
> > > On Wed, 2011-12-14 at 17:27 -0500, J. Bruce Fields wrote:
> > > > On Wed, Dec 14, 2011 at 02:42:38PM -0500, Ric Wheeler wrote:
> > > > > On 12/14/2011 02:27 PM, Al Viro wrote:
> > > > > >On Wed, Dec 14, 2011 at 02:22:07PM -0500, Ric Wheeler wrote:
> > > > > >
> > > > > >>We had an active thread a couple of years back that came out of the
> > > > > >>reflink work and, at the time, there seemed to be moderately
> > > > > >>positive support for adding a new system call that would fit this
> > > > > >>use case (Joel Becker's copyfile()).
> > > > > >>
> > > > > >>Can we resurrect this effort? Is copyfile() still a good way to go,
> > > > > >>or should we look at other hooks?
> > > > > >copyfile(2) is probably a good way to go, provided that we do _not_
> > > > > >go baroque as it had happened the last time syscall had been discussed.
> > > > > >
> > > > > >IOW, to hell with progress reports, etc. - just a fastpath kind of
> > > > > >thing, in the same kind of relationship to cp(1) as rename(2) is to mv(1).
> > > > > >If it works - fine, if not - caller has to be ready to deal with handling
> > > > > >cross-device case anyway.
> > > > >
> > > > > I think that this approach makes a lot of sense. Most of the
> > > > > devices/targets that support the copy offload, will do it in very
> > > > > reasonable amounts of time.
> > > >
> > > > The current NFSv4.2 draft rolls both the "fast" and "slow" cases into
> > > > one operation:
> > > >
> > > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-06#section-2
> > > >
> > > > Perhaps we should ask for separate operations for the two cases. (Or at
> > > > least a "please don't bother if this is going to take 8 hours" flag....)
> > >
> > > How would the server know? I suggest we deal with this by adding an
> > > ioctl() to allow the application to poll for progress: I'm assuming now
> > > that we don't expect more than 1 copyfile() system call at a time per
> > > file descriptor...
> >
> > If we're using this to copy VM image files, I could easily imagine
> > wanting to clone multiple copies of the VM in parallel.
> >
> > -chris
> >
>
> Not really a problem is it? Just dup() the fd before you issue the
> copyfile()? Or even simpler, just do periodic stat() on the destination
> file if you want a progress report.
>
> Regardless, I like the simple approach that Al is suggesting here.

Periodic stat() isn't good enough if you are copying subranges of a
file. Part of the application here (as I understood it) is to initialise
specific disk volumes on existing VM images when doing thin
provisioning. In that case, the reported image size won't ever change...

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com