2013-03-30 19:49:36

by Pavel Machek

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

Hi!

> > I thought the first thing people would ask for is to atomically create a
> > new file and copy the old file into it (at least on local file systems).
> > The idea is that nothing should see an empty destination file, either
> > by race or by crash. (This feature would perhaps be described as a
> > pony, but it should be implementable.)
>
> Having already wasted many week trying to implement your pony, I would
> consider it about as possible as winning the lottery three times in a
> row. It clearly is in theory and yet,...

Hmm, really? AFAICT it would be simple to provide open_deleted_file("directory")
syscall. You'd open_deleted_file(), copy source file into it, then
fsync(), then link it into filesystem.

That should have atomicity properties reflected.
Pavel
(who has too many (*)
ponies around)
(*) 1 is sometimes too many when we talk about big mammals.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2013-03-30 20:08:47

by Andreas Dilger

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> Hmm, really? AFAICT it would be simple to provide an
> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> copy source file into it, then fsync(), then link it into filesystem.
>
> That should have atomicity properties reflected.

Actually, the open_deleted_file() syscall is quite useful for many
different things all by itself. Lots of applications need to create
temporary files that are unlinked at application failure (without a
race if app crashes after creating the file, but before unlinking).
It also avoids exposing temporary files into the namespace if other
applications are accessing the directory.

We've added a library routine that does this for Lustre in a hackish
way (magical filename created in target directory) for being able to
migrate files between data servers, HSM, defragmentation, rsync, etc.

Cheers, Andreas




2013-03-30 21:45:12

by Pavel Machek

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> > Hmm, really? AFAICT it would be simple to provide an
> > open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > copy source file into it, then fsync(), then link it into filesystem.
> >
> > That should have atomicity properties reflected.
>
> Actually, the open_deleted_file() syscall is quite useful for many
> different things all by itself. Lots of applications need to create
> temporary files that are unlinked at application failure (without a
> race if app crashes after creating the file, but before unlinking).
> It also avoids exposing temporary files into the namespace if other
> applications are accessing the directory.

Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
be acceptable interface?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-03-30 21:57:10

by Myklebust, Trond

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?


On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
wrote:

> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>>> Hmm, really? AFAICT it would be simple to provide an
>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>>> copy source file into it, then fsync(), then link it into filesystem.
>>>
>>> That should have atomicity properties reflected.
>>
>> Actually, the open_deleted_file() syscall is quite useful for many
>> different things all by itself. Lots of applications need to create
>> temporary files that are unlinked at application failure (without a
>> race if app crashes after creating the file, but before unlinking).
>> It also avoids exposing temporary files into the namespace if other
>> applications are accessing the directory.
>
> Hmm. open_deleted_file() will still need to get a directory... so it
> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> be acceptable interface?
> Pavel

...and what's the big plan to make this work on anything other than ext4 and btrfs?

Cheers,
Trond-

2013-03-30 22:41:19

by Andy Lutomirski

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sat, Mar 30, 2013 at 12:49 PM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> > I thought the first thing people would ask for is to atomically create a
>> > new file and copy the old file into it (at least on local file systems).
>> > The idea is that nothing should see an empty destination file, either
>> > by race or by crash. (This feature would perhaps be described as a
>> > pony, but it should be implementable.)
>>
>> Having already wasted many week trying to implement your pony, I would
>> consider it about as possible as winning the lottery three times in a
>> row. It clearly is in theory and yet,...
>
> Hmm, really? AFAICT it would be simple to provide open_deleted_file("directory")
> syscall. You'd open_deleted_file(), copy source file into it, then
> fsync(), then link it into filesystem.

Isn't linking a deleted file back into the filesystem explicitly
forbidden? I'm pretty sure that linking from /proc/fd/whatever
doesn't work. (I've often wanted a flink system call that takes a
file descriptor and links it somewhere. If it came with an option to
control whether it would overwrite an existing file, even better.)

--Andy

2013-03-30 23:22:37

by Ric Wheeler

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
> wrote:
>
>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>>>> Hmm, really? AFAICT it would be simple to provide an
>>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>>>> copy source file into it, then fsync(), then link it into filesystem.
>>>>
>>>> That should have atomicity properties reflected.
>>> Actually, the open_deleted_file() syscall is quite useful for many
>>> different things all by itself. Lots of applications need to create
>>> temporary files that are unlinked at application failure (without a
>>> race if app crashes after creating the file, but before unlinking).
>>> It also avoids exposing temporary files into the namespace if other
>>> applications are accessing the directory.
>> Hmm. open_deleted_file() will still need to get a directory... so it
>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
>> be acceptable interface?
>> Pavel
> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
>
> Cheers,
> Trond

I know that change can be a good thing, but are we really solving a pressing
problem given that application developers have dealt with open/rename as the way
to get "atomic" file creation for several decades now ?

Regards,

Ric

2013-03-31 02:53:13

by Andreas Dilger

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On 2013-03-30, at 16:21, Ric Wheeler <[email protected]> wrote:

> On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
>> On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
>> wrote:
>>
>>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>>>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>>>>> Hmm, really? AFAICT it would be simple to provide an
>>>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>>>>> copy source file into it, then fsync(), then link it into filesystem.
>>>>>
>>>>> That should have atomicity properties reflected.
>>>> Actually, the open_deleted_file() syscall is quite useful for many
>>>> different things all by itself. Lots of applications need to create
>>>> temporary files that are unlinked at application failure (without a
>>>> race if app crashes after creating the file, but before unlinking).
>>>> It also avoids exposing temporary files into the namespace if other
>>>> applications are accessing the directory.
>>> Hmm. open_deleted_file() will still need to get a directory... so it
>>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
>>> be acceptable interface?
>>> Pavel
>> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
>>
>> Cheers,
>> Trond
>
> I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get "atomic" file creation for several decades now ?

Using open()+rename() has side effects:
- changes ctime/mtime on parent directory
- leaves temporary file in path during creation
- leaves temporary file in namespace during operations, and after crash

Cheers, Andreas-

2013-03-31 03:52:43

by Myklebust, Trond

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
> On 2013-03-30, at 16:21, Ric Wheeler <[email protected]> wrote:
>
> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
> >> wrote:
> >>
> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> >>>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> >>>>> Hmm, really? AFAICT it would be simple to provide an
> >>>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> >>>>> copy source file into it, then fsync(), then link it into filesystem.
> >>>>>
> >>>>> That should have atomicity properties reflected.
> >>>> Actually, the open_deleted_file() syscall is quite useful for many
> >>>> different things all by itself. Lots of applications need to create
> >>>> temporary files that are unlinked at application failure (without a
> >>>> race if app crashes after creating the file, but before unlinking).
> >>>> It also avoids exposing temporary files into the namespace if other
> >>>> applications are accessing the directory.
> >>> Hmm. open_deleted_file() will still need to get a directory... so it
> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> >>> be acceptable interface?
> >>> Pavel
> >> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
> >>
> >> Cheers,
> >> Trond
> >
> > I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get "atomic" file creation for several decades now ?
>
> Using open()+rename() has side effects:
> - changes ctime/mtime on parent directory
> - leaves temporary file in path during creation
> - leaves temporary file in namespace during operations, and after crash

So what is the actual problem that is being solved? Yes, the above may
be disadvantages, but none of them have proven to be show-stoppers so
far.

So far, I've seen no justification for Andy's atomicity requirement
other than "it would be nice if...". That's not enough IMO...


--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-03-31 04:18:51

by Andy Lutomirski

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
<[email protected]> wrote:
> On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
>> On 2013-03-30, at 16:21, Ric Wheeler <[email protected]> wrote:
>>
>> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
>> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
>> >> wrote:
>> >>
>> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>> >>>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>> >>>>> Hmm, really? AFAICT it would be simple to provide an
>> >>>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>> >>>>> copy source file into it, then fsync(), then link it into filesystem.
>> >>>>>
>> >>>>> That should have atomicity properties reflected.
>> >>>> Actually, the open_deleted_file() syscall is quite useful for many
>> >>>> different things all by itself. Lots of applications need to create
>> >>>> temporary files that are unlinked at application failure (without a
>> >>>> race if app crashes after creating the file, but before unlinking).
>> >>>> It also avoids exposing temporary files into the namespace if other
>> >>>> applications are accessing the directory.
>> >>> Hmm. open_deleted_file() will still need to get a directory... so it
>> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
>> >>> be acceptable interface?
>> >>> Pavel
>> >> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
>> >>
>> >> Cheers,
>> >> Trond
>> >
>> > I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get "atomic" file creation for several decades now ?
>>
>> Using open()+rename() has side effects:
>> - changes ctime/mtime on parent directory
>> - leaves temporary file in path during creation
>> - leaves temporary file in namespace during operations, and after crash
>
> So what is the actual problem that is being solved? Yes, the above may
> be disadvantages, but none of them have proven to be show-stoppers so
> far.
>
> So far, I've seen no justification for Andy's atomicity requirement
> other than "it would be nice if...". That's not enough IMO...

ISTM vpsendfile (or whatever it's called) plus a way to create deleted
files plus a way to relink deleted files gives atomic copies. Perhaps
this is less efficient than would be ideal for OCFS2, though.

--Andy

2013-03-31 04:37:04

by Myklebust, Trond

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote:
> On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
> <[email protected]> wrote:
> > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
> >> On 2013-03-30, at 16:21, Ric Wheeler <[email protected]> wrote:
> >>
> >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
> >> >> wrote:
> >> >>
> >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> >> >>>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> >> >>>>> Hmm, really? AFAICT it would be simple to provide an
> >> >>>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> >> >>>>> copy source file into it, then fsync(), then link it into filesystem.
> >> >>>>>
> >> >>>>> That should have atomicity properties reflected.
> >> >>>> Actually, the open_deleted_file() syscall is quite useful for many
> >> >>>> different things all by itself. Lots of applications need to create
> >> >>>> temporary files that are unlinked at application failure (without a
> >> >>>> race if app crashes after creating the file, but before unlinking).
> >> >>>> It also avoids exposing temporary files into the namespace if other
> >> >>>> applications are accessing the directory.
> >> >>> Hmm. open_deleted_file() will still need to get a directory... so it
> >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> >> >>> be acceptable interface?
> >> >>> Pavel
> >> >> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
> >> >>
> >> >> Cheers,
> >> >> Trond
> >> >
> >> > I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get "atomic" file creation for several decades now ?
> >>
> >> Using open()+rename() has side effects:
> >> - changes ctime/mtime on parent directory
> >> - leaves temporary file in path during creation
> >> - leaves temporary file in namespace during operations, and after crash
> >
> > So what is the actual problem that is being solved? Yes, the above may
> > be disadvantages, but none of them have proven to be show-stoppers so
> > far.
> >
> > So far, I've seen no justification for Andy's atomicity requirement
> > other than "it would be nice if...". That's not enough IMO...
>
> ISTM vpsendfile (or whatever it's called) plus a way to create deleted
> files plus a way to relink deleted files gives atomic copies. Perhaps
> this is less efficient than would be ideal for OCFS2, though.

What real-life problem does the atomicity requirement solve? None of our
customers have ever asked for it. They don't care...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-03-31 04:45:25

by Myklebust, Trond

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 00:36 -0400, Trond Myklebust wrote:
> On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote:
> > On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
> > <[email protected]> wrote:
> > > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
> > >> On 2013-03-30, at 16:21, Ric Wheeler <[email protected]> wrote:
> > >>
> > >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> > >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
> > >> >> wrote:
> > >> >>
> > >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> > >> >>>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> > >> >>>>> Hmm, really? AFAICT it would be simple to provide an
> > >> >>>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > >> >>>>> copy source file into it, then fsync(), then link it into filesystem.
> > >> >>>>>
> > >> >>>>> That should have atomicity properties reflected.
> > >> >>>> Actually, the open_deleted_file() syscall is quite useful for many
> > >> >>>> different things all by itself. Lots of applications need to create
> > >> >>>> temporary files that are unlinked at application failure (without a
> > >> >>>> race if app crashes after creating the file, but before unlinking).
> > >> >>>> It also avoids exposing temporary files into the namespace if other
> > >> >>>> applications are accessing the directory.
> > >> >>> Hmm. open_deleted_file() will still need to get a directory... so it
> > >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > >> >>> be acceptable interface?
> > >> >>> Pavel
> > >> >> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
> > >> >>
> > >> >> Cheers,
> > >> >> Trond
> > >> >
> > >> > I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get "atomic" file creation for several decades now ?
> > >>
> > >> Using open()+rename() has side effects:
> > >> - changes ctime/mtime on parent directory
> > >> - leaves temporary file in path during creation
> > >> - leaves temporary file in namespace during operations, and after crash
> > >
> > > So what is the actual problem that is being solved? Yes, the above may
> > > be disadvantages, but none of them have proven to be show-stoppers so
> > > far.
> > >
> > > So far, I've seen no justification for Andy's atomicity requirement
> > > other than "it would be nice if...". That's not enough IMO...
> >
> > ISTM vpsendfile (or whatever it's called) plus a way to create deleted
> > files plus a way to relink deleted files gives atomic copies. Perhaps
> > this is less efficient than would be ideal for OCFS2, though.
>
> What real-life problem does the atomicity requirement solve? None of our
> customers have ever asked for it. They don't care...
>
BTW: before you do answer, please note that the current NFSv4.2 solution
_does_ allow you to lock the file before you copy.

IOW: the same atomicity rules apply to offloaded copy as apply to
standard copy: there is no requirement anywhere to apply stronger
semantics. Surprisingly enough, that works for most people...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-03-31 05:50:44

by Andreas Dilger

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On 2013-03-30, at 14:45, Pavel Machek <[email protected]> wrote:
> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>>> Hmm, really? AFAICT it would be simple to provide an
>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>>> copy source file into it, then fsync(), then link it into filesystem.
>>>
>>> That should have atomicity properties reflected.
>>
>> Actually, the open_deleted_file() syscall is quite useful for many
>> different things all by itself. Lots of applications need to create
>> temporary files that are unlinked at application failure (without a
>> race if app crashes after creating the file, but before unlinking).
>> It also avoids exposing temporary files into the namespace if other
>> applications are accessing the directory.
>
> Hmm. open_deleted_file() will still need to get a directory... so it
> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> be acceptable interface?

Yes, that would be reasonable, and/or possibly openat(fd, NULL, AT_FDCWD|AT_UNLINKED)?

Cheers, Andreas-

2013-03-31 07:36:12

by Pavel Machek

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

Hi!

> >>> Hmm, really? AFAICT it would be simple to provide an
> >>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> >>> copy source file into it, then fsync(), then link it into filesystem.
> >>>
> >>> That should have atomicity properties reflected.
> >>
> >> Actually, the open_deleted_file() syscall is quite useful for many
> >> different things all by itself. Lots of applications need to create
> >> temporary files that are unlinked at application failure (without a
> >> race if app crashes after creating the file, but before unlinking).
> >> It also avoids exposing temporary files into the namespace if other
> >> applications are accessing the directory.
> >
> > Hmm. open_deleted_file() will still need to get a directory... so it
> > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > be acceptable interface?
>
> ...and what's the big plan to make this work on anything other than ext4 and btrfs?

Deleted but open files are from original unix, so it should work on
anything unixy (minix, ext, ext2, ...).
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-03-31 08:25:53

by Pavel Machek

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

Hi!
On Sat 2013-03-30 22:38:35, AEDilger Gmail wrote:
> On 2013-03-30, at 14:45, Pavel Machek <[email protected]> wrote:
> > On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> >>> Hmm, really? AFAICT it would be simple to provide an
> >>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> >>> copy source file into it, then fsync(), then link it into filesystem.
> >>>
> >>> That should have atomicity properties reflected.
> >>
> >> Actually, the open_deleted_file() syscall is quite useful for many
> >> different things all by itself. Lots of applications need to create
> >> temporary files that are unlinked at application failure (without a
> >> race if app crashes after creating the file, but before unlinking).
> >> It also avoids exposing temporary files into the namespace if other
> >> applications are accessing the directory.
> >
> > Hmm. open_deleted_file() will still need to get a directory... so it
> > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > be acceptable interface?
>
> Yes, that would be reasonable, and/or possibly openat(fd, NULL, AT_FDCWD|AT_UNLINKED)?

openat() is better interface for this, I'd say.

BTW... I don't think this has to be done at the same time as splice()
[or how it ends up being called] changes...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-03-31 11:48:42

by Pádraig Brady

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On 03/30/2013 08:08 PM, Andreas Dilger wrote:
> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>> Hmm, really? AFAICT it would be simple to provide an
>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>> copy source file into it, then fsync(), then link it into filesystem.
>>
>> That should have atomicity properties reflected.
>
> Actually, the open_deleted_file() syscall is quite useful for many
> different things all by itself. Lots of applications need to create
> temporary files that are unlinked at application failure (without a
> race if app crashes after creating the file, but before unlinking).
> It also avoids exposing temporary files into the namespace if other
> applications are accessing the directory.
>
> We've added a library routine that does this for Lustre in a hackish
> way (magical filename created in target directory) for being able to
> migrate files between data servers, HSM, defragmentation, rsync, etc.
>
> Cheers, Andreas

This reminds me of the flink() discussion:
http://marc.info/?l=linux-kernel&m=104965452917349

Also kinda related is the exchangedata() OSX system call to
"atomically exchange data between two files"

thanks,
P?draig.

2013-03-31 18:27:38

by Myklebust, Trond

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 09:36 +0200, Pavel Machek wrote:
> Hi!
>
> > >>> Hmm, really? AFAICT it would be simple to provide an
> > >>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > >>> copy source file into it, then fsync(), then link it into filesystem.
> > >>>
> > >>> That should have atomicity properties reflected.
> > >>
> > >> Actually, the open_deleted_file() syscall is quite useful for many
> > >> different things all by itself. Lots of applications need to create
> > >> temporary files that are unlinked at application failure (without a
> > >> race if app crashes after creating the file, but before unlinking).
> > >> It also avoids exposing temporary files into the namespace if other
> > >> applications are accessing the directory.
> > >
> > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > > be acceptable interface?
> >
> > ...and what's the big plan to make this work on anything other than ext4 and btrfs?
>
> Deleted but open files are from original unix, so it should work on
> anything unixy (minix, ext, ext2, ...).
> Pavel

minix, ext, ext2... are not under active development and haven't been
for more than a decade.

Take a look at how many actively used filesystems out there that have
some variant of sillyrename(), and explain what you want to do in those
cases.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-03-31 18:32:41

by Pavel Machek

[permalink] [raw]
Subject: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?


> > > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > > > be acceptable interface?
> > >
> > > ...and what's the big plan to make this work on anything other than ext4 and btrfs?
> >
> > Deleted but open files are from original unix, so it should work on
> > anything unixy (minix, ext, ext2, ...).
>
> minix, ext, ext2... are not under active development and haven't been
> for more than a decade.
>
> Take a look at how many actively used filesystems out there that have
> some variant of sillyrename(), and explain what you want to do in those
> cases.

Well. Yes, there are non-unix filesystems around. You have to deal
with silly files on them, and this will not be different.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-03-31 18:44:58

by Myklebust, Trond

[permalink] [raw]
Subject: Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:
> > > > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > > > > be acceptable interface?
> > > >
> > > > ...and what's the big plan to make this work on anything other than ext4 and btrfs?
> > >
> > > Deleted but open files are from original unix, so it should work on
> > > anything unixy (minix, ext, ext2, ...).
> >
> > minix, ext, ext2... are not under active development and haven't been
> > for more than a decade.
> >
> > Take a look at how many actively used filesystems out there that have
> > some variant of sillyrename(), and explain what you want to do in those
> > cases.
>
> Well. Yes, there are non-unix filesystems around. You have to deal
> with silly files on them, and this will not be different.

So this would be a local POSIX filesystem only solution to a problem
that has yet to be formulated?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-03-31 22:50:27

by Pavel Machek

[permalink] [raw]
Subject: Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote:
> On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:
> > > > > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > > > > > be acceptable interface?
> > > > >
> > > > > ...and what's the big plan to make this work on anything other than ext4 and btrfs?
> > > >
> > > > Deleted but open files are from original unix, so it should work on
> > > > anything unixy (minix, ext, ext2, ...).
> > >
> > > minix, ext, ext2... are not under active development and haven't been
> > > for more than a decade.
> > >
> > > Take a look at how many actively used filesystems out there that have
> > > some variant of sillyrename(), and explain what you want to do in those
> > > cases.
> >
> > Well. Yes, there are non-unix filesystems around. You have to deal
> > with silly files on them, and this will not be different.
>
> So this would be a local POSIX filesystem only solution to a problem
> that has yet to be formulated?

Problem is "clasical create temp file then delete it" is racy. See the
archives. That is useful & common operation.

Problem is "atomicaly create file at target location with guaranteed
right content". That's also in the archives. Looks useful if someone
does rsync from your directory.

Non-POSIX filesystems have problems handling deleted files, but that
was always the case. That's one of the reasons they are seldomly used
for root filesystems.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-03-31 23:15:36

by Ric Wheeler

[permalink] [raw]
Subject: Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On 03/31/2013 06:50 PM, Pavel Machek wrote:
> On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote:
>> On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:
>>>>>>> Hmm. open_deleted_file() will still need to get a directory... so it
>>>>>>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
>>>>>>> be acceptable interface?
>>>>>> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
>>>>> Deleted but open files are from original unix, so it should work on
>>>>> anything unixy (minix, ext, ext2, ...).
>>>> minix, ext, ext2... are not under active development and haven't been
>>>> for more than a decade.
>>>>
>>>> Take a look at how many actively used filesystems out there that have
>>>> some variant of sillyrename(), and explain what you want to do in those
>>>> cases.
>>> Well. Yes, there are non-unix filesystems around. You have to deal
>>> with silly files on them, and this will not be different.
>> So this would be a local POSIX filesystem only solution to a problem
>> that has yet to be formulated?
> Problem is "clasical create temp file then delete it" is racy. See the
> archives. That is useful & common operation.

Which race are you concerned with exactly?

User wants to test for a file with name "foo.txt"

* create "foo.txt~" (or whatever)
* write contents into "foo.txt~"
* rename "foo.txt~" to "foo.txt"

Until rename is done, the file does not exists and is not complete. You will
potentially have a garbage file to clean up if the program (or system) crashes,
but that is not racy in a classic sense, right?

This is more of a garbage clean up issue?

Regards,

Ric

>
> Problem is "atomicaly create file at target location with guaranteed
> right content". That's also in the archives. Looks useful if someone
> does rsync from your directory.
>
> Non-POSIX filesystems have problems handling deleted files, but that
> was always the case. That's one of the reasons they are seldomly used
> for root filesystems.
>
> Pavel

2013-03-31 23:18:57

by Pavel Machek

[permalink] [raw]
Subject: Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

Hi!

> >>>>Take a look at how many actively used filesystems out there that have
> >>>>some variant of sillyrename(), and explain what you want to do in those
> >>>>cases.
> >>>Well. Yes, there are non-unix filesystems around. You have to deal
> >>>with silly files on them, and this will not be different.
> >>So this would be a local POSIX filesystem only solution to a problem
> >>that has yet to be formulated?
> >Problem is "clasical create temp file then delete it" is racy. See the
> >archives. That is useful & common operation.
>
> Which race are you concerned with exactly?
>
> User wants to test for a file with name "foo.txt"
>
> * create "foo.txt~" (or whatever)
> * write contents into "foo.txt~"
> * rename "foo.txt~" to "foo.txt"
>
> Until rename is done, the file does not exists and is not complete.
> You will potentially have a garbage file to clean up if the program
> (or system) crashes, but that is not racy in a classic sense, right?

Well. If people rsync from you, they will start fetching incomplete
foo.txt~. Plus the garbage issue.

> This is more of a garbage clean up issue?

Also. Plus sometimes you want temporary "file" that is
deleted. Terminals use it for history, etc...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-03-31 23:29:44

by Ric Wheeler

[permalink] [raw]
Subject: Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On 03/31/2013 07:18 PM, Pavel Machek wrote:
> Hi!
>
>>>>>> Take a look at how many actively used filesystems out there that have
>>>>>> some variant of sillyrename(), and explain what you want to do in those
>>>>>> cases.
>>>>> Well. Yes, there are non-unix filesystems around. You have to deal
>>>>> with silly files on them, and this will not be different.
>>>> So this would be a local POSIX filesystem only solution to a problem
>>>> that has yet to be formulated?
>>> Problem is "clasical create temp file then delete it" is racy. See the
>>> archives. That is useful & common operation.
>> Which race are you concerned with exactly?
>>
>> User wants to test for a file with name "foo.txt"
>>
>> * create "foo.txt~" (or whatever)
>> * write contents into "foo.txt~"
>> * rename "foo.txt~" to "foo.txt"
>>
>> Until rename is done, the file does not exists and is not complete.
>> You will potentially have a garbage file to clean up if the program
>> (or system) crashes, but that is not racy in a classic sense, right?
> Well. If people rsync from you, they will start fetching incomplete
> foo.txt~. Plus the garbage issue.

That is not racy, just garbage (not trying to be pedantic, just trying to
understand). I can see that the "~" file is annoying, but we have dealt with it
for a *long* time :)

Until it has the right name (on either the source or target system for rsync),
it is not the file you are looking for.
>
>> This is more of a garbage clean up issue?
> Also. Plus sometimes you want temporary "file" that is
> deleted. Terminals use it for history, etc...

There you would have a race, you can create a file and unlink it of course and
still write to it, but you would have a potential empty file issue?

Ric

2013-03-31 23:41:37

by Pavel Machek

[permalink] [raw]
Subject: Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

Hi!

> >>User wants to test for a file with name "foo.txt"
> >>
> >>* create "foo.txt~" (or whatever)
> >>* write contents into "foo.txt~"
> >>* rename "foo.txt~" to "foo.txt"
> >>
> >>Until rename is done, the file does not exists and is not complete.
> >>You will potentially have a garbage file to clean up if the program
> >>(or system) crashes, but that is not racy in a classic sense, right?
> >Well. If people rsync from you, they will start fetching incomplete
> >foo.txt~. Plus the garbage issue.
>
> That is not racy, just garbage (not trying to be pedantic, just
> trying to understand). I can see that the "~" file is annoying, but
> we have dealt with it for a *long* time :)

Ok, so lets keep it at "~" is annoying :-).

[But... I was wrong. openat(..., AT_UNLINKED) is not enough to solve
this: we do not have flink() and it is not easily possible to link
deleted file "back to life" from /proc/self/fd:

pavel@amd:/tmp$ > delme
pavel@amd:/tmp$ bash 3< delme &
[2] 32667
[2]+ Stopped bash 3< delme
pavel@amd:/tmp$ fg
bash 3< delme
pavel@amd:/tmp$ ls -al delme
-rw-r--r-- 1 pavel pavel 0 Apr 1 01:36 delme
pavel@amd:/tmp$ ls -al /proc/self/fd/3
lr-x------ 1 pavel pavel 64 Apr 1 01:37 /proc/self/fd/3 -> /tmp/delme
pavel@amd:/tmp$ rm delme
pavel@amd:/tmp$ ls -al /proc/self/fd/3
lr-x------ 1 pavel pavel 64 Apr 1 01:37 /proc/self/fd/3 -> /tmp/delme
(deleted)
pavel@amd:/tmp$ ln /proc/self/fd/3 delme2
ln: creating hard link `delme2' => `/proc/self/fd/3': Invalid
cross-device link
]

> >>This is more of a garbage clean up issue?
> >Also. Plus sometimes you want temporary "file" that is
> >deleted. Terminals use it for history, etc...
>
> There you would have a race, you can create a file and unlink it of
> course and still write to it, but you would have a potential empty
> file issue?

Yes. openat(..., AT_UNLINKED) solves that -- you'll no longer get
those files. (Not sure they'd be always empty. How do you ensure rm
hits the disk? fsync() on parent directory? Sounds expensive.)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-04-01 15:49:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: New copyfile system call - discuss before LSF?

On Sun, Mar 31, 2013 at 04:36:59AM +0000, Myklebust, Trond wrote:
> On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote:
> > On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
> > <[email protected]> wrote:
> > > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
> > >> On 2013-03-30, at 16:21, Ric Wheeler <[email protected]> wrote:
> > >>
> > >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> > >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek <[email protected]>
> > >> >> wrote:
> > >> >>
> > >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> > >> >>>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> > >> >>>>> Hmm, really? AFAICT it would be simple to provide an
> > >> >>>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > >> >>>>> copy source file into it, then fsync(), then link it into filesystem.
> > >> >>>>>
> > >> >>>>> That should have atomicity properties reflected.
> > >> >>>> Actually, the open_deleted_file() syscall is quite useful for many
> > >> >>>> different things all by itself. Lots of applications need to create
> > >> >>>> temporary files that are unlinked at application failure (without a
> > >> >>>> race if app crashes after creating the file, but before unlinking).
> > >> >>>> It also avoids exposing temporary files into the namespace if other
> > >> >>>> applications are accessing the directory.
> > >> >>> Hmm. open_deleted_file() will still need to get a directory... so it
> > >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > >> >>> be acceptable interface?
> > >> >>> Pavel
> > >> >> ...and what's the big plan to make this work on anything other than ext4 and btrfs?
> > >> >>
> > >> >> Cheers,
> > >> >> Trond
> > >> >
> > >> > I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get "atomic" file creation for several decades now ?
> > >>
> > >> Using open()+rename() has side effects:
> > >> - changes ctime/mtime on parent directory
> > >> - leaves temporary file in path during creation
> > >> - leaves temporary file in namespace during operations, and after crash
> > >
> > > So what is the actual problem that is being solved? Yes, the above may
> > > be disadvantages, but none of them have proven to be show-stoppers so
> > > far.
> > >
> > > So far, I've seen no justification for Andy's atomicity requirement
> > > other than "it would be nice if...". That's not enough IMO...
> >
> > ISTM vpsendfile (or whatever it's called) plus a way to create deleted
> > files plus a way to relink deleted files gives atomic copies. Perhaps
> > this is less efficient than would be ideal for OCFS2, though.
>
> What real-life problem does the atomicity requirement solve?

I've occasionally wondered whether something like that would help an nfs
server implement atomic v4 open (which can acquire share locks and set
attributes): create an anonymous file, get the locks and set the
attributes, then link it in only once all that's succeeded.

I don't know if that actually works--among other problems, I'm not sure
how you'd implement O_CREAT and O_EXCL. Probably it would make more
sense just to add a new open system call that does what we want. (If we
decide we even care that much about perfect atomicity for v4 open
semantics that few clients actually use.)

--b.