2020-01-14 16:35:20

by David Howells

[permalink] [raw]
Subject: Making linkat() able to overwrite the target

With my rewrite of fscache and cachefiles:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

when a file gets invalidated by the server - and, under some circumstances,
modified locally - I have the cache create a temporary file with vfs_tmpfile()
that I'd like to just link into place over the old one - but I can't because
vfs_link() doesn't allow you to do that. Instead I have to either unlink the
old one and then link the new one in or create it elsewhere and rename across.

Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE, that
causes the target to be replaced and not give EEXIST? Or make it so that
rename() can take a tmpfile as the source and replace the target with that. I
presume that, either way, this would require journal changes on ext4, xfs and
btrfs.

Thanks,
David


2020-01-14 17:04:04

by Al Viro

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

On Tue, Jan 14, 2020 at 04:34:25PM +0000, David Howells wrote:
> With my rewrite of fscache and cachefiles:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> when a file gets invalidated by the server - and, under some circumstances,
> modified locally - I have the cache create a temporary file with vfs_tmpfile()
> that I'd like to just link into place over the old one - but I can't because
> vfs_link() doesn't allow you to do that. Instead I have to either unlink the
> old one and then link the new one in or create it elsewhere and rename across.
>
> Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE, that
> causes the target to be replaced and not give EEXIST? Or make it so that
> rename() can take a tmpfile as the source and replace the target with that. I
> presume that, either way, this would require journal changes on ext4, xfs and
> btrfs.

Umm... I don't like the idea of linkat() doing that - you suddenly get new
fun cases to think about (what should happen when the target is a mountpoint,
for starters?) _and_ you would have to add a magical flag to vfs_link() so
that it would know which tests to do. As for rename... How would that
work? AT_EMPTY_PATH for source? What happens if two threads do that
at the same time? Should that case be always "create a new link, even
if you've got it by plain lookup somewhere"? Worse, suppose you do that
to given tmpfile; what should happen to /proc/self/fd/* link to it? Should
it point to new location, or...?

2020-01-14 19:38:05

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

On Tue, Jan 14, 2020 at 7:06 PM David Howells <[email protected]> wrote:
>
> Al Viro <[email protected]> wrote:
>
> > > Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE,
> > > that causes the target to be replaced and not give EEXIST? Or make it so
> > > that rename() can take a tmpfile as the source and replace the target with
> > > that. I presume that, either way, this would require journal changes on
> > > ext4, xfs and btrfs.
> >
> > Umm... I don't like the idea of linkat() doing that - you suddenly get new
> > fun cases to think about (what should happen when the target is a mountpoint,
> > for starters?
>
> Don't allow it onto directories, S_AUTOMOUNT-marked inodes or anything that's
> got something mounted on it.
>
> > ) _and_ you would have to add a magical flag to vfs_link() so
> > that it would know which tests to do.
>
> Yes, I suggested AT_LINK_REPLACE as said magical flag.
>
> > As for rename...
>
> Yeah - with further thought, rename() doesn't really work as an interface,
> particularly if a link has already been made.
>
> Do you have an alternative suggestion? There are two things I want to avoid:
>
> (1) Doing unlink-link or unlink-create as that leaves a window where the
> cache file is absent.
>
> (2) Creating replacement files in a temporary directory and renaming from
> there over the top of the target file as the temp dir would then be a
> bottleneck that spends a lot of time locked for creations and renames.

Create multiple sub-temp-dirs and use them alternatively.

I think there was a report for overlayfs with the same bottleneck
(copy up uses a temp dir, but now only for non-regular). Hasn't
gotten around to implementing this idea yet.

Thanks,
Miklos

2020-01-15 08:36:49

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

On Tue, Jan 14, 2020 at 04:34:25PM +0000, David Howells wrote:
>
> when a file gets invalidated by the server - and, under some circumstances,
> modified locally - I have the cache create a temporary file with vfs_tmpfile()
> that I'd like to just link into place over the old one - but I can't because
> vfs_link() doesn't allow you to do that. Instead I have to either unlink the
> old one and then link the new one in or create it elsewhere and rename across.
>
> Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE, that
> causes the target to be replaced and not give EEXIST? Or make it so that
> rename() can take a tmpfile as the source and replace the target with that. I
> presume that, either way, this would require journal changes on ext4, xfs and
> btrfs.

This sounds like a very useful primitive, and from the low-level XFS
point of view should be very easy to implement and will not require any
on-disk changes. I can't really think of any good userspace interface but
a new syscall, though.

2020-01-17 03:53:10

by Colin Walters

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

On Tue, Jan 14, 2020, at 1:06 PM, David Howells wrote:

> Yes, I suggested AT_LINK_REPLACE as said magical flag.

This came up before right?

https://lore.kernel.org/linux-fsdevel/[email protected]/

2020-01-17 09:58:05

by Amir Goldstein

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

On Fri, Jan 17, 2020 at 5:52 AM Colin Walters <[email protected]> wrote:
>
> On Tue, Jan 14, 2020, at 1:06 PM, David Howells wrote:
>
> > Yes, I suggested AT_LINK_REPLACE as said magical flag.
>
> This came up before right?
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/

David,

This sounds like a good topic to be discussed at LSF/MM (hint hint)

Thanks,
Amir.

2020-01-17 11:44:04

by David Howells

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

Hi Omar,

Do you still have your AT_REPLACE patches? You said that you'd post a v4
series, though I don't see it. I could make use of such a feature in
cachefiles inside the kernel. For my original question, see:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

And do you have ext4 support for it?

Colin Walters <[email protected]> wrote:

> On Tue, Jan 14, 2020, at 1:06 PM, David Howells wrote:
>
> > Yes, I suggested AT_LINK_REPLACE as said magical flag.
>
> This came up before right?
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/

David

2020-01-17 16:22:25

by Omar Sandoval

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

On Fri, Jan 17, 2020 at 11:42:55AM +0000, David Howells wrote:
> Hi Omar,
>
> Do you still have your AT_REPLACE patches? You said that you'd post a v4
> series, though I don't see it. I could make use of such a feature in
> cachefiles inside the kernel. For my original question, see:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> And do you have ext4 support for it?

Hi,

Yes I still have those patches lying around and I'd be happy to dust
them off and resend them. I don't have ext4 support. I'd be willing to
take a stab at ext4 once Al is happy with the VFS part unless someone
more familiar with ext4 wants to contribute that support.

Thanks for reviving interesting in this!

2020-01-17 16:39:40

by David Howells

[permalink] [raw]
Subject: Re: Making linkat() able to overwrite the target

Omar Sandoval <[email protected]> wrote:

> Yes I still have those patches lying around and I'd be happy to dust
> them off and resend them.

That would be great if you could. I could use them here:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

I'm performing invalidation by creating a vfs_tmpfile() and then replacing the
on-disk file whilst letting ops resume on the temporary file. Replacing the
on-disk file currently, however, involves unlinking the old one before I can
link in a new one - which leaves a window in which nothing is there. I could
use one or more side dirs in which to create new files and rename them over,
but that has potential lock bottleneck issues - and is particularly fun if an
entire volume is invalidated (e.g. AFS vos release).

David