With my rewrite of fscache and cachefiles:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
when a file gets invalidated by the server - and, under some circumstances,
modified locally - I have the cache create a temporary file with vfs_tmpfile()
that I'd like to just link into place over the old one - but I can't because
vfs_link() doesn't allow you to do that. Instead I have to either unlink the
old one and then link the new one in or create it elsewhere and rename across.
Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE, that
causes the target to be replaced and not give EEXIST? Or make it so that
rename() can take a tmpfile as the source and replace the target with that. I
presume that, either way, this would require journal changes on ext4, xfs and
btrfs.
Thanks,
David
On Tue, Jan 14, 2020 at 04:34:25PM +0000, David Howells wrote:
> With my rewrite of fscache and cachefiles:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> when a file gets invalidated by the server - and, under some circumstances,
> modified locally - I have the cache create a temporary file with vfs_tmpfile()
> that I'd like to just link into place over the old one - but I can't because
> vfs_link() doesn't allow you to do that. Instead I have to either unlink the
> old one and then link the new one in or create it elsewhere and rename across.
>
> Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE, that
> causes the target to be replaced and not give EEXIST? Or make it so that
> rename() can take a tmpfile as the source and replace the target with that. I
> presume that, either way, this would require journal changes on ext4, xfs and
> btrfs.
Umm... I don't like the idea of linkat() doing that - you suddenly get new
fun cases to think about (what should happen when the target is a mountpoint,
for starters?) _and_ you would have to add a magical flag to vfs_link() so
that it would know which tests to do. As for rename... How would that
work? AT_EMPTY_PATH for source? What happens if two threads do that
at the same time? Should that case be always "create a new link, even
if you've got it by plain lookup somewhere"? Worse, suppose you do that
to given tmpfile; what should happen to /proc/self/fd/* link to it? Should
it point to new location, or...?
On Tue, Jan 14, 2020 at 7:06 PM David Howells <[email protected]> wrote:
>
> Al Viro <[email protected]> wrote:
>
> > > Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE,
> > > that causes the target to be replaced and not give EEXIST? Or make it so
> > > that rename() can take a tmpfile as the source and replace the target with
> > > that. I presume that, either way, this would require journal changes on
> > > ext4, xfs and btrfs.
> >
> > Umm... I don't like the idea of linkat() doing that - you suddenly get new
> > fun cases to think about (what should happen when the target is a mountpoint,
> > for starters?
>
> Don't allow it onto directories, S_AUTOMOUNT-marked inodes or anything that's
> got something mounted on it.
>
> > ) _and_ you would have to add a magical flag to vfs_link() so
> > that it would know which tests to do.
>
> Yes, I suggested AT_LINK_REPLACE as said magical flag.
>
> > As for rename...
>
> Yeah - with further thought, rename() doesn't really work as an interface,
> particularly if a link has already been made.
>
> Do you have an alternative suggestion? There are two things I want to avoid:
>
> (1) Doing unlink-link or unlink-create as that leaves a window where the
> cache file is absent.
>
> (2) Creating replacement files in a temporary directory and renaming from
> there over the top of the target file as the temp dir would then be a
> bottleneck that spends a lot of time locked for creations and renames.
Create multiple sub-temp-dirs and use them alternatively.
I think there was a report for overlayfs with the same bottleneck
(copy up uses a temp dir, but now only for non-regular). Hasn't
gotten around to implementing this idea yet.
Thanks,
Miklos
On Tue, Jan 14, 2020 at 04:34:25PM +0000, David Howells wrote:
>
> when a file gets invalidated by the server - and, under some circumstances,
> modified locally - I have the cache create a temporary file with vfs_tmpfile()
> that I'd like to just link into place over the old one - but I can't because
> vfs_link() doesn't allow you to do that. Instead I have to either unlink the
> old one and then link the new one in or create it elsewhere and rename across.
>
> Would it be possible to make linkat() take a flag, say AT_LINK_REPLACE, that
> causes the target to be replaced and not give EEXIST? Or make it so that
> rename() can take a tmpfile as the source and replace the target with that. I
> presume that, either way, this would require journal changes on ext4, xfs and
> btrfs.
This sounds like a very useful primitive, and from the low-level XFS
point of view should be very easy to implement and will not require any
on-disk changes. I can't really think of any good userspace interface but
a new syscall, though.
On Tue, Jan 14, 2020, at 1:06 PM, David Howells wrote:
> Yes, I suggested AT_LINK_REPLACE as said magical flag.
This came up before right?
https://lore.kernel.org/linux-fsdevel/[email protected]/
On Fri, Jan 17, 2020 at 5:52 AM Colin Walters <[email protected]> wrote:
>
> On Tue, Jan 14, 2020, at 1:06 PM, David Howells wrote:
>
> > Yes, I suggested AT_LINK_REPLACE as said magical flag.
>
> This came up before right?
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/
David,
This sounds like a good topic to be discussed at LSF/MM (hint hint)
Thanks,
Amir.
Hi Omar,
Do you still have your AT_REPLACE patches? You said that you'd post a v4
series, though I don't see it. I could make use of such a feature in
cachefiles inside the kernel. For my original question, see:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
And do you have ext4 support for it?
Colin Walters <[email protected]> wrote:
> On Tue, Jan 14, 2020, at 1:06 PM, David Howells wrote:
>
> > Yes, I suggested AT_LINK_REPLACE as said magical flag.
>
> This came up before right?
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/
David
On Fri, Jan 17, 2020 at 11:42:55AM +0000, David Howells wrote:
> Hi Omar,
>
> Do you still have your AT_REPLACE patches? You said that you'd post a v4
> series, though I don't see it. I could make use of such a feature in
> cachefiles inside the kernel. For my original question, see:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> And do you have ext4 support for it?
Hi,
Yes I still have those patches lying around and I'd be happy to dust
them off and resend them. I don't have ext4 support. I'd be willing to
take a stab at ext4 once Al is happy with the VFS part unless someone
more familiar with ext4 wants to contribute that support.
Thanks for reviving interesting in this!
Omar Sandoval <[email protected]> wrote:
> Yes I still have those patches lying around and I'd be happy to dust
> them off and resend them.
That would be great if you could. I could use them here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
I'm performing invalidation by creating a vfs_tmpfile() and then replacing the
on-disk file whilst letting ops resume on the temporary file. Replacing the
on-disk file currently, however, involves unlinking the old one before I can
link in a new one - which leaves a window in which nothing is there. I could
use one or more side dirs in which to create new files and rename them over,
but that has potential lock bottleneck issues - and is particularly fun if an
entire volume is invalidated (e.g. AFS vos release).
David