Hi,
I recently ran into a situation where a virtiofs client began
encountering EBADF after the client / guest system had an OOM. After
reproducing the issue and debugging, the problem is caused by a
virtiofsd submount having the nodeid of its root dentry fogotten. This
occurs because it borrows the reference for this dentry from the parent
that is passed into the function.
In this particular case, the submount had been bind mounted into a
container's mount namespace. The reference count on the original parent
dentry was 0, making it eligible for eviction. However, because this
dentry was also the last reference the fuse client knew it had, it sent
a forget message to the server. This caused all future references to
the FUSE node-id from virtiofsd perspective to become invalid.
Subsequent attempts to use the node-id for operations against the
submount's root received an EBADF from the server.
This pair of patches modifies the virtiofs submount code to perform a
lookup on the nodeid that forms the root of the submount. The patch
before this pulls the revalidate lookup code into a helper function that
can be used both in revalidate and submount superblock fill.
Tested via:
- fstests for virtiofs
- fstests for fuse (against passthrough_ll)
- manual testing to watch how refcounts change between client and server
in response to filesytem access, umount, and eviction by the shrinker.
This resend has rebased against the latest tip of fuse/for-next and
massaged the commit messages in the patches, but hasn't made any
functional modifications since the original v2.
There's also been an issue opened with the project that uses this
functionality. More details on that can be found at [1].
Changes since v1:
- Cleanups to pacify test robot
Changes since RFC:
- Modified fuse_fill_super_submount to always fail if dentry cannot be
revalidated. (Feedback from Bernd Schubert)
- Fixed up an edge case where looked up but subsequently declared
invalid dentries were not correctly tracking nlookup. (Error was
introduced in my RFC).
Thanks,
-K
[1] https://github.com/kata-containers/kata-containers/issues/8040
Krister Johansen (2):
fuse: revalidate: move lookup into a separate function
fuse: ensure that submounts lookup their parent
fs/fuse/dir.c | 85 +++++++++++++++++++++++++++++++++---------------
fs/fuse/fuse_i.h | 6 ++++
fs/fuse/inode.c | 43 ++++++++++++++++++++----
3 files changed, 101 insertions(+), 33 deletions(-)
--
2.25.1
Move the lookup parts of fuse_dentry_revalidate into a common function.
This function will be used elsewhere in a separate commit. In the
meantime, the new function fuse_dentry_revalidate_lookup is responsible
for just the lookup and validation portions of the revalidate dance.
The fuse_dentry_revalidate function retains the responsibility for
invalidating and mutating any state associated with the origial
fuse_inode and dentry.
The rationale for this refactoring is to allow a lookup to be triggered
as part of creating a submount. The submount code lives in another
module. A subsequent commit will utilize the common function that is
created by this commit.
Signed-off-by: Krister Johansen <[email protected]>
---
fs/fuse/dir.c | 85 +++++++++++++++++++++++++++++++++++----------------
1 file changed, 58 insertions(+), 27 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d707e6987da9..5e01946d7531 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -183,6 +183,57 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
args->out_args[0].value = outarg;
}
+static int fuse_dentry_revalidate_lookup(struct fuse_mount *fm,
+ struct dentry *entry,
+ struct inode *inode,
+ struct fuse_entry_out *outarg,
+ bool *lookedup)
+{
+ struct dentry *parent;
+ struct fuse_forget_link *forget;
+ FUSE_ARGS(args);
+ int ret;
+
+ forget = fuse_alloc_forget();
+ ret = -ENOMEM;
+ if (!forget)
+ goto out;
+
+ parent = dget_parent(entry);
+ fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
+ &entry->d_name, outarg);
+ ret = fuse_simple_request(fm, &args);
+ dput(parent);
+
+ /* Zero nodeid is same as -ENOENT */
+ if (!ret && !outarg->nodeid)
+ ret = -ENOENT;
+ if (!ret) {
+ if (outarg->nodeid != get_node_id(inode) ||
+ (bool) IS_AUTOMOUNT(inode) != (bool) (outarg->attr.flags & FUSE_ATTR_SUBMOUNT)) {
+ fuse_queue_forget(fm->fc, forget,
+ outarg->nodeid, 1);
+ goto invalid;
+ }
+ *lookedup = true;
+ }
+ kfree(forget);
+ if (ret == -ENOMEM || ret == -EINTR)
+ goto out;
+ if (ret || fuse_invalid_attr(&outarg->attr) ||
+ fuse_stale_inode(inode, outarg->generation, &outarg->attr)) {
+ goto invalid;
+ }
+
+ ret = 1;
+out:
+ return ret;
+
+invalid:
+ ret = 0;
+ goto out;
+}
+
/*
* Check whether the dentry is still valid
*
@@ -206,9 +257,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
(flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET))) {
struct fuse_entry_out outarg;
- FUSE_ARGS(args);
- struct fuse_forget_link *forget;
u64 attr_version;
+ bool lookedup = false;
/* For negative dentries, always do a fresh lookup */
if (!inode)
@@ -220,38 +270,19 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
fm = get_fuse_mount(inode);
- forget = fuse_alloc_forget();
- ret = -ENOMEM;
- if (!forget)
- goto out;
-
attr_version = fuse_get_attr_version(fm->fc);
- parent = dget_parent(entry);
- fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
- &entry->d_name, &outarg);
- ret = fuse_simple_request(fm, &args);
- dput(parent);
- /* Zero nodeid is same as -ENOENT */
- if (!ret && !outarg.nodeid)
- ret = -ENOENT;
- if (!ret) {
+ ret = fuse_dentry_revalidate_lookup(fm, entry, inode, &outarg,
+ &lookedup);
+ if (ret == -ENOMEM || ret == -EINTR)
+ goto out;
+ if (lookedup) {
fi = get_fuse_inode(inode);
- if (outarg.nodeid != get_node_id(inode) ||
- (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
- fuse_queue_forget(fm->fc, forget,
- outarg.nodeid, 1);
- goto invalid;
- }
spin_lock(&fi->lock);
fi->nlookup++;
spin_unlock(&fi->lock);
}
- kfree(forget);
- if (ret == -ENOMEM || ret == -EINTR)
- goto out;
- if (ret || fuse_invalid_attr(&outarg.attr) ||
- fuse_stale_inode(inode, outarg.generation, &outarg.attr))
+ if (ret <= 0)
goto invalid;
forget_all_cached_acls(inode);
--
2.25.1
On 10/2/23 17:24, Krister Johansen wrote:
> Hi,
> I recently ran into a situation where a virtiofs client began
> encountering EBADF after the client / guest system had an OOM. After
> reproducing the issue and debugging, the problem is caused by a
> virtiofsd submount having the nodeid of its root dentry fogotten. This
> occurs because it borrows the reference for this dentry from the parent
> that is passed into the function.
Sorry, I didn't forget you, just didn't manage to review the 2nd version
yet. Will definitely do this week.
Please also note that there will be merge conflicts with atomic open
patches from Dharmendra/me. Although probably not too difficult to resolve.
Thanks,
Bernd
On Tue, Oct 03, 2023 at 12:18:42AM +0200, Bernd Schubert wrote:
>
>
> On 10/2/23 17:24, Krister Johansen wrote:
> > Hi,
> > I recently ran into a situation where a virtiofs client began
> > encountering EBADF after the client / guest system had an OOM. After
> > reproducing the issue and debugging, the problem is caused by a
> > virtiofsd submount having the nodeid of its root dentry fogotten. This
> > occurs because it borrows the reference for this dentry from the parent
> > that is passed into the function.
>
>
> Sorry, I didn't forget you, just didn't manage to review the 2nd version
> yet. Will definitely do this week.
Thanks; I appreciate the feedback you've provided so far.
> Please also note that there will be merge conflicts with atomic open patches
> from Dharmendra/me. Although probably not too difficult to resolve.
Sure. I'm happy to reparent, resolve those conflicts, re-test, and send
another revision when we're ready. I suspect there are going to be
additional changes requested on the v2. With that in mind, I'll hold
off for the moment unless it is going to cause headaches for you.
For the atomic-open-revalidate changes: should I be working from what's
on the list? This is the most recent patchset I see:
https://lore.kernel.org/linux-fsdevel/[email protected]/
I found a 6.5 relative tree of yours on GitHub by following the libfuse
pull request, but nothing that seemed in sync with fuse/for-next.
Thanks,
-K
On 10/3/23 18:48, Krister Johansen wrote:
> On Tue, Oct 03, 2023 at 12:18:42AM +0200, Bernd Schubert wrote:
>>
>>
>> On 10/2/23 17:24, Krister Johansen wrote:
>>> Hi,
>>> I recently ran into a situation where a virtiofs client began
>>> encountering EBADF after the client / guest system had an OOM. After
>>> reproducing the issue and debugging, the problem is caused by a
>>> virtiofsd submount having the nodeid of its root dentry fogotten. This
>>> occurs because it borrows the reference for this dentry from the parent
>>> that is passed into the function.
>>
>>
>> Sorry, I didn't forget you, just didn't manage to review the 2nd version
>> yet. Will definitely do this week.
>
> Thanks; I appreciate the feedback you've provided so far.
>
>> Please also note that there will be merge conflicts with atomic open patches
>> from Dharmendra/me. Although probably not too difficult to resolve.
>
> Sure. I'm happy to reparent, resolve those conflicts, re-test, and send
> another revision when we're ready. I suspect there are going to be
> additional changes requested on the v2. With that in mind, I'll hold
> off for the moment unless it is going to cause headaches for you.
I certainly also didn't mean that you should check for merge conflicts,
it was more an annotation that it might come up - depending on the merge
order. Please don't stop to do improvements, resolving merge conflicts
shouldn't be difficult.
I'm going to add you to the atomic open patch series to keep you
updated, if you don't mind.
>
> For the atomic-open-revalidate changes: should I be working from what's
> on the list? This is the most recent patchset I see:
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/
>
> I found a 6.5 relative tree of yours on GitHub by following the libfuse
> pull request, but nothing that seemed in sync with fuse/for-next.
I don't think there are conflicts with fuse-next right now, but I can
check.
Thanks,
Bernd
On Wed, Oct 04, 2023 at 12:54:49AM +0200, Bernd Schubert wrote:
>
>
> On 10/3/23 18:48, Krister Johansen wrote:
> > On Tue, Oct 03, 2023 at 12:18:42AM +0200, Bernd Schubert wrote:
> > >
> > >
> > > On 10/2/23 17:24, Krister Johansen wrote:
> > > > Hi,
> > > > I recently ran into a situation where a virtiofs client began
> > > > encountering EBADF after the client / guest system had an OOM. After
> > > > reproducing the issue and debugging, the problem is caused by a
> > > > virtiofsd submount having the nodeid of its root dentry fogotten. This
> > > > occurs because it borrows the reference for this dentry from the parent
> > > > that is passed into the function.
> > >
> > > Please also note that there will be merge conflicts with atomic open patches
> > > from Dharmendra/me. Although probably not too difficult to resolve.
> >
> > Sure. I'm happy to reparent, resolve those conflicts, re-test, and send
> > another revision when we're ready. I suspect there are going to be
> > additional changes requested on the v2. With that in mind, I'll hold
> > off for the moment unless it is going to cause headaches for you.
>
> I certainly also didn't mean that you should check for merge conflicts, it
> was more an annotation that it might come up - depending on the merge order.
> Please don't stop to do improvements, resolving merge conflicts shouldn't be
> difficult.
> I'm going to add you to the atomic open patch series to keep you updated, if
> you don't mind.
Thanks, no objections from me. I'm willing to help with any conflict
resolution or retesting tasks, if anything turns out to be non-trivial.
My goal is to get these patches to the state where they're acceptable.
I'm happy to make additional changes, or work against a different
branch.
-K