Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
someone wants the return code") `fput()` is called asynchronously if a
file is closed as part of a process exiting, i.e., if there was no
explicit `close()` before exit.
If the file was open for writing, also `put_write_access()` is called
asynchronously as part of the async `fput()`.
If that newly written file is an executable, attempting to `execve()`
the new file can fail with `ETXTBSY` if it's called after the writer
process exited but before the async `fput()` has run.
I've confirmed that this issue is absent in v6.2 and reverting
5a8bee63b1 on top of v6.4.10 fixes the regression.
#regzbot introduced: 5a8bee63b1
Jürg
On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <[email protected]> wrote:
>
> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
> someone wants the return code") `fput()` is called asynchronously if a
> file is closed as part of a process exiting, i.e., if there was no
> explicit `close()` before exit.
>
> If the file was open for writing, also `put_write_access()` is called
> asynchronously as part of the async `fput()`.
>
> If that newly written file is an executable, attempting to `execve()`
> the new file can fail with `ETXTBSY` if it's called after the writer
> process exited but before the async `fput()` has run.
Thanks for the report.
At this point, I think it would be best to revert the original patch,
since only v6.4 has it.
The original fix was already a workaround, and I don't see a clear
path forward in this direction. We need to see if there's better
direction.
Ideas?
Thanks,
Miklos
On 8/14/23 13:02, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <[email protected]> wrote:
>>
>> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
>> someone wants the return code") `fput()` is called asynchronously if a
>> file is closed as part of a process exiting, i.e., if there was no
>> explicit `close()` before exit.
>>
>> If the file was open for writing, also `put_write_access()` is called
>> asynchronously as part of the async `fput()`.
>>
>> If that newly written file is an executable, attempting to `execve()`
>> the new file can fail with `ETXTBSY` if it's called after the writer
>> process exited but before the async `fput()` has run.
>
> Thanks for the report.
>
> At this point, I think it would be best to revert the original patch,
> since only v6.4 has it.
>
> The original fix was already a workaround, and I don't see a clear
> path forward in this direction. We need to see if there's better
> direction.
>
> Ideas?
Is there a good reason to flush O_RDONLY?
fuse: Avoid flush for O_RDONLY
From: Bernd Schubert <[email protected]>
A file opened in read-only moded does not have data to be
flushed, so no need to send flush at all.
This also mitigates -EBUSY for executables, which is due to
async flush with commit 5a8bee63b1.
Fixes: 5a8bee63b1 (unless executable opened in rw)
Signed-off-by: Bernd Schubert <[email protected]>
index 89d97f6188e0..e058a6af6751 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -545,7 +545,8 @@ static int fuse_flush(struct file *file, fl_owner_t id)
if (fuse_is_bad(inode))
return -EIO;
- if (ff->open_flags & FOPEN_NOFLUSH && !fm->fc->writeback_cache)
+ if ((ff->open_flags & FOPEN_NOFLUSH && !fm->fc->writeback_cache) ||
+ ((file->f_flags & O_ACCMODE) == O_RDONLY))
return 0;
fa = kzalloc(sizeof(*fa), GFP_KERNEL);
On Mon, 2023-08-14 at 14:28 +0200, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 14:07, Bernd Schubert
> > fuse: Avoid flush for O_RDONLY
> >
> > From: Bernd Schubert <[email protected]>
> >
> > A file opened in read-only moded does not have data to be
> > flushed, so no need to send flush at all.
> >
> > This also mitigates -EBUSY for executables, which is due to
> > async flush with commit 5a8bee63b1.
>
> Does it? If I read the bug report correctly, it's the write case that
> causes EBUSY.
Indeed, I see this when trying to execute a file after a process wrote
to (created) that file. As far as I can tell, `ETXTBSY` can't happen on
exec without the file being opened for writing and thus, I don't see
this patch mitigating this bug.
Jürg
On Mon, 14 Aug 2023 at 14:07, Bernd Schubert <[email protected]> wrote:
>
>
>
> On 8/14/23 13:02, Miklos Szeredi wrote:
> > On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <[email protected]> wrote:
> >>
> >> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
> >> someone wants the return code") `fput()` is called asynchronously if a
> >> file is closed as part of a process exiting, i.e., if there was no
> >> explicit `close()` before exit.
> >>
> >> If the file was open for writing, also `put_write_access()` is called
> >> asynchronously as part of the async `fput()`.
> >>
> >> If that newly written file is an executable, attempting to `execve()`
> >> the new file can fail with `ETXTBSY` if it's called after the writer
> >> process exited but before the async `fput()` has run.
> >
> > Thanks for the report.
> >
> > At this point, I think it would be best to revert the original patch,
> > since only v6.4 has it.
> >
> > The original fix was already a workaround, and I don't see a clear
> > path forward in this direction. We need to see if there's better
> > direction.
> >
> > Ideas?
>
> Is there a good reason to flush O_RDONLY?
->flush() is somewhat of a misnomer, it's the callback for explicit
close(2) and also for implicit close by exit(2).
The reason it's called flush is that nfs/fuse use it to ensure
close-to-open cache consistency, which means flushing dirty data on
close.
On fuse it also used to unlock remote posix locks, and we really know
what other "creative" uses were found for this request. So while we
could turn off sending FLUSH for read-only case (and use SETLK/F_UNLCK
for the remote lock case) but first let's see a use case. Sending
FLUSH can already be disabled by returning ENOSYS.
>
> fuse: Avoid flush for O_RDONLY
>
> From: Bernd Schubert <[email protected]>
>
> A file opened in read-only moded does not have data to be
> flushed, so no need to send flush at all.
>
> This also mitigates -EBUSY for executables, which is due to
> async flush with commit 5a8bee63b1.
Does it? If I read the bug report correctly, it's the write case that
causes EBUSY.
Thanks,
Miklos
On 8/14/23 14:38, Jürg Billeter wrote:
> On Mon, 2023-08-14 at 14:28 +0200, Miklos Szeredi wrote:
>> On Mon, 14 Aug 2023 at 14:07, Bernd Schubert
>>> fuse: Avoid flush for O_RDONLY
>>>
>>> From: Bernd Schubert <[email protected]>
>>>
>>> A file opened in read-only moded does not have data to be
>>> flushed, so no need to send flush at all.
>>>
>>> This also mitigates -EBUSY for executables, which is due to
>>> async flush with commit 5a8bee63b1.
>>
>> Does it? If I read the bug report correctly, it's the write case that
>> causes EBUSY.
>
> Indeed, I see this when trying to execute a file after a process wrote
> to (created) that file. As far as I can tell, `ETXTBSY` can't happen on
> exec without the file being opened for writing and thus, I don't see
> this patch mitigating this bug.
Sorry, my fault, I should have read your message/report more carefully.
Bernd
On Mon, Aug 14, 2023 at 01:02:35PM +0200, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 08:03, J?rg Billeter <[email protected]> wrote:
> >
> > Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
> > someone wants the return code") `fput()` is called asynchronously if a
> > file is closed as part of a process exiting, i.e., if there was no
> > explicit `close()` before exit.
> >
> > If the file was open for writing, also `put_write_access()` is called
> > asynchronously as part of the async `fput()`.
> >
> > If that newly written file is an executable, attempting to `execve()`
> > the new file can fail with `ETXTBSY` if it's called after the writer
> > process exited but before the async `fput()` has run.
>
> Thanks for the report.
>
> At this point, I think it would be best to revert the original patch,
> since only v6.4 has it.
I agree.
> The original fix was already a workaround, and I don't see a clear
> path forward in this direction. We need to see if there's better
> direction.
>
> Ideas?
It seems like we really do need to wait here. I guess that means we
need some kind of exit-proof wait?
Tycho
On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <[email protected]> wrote:
> It seems like we really do need to wait here. I guess that means we
> need some kind of exit-proof wait?
Could you please recap the original problem?
Thanks,
Miklos
On 8/14/23 16:00, Tycho Andersen wrote:
> On Mon, Aug 14, 2023 at 01:02:35PM +0200, Miklos Szeredi wrote:
>> On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <[email protected]> wrote:
>>>
>>> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
>>> someone wants the return code") `fput()` is called asynchronously if a
>>> file is closed as part of a process exiting, i.e., if there was no
>>> explicit `close()` before exit.
>>>
>>> If the file was open for writing, also `put_write_access()` is called
>>> asynchronously as part of the async `fput()`.
>>>
>>> If that newly written file is an executable, attempting to `execve()`
>>> the new file can fail with `ETXTBSY` if it's called after the writer
>>> process exited but before the async `fput()` has run.
>>
>> Thanks for the report.
>>
>> At this point, I think it would be best to revert the original patch,
>> since only v6.4 has it.
>
> I agree.
>
>> The original fix was already a workaround, and I don't see a clear
>> path forward in this direction. We need to see if there's better
>> direction.
>>
>> Ideas?
>
> It seems like we really do need to wait here. I guess that means we
> need some kind of exit-proof wait?
I'm not sure how hackish it is, if fuse_flush gets converted to
queue_work() and with a new work-queue in struct fuse_inode. That
work_queue could be flushed through a new inode operation from
do_open_execat.
Bernd
On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <[email protected]> wrote:
>
> > It seems like we really do need to wait here. I guess that means we
> > need some kind of exit-proof wait?
>
> Could you please recap the original problem?
Sure, the symptom is a deadlock, something like:
# cat /proc/1528591/stack
[<0>] do_wait+0x156/0x2f0
[<0>] kernel_wait4+0x8d/0x140
[<0>] zap_pid_ns_processes+0x104/0x180
[<0>] do_exit+0xa41/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] __x64_sys_exit_group+0x14/0x20
[<0>] do_syscall_64+0x37/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
which is stuck waiting for:
# cat /proc/1544574/stack
[<0>] request_wait_answer+0x12f/0x210
[<0>] fuse_simple_request+0x109/0x2c0
[<0>] fuse_flush+0x16f/0x1b0
[<0>] filp_close+0x27/0x70
[<0>] put_files_struct+0x6b/0xc0
[<0>] do_exit+0x360/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x140/0x870
[<0>] arch_do_signal_or_restart+0xae/0x7c0
[<0>] exit_to_user_mode_prepare+0x10f/0x1c0
[<0>] syscall_exit_to_user_mode+0x26/0x40
[<0>] do_syscall_64+0x46/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
I have a reproducer here:
https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7
The problem is that the second thread has called do_exit() ->
exit_signals(), but then tries to request_wait_answer() which uses the
core wait primitives that no longer get woken up from signals due to
the code in exit_signals(). So when we try to exit the pid ns, the
whole cleanup hangs.
It seems we really do need to wait in do_exit(), otherwise we get
the behavior described in this regression...
Tycho
On Mon, Aug 21, 2023 at 05:31:48PM +0200, Miklos Szeredi wrote:
(Apologies for the delay, I have been away without cell signal for
some time.)
> > I think the idea is that they're saving snapshots of their own threads
> > to the fs for debugging purposes.
>
> This seems a fairly special situation. Have they (whoever they may
> be) thought about fixing this in their server?
Sorry, "we" here is some internal team that works for my employer
Netflix. We can't use imap clients on our corporate e-mails, whee.
> > Whether this is a sane thing to do or not, it doesn't seem like it
> > should deadlock pid ns destruction.
>
> True. So the suggested solution is to allow wait_event_killable() to
> return if a terminal signal is pending in the exiting state and only
> in that case turn the flush into a background request? That would
> still allow for regressions like the one reported, but that would be
> much less likely to happen in real life. Okay, I said this for the
> original solution as well, so this may turn out to be wrong as well.
I wonder if there's room here for a completion that doesn't use the
wait primitives. Something like an atomic + queuing in task_work()
would both fix this bug and not exhibit this regression, IIUC.
> Anyway, I'd prefer if this was fixed in the server code, as it looks
> fairly special and adding complexity to the kernel for this case might
> not be justifiable. But I'm also open to suggestions on fixing this
> in the kernel in a not too complex manner.
I don't think this is specific to the server-accessing-its-own-file
case. My reproducer uses that because I didn't quite understand the
bug fully at the time. I believe that *any* task that is killed with
an inflight fuse request will exhibit this. We have seen this fairly
rarely on another fuse fs we use throughout the fleet:
https://github.com/lxc/lxcfs and it doesn't really do anything
strange, and is mounted from the host's pid ns.
Tycho