2023-09-08 18:03:48

by John Stoffel

[permalink] [raw]
Subject: Re: [PATCH] fix writing to the filesystem after unmount

>>>>> "Christian" == Christian Brauner <[email protected]> writes:

>> Well, currently you click some "Eject / safely remove / whatever" button
>> and then you get a "wait" dialog until everything is done after which
>> you're told the stick is safe to remove. What I imagine is that the "wait"
>> dialog needs to be there while there are any (or exclusive at minimum) openers
>> of the device. Not until umount(2) syscall has returned. And yes, the

> Agreed. umount(2) doesn't give guarantees about a filesystem being
> really gone once it has returned. And it really shouldn't. There's
> too many cases where that doesn't work and it's not a commitment we
> should make.

So how the heck is someone supposed to know, from userspace, that a
filesystem is unmounted? Just wearing my SysAdmin hat, this strikes
me as really potentially painful and annoying. But then again, so are
bind mounts from alot of views too.

Don't people remember how bad it can be when you are trying to
shutdown and system and it hangs because a remote NFS server is down
and not responding? And your client system hangs completely?

> And there are ways to wait until superblock shutdown that I've
> mentioned before in other places where it somehow really
> matters. inotify's IN_UMOUNT will notify about superblock
> shutdown. IOW, when it really hits generic_shutdown_super() which
> can only be hit after unfreezing as that requires active references.

Can we maybe invert this discussion and think about it from the
userspace side of things? How does the user(space) mount/unmount
devices cleanly and reliably?

> So this really can be used to wait for a filesystem to go away across
> all namespaces, and across filesytem freezing and it's available to
> unprivileged users. Simple example:

> # shell 1
> sudo mount -t xfs /dev/sda /mnt
> sudo mount --bind /mnt /opt
> inotifywait -e unmount /mnt

> #shell 2
> sudo umount /opt # nothing happens in shell 1
> sudo umount /mnt # shell 1 gets woken

So what makes this *useful* to anyone? Why doesn't the bind mount
A) lock /mnt into place, but B) give you some way of seeing that
there's a bindmount in place?

>> corner-cases. So does the current behavior, I agree, but improving
>> situation for one usecase while breaking another usecase isn't really a way
>> forward...

> Agreed.

>> Well, the filesystem (struct superblock to be exact) is invisible
>> in /proc/mounts (or whatever), that is true. But it is still very
>> much associated with that block device and if you do 'mount
>> <device> <mntpoint>', you'll get it back. But yes, the filesystem
>> will not go away

Then should it be unmountable in the first place? I mean yes, we
always need a way to force an unmount no matter what, even if that
breaks some other process on the system, but for regular use,
shouldn't it just give back an error like:

/mnt in use by bind mount /opt

or something like that? Give the poor sysadmin some information on
what's going on here.

> And now we at least have an api to detect that case and refuse to reuse
> the superblock.

>> until all references to it are dropped and you cannot easily find
>> who holds those references and how to get rid of them.

ding ding ding!!!! I don't want to have to run 'lsof' or something
like that.

> Namespaces make this even messier. You have no easy way of knowing
> whether the filesystem isn't pinned somewhere else through an
> explicit bind-mount or when it was copied during mount namespace
> creation.

This is the biggest downside of namespaces and bind mounts in my
mind. The lack of traceability back to the source.