On Tue, May 21, 2024 at 03:46:06PM +0200, Christian Brauner wrote:
> On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > Now that we have stabilised the unique 64-bit mount ID interface in
> > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > provide a file handle and corresponding mount without needing to worry
> > about racing with /proc/mountinfo parsing.
> >
> > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > that doesn't make sense for name_to_handle_at(2).
> >
> > Signed-off-by: Aleksa Sarai <[email protected]>
> > ---
>
> So I think overall this is probably fine (famous last words). If it's
> just about being able to retrieve the new mount id without having to
> take the hit of another statx system call it's indeed a bit much to
> add a revised system call for this. Althoug I did say earlier that I
> wouldn't rule that out.
>
> But if we'd that then it'll be a long discussion on the form of the new
> system call and the information it exposes.
>
> For example, I lack the grey hair needed to understand why
> name_to_handle_at() returns a mount id at all. The pitch in commit
> 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> the (old) mount id can be used to "lookup file system specific
> information [...] in /proc/<pid>/mountinfo".
>
> Granted, that's doable but it'll mean a lot of careful checking to avoid
> races for mount id recycling because they're not even allocated
> cyclically. With lots of containers it becomes even more of an issue. So
> it's doubtful whether exposing the mount id through name_to_handle_at()
> would be something that we'd still do.
>
> So really, if this is just about a use-case where you want to spare the
> additional system call for statx() and you need the mnt_id then
> overloading is probably ok.
>
> But it remains an unpleasant thing to look at.
And I'd like an ok from Jeff and Amir if we're going to try this. :)
On Tue, 2024-05-21 at 16:11 +0200, Christian Brauner wrote:
> On Tue, May 21, 2024 at 03:46:06PM +0200, Christian Brauner wrote:
> > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > provide a file handle and corresponding mount without needing to worry
> > > about racing with /proc/mountinfo parsing.
> > >
> > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > that doesn't make sense for name_to_handle_at(2).
> > >
> > > Signed-off-by: Aleksa Sarai <[email protected]>
> > > ---
> >
> > So I think overall this is probably fine (famous last words). If it's
> > just about being able to retrieve the new mount id without having to
> > take the hit of another statx system call it's indeed a bit much to
> > add a revised system call for this. Althoug I did say earlier that I
> > wouldn't rule that out.
> >
> > But if we'd that then it'll be a long discussion on the form of the new
> > system call and the information it exposes.
> >
> > For example, I lack the grey hair needed to understand why
> > name_to_handle_at() returns a mount id at all. The pitch in commit
> > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > the (old) mount id can be used to "lookup file system specific
> > information [...] in /proc/<pid>/mountinfo".
> >
> > Granted, that's doable but it'll mean a lot of careful checking to avoid
> > races for mount id recycling because they're not even allocated
> > cyclically. With lots of containers it becomes even more of an issue. So
> > it's doubtful whether exposing the mount id through name_to_handle_at()
> > would be something that we'd still do.
> >
> > So really, if this is just about a use-case where you want to spare the
> > additional system call for statx() and you need the mnt_id then
> > overloading is probably ok.
> >
> > But it remains an unpleasant thing to look at.
>
> And I'd like an ok from Jeff and Amir if we're going to try this. :)
I don't have strong feelings about it other than "it looks sort of
ugly", so I'm OK with doing this.
I suspect we will eventually need name_to_handle_at2, or something
similar, as it seems like we're starting to grow some new use-cases for
filehandles, and hitting the limits of the old syscall. I don't have a
good feel for what that should look like though, so I'm happy to put
that off for a while.
--
Jeff Layton <[email protected]>
On Tue, May 21, 2024 at 5:27 PM Jeff Layton <[email protected]> wrote:
>
> On Tue, 2024-05-21 at 16:11 +0200, Christian Brauner wrote:
> > On Tue, May 21, 2024 at 03:46:06PM +0200, Christian Brauner wrote:
> > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > > provide a file handle and corresponding mount without needing to worry
> > > > about racing with /proc/mountinfo parsing.
> > > >
> > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > > that doesn't make sense for name_to_handle_at(2).
> > > >
> > > > Signed-off-by: Aleksa Sarai <[email protected]>
> > > > ---
> > >
> > > So I think overall this is probably fine (famous last words). If it's
> > > just about being able to retrieve the new mount id without having to
> > > take the hit of another statx system call it's indeed a bit much to
> > > add a revised system call for this. Althoug I did say earlier that I
> > > wouldn't rule that out.
> > >
> > > But if we'd that then it'll be a long discussion on the form of the new
> > > system call and the information it exposes.
> > >
> > > For example, I lack the grey hair needed to understand why
> > > name_to_handle_at() returns a mount id at all. The pitch in commit
> > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > > the (old) mount id can be used to "lookup file system specific
> > > information [...] in /proc/<pid>/mountinfo".
> > >
> > > Granted, that's doable but it'll mean a lot of careful checking to avoid
> > > races for mount id recycling because they're not even allocated
> > > cyclically. With lots of containers it becomes even more of an issue. So
> > > it's doubtful whether exposing the mount id through name_to_handle_at()
> > > would be something that we'd still do.
> > >
> > > So really, if this is just about a use-case where you want to spare the
> > > additional system call for statx() and you need the mnt_id then
> > > overloading is probably ok.
> > >
> > > But it remains an unpleasant thing to look at.
> >
> > And I'd like an ok from Jeff and Amir if we're going to try this. :)
>
> I don't have strong feelings about it other than "it looks sort of
> ugly", so I'm OK with doing this.
>
> I suspect we will eventually need name_to_handle_at2, or something
> similar, as it seems like we're starting to grow some new use-cases for
> filehandles, and hitting the limits of the old syscall. I don't have a
> good feel for what that should look like though, so I'm happy to put
> that off for a while.
I'm ok with it, but we cannot possibly allow it without any bikeshedding...
Please call it AT_HANDLE_MNT_ID_UNIQUE to align with
STATX_MNT_ID_UNIQUE
and as I wrote, I do not like overloading the AT_*_SYNC flags
and as there is no other obvious candidate to overload, so
I think that it is best to at least declare in a comment that
/* 0x00ff flags are reserved for per-syscall flags */
and use one of those bits for AT_HANDLE_MNT_ID_UNIQUE.
It does not matter whether we decide to unify the AT_ flags
namespace with RENAME_ flags namespace or not.
The fact that there is a syscall named renameat2() with a flags
argument, means that someone is bound to pass in an AT_ flags
in this syscall sooner or later, so the least we can do is try to
delay the day that this will not result in EINVAL.
Thanks,
Amir.
P.S.: As I mentioned to Jeff in LSFMM, I have a patch in my tree
to add AT_HANDLE_CONNECTABLE which I have not yet
decided if it is upstream worthy.
On 2024-05-21, Amir Goldstein <[email protected]> wrote:
> On Tue, May 21, 2024 at 5:27 PM Jeff Layton <[email protected]> wrote:
> >
> > On Tue, 2024-05-21 at 16:11 +0200, Christian Brauner wrote:
> > > On Tue, May 21, 2024 at 03:46:06PM +0200, Christian Brauner wrote:
> > > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > > > provide a file handle and corresponding mount without needing to worry
> > > > > about racing with /proc/mountinfo parsing.
> > > > >
> > > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > > > that doesn't make sense for name_to_handle_at(2).
> > > > >
> > > > > Signed-off-by: Aleksa Sarai <[email protected]>
> > > > > ---
> > > >
> > > > So I think overall this is probably fine (famous last words). If it's
> > > > just about being able to retrieve the new mount id without having to
> > > > take the hit of another statx system call it's indeed a bit much to
> > > > add a revised system call for this. Althoug I did say earlier that I
> > > > wouldn't rule that out.
> > > >
> > > > But if we'd that then it'll be a long discussion on the form of the new
> > > > system call and the information it exposes.
> > > >
> > > > For example, I lack the grey hair needed to understand why
> > > > name_to_handle_at() returns a mount id at all. The pitch in commit
> > > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > > > the (old) mount id can be used to "lookup file system specific
> > > > information [...] in /proc/<pid>/mountinfo".
> > > >
> > > > Granted, that's doable but it'll mean a lot of careful checking to avoid
> > > > races for mount id recycling because they're not even allocated
> > > > cyclically. With lots of containers it becomes even more of an issue. So
> > > > it's doubtful whether exposing the mount id through name_to_handle_at()
> > > > would be something that we'd still do.
> > > >
> > > > So really, if this is just about a use-case where you want to spare the
> > > > additional system call for statx() and you need the mnt_id then
> > > > overloading is probably ok.
> > > >
> > > > But it remains an unpleasant thing to look at.
> > >
> > > And I'd like an ok from Jeff and Amir if we're going to try this. :)
> >
> > I don't have strong feelings about it other than "it looks sort of
> > ugly", so I'm OK with doing this.
> >
> > I suspect we will eventually need name_to_handle_at2, or something
> > similar, as it seems like we're starting to grow some new use-cases for
> > filehandles, and hitting the limits of the old syscall. I don't have a
> > good feel for what that should look like though, so I'm happy to put
> > that off for a while.
>
> I'm ok with it, but we cannot possibly allow it without any bikeshedding...
>
> Please call it AT_HANDLE_MNT_ID_UNIQUE to align with
> STATX_MNT_ID_UNIQUE
>
> and as I wrote, I do not like overloading the AT_*_SYNC flags
> and as there is no other obvious candidate to overload, so
> I think that it is best to at least declare in a comment that
>
> /* 0x00ff flags are reserved for per-syscall flags */
>
> and use one of those bits for AT_HANDLE_MNT_ID_UNIQUE.
I can switch the flag to use 0x80, but given there are already
exceptions to that rule, it seems unlikely that this is going to be a
strong guarantee going forward. I will add a comment though.
Note that this will mean that we are planning to only have 15 remaining
generic AT_* flags.
> It does not matter whether we decide to unify the AT_ flags
> namespace with RENAME_ flags namespace or not.
>
> The fact that there is a syscall named renameat2() with a flags
> argument, means that someone is bound to pass in an AT_ flags
> in this syscall sooner or later, so the least we can do is try to
> delay the day that this will not result in EINVAL.
While there is a risk this could happen, in theory a user could also
incorrectly pass AT_* to open(). While ergonomics is important, I think
that most users generally read the docs when figuring out how to use
flags for syscalls (mainly because we don't have a unified flag
namespace for all syscalls) so I don't think this is a huge problem.
(But I'm sure I was part of making this problem worse with RESOLVE_*
flags.)
> Thanks,
> Amir.
>
> P.S.: As I mentioned to Jeff in LSFMM, I have a patch in my tree
> to add AT_HANDLE_CONNECTABLE which I have not yet
> decided if it is upstream worthy.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>