2024-05-07 16:49:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Tue, 7 May 2024 at 04:03, Daniel Vetter <[email protected]> wrote:
>
> It's really annoying that on some distros/builds we don't have that, and
> for gpu driver stack reasons we _really_ need to know whether a fd is the
> same as another, due to some messy uniqueness requirements on buffer
> objects various drivers have.

It's sad that such a simple thing would require two other horrid
models (EPOLL or KCMP).

There'[s a reason that KCMP is a config option - *some* of that is
horrible code - but the "compare file descriptors for equality" is not
that reason.

Note that KCMP really is a broken mess. It's also a potential security
hole, even for the simple things, because of how it ends up comparing
kernel pointers (ie it doesn't just say "same file descriptor", it
gives an ordering of them, so you can use KCMP to sort things in
kernel space).

And yes, it orders them after obfuscating the pointer, but it's still
not something I would consider sane as a baseline interface. It was
designed for checkpoint-restore, it's the wrong thing to use for some
"are these file descriptors the same".

The same argument goes for using EPOLL for that. Disgusting hack.

Just what are the requirements for the GPU stack? Is one of the file
descriptors "trusted", IOW, you know what kind it is?

Because dammit, it's *so* easy to do. You could just add a core DRM
ioctl for it. Literally just

struct fd f1 = fdget(fd1);
struct fd f2 = fdget(fd2);
int same;

same = f1.file && f1.file == f2.file;
fdput(fd1);
fdput(fd2);
return same;

where the only question is if you also woudl want to deal with O_PATH
fd's, in which case the "fdget()" would be "fdget_raw()".

Honestly, adding some DRM ioctl for this sounds hacky, but it sounds
less hacky than relying on EPOLL or KCMP.

I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
too, if this is possibly a more common thing. and not just DRM wants
it.

Would something like that work for you?

Linus


2024-05-07 17:45:24

by Christian König

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

Am 07.05.24 um 18:46 schrieb Linus Torvalds:
> On Tue, 7 May 2024 at 04:03, Daniel Vetter <[email protected]> wrote:
>> It's really annoying that on some distros/builds we don't have that, and
>> for gpu driver stack reasons we _really_ need to know whether a fd is the
>> same as another, due to some messy uniqueness requirements on buffer
>> objects various drivers have.
> It's sad that such a simple thing would require two other horrid
> models (EPOLL or KCMP).
>
> There'[s a reason that KCMP is a config option - *some* of that is
> horrible code - but the "compare file descriptors for equality" is not
> that reason.
>
> Note that KCMP really is a broken mess. It's also a potential security
> hole, even for the simple things, because of how it ends up comparing
> kernel pointers (ie it doesn't just say "same file descriptor", it
> gives an ordering of them, so you can use KCMP to sort things in
> kernel space).
>
> And yes, it orders them after obfuscating the pointer, but it's still
> not something I would consider sane as a baseline interface. It was
> designed for checkpoint-restore, it's the wrong thing to use for some
> "are these file descriptors the same".
>
> The same argument goes for using EPOLL for that. Disgusting hack.
>
> Just what are the requirements for the GPU stack? Is one of the file
> descriptors "trusted", IOW, you know what kind it is?
>
> Because dammit, it's *so* easy to do. You could just add a core DRM
> ioctl for it. Literally just
>
> struct fd f1 = fdget(fd1);
> struct fd f2 = fdget(fd2);
> int same;
>
> same = f1.file && f1.file == f2.file;
> fdput(fd1);
> fdput(fd2);
> return same;
>
> where the only question is if you also woudl want to deal with O_PATH
> fd's, in which case the "fdget()" would be "fdget_raw()".
>
> Honestly, adding some DRM ioctl for this sounds hacky, but it sounds
> less hacky than relying on EPOLL or KCMP.
>
> I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
> too, if this is possibly a more common thing. and not just DRM wants
> it.
>
> Would something like that work for you?

Well the generic approach yes, the DRM specific one maybe. IIRC we need
to be able to compare both DRM as well as DMA-buf file descriptors.

The basic problem userspace tries to solve is that drivers might get the
same fd through two different code paths.

For example application using OpenGL/Vulkan for rendering and VA-API for
video decoding/encoding at the same time.

Both APIs get a fd which identifies the device to use. It can be the
same, but it doesn't have to.

If it's the same device driver connection (or in kernel speak underlying
struct file) then you can optimize away importing and exporting of
buffers for example.

Additional to that it makes cgroup accounting much easier because you
don't count things twice because they are shared etc...

Regards,
Christian.

>
> Linus


2024-05-07 19:06:31

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote:
> On Tue, 7 May 2024 at 04:03, Daniel Vetter <[email protected]> wrote:
> >
> > It's really annoying that on some distros/builds we don't have that, and
> > for gpu driver stack reasons we _really_ need to know whether a fd is the
> > same as another, due to some messy uniqueness requirements on buffer
> > objects various drivers have.
>
> It's sad that such a simple thing would require two other horrid
> models (EPOLL or KCMP).
>
> There'[s a reason that KCMP is a config option - *some* of that is
> horrible code - but the "compare file descriptors for equality" is not
> that reason.
>
> Note that KCMP really is a broken mess. It's also a potential security
> hole, even for the simple things, because of how it ends up comparing
> kernel pointers (ie it doesn't just say "same file descriptor", it
> gives an ordering of them, so you can use KCMP to sort things in
> kernel space).
>
> And yes, it orders them after obfuscating the pointer, but it's still
> not something I would consider sane as a baseline interface. It was
> designed for checkpoint-restore, it's the wrong thing to use for some
> "are these file descriptors the same".
>
> The same argument goes for using EPOLL for that. Disgusting hack.
>
> Just what are the requirements for the GPU stack? Is one of the file
> descriptors "trusted", IOW, you know what kind it is?
>
> Because dammit, it's *so* easy to do. You could just add a core DRM
> ioctl for it. Literally just
>
> struct fd f1 = fdget(fd1);
> struct fd f2 = fdget(fd2);
> int same;
>
> same = f1.file && f1.file == f2.file;
> fdput(fd1);
> fdput(fd2);
> return same;
>
> where the only question is if you also woudl want to deal with O_PATH
> fd's, in which case the "fdget()" would be "fdget_raw()".
>
> Honestly, adding some DRM ioctl for this sounds hacky, but it sounds
> less hacky than relying on EPOLL or KCMP.

Well, in slightly more code (because it's part of the "import this
dma-buf/dma-fence/whatever fd into a driver object" ioctl) this is what we
do.

The issue is that there's generic userspace (like compositors) that sees
these things fly by and would also like to know whether the other side
they receive them from is doing nasty stuff/buggy/evil. And they don't
have access to the device drm fd (since those are a handful of layers away
behind the opengl/vulkan userspace drivers even if the compositor could get
at them, and in some cases not even that).

So if we do this in drm we'd essentially have to create a special
drm_compare_files chardev, put the ioctl there and then tell everyone to
make that thing world-accessible.

Which is just too close to a real syscall that it's offensive, and hey
kcmp does what we want already (but unfortunately also way more). So we
rejected adding that to drm. But we did think about it.

> I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
> too, if this is possibly a more common thing. and not just DRM wants
> it.
>
> Would something like that work for you?

Yes.

Adding Simon and Pekka as two of the usual suspects for this kind of
stuff. Also example code (the int return value is just so that callers know
when kcmp isn't available, they all only care about equality):

https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239
-Sima

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-07 19:07:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Tue, 7 May 2024 at 11:04, Daniel Vetter <[email protected]> wrote:
>
> On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote:
>
> > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
> > too, if this is possibly a more common thing. and not just DRM wants
> > it.
> >
> > Would something like that work for you?
>
> Yes.
>
> Adding Simon and Pekka as two of the usual suspects for this kind of
> stuff. Also example code (the int return value is just so that callers know
> when kcmp isn't available, they all only care about equality):
>
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239

That example thing shows that we shouldn't make it a FISAME ioctl - we
should make it a fcntl() instead, and it would just be a companion to
F_DUPFD.

Doesn't that strike everybody as a *much* cleaner interface? I think
F_ISDUP would work very naturally indeed with F_DUPFD.

Yes? No?

Linus

2024-05-08 05:55:24

by Christian König

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

Am 07.05.24 um 21:07 schrieb Linus Torvalds:
> On Tue, 7 May 2024 at 11:04, Daniel Vetter <[email protected]> wrote:
>> On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote:
>>
>>> I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
>>> too, if this is possibly a more common thing. and not just DRM wants
>>> it.
>>>
>>> Would something like that work for you?
>> Yes.
>>
>> Adding Simon and Pekka as two of the usual suspects for this kind of
>> stuff. Also example code (the int return value is just so that callers know
>> when kcmp isn't available, they all only care about equality):
>>
>> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239
> That example thing shows that we shouldn't make it a FISAME ioctl - we
> should make it a fcntl() instead, and it would just be a companion to
> F_DUPFD.
>
> Doesn't that strike everybody as a *much* cleaner interface? I think
> F_ISDUP would work very naturally indeed with F_DUPFD.
>
> Yes? No?

Sounds absolutely sane to me.

Christian.

>
> Linus


2024-05-08 08:00:23

by Michel Dänzer

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On 2024-05-07 19:45, Christian König wrote:
> Am 07.05.24 um 18:46 schrieb Linus Torvalds:
>>
>> Just what are the requirements for the GPU stack? Is one of the file
>> descriptors "trusted", IOW, you know what kind it is?
>>
>> Because dammit, it's *so* easy to do. You could just add a core DRM
>> ioctl for it. Literally just
>>
>>          struct fd f1 = fdget(fd1);
>>          struct fd f2 = fdget(fd2);
>>          int same;
>>
>>          same = f1.file && f1.file == f2.file;
>>          fdput(fd1);
>>          fdput(fd2);
>>          return same;
>>
>> where the only question is if you also woudl want to deal with O_PATH
>> fd's, in which case the "fdget()" would be "fdget_raw()".
>>
>> Honestly, adding some DRM ioctl for this sounds hacky, but it sounds
>> less hacky than relying on EPOLL or KCMP.
>>
>> I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
>> too, if this is possibly a more common thing. and not just DRM wants
>> it.
>>
>> Would something like that work for you?
>
> Well the generic approach yes, the DRM specific one maybe. IIRC we need to be able to compare both DRM as well as DMA-buf file descriptors.
>
> The basic problem userspace tries to solve is that drivers might get the same fd through two different code paths.
>
> For example application using OpenGL/Vulkan for rendering and VA-API for video decoding/encoding at the same time.
>
> Both APIs get a fd which identifies the device to use. It can be the same, but it doesn't have to.
>
> If it's the same device driver connection (or in kernel speak underlying struct file) then you can optimize away importing and exporting of buffers for example.

It's not just about optimization. Mesa needs to know this for correct tracking of GEM handles. If it guesses incorrectly, there can be misbehaviour.


--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer


2024-05-08 08:15:53

by Christian Brauner

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Tue, May 07, 2024 at 12:07:10PM -0700, Linus Torvalds wrote:
> On Tue, 7 May 2024 at 11:04, Daniel Vetter <[email protected]> wrote:
> >
> > On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote:
> >
> > > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
> > > too, if this is possibly a more common thing. and not just DRM wants
> > > it.
> > >
> > > Would something like that work for you?
> >
> > Yes.
> >
> > Adding Simon and Pekka as two of the usual suspects for this kind of
> > stuff. Also example code (the int return value is just so that callers know
> > when kcmp isn't available, they all only care about equality):
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239
>
> That example thing shows that we shouldn't make it a FISAME ioctl - we
> should make it a fcntl() instead, and it would just be a companion to
> F_DUPFD.
>
> Doesn't that strike everybody as a *much* cleaner interface? I think

+1
See
https://github.com/systemd/systemd/blob/a4f0e0da3573a10bc5404142be8799418760b1d1/src/basic/fd-util.c#L517
that's another heavy user of this kind of functionality.

2024-05-08 08:32:47

by Christian Brauner

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Tue, May 07, 2024 at 07:45:02PM +0200, Christian König wrote:
> Am 07.05.24 um 18:46 schrieb Linus Torvalds:
> > On Tue, 7 May 2024 at 04:03, Daniel Vetter <[email protected]> wrote:
> > > It's really annoying that on some distros/builds we don't have that, and
> > > for gpu driver stack reasons we _really_ need to know whether a fd is the
> > > same as another, due to some messy uniqueness requirements on buffer
> > > objects various drivers have.
> > It's sad that such a simple thing would require two other horrid
> > models (EPOLL or KCMP).
> >
> > There'[s a reason that KCMP is a config option - *some* of that is
> > horrible code - but the "compare file descriptors for equality" is not
> > that reason.
> >
> > Note that KCMP really is a broken mess. It's also a potential security
> > hole, even for the simple things, because of how it ends up comparing
> > kernel pointers (ie it doesn't just say "same file descriptor", it
> > gives an ordering of them, so you can use KCMP to sort things in
> > kernel space).
> >
> > And yes, it orders them after obfuscating the pointer, but it's still
> > not something I would consider sane as a baseline interface. It was
> > designed for checkpoint-restore, it's the wrong thing to use for some
> > "are these file descriptors the same".
> >
> > The same argument goes for using EPOLL for that. Disgusting hack.
> >
> > Just what are the requirements for the GPU stack? Is one of the file
> > descriptors "trusted", IOW, you know what kind it is?
> >
> > Because dammit, it's *so* easy to do. You could just add a core DRM
> > ioctl for it. Literally just
> >
> > struct fd f1 = fdget(fd1);
> > struct fd f2 = fdget(fd2);
> > int same;
> >
> > same = f1.file && f1.file == f2.file;
> > fdput(fd1);
> > fdput(fd2);
> > return same;
> >
> > where the only question is if you also woudl want to deal with O_PATH
> > fd's, in which case the "fdget()" would be "fdget_raw()".
> >
> > Honestly, adding some DRM ioctl for this sounds hacky, but it sounds
> > less hacky than relying on EPOLL or KCMP.
> >
> > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
> > too, if this is possibly a more common thing. and not just DRM wants
> > it.
> >
> > Would something like that work for you?
>
> Well the generic approach yes, the DRM specific one maybe. IIRC we need to
> be able to compare both DRM as well as DMA-buf file descriptors.
>
> The basic problem userspace tries to solve is that drivers might get the
> same fd through two different code paths.
>
> For example application using OpenGL/Vulkan for rendering and VA-API for
> video decoding/encoding at the same time.
>
> Both APIs get a fd which identifies the device to use. It can be the same,
> but it doesn't have to.
>
> If it's the same device driver connection (or in kernel speak underlying
> struct file) then you can optimize away importing and exporting of buffers
> for example.
>
> Additional to that it makes cgroup accounting much easier because you don't
> count things twice because they are shared etc...

One thing to keep in mind is that a generic VFS level comparing function
will only catch the obvious case where you have dup() equivalency as
outlined above by Linus. That's what most people are interested in and
that could easily replace most kcmp() use-cases for comparing fds.

But, of course there's the case where you have two file descriptors
referring to two different files that reference the same underlying
object (usually stashed in file->private_data).

For most cases that problem can ofc be solved by comparing the
underlying inode. But that doesn't work for drivers using the generic
anonymous inode infrastructure because it uses the same inode for
everything or for cases where the same underlying object can even be
represented by different inodes.

So for such cases a driver specific ioctl() to compare two fds will
be needed in addition to the generic helper.

2024-05-08 09:11:20

by Christian König

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

Am 08.05.24 um 10:23 schrieb Christian Brauner:
> On Tue, May 07, 2024 at 07:45:02PM +0200, Christian König wrote:
>> Am 07.05.24 um 18:46 schrieb Linus Torvalds:
>>> On Tue, 7 May 2024 at 04:03, Daniel Vetter <[email protected]> wrote:
>>>> It's really annoying that on some distros/builds we don't have that, and
>>>> for gpu driver stack reasons we _really_ need to know whether a fd is the
>>>> same as another, due to some messy uniqueness requirements on buffer
>>>> objects various drivers have.
>>> It's sad that such a simple thing would require two other horrid
>>> models (EPOLL or KCMP).
>>>
>>> There'[s a reason that KCMP is a config option - *some* of that is
>>> horrible code - but the "compare file descriptors for equality" is not
>>> that reason.
>>>
>>> Note that KCMP really is a broken mess. It's also a potential security
>>> hole, even for the simple things, because of how it ends up comparing
>>> kernel pointers (ie it doesn't just say "same file descriptor", it
>>> gives an ordering of them, so you can use KCMP to sort things in
>>> kernel space).
>>>
>>> And yes, it orders them after obfuscating the pointer, but it's still
>>> not something I would consider sane as a baseline interface. It was
>>> designed for checkpoint-restore, it's the wrong thing to use for some
>>> "are these file descriptors the same".
>>>
>>> The same argument goes for using EPOLL for that. Disgusting hack.
>>>
>>> Just what are the requirements for the GPU stack? Is one of the file
>>> descriptors "trusted", IOW, you know what kind it is?
>>>
>>> Because dammit, it's *so* easy to do. You could just add a core DRM
>>> ioctl for it. Literally just
>>>
>>> struct fd f1 = fdget(fd1);
>>> struct fd f2 = fdget(fd2);
>>> int same;
>>>
>>> same = f1.file && f1.file == f2.file;
>>> fdput(fd1);
>>> fdput(fd2);
>>> return same;
>>>
>>> where the only question is if you also woudl want to deal with O_PATH
>>> fd's, in which case the "fdget()" would be "fdget_raw()".
>>>
>>> Honestly, adding some DRM ioctl for this sounds hacky, but it sounds
>>> less hacky than relying on EPOLL or KCMP.
>>>
>>> I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
>>> too, if this is possibly a more common thing. and not just DRM wants
>>> it.
>>>
>>> Would something like that work for you?
>> Well the generic approach yes, the DRM specific one maybe. IIRC we need to
>> be able to compare both DRM as well as DMA-buf file descriptors.
>>
>> The basic problem userspace tries to solve is that drivers might get the
>> same fd through two different code paths.
>>
>> For example application using OpenGL/Vulkan for rendering and VA-API for
>> video decoding/encoding at the same time.
>>
>> Both APIs get a fd which identifies the device to use. It can be the same,
>> but it doesn't have to.
>>
>> If it's the same device driver connection (or in kernel speak underlying
>> struct file) then you can optimize away importing and exporting of buffers
>> for example.
>>
>> Additional to that it makes cgroup accounting much easier because you don't
>> count things twice because they are shared etc...
> One thing to keep in mind is that a generic VFS level comparing function
> will only catch the obvious case where you have dup() equivalency as
> outlined above by Linus. That's what most people are interested in and
> that could easily replace most kcmp() use-cases for comparing fds.
>
> But, of course there's the case where you have two file descriptors
> referring to two different files that reference the same underlying
> object (usually stashed in file->private_data).
>
> For most cases that problem can ofc be solved by comparing the
> underlying inode. But that doesn't work for drivers using the generic
> anonymous inode infrastructure because it uses the same inode for
> everything or for cases where the same underlying object can even be
> represented by different inodes.
>
> So for such cases a driver specific ioctl() to compare two fds will
> be needed in addition to the generic helper.

At least for the DRM we already have some solution for that, see
drmGetPrimaryDeviceNameFromFd() for an example.

Basically the file->private_data is still something different, but we
use this to figure out if we have two file descriptors (with individual
struct files underneath) pointing to the same hw driver.

This is important if you need to know if just importing/exporting of
DMA-buf handles between the two file descriptors is enough or if you
deal with two different hw devices and need to do stuff like format
conversion etc...

Regards,
Christian.

2024-05-08 11:02:04

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Wed, May 08, 2024 at 07:55:08AM +0200, Christian K?nig wrote:
> Am 07.05.24 um 21:07 schrieb Linus Torvalds:
> > On Tue, 7 May 2024 at 11:04, Daniel Vetter <[email protected]> wrote:
> > > On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote:
> > >
> > > > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
> > > > too, if this is possibly a more common thing. and not just DRM wants
> > > > it.
> > > >
> > > > Would something like that work for you?
> > > Yes.
> > >
> > > Adding Simon and Pekka as two of the usual suspects for this kind of
> > > stuff. Also example code (the int return value is just so that callers know
> > > when kcmp isn't available, they all only care about equality):
> > >
> > > https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239
> > That example thing shows that we shouldn't make it a FISAME ioctl - we
> > should make it a fcntl() instead, and it would just be a companion to
> > F_DUPFD.
> >
> > Doesn't that strike everybody as a *much* cleaner interface? I think
> > F_ISDUP would work very naturally indeed with F_DUPFD.
> >
> > Yes? No?
>
> Sounds absolutely sane to me.

Yeah fcntl(fd1, F_ISDUP, fd2); sounds extremely reasonable to me too.

Aside, after some irc discussions I paged a few more of the relevant info
back in, and at least for dma-buf we kinda sorted this out by going away
from the singleton inode in this patch: ed63bb1d1f84 ("dma-buf: give each
buffer a full-fledged inode")

It's uapi now so we can't ever undo that, but with hindsight just the
F_ISDUP is really what we wanted. Because we have no need for that inode
aside from the unique inode number that's only used to compare dma-buf fd
for sameness, e.g.

https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/render/vulkan/texture.c#L490

The one question I have is whether this could lead to some exploit tools,
because at least the android conformance test suite verifies that kcmp
isn't available to apps (which is where we need it, because even with all
the binder-based isolation gpu userspace still all run in the application
process due to performance reasons, any ipc at all is just too much).

Otoh if we just add this to drm fd as an ioctl somewhere, then it will
also be available to every android app because they all do need the gpu
for rendering. So going with the full generic fcntl is probably best.
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-08 16:27:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Tue, 7 May 2024 at 12:07, Linus Torvalds
<[email protected]> wrote:
>
> That example thing shows that we shouldn't make it a FISAME ioctl - we
> should make it a fcntl() instead, and it would just be a companion to
> F_DUPFD.
>
> Doesn't that strike everybody as a *much* cleaner interface? I think
> F_ISDUP would work very naturally indeed with F_DUPFD.

So since we already have two versions of F_DUPFD (the other being
F_DUPFD_CLOEXEC) I decided that the best thing to do is to just extend
on that existing naming pattern, and called it F_DUPFD_QUERY instead.

I'm not married to the name, so if somebody hates it, feel free to
argue otherwise.

But with that, the suggested patch would end up looking something like
the attached (I also re-ordered the existing "F_LINUX_SPECIFIC_BASE"
users, since one of them was out of numerical order).

This really feels like a very natural thing, and yes, the 'same_fd()'
function in systemd that Christian also pointed at could use this very
naturally.

Also note that I obviously haven't tested this. Because obviously this
is trivially correct and cannot possibly have any bugs. Right? RIGHT?

And yes, I did check - despite the odd jump in numbers, we've never
had anything between F_NOTIFY (+2) and F_CANCELLK (+5).

We added F_SETLEASE (+0) , F_GETLEASE (+1) and F_NOTIFY (+2) in
2.4.0-test9 (roughly October 2000, I didn't dig deeper).

And then back in 2007 we suddenly jumped to F_CANCELLK (+5) in commit
9b9d2ab4154a ("locks: add lock cancel command"). I don't know why 3/4
were shunned.

After that we had 22d2b35b200f ("F_DUPFD_CLOEXEC implementation") add
F_DUPFD_CLOEXEC (+6).

I'd have loved to put F_DUPFD_QUERY next to it, but +5 and +7 are both used.

Linus


Attachments:
patch.diff (2.32 kB)

2024-05-08 17:16:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Wed, 8 May 2024 at 09:19, Linus Torvalds
<[email protected]> wrote:
>
> So since we already have two versions of F_DUPFD (the other being
> F_DUPFD_CLOEXEC) I decided that the best thing to do is to just extend
> on that existing naming pattern, and called it F_DUPFD_QUERY instead.
>
> I'm not married to the name, so if somebody hates it, feel free to
> argue otherwise.

Side note: with this patch, doing

ret = fcntl(fd1, F_DUPFD_QUERY, fd2);

will result in:

-1 (EBADF): 'fd1' is not a valid file descriptor
-1 (EINVAL): old kernel that doesn't support F_DUPFD_QUERY
0: fd2 does not refer to the same file as fd1
1: fd2 is the same 'struct file' as fd1

and it might be worth noting a couple of things here:

(a) fd2 being an invalid file descriptor does not cause EBADF, it
just causes "does not match".

(b) we *could* use more bits for more equality

IOW, it would possibly make sense to extend the 0/1 result to be

- bit #0: same file pointer
- bit #1: same path
- bit #2: same dentry
- bit #3: same inode

which are all different levels of "sameness".

Does anybody care? Do we want to extend on this "sameness"? I'm not
convinced, but it might be a good idea to document this as a possibly
future extension, ie *if* what you care about is "same file pointer",
maybe you should make sure to only look at bit #0.

Linus

2024-05-09 11:39:29

by Christian Brauner

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Wed, May 08, 2024 at 10:14:44AM -0700, Linus Torvalds wrote:
> On Wed, 8 May 2024 at 09:19, Linus Torvalds
> <[email protected]> wrote:
> >
> > So since we already have two versions of F_DUPFD (the other being
> > F_DUPFD_CLOEXEC) I decided that the best thing to do is to just extend
> > on that existing naming pattern, and called it F_DUPFD_QUERY instead.
> >
> > I'm not married to the name, so if somebody hates it, feel free to
> > argue otherwise.
>
> Side note: with this patch, doing
>
> ret = fcntl(fd1, F_DUPFD_QUERY, fd2);
>
> will result in:
>
> -1 (EBADF): 'fd1' is not a valid file descriptor
> -1 (EINVAL): old kernel that doesn't support F_DUPFD_QUERY
> 0: fd2 does not refer to the same file as fd1
> 1: fd2 is the same 'struct file' as fd1
>
> and it might be worth noting a couple of things here:
>
> (a) fd2 being an invalid file descriptor does not cause EBADF, it
> just causes "does not match".
>
> (b) we *could* use more bits for more equality
>
> IOW, it would possibly make sense to extend the 0/1 result to be
>
> - bit #0: same file pointer
> - bit #1: same path
> - bit #2: same dentry
> - bit #3: same inode
>
> which are all different levels of "sameness".

Not worth it without someone explaining in detail why imho. First pass
should be to try and replace kcmp() in scenarios where it's obviously
not needed or overkill.

I've added a CLASS(fd_raw) in a preliminary patch since we'll need that
anyway which means that your comparison patch becomes even simpler imho.
I've also added a selftest patch:

https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.misc

?

2024-05-09 15:48:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Thu, 9 May 2024 at 04:39, Christian Brauner <[email protected]> wrote:
>
> Not worth it without someone explaining in detail why imho. First pass
> should be to try and replace kcmp() in scenarios where it's obviously
> not needed or overkill.

Ack.

> I've added a CLASS(fd_raw) in a preliminary patch since we'll need that
> anyway which means that your comparison patch becomes even simpler imho.
> I've also added a selftest patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.misc

LGTM.

Maybe worth adding an explicit test for "open same file, but two
separate opens, F_DUPFD_QUERY returns 0? Just to clarify the "it's not
testing the file on the filesystem for equality, but the file pointer
itself".

Linus

2024-05-10 06:34:38

by Christian Brauner

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

On Thu, May 09, 2024 at 08:48:20AM -0700, Linus Torvalds wrote:
> On Thu, 9 May 2024 at 04:39, Christian Brauner <[email protected]> wrote:
> >
> > Not worth it without someone explaining in detail why imho. First pass
> > should be to try and replace kcmp() in scenarios where it's obviously
> > not needed or overkill.
>
> Ack.
>
> > I've added a CLASS(fd_raw) in a preliminary patch since we'll need that
> > anyway which means that your comparison patch becomes even simpler imho.
> > I've also added a selftest patch:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.misc
>
> LGTM.
>
> Maybe worth adding an explicit test for "open same file, but two
> separate opens, F_DUPFD_QUERY returns 0? Just to clarify the "it's not
> testing the file on the filesystem for equality, but the file pointer
> itself".

Yep, good point. Added now.