2021-02-05 18:40:46

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] kernel: Expose SYS_kcmp by default

On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
> Userspace has discovered the functionality offered by SYS_kcmp and has
> started to depend upon it. In particular, Mesa uses SYS_kcmp for
> os_same_file_description() in order to identify when two fd (e.g. device
> or dmabuf) point to the same struct file. Since they depend on it for
> core functionality, lift SYS_kcmp out of the non-default
> CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
>
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Will Drewry <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Dave Airlie <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Lucas Stach <[email protected]>
> ---
> init/Kconfig | 11 +++++++++++
> kernel/Makefile | 2 +-
> tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +-
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index b77c60f8b963..f62fca13ac5b 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1194,6 +1194,7 @@ endif # NAMESPACES
> config CHECKPOINT_RESTORE
> bool "Checkpoint/restore support"
> select PROC_CHILDREN
> + select KCMP
> default n
> help
> Enables additional kernel features in a sake of checkpoint/restore.
> @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> config ARCH_HAS_MEMBARRIER_SYNC_CORE
> bool
>
> +config KCMP
> + bool "Enable kcmp() system call" if EXPERT
> + default y

I would expect this to be not default-y, especially if
CHECKPOINT_RESTORE does a "select" on it.

This is a really powerful syscall, but it is bounded by ptrace access
controls, and uses pointer address obfuscation, so it may be okay to
expose this. As it is, at least Ubuntu already has
CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much
difference on exposure.

So, if you drop the "default y", I'm fine with this.

-Kees

> + help
> + Enable the file descriptor comparison system call. It provides
> + user-space with the ability to compare two fd to see if they
> + point to the same file, and check other attributes.
> +
> + If unsure, say Y.
> +
> config RSEQ
> bool "Enable rseq() system call" if EXPERT
> default y
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf..320f1f3941b7 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -51,7 +51,7 @@ obj-y += livepatch/
> obj-y += dma/
> obj-y += entry/
>
> -obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
> +obj-$(CONFIG_KCMP) += kcmp.o
> obj-$(CONFIG_FREEZER) += freezer.o
> obj-$(CONFIG_PROFILING) += profile.o
> obj-$(CONFIG_STACKTRACE) += stacktrace.o
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index 26c72f2b61b1..1b6c7d33c4ff 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -315,7 +315,7 @@ TEST(kcmp)
> ret = __filecmp(getpid(), getpid(), 1, 1);
> EXPECT_EQ(ret, 0);
> if (ret != 0 && errno == ENOSYS)
> - SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
> + SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
> }
>
> TEST(mode_strict_support)
> --
> 2.20.1
>

--
Kees Cook


2021-02-05 21:13:44

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] kernel: Expose SYS_kcmp by default

On Fri, Feb 5, 2021 at 7:37 PM Kees Cook <[email protected]> wrote:
>
> On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
> > Userspace has discovered the functionality offered by SYS_kcmp and has
> > started to depend upon it. In particular, Mesa uses SYS_kcmp for
> > os_same_file_description() in order to identify when two fd (e.g. device
> > or dmabuf) point to the same struct file. Since they depend on it for
> > core functionality, lift SYS_kcmp out of the non-default
> > CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
> >
> > Signed-off-by: Chris Wilson <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Will Drewry <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Dave Airlie <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: Lucas Stach <[email protected]>
> > ---
> > init/Kconfig | 11 +++++++++++
> > kernel/Makefile | 2 +-
> > tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +-
> > 3 files changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/init/Kconfig b/init/Kconfig
> > index b77c60f8b963..f62fca13ac5b 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1194,6 +1194,7 @@ endif # NAMESPACES
> > config CHECKPOINT_RESTORE
> > bool "Checkpoint/restore support"
> > select PROC_CHILDREN
> > + select KCMP
> > default n
> > help
> > Enables additional kernel features in a sake of checkpoint/restore.
> > @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> > config ARCH_HAS_MEMBARRIER_SYNC_CORE
> > bool
> >
> > +config KCMP
> > + bool "Enable kcmp() system call" if EXPERT
> > + default y
>
> I would expect this to be not default-y, especially if
> CHECKPOINT_RESTORE does a "select" on it.
>
> This is a really powerful syscall, but it is bounded by ptrace access
> controls, and uses pointer address obfuscation, so it may be okay to
> expose this. As it is, at least Ubuntu already has
> CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much
> difference on exposure.
>
> So, if you drop the "default y", I'm fine with this.

It was maybe stupid, but our userspace started relying on fd
comaprison through sys_kcomp. So for better or worse, if you want to
run the mesa3d gl/vk stacks, you need this. Was maybe not the brighest
ideas, but since enough distros had this enabled by defaults, it
wasn't really discovered, and now we're shipping this everywhere.

Ofc we can leave the default n, but the select if CONFIG_DRM is
unfortunately needed I think. For that part:

Acked-by: Daniel Vetter <[email protected]>

Also adding Dave Airlie for his take.
-Daniel

>
> -Kees
>
> > + help
> > + Enable the file descriptor comparison system call. It provides
> > + user-space with the ability to compare two fd to see if they
> > + point to the same file, and check other attributes.
> > +
> > + If unsure, say Y.
> > +
> > config RSEQ
> > bool "Enable rseq() system call" if EXPERT
> > default y
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf..320f1f3941b7 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -51,7 +51,7 @@ obj-y += livepatch/
> > obj-y += dma/
> > obj-y += entry/
> >
> > -obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
> > +obj-$(CONFIG_KCMP) += kcmp.o
> > obj-$(CONFIG_FREEZER) += freezer.o
> > obj-$(CONFIG_PROFILING) += profile.o
> > obj-$(CONFIG_STACKTRACE) += stacktrace.o
> > diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> > index 26c72f2b61b1..1b6c7d33c4ff 100644
> > --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> > +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> > @@ -315,7 +315,7 @@ TEST(kcmp)
> > ret = __filecmp(getpid(), getpid(), 1, 1);
> > EXPECT_EQ(ret, 0);
> > if (ret != 0 && errno == ENOSYS)
> > - SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
> > + SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
> > }
> >
> > TEST(mode_strict_support)
> > --
> > 2.20.1
> >
>
> --
> Kees Cook



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-02-08 12:14:49

by Michel Dänzer

[permalink] [raw]
Subject: Re: [PATCH] kernel: Expose SYS_kcmp by default

On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
> On Fri, Feb 5, 2021 at 7:37 PM Kees Cook <[email protected]> wrote:
>>
>> On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
>>> Userspace has discovered the functionality offered by SYS_kcmp and has
>>> started to depend upon it. In particular, Mesa uses SYS_kcmp for
>>> os_same_file_description() in order to identify when two fd (e.g. device
>>> or dmabuf) point to the same struct file. Since they depend on it for
>>> core functionality, lift SYS_kcmp out of the non-default
>>> CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
>>>
>>> Signed-off-by: Chris Wilson <[email protected]>
>>> Cc: Kees Cook <[email protected]>
>>> Cc: Andy Lutomirski <[email protected]>
>>> Cc: Will Drewry <[email protected]>
>>> Cc: Andrew Morton <[email protected]>
>>> Cc: Dave Airlie <[email protected]>
>>> Cc: Daniel Vetter <[email protected]>
>>> Cc: Lucas Stach <[email protected]>
>>> ---
>>> init/Kconfig | 11 +++++++++++
>>> kernel/Makefile | 2 +-
>>> tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +-
>>> 3 files changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/init/Kconfig b/init/Kconfig
>>> index b77c60f8b963..f62fca13ac5b 100644
>>> --- a/init/Kconfig
>>> +++ b/init/Kconfig
>>> @@ -1194,6 +1194,7 @@ endif # NAMESPACES
>>> config CHECKPOINT_RESTORE
>>> bool "Checkpoint/restore support"
>>> select PROC_CHILDREN
>>> + select KCMP
>>> default n
>>> help
>>> Enables additional kernel features in a sake of checkpoint/restore.
>>> @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
>>> config ARCH_HAS_MEMBARRIER_SYNC_CORE
>>> bool
>>>
>>> +config KCMP
>>> + bool "Enable kcmp() system call" if EXPERT
>>> + default y
>>
>> I would expect this to be not default-y, especially if
>> CHECKPOINT_RESTORE does a "select" on it.
>>
>> This is a really powerful syscall, but it is bounded by ptrace access
>> controls, and uses pointer address obfuscation, so it may be okay to
>> expose this. As it is, at least Ubuntu already has
>> CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much
>> difference on exposure.
>>
>> So, if you drop the "default y", I'm fine with this.
>
> It was maybe stupid, but our userspace started relying on fd
> comaprison through sys_kcomp. So for better or worse, if you want to
> run the mesa3d gl/vk stacks, you need this.

That's overstating things somewhat. The vast majority of applications
will work fine regardless (as they did before Mesa started using this
functionality). Only some special ones will run into issues, because the
user-space drivers incorrectly assume two file descriptors reference
different descriptions.


> Was maybe not the brighest ideas, but since enough distros had this
> enabled by defaults,

Right, that (and the above) is why I considered it fair game to use.
What should I have done instead? (TBH I was surprised that this
functionality isn't generally available)

> it wasn't really discovered, and now we're
> shipping this everywhere.

You're making it sound like this snuck in secretly somehow, which is not
true of course.


> Ofc we can leave the default n, but the select if CONFIG_DRM is
> unfortunately needed I think.

Per above, not sure this is really true.


--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer

2021-02-08 13:13:50

by Michel Dänzer

[permalink] [raw]
Subject: Re: [PATCH] kernel: Expose SYS_kcmp by default

On 2021-02-08 12:49 p.m., Michel Dänzer wrote:
> On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
>> On Fri, Feb 5, 2021 at 7:37 PM Kees Cook <[email protected]> wrote:
>>>
>>> On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
>>>> Userspace has discovered the functionality offered by SYS_kcmp and has
>>>> started to depend upon it. In particular, Mesa uses SYS_kcmp for
>>>> os_same_file_description() in order to identify when two fd (e.g.
>>>> device
>>>> or dmabuf) point to the same struct file. Since they depend on it for
>>>> core functionality, lift SYS_kcmp out of the non-default
>>>> CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
>>>>
>>>> Signed-off-by: Chris Wilson <[email protected]>
>>>> Cc: Kees Cook <[email protected]>
>>>> Cc: Andy Lutomirski <[email protected]>
>>>> Cc: Will Drewry <[email protected]>
>>>> Cc: Andrew Morton <[email protected]>
>>>> Cc: Dave Airlie <[email protected]>
>>>> Cc: Daniel Vetter <[email protected]>
>>>> Cc: Lucas Stach <[email protected]>
>>>> ---
>>>>   init/Kconfig                                  | 11 +++++++++++
>>>>   kernel/Makefile                               |  2 +-
>>>>   tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
>>>>   3 files changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/init/Kconfig b/init/Kconfig
>>>> index b77c60f8b963..f62fca13ac5b 100644
>>>> --- a/init/Kconfig
>>>> +++ b/init/Kconfig
>>>> @@ -1194,6 +1194,7 @@ endif # NAMESPACES
>>>>   config CHECKPOINT_RESTORE
>>>>        bool "Checkpoint/restore support"
>>>>        select PROC_CHILDREN
>>>> +     select KCMP
>>>>        default n
>>>>        help
>>>>          Enables additional kernel features in a sake of
>>>> checkpoint/restore.
>>>> @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
>>>>   config ARCH_HAS_MEMBARRIER_SYNC_CORE
>>>>        bool
>>>>
>>>> +config KCMP
>>>> +     bool "Enable kcmp() system call" if EXPERT
>>>> +     default y
>>>
>>> I would expect this to be not default-y, especially if
>>> CHECKPOINT_RESTORE does a "select" on it.
>>>
>>> This is a really powerful syscall, but it is bounded by ptrace access
>>> controls, and uses pointer address obfuscation, so it may be okay to
>>> expose this. As it is, at least Ubuntu already has
>>> CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much
>>> difference on exposure.
>>>
>>> So, if you drop the "default y", I'm fine with this.
>>
>> It was maybe stupid, but our userspace started relying on fd
>> comaprison through sys_kcomp. So for better or worse, if you want to
>> run the mesa3d gl/vk stacks, you need this.
>
> That's overstating things somewhat. The vast majority of applications
> will work fine regardless (as they did before Mesa started using this
> functionality). Only some special ones will run into issues, because the
> user-space drivers incorrectly assume two file descriptors reference
> different descriptions.
>
>
>> Was maybe not the brighest ideas, but since enough distros had this
>> enabled by defaults,
>
> Right, that (and the above) is why I considered it fair game to use.
> What should I have done instead? (TBH I was surprised that this
> functionality isn't generally available)

In that spirit, an alternative might be to make KCMP_FILE available
unconditionally, and the rest of SYS_kcmp only with CHECKPOINT_RESTORE
as before. (Or maybe other parts of SYS_kcmp are generally useful as well?)


--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer

2021-02-08 13:38:53

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] kernel: Expose SYS_kcmp by default

On Mon, Feb 8, 2021 at 12:49 PM Michel Dänzer <[email protected]> wrote:
>
> On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
> > On Fri, Feb 5, 2021 at 7:37 PM Kees Cook <[email protected]> wrote:
> >>
> >> On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
> >>> Userspace has discovered the functionality offered by SYS_kcmp and has
> >>> started to depend upon it. In particular, Mesa uses SYS_kcmp for
> >>> os_same_file_description() in order to identify when two fd (e.g. device
> >>> or dmabuf) point to the same struct file. Since they depend on it for
> >>> core functionality, lift SYS_kcmp out of the non-default
> >>> CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
> >>>
> >>> Signed-off-by: Chris Wilson <[email protected]>
> >>> Cc: Kees Cook <[email protected]>
> >>> Cc: Andy Lutomirski <[email protected]>
> >>> Cc: Will Drewry <[email protected]>
> >>> Cc: Andrew Morton <[email protected]>
> >>> Cc: Dave Airlie <[email protected]>
> >>> Cc: Daniel Vetter <[email protected]>
> >>> Cc: Lucas Stach <[email protected]>
> >>> ---
> >>> init/Kconfig | 11 +++++++++++
> >>> kernel/Makefile | 2 +-
> >>> tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +-
> >>> 3 files changed, 13 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/init/Kconfig b/init/Kconfig
> >>> index b77c60f8b963..f62fca13ac5b 100644
> >>> --- a/init/Kconfig
> >>> +++ b/init/Kconfig
> >>> @@ -1194,6 +1194,7 @@ endif # NAMESPACES
> >>> config CHECKPOINT_RESTORE
> >>> bool "Checkpoint/restore support"
> >>> select PROC_CHILDREN
> >>> + select KCMP
> >>> default n
> >>> help
> >>> Enables additional kernel features in a sake of checkpoint/restore.
> >>> @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> >>> config ARCH_HAS_MEMBARRIER_SYNC_CORE
> >>> bool
> >>>
> >>> +config KCMP
> >>> + bool "Enable kcmp() system call" if EXPERT
> >>> + default y
> >>
> >> I would expect this to be not default-y, especially if
> >> CHECKPOINT_RESTORE does a "select" on it.
> >>
> >> This is a really powerful syscall, but it is bounded by ptrace access
> >> controls, and uses pointer address obfuscation, so it may be okay to
> >> expose this. As it is, at least Ubuntu already has
> >> CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much
> >> difference on exposure.
> >>
> >> So, if you drop the "default y", I'm fine with this.
> >
> > It was maybe stupid, but our userspace started relying on fd
> > comaprison through sys_kcomp. So for better or worse, if you want to
> > run the mesa3d gl/vk stacks, you need this.
>
> That's overstating things somewhat. The vast majority of applications
> will work fine regardless (as they did before Mesa started using this
> functionality). Only some special ones will run into issues, because the
> user-space drivers incorrectly assume two file descriptors reference
> different descriptions.
>
>
> > Was maybe not the brighest ideas, but since enough distros had this
> > enabled by defaults,
>
> Right, that (and the above) is why I considered it fair game to use.
> What should I have done instead? (TBH I was surprised that this
> functionality isn't generally available)

Yeah that one is fine, but I thought we've discussed (irc or
something) more uses for de-duping dma-buf and stuff like that. But
quick grep says that hasn't landed yet, so I got a bit confused (or
just dreamt). Looking at this again I'm kinda surprised the drmfd
de-duping blows up on normal linux distros, but I guess it can all
happen.

> > it wasn't really discovered, and now we're
> > shipping this everywhere.
>
> You're making it sound like this snuck in secretly somehow, which is not
> true of course.
>
>
> > Ofc we can leave the default n, but the select if CONFIG_DRM is
> > unfortunately needed I think.
>
> Per above, not sure this is really true.

We seem to be going boom on linux distros now, maybe userspace got
more creative in abusing stuff? The entire thing is small enough that
imo we don't really have to care, e.g. we also unconditionally select
dma-buf, despite that on most systems there's only 1 gpu, and you're
never going to end up with a buffer sharing case that needs any of
that code (aside from the "here's an fd" part).

But I guess we can limit to just KCMP_FILE like you suggest in another
reply. Just feels a bit like overkill.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-02-08 13:51:52

by Michel Dänzer

[permalink] [raw]
Subject: Re: [PATCH] kernel: Expose SYS_kcmp by default

On 2021-02-08 2:34 p.m., Daniel Vetter wrote:
> On Mon, Feb 8, 2021 at 12:49 PM Michel Dänzer <[email protected]> wrote:
>>
>> On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
>>> On Fri, Feb 5, 2021 at 7:37 PM Kees Cook <[email protected]> wrote:
>>>>
>>>> On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
>>>>> Userspace has discovered the functionality offered by SYS_kcmp and has
>>>>> started to depend upon it. In particular, Mesa uses SYS_kcmp for
>>>>> os_same_file_description() in order to identify when two fd (e.g. device
>>>>> or dmabuf) point to the same struct file. Since they depend on it for
>>>>> core functionality, lift SYS_kcmp out of the non-default
>>>>> CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
>>>>>
>>>>> Signed-off-by: Chris Wilson <[email protected]>
>>>>> Cc: Kees Cook <[email protected]>
>>>>> Cc: Andy Lutomirski <[email protected]>
>>>>> Cc: Will Drewry <[email protected]>
>>>>> Cc: Andrew Morton <[email protected]>
>>>>> Cc: Dave Airlie <[email protected]>
>>>>> Cc: Daniel Vetter <[email protected]>
>>>>> Cc: Lucas Stach <[email protected]>
>>>>> ---
>>>>> init/Kconfig | 11 +++++++++++
>>>>> kernel/Makefile | 2 +-
>>>>> tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +-
>>>>> 3 files changed, 13 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/init/Kconfig b/init/Kconfig
>>>>> index b77c60f8b963..f62fca13ac5b 100644
>>>>> --- a/init/Kconfig
>>>>> +++ b/init/Kconfig
>>>>> @@ -1194,6 +1194,7 @@ endif # NAMESPACES
>>>>> config CHECKPOINT_RESTORE
>>>>> bool "Checkpoint/restore support"
>>>>> select PROC_CHILDREN
>>>>> + select KCMP
>>>>> default n
>>>>> help
>>>>> Enables additional kernel features in a sake of checkpoint/restore.
>>>>> @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
>>>>> config ARCH_HAS_MEMBARRIER_SYNC_CORE
>>>>> bool
>>>>>
>>>>> +config KCMP
>>>>> + bool "Enable kcmp() system call" if EXPERT
>>>>> + default y
>>>>
>>>> I would expect this to be not default-y, especially if
>>>> CHECKPOINT_RESTORE does a "select" on it.
>>>>
>>>> This is a really powerful syscall, but it is bounded by ptrace access
>>>> controls, and uses pointer address obfuscation, so it may be okay to
>>>> expose this. As it is, at least Ubuntu already has
>>>> CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much
>>>> difference on exposure.
>>>>
>>>> So, if you drop the "default y", I'm fine with this.
>>>
>>> It was maybe stupid, but our userspace started relying on fd
>>> comaprison through sys_kcomp. So for better or worse, if you want to
>>> run the mesa3d gl/vk stacks, you need this.
>>
>> That's overstating things somewhat. The vast majority of applications
>> will work fine regardless (as they did before Mesa started using this
>> functionality). Only some special ones will run into issues, because the
>> user-space drivers incorrectly assume two file descriptors reference
>> different descriptions.
>>
>>
>>> Was maybe not the brighest ideas, but since enough distros had this
>>> enabled by defaults,
>>
>> Right, that (and the above) is why I considered it fair game to use.
>> What should I have done instead? (TBH I was surprised that this
>> functionality isn't generally available)
>
> Yeah that one is fine, but I thought we've discussed (irc or
> something) more uses for de-duping dma-buf and stuff like that. But
> quick grep says that hasn't landed yet, so I got a bit confused (or
> just dreamt). Looking at this again I'm kinda surprised the drmfd
> de-duping blows up on normal linux distros, but I guess it can all
> happen.

One example: GEM handle name-spaces are per file description. If
user-space incorrectly assumes two DRM fds are independent, when they
actually reference the same file description, closing a GEM handle with
one file descriptor will make it unusable with the other file descriptor
as well.


>>> Ofc we can leave the default n, but the select if CONFIG_DRM is
>>> unfortunately needed I think.
>>
>> Per above, not sure this is really true.
>
> We seem to be going boom on linux distros now, maybe userspace got
> more creative in abusing stuff?

I don't know what you're referring to. I've only seen maybe two or three
reports from people who didn't enable CHECKPOINT_RESTORE in their
self-built kernels.


> The entire thing is small enough that imo we don't really have to care,
> e.g. we also unconditionally select dma-buf, despite that on most
> systems there's only 1 gpu, and you're never going to end up with a
> buffer sharing case that needs any of that code (aside from the
> "here's an fd" part).
>
> But I guess we can limit to just KCMP_FILE like you suggest in another
> reply. Just feels a bit like overkill.

Making KCMP_FILE gated by DRM makes as little sense to me as by
CHECKPOINT_RESTORE.


--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer