2020-07-10 07:59:35

by Rasmus Villemoes

[permalink] [raw]
Subject: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

The ability to check open file descriptions for equality (without
resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
useful outside of the checkpoint/restore use case - for example,
systemd uses kcmp() to deduplicate the per-service file descriptor
store.

Make it possible to have the kcmp() syscall without the full
CONFIG_CHECKPOINT_RESTORE.

Signed-off-by: Rasmus Villemoes <[email protected]>
---
I deliberately drop the ifdef in the eventpoll.h header rather than
replace with KCMP_SYSCALL; it's harmless to declare a function that
isn't defined anywhere.

fs/eventpoll.c | 4 ++--
include/linux/eventpoll.h | 2 --
init/Kconfig | 11 +++++++++++
kernel/Makefile | 2 +-
4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 12eebcdea9c8..b0313ce2df73 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1064,7 +1064,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd)
return epir;
}

-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_KCMP_SYSCALL
static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff)
{
struct rb_node *rbp;
@@ -1106,7 +1106,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,

return file_raw;
}
-#endif /* CONFIG_CHECKPOINT_RESTORE */
+#endif /* CONFIG_KCMP_SYSCALL */

/**
* Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 8f000fada5a4..aa799295e373 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -18,9 +18,7 @@ struct file;

#ifdef CONFIG_EPOLL

-#ifdef CONFIG_CHECKPOINT_RESTORE
struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff);
-#endif

/* Used to initialize the epoll bits inside the "struct file" */
static inline void eventpoll_init_file(struct file *file)
diff --git a/init/Kconfig b/init/Kconfig
index 0498af567f70..95e9486d4217 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1158,9 +1158,20 @@ config NET_NS

endif # NAMESPACES

+config KCMP_SYSCALL
+ bool "kcmp system call"
+ help
+ Enable the kcmp system call, which allows one to determine
+ whether to tasks share various kernel resources, for example
+ whether they share address space, or if two file descriptors
+ refer to the same open file description.
+
+ If unsure, say N.
+
config CHECKPOINT_RESTORE
bool "Checkpoint/restore support"
select PROC_CHILDREN
+ select KCMP_SYSCALL
default n
help
Enables additional kernel features in a sake of checkpoint/restore.
diff --git a/kernel/Makefile b/kernel/Makefile
index f3218bc5ec69..3daedba2146a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -49,7 +49,7 @@ obj-y += rcu/
obj-y += livepatch/
obj-y += dma/

-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
+obj-$(CONFIG_KCMP_SYSCALL) += kcmp.o
obj-$(CONFIG_FREEZER) += freezer.o
obj-$(CONFIG_PROFILING) += profile.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
--
2.23.0


2020-07-10 08:33:01

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
> The ability to check open file descriptions for equality (without
> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
> useful outside of the checkpoint/restore use case - for example,
> systemd uses kcmp() to deduplicate the per-service file descriptor
> store.
>
> Make it possible to have the kcmp() syscall without the full
> CONFIG_CHECKPOINT_RESTORE.
>
> Signed-off-by: Rasmus Villemoes <[email protected]>
> ---
> I deliberately drop the ifdef in the eventpoll.h header rather than
> replace with KCMP_SYSCALL; it's harmless to declare a function that
> isn't defined anywhere.

Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h
is not suitable? Still the overall idea is fine for me, thanks!

Reviewed-by: Cyrill Gorcunov <[email protected]>

2020-07-10 09:05:57

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

On 10/07/2020 10.30, Cyrill Gorcunov wrote:
> On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
>> The ability to check open file descriptions for equality (without
>> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
>> useful outside of the checkpoint/restore use case - for example,
>> systemd uses kcmp() to deduplicate the per-service file descriptor
>> store.
>>
>> Make it possible to have the kcmp() syscall without the full
>> CONFIG_CHECKPOINT_RESTORE.
>>
>> Signed-off-by: Rasmus Villemoes <[email protected]>
>> ---
>> I deliberately drop the ifdef in the eventpoll.h header rather than
>> replace with KCMP_SYSCALL; it's harmless to declare a function that
>> isn't defined anywhere.
>
> Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h
> is not suitable?

It's just from a general "avoid ifdef clutter if possible" POV. The
conditional declaration of the function doesn't really serve any purpose.

> Still the overall idea is fine for me, thanks!>
> Reviewed-by: Cyrill Gorcunov <[email protected]>

Thank you.

Rasmus


2020-07-10 09:38:38

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

On Fri, Jul 10, 2020 at 11:05:11AM +0200, Rasmus Villemoes wrote:
> >> I deliberately drop the ifdef in the eventpoll.h header rather than
> >> replace with KCMP_SYSCALL; it's harmless to declare a function that
> >> isn't defined anywhere.
> >
> > Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h
> > is not suitable?
>
> It's just from a general "avoid ifdef clutter if possible" POV. The
> conditional declaration of the function doesn't really serve any
> purpose.

OK, thanks for explanation.

2020-07-10 15:51:51

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

On 7/10/20 12:56 AM, Rasmus Villemoes wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index 0498af567f70..95e9486d4217 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1158,9 +1158,20 @@ config NET_NS
>
> endif # NAMESPACES
>
> +config KCMP_SYSCALL
> + bool "kcmp system call"
> + help
> + Enable the kcmp system call, which allows one to determine
> + whether to tasks share various kernel resources, for example

Either s/to/two/ or s/to//
?

> + whether they share address space, or if two file descriptors
> + refer to the same open file description.
> +
> + If unsure, say N.
> +
> config CHECKPOINT_RESTORE
> bool "Checkpoint/restore support"
> select PROC_CHILDREN
> + select KCMP_SYSCALL
> default n
> help
> Enables additional kernel features in a sake of checkpoint/restore.


--
~Randy

2020-07-10 15:58:41

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
> The ability to check open file descriptions for equality (without
> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
> useful outside of the checkpoint/restore use case - for example,
> systemd uses kcmp() to deduplicate the per-service file descriptor
> store.
>
> Make it possible to have the kcmp() syscall without the full
> CONFIG_CHECKPOINT_RESTORE.

If systemd is using it, is it even worth making it conditional any more?
Maybe for CONFIG_EXPERT builds, it could be de-selectable.

2020-10-13 19:48:40

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

On 10/07/2020 17.57, Matthew Wilcox wrote:
> On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
>> The ability to check open file descriptions for equality (without
>> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
>> useful outside of the checkpoint/restore use case - for example,
>> systemd uses kcmp() to deduplicate the per-service file descriptor
>> store.
>>
>> Make it possible to have the kcmp() syscall without the full
>> CONFIG_CHECKPOINT_RESTORE.
>
> If systemd is using it, is it even worth making it conditional any more?
> Maybe for CONFIG_EXPERT builds, it could be de-selectable.
>

[hm, I dropped the ball, sorry for the necromancy]

Well, first, I don't want to change any defaults here, if that is to be
done, it should be a separate patch.

Second, yes, systemd uses it for the de-duplication, and for that reason
recommends CONFIG_CHECKPOINT_RESTORE (at least, according to their
README) - but I'm not aware of any daemons that actually make use of
systemd's file descriptor store, so it's not really something essential
to every systemd-based system out there. It would be nice if systemd
could change its recommendation to just CONFIG_KCMP_SYSCALL.

But it's also useful for others, e.g. I have some code that wants to
temporarily replace stdin/stdout/stderr with some other file
descriptors, but needs to preserve the '0 is a dup of 1, or not' state
(i.e., is the same struct file) - that cannot reliably be determined
from fstat()/lseek(SEEK_CUR)/F_GETFL or whatever else one could throw at
an fd.

Rasmus