LinuxLists.cc - [PATCHSET 0/2] PF_IO

2021-03-20 15:40:38

Subject: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

Hi,

Been trying to ensure that we do the right thing wrt signals and
PF_IO_WORKER threads, and I think there are two cases we need to handle
explicitly:

1) Just don't allow signals to them in general. We do mask everything
as blocked, outside of SIGKILL, so things like wants_signal() will
never return true for them. But it's still possible to send them a
signal via (ultimately) group_send_sig_info(). This will then deliver
the signal to the original io_uring owning task, and that seems a bit
unexpected. So just don't allow them in general.

2) STOP is done a bit differently, and we should not allow that either.

Outside of that, I've been looking at same_thread_group(). This will
currently return true for an io_uring task and it's IO workers, since
they do share ->signal. From looking at the kernel users of this, that
actually seems OK for the cases I checked. One is accounting related,
which we obviously want, and others are related to permissions between
tasks. FWIW, I ran with the below and didn't observe any ill effects,
but I'd like someone to actually think about and verify that PF_IO_WORKER
same_thread_group() usage is sane.

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3f6a0fcaa10c..a580bc0f8aa3 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -667,10 +667,17 @@ static inline bool thread_group_leader(struct task_struct *p)
return p->exit_signal >= 0;
}

+static inline
+bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
+{
+ return p1->signal == p2->signal
+}
+
static inline
bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
{
- return p1->signal == p2->signal;
+ return same_thread_group_account(p1, p2) &&
+ !((p1->flags | p2->flags) & PF_IO_WORKER);
}

static inline struct task_struct *next_thread(const struct task_struct *p)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f611658eeab..625110cacc2a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -307,7 +307,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
* those pending times and rely only on values updated on tick or
* other scheduler action.
*/
- if (same_thread_group(current, tsk))
+ if (same_thread_group_account(current, tsk))
(void) task_sched_runtime(current);

rcu_read_lock();

2021-03-20 15:41:13

by Jens Axboe

[permalink] [raw]

Subject: [PATCH 2/2] signal: don't allow STOP on PF_IO_WORKER threads

Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/signal.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index 730ecd3d6faf..b113bf647fb4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2349,6 +2349,10 @@ static bool do_signal_stop(int signr)

t = current;
while_each_thread(current, t) {
+ /* don't STOP PF_IO_WORKER threads */
+ if (t->flags & PF_IO_WORKER)
+ continue;
+
/*
* Setting state to TASK_STOPPED for a group
* stop is always done with the siglock held,
--
2.31.0

2021-03-20 15:42:57

by Jens Axboe

[permalink] [raw]

Subject: [PATCH 1/2] signal: don't allow sending any signals to PF_IO_WORKER threads

They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/signal.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index ba4d1ef39a9e..730ecd3d6faf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -833,6 +833,9 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,

if (!valid_signal(sig))
return -EINVAL;
+ /* PF_IO_WORKER threads don't take any signals */
+ if (t->flags & PF_IO_WORKER)
+ return -EPERM;

if (!si_fromuser(info))
return 0;
--
2.31.0

2021-03-20 16:21:31

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 1/2] signal: don't allow sending any signals to PF_IO_WORKER threads

Jens Axboe <[email protected]> writes:

> They don't take signals individually, and even if they share signals with
> the parent task, don't allow them to be delivered through the worker
> thread.

This is silly I know, but why do we care?

The creds should be reasonably in-sync with the rest of the threads.

There are other threads that will receive the signal, especially when
you worry about group_send_sig_info. Which signal sending code paths
are actually a problem.

> Reported-by: Stefan Metzmacher <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> kernel/signal.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index ba4d1ef39a9e..730ecd3d6faf 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -833,6 +833,9 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
>
> if (!valid_signal(sig))
> return -EINVAL;
> + /* PF_IO_WORKER threads don't take any signals */
> + if (t->flags & PF_IO_WORKER)
> + return -EPERM;
>
> if (!si_fromuser(info))
> return 0;

2021-03-20 16:26:53

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 2/2] signal: don't allow STOP on PF_IO_WORKER threads

Jens Axboe <[email protected]> writes:

> Just like we don't allow normal signals to IO threads, don't deliver a
> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
> signals in general, and have no means of flushing out a stop either.

At first glance this seems safe. This is before we count all of the
threads, and it needs to be a non io_uring thread.

Further we can change this decision later if necessary, as this is not
really exposed to userspace.

But please tell me why is it not the right thing to have the io_uring
helper threads stop? Why is it ok for that process to go on consuming
cpu resources in a non-stoppable way.

Eric

> Reported-by: Stefan Metzmacher <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> kernel/signal.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 730ecd3d6faf..b113bf647fb4 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2349,6 +2349,10 @@ static bool do_signal_stop(int signr)
>
> t = current;
> while_each_thread(current, t) {
> + /* don't STOP PF_IO_WORKER threads */
> + if (t->flags & PF_IO_WORKER)
> + continue;
> +
> /*
> * Setting state to TASK_STOPPED for a group
> * stop is always done with the siglock held,

2021-03-20 16:29:14

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

Jens Axboe <[email protected]> writes:

> Hi,
>
> Been trying to ensure that we do the right thing wrt signals and
> PF_IO_WORKER threads, and I think there are two cases we need to handle
> explicitly:
>
> 1) Just don't allow signals to them in general. We do mask everything
> as blocked, outside of SIGKILL, so things like wants_signal() will
> never return true for them. But it's still possible to send them a
> signal via (ultimately) group_send_sig_info(). This will then deliver
> the signal to the original io_uring owning task, and that seems a bit
> unexpected. So just don't allow them in general.
>
> 2) STOP is done a bit differently, and we should not allow that either.
>
> Outside of that, I've been looking at same_thread_group(). This will
> currently return true for an io_uring task and it's IO workers, since
> they do share ->signal. From looking at the kernel users of this, that
> actually seems OK for the cases I checked. One is accounting related,
> which we obviously want, and others are related to permissions between
> tasks. FWIW, I ran with the below and didn't observe any ill effects,
> but I'd like someone to actually think about and verify that PF_IO_WORKER
> same_thread_group() usage is sane.

They are helper threads running in kernel space. Allowing the ordinary
threads not to block. Why would same_thread_group be a problem?

I don't hate this set of patches. But I also don't see much
explanation why the changes are the right thing semantically.

That makes me uneasy. Because especially the SIGSTOP changes feels like
it is the wrong thing semantically. The group_send_sig_info change
simply feels like it is unnecessary.

Like we are maybe playing whak-a-mole with symptoms rather than make
certain these are the right fixes for the symptoms.

Eric

> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3f6a0fcaa10c..a580bc0f8aa3 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -667,10 +667,17 @@ static inline bool thread_group_leader(struct task_struct *p)
> return p->exit_signal >= 0;
> }
>
> +static inline
> +bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
> +{
> + return p1->signal == p2->signal
> +}
> +
> static inline
> bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
> {
> - return p1->signal == p2->signal;
> + return same_thread_group_account(p1, p2) &&
> + !((p1->flags | p2->flags) & PF_IO_WORKER);
> }
>
> static inline struct task_struct *next_thread(const struct task_struct *p)
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 5f611658eeab..625110cacc2a 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -307,7 +307,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> * those pending times and rely only on values updated on tick or
> * other scheduler action.
> */
> - if (same_thread_group(current, tsk))
> + if (same_thread_group_account(current, tsk))
> (void) task_sched_runtime(current);
>
> rcu_read_lock();

2021-03-20 17:07:55

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

Attachments:

(No filename) (3.91 kB)
.config.gz (26.73 kB)
Download all attachments

2021-03-20 17:10:39

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

Hi Jens,

I love your patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on tip/sched/core linus/master v5.12-rc3 next-20210319]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Jens-Axboe/PF_IO_WORKER-signal-tweaks/20210320-234247
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a74e6a014c9d4d4161061f770c9b4f98372ac778
config: s390-randconfig-r014-20210321 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 436c6c9c20cc522c92a923440a5fc509c342a7db)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# https://github.com/0day-ci/linux/commit/05c70f370b93f3bf555e63293d43a82aab2fcdf3
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jens-Axboe/PF_IO_WORKER-signal-tweaks/20210320-234247
git checkout 05c70f370b93f3bf555e63293d43a82aab2fcdf3
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:15:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:105:32: note: expanded from macro '__swab16'
(__builtin_constant_p((__u16)(x)) ? \
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:106:21: note: expanded from macro '__swab16'
___constant_swab16(x) : \
^
include/uapi/linux/swab.h:15:12: note: expanded from macro '___constant_swab16'
(((__u16)(x) & (__u16)0x00ffU) << 8) | \
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:106:21: note: expanded from macro '__swab16'
___constant_swab16(x) : \
^
include/uapi/linux/swab.h:16:12: note: expanded from macro '___constant_swab16'
(((__u16)(x) & (__u16)0xff00U) >> 8)))
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:107:12: note: expanded from macro '__swab16'
__fswab16(x))
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
--
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:15:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:105:32: note: expanded from macro '__swab16'
(__builtin_constant_p((__u16)(x)) ? \
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:106:21: note: expanded from macro '__swab16'
___constant_swab16(x) : \
^
include/uapi/linux/swab.h:15:12: note: expanded from macro '___constant_swab16'
(((__u16)(x) & (__u16)0x00ffU) << 8) | \
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:106:21: note: expanded from macro '__swab16'
___constant_swab16(x) : \
^
include/uapi/linux/swab.h:16:12: note: expanded from macro '___constant_swab16'
(((__u16)(x) & (__u16)0xff00U) >> 8)))
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:107:12: note: expanded from macro '__swab16'
__fswab16(x))
^
In file included from arch/s390/kernel/asm-offsets.c:11:
In file included from include/linux/kvm_host.h:33:
In file included from include/linux/kvm_para.h:5:
In file included from include/uapi/linux/kvm_para.h:36:
In file included from arch/s390/include/asm/kvm_para.h:25:
In file included from arch/s390/include/asm/diag.h:12:
In file included from include/linux/if_ether.h:19:
In file included from include/linux/skbuff.h:31:

vim +673 include/linux/sched/signal.h

669
670 static inline
671 bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
672 {
> 673 return p1->signal == p2->signal
674 }
675

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

Attachments:

(No filename) (15.41 kB)
.config.gz (15.48 kB)
Download all attachments

2021-03-20 17:54:18

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

On Sat, Mar 20, 2021 at 9:27 AM Eric W. Biederman <[email protected]> wrote:
>
> That makes me uneasy. Because especially the SIGSTOP changes feels like
> it is the wrong thing semantically. The group_send_sig_info change
> simply feels like it is unnecessary.

SIGSTOP handling is fundamentally done at signal handling time, and
signal handling is fundamentally done at "return to user space" time.

End result: you cannot send kernel threads any signals at all, unless
it _explicitly_ handles them manually. SIGSTOP isn't different from
user space delivery of an "actual" signal handler in this respect.

And practically speaking, the only signal a kernel thread generally
can handle is SIGKILL (and exit on it).

Now, to make matters actually more confusing for SIGSTOP, it's a
two-phase operation - initiated from that usual "about to return to
user space with signals pending" logic (which doesn't happen for
kernel threads, including IO threads), _and_ then it has that magic
accounting for when to notify the parent about stoppage (which has
some across-thread handling).

I really think IO threads need to not participate, because they simply
cannot handle signals in any sane manner.

You should think of the IO threads as fully kernel threads that just
share VM and fs with the user thing.

In fact, it might be a good idea to disassociate them from the thread
group entirely when they are created, so that none of
"for_each_thread()" or "next_thread()" logic ever finds them.

Maybe the right thing to do is to add a new case to that whole thread
group initialization code in copy_process() - something like this fake
and intentionally whitespace-damaged pseudo-patch:

diff --git a/kernel/fork.c b/kernel/fork.c
index 54cc905e5fe0..b87abe3a9ac6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2296,7 +2296,7 @@ static __latent_entropy struct task_struct
*copy_process(
attach_pid(p, PIDTYPE_PGID);
attach_pid(p, PIDTYPE_SID);
__this_cpu_inc(process_counts);
- } else {
+ } else if (!IOTHREAD) {
current->signal->nr_threads++;
atomic_inc(&current->signal->live);
refcount_inc(&current->signal->sigcnt);

so that the IO thread isn't considered "really" part of the existing
signal state. Alternatively, make it not use
CLONE_SIGHAND|CLONE_THREAD at all, but that would make it
unnecessarily allocate its own signal state, so that's "cleaner" but
not great either.

Hmm.

Linus

2021-03-20 17:58:45

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 1/2] signal: don't allow sending any signals to PF_IO_WORKER threads

On Sat, Mar 20, 2021 at 9:19 AM Eric W. Biederman <[email protected]> wrote:
>
> The creds should be reasonably in-sync with the rest of the threads.

It's not about credentials (despite the -EPERM).

It's about the fact that kernel threads cannot handle signals, and
then get caught in endless loops of "if (sigpending()) return
-EAGAIN".

For a normal user thread, that "return -EAGAIN" (or whatever) will end
up returning an error to user space - and before it does that, it will
go through the "oh, returning to user space, so handle signal" path.
Which will clear sigpending etc.

A thread that never returns to user space fundamentally cannot handle
this. The sigpending() stays on forever, the signal never gets
handled, the thread can't do anything.

So delivering a signal to a kernel thread fundamentally cannot work
(although we do have some threads that explicitly see "oh, if I was
killed, I will exit" - think things like in-kernel nfsd etc).

Linus

2021-03-20 19:14:12

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

Hi Jens,

I love your patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on tip/sched/core linus/master v5.12-rc3 next-20210319]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Jens-Axboe/PF_IO_WORKER-signal-tweaks/20210320-234247
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a74e6a014c9d4d4161061f770c9b4f98372ac778
config: riscv-randconfig-r024-20210321 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 436c6c9c20cc522c92a923440a5fc509c342a7db)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# https://github.com/0day-ci/linux/commit/05c70f370b93f3bf555e63293d43a82aab2fcdf3
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jens-Axboe/PF_IO_WORKER-signal-tweaks/20210320-234247
git checkout 05c70f370b93f3bf555e63293d43a82aab2fcdf3
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from drivers/ata/ahci.c:21:
In file included from include/linux/pci.h:1456:
In file included from include/linux/dmapool.h:14:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
drivers/ata/ahci.c:104:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT("ahci"),
^~~~~~~~~~~~~~~~
drivers/ata/ahci.h:387:16: note: expanded from macro 'AHCI_SHT'
.can_queue = AHCI_MAX_CMDS, \
^~~~~~~~~~~~~
drivers/ata/ahci.c:104:2: note: previous initialization is here
AHCI_SHT("ahci"),
^~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1418:2: note: expanded from macro 'ATA_NCQ_SHT'
__ATA_BASE_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1400:16: note: expanded from macro '__ATA_BASE_SHT'
.can_queue = ATA_DEF_QUEUE, \
^~~~~~~~~~~~~
drivers/ata/ahci.c:104:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT("ahci"),
^~~~~~~~~~~~~~~~
drivers/ata/ahci.h:391:17: note: expanded from macro 'AHCI_SHT'
.sdev_attrs = ahci_sdev_attrs
^~~~~~~~~~~~~~~
drivers/ata/ahci.c:104:2: note: previous initialization is here
AHCI_SHT("ahci"),
^~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1419:17: note: expanded from macro 'ATA_NCQ_SHT'
.sdev_attrs = ata_ncq_sdev_attrs, \
^~~~~~~~~~~~~~~~~~
2 warnings and 1 error generated.
--
In file included from drivers/ata/ahci_xgene.c:19:
In file included from include/linux/phy/phy.h:17:
In file included from include/linux/regulator/consumer.h:35:
In file included from include/linux/suspend.h:5:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:17:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
drivers/ata/ahci_xgene.c:718:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:387:16: note: expanded from macro 'AHCI_SHT'
.can_queue = AHCI_MAX_CMDS, \
^~~~~~~~~~~~~
drivers/ata/ahci_xgene.c:718:2: note: previous initialization is here
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1418:2: note: expanded from macro 'ATA_NCQ_SHT'
__ATA_BASE_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1400:16: note: expanded from macro '__ATA_BASE_SHT'
.can_queue = ATA_DEF_QUEUE, \
^~~~~~~~~~~~~
drivers/ata/ahci_xgene.c:718:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:391:17: note: expanded from macro 'AHCI_SHT'
.sdev_attrs = ahci_sdev_attrs
^~~~~~~~~~~~~~~
drivers/ata/ahci_xgene.c:718:2: note: previous initialization is here
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1419:17: note: expanded from macro 'ATA_NCQ_SHT'
.sdev_attrs = ata_ncq_sdev_attrs, \
^~~~~~~~~~~~~~~~~~
2 warnings and 1 error generated.
--
In file included from drivers/ata/libata-core.c:30:
In file included from include/linux/pci.h:1456:
In file included from include/linux/dmapool.h:14:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
1 error generated.
--
In file included from drivers/ata/acard-ahci.c:22:
In file included from include/linux/pci.h:1456:
In file included from include/linux/dmapool.h:14:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
drivers/ata/acard-ahci.c:70:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT("acard-ahci"),
^~~~~~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:387:16: note: expanded from macro 'AHCI_SHT'
.can_queue = AHCI_MAX_CMDS, \
^~~~~~~~~~~~~
drivers/ata/acard-ahci.c:70:2: note: previous initialization is here
AHCI_SHT("acard-ahci"),
^~~~~~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1418:2: note: expanded from macro 'ATA_NCQ_SHT'
__ATA_BASE_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1400:16: note: expanded from macro '__ATA_BASE_SHT'
.can_queue = ATA_DEF_QUEUE, \
^~~~~~~~~~~~~
drivers/ata/acard-ahci.c:70:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT("acard-ahci"),
^~~~~~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:391:17: note: expanded from macro 'AHCI_SHT'
.sdev_attrs = ahci_sdev_attrs
^~~~~~~~~~~~~~~
drivers/ata/acard-ahci.c:70:2: note: previous initialization is here
AHCI_SHT("acard-ahci"),
^~~~~~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1419:17: note: expanded from macro 'ATA_NCQ_SHT'
.sdev_attrs = ata_ncq_sdev_attrs, \
^~~~~~~~~~~~~~~~~~
2 warnings and 1 error generated.
--
In file included from drivers/ata/ahci_platform.c:17:
In file included from include/linux/libata.h:16:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
drivers/ata/ahci_platform.c:40:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:387:16: note: expanded from macro 'AHCI_SHT'
.can_queue = AHCI_MAX_CMDS, \
^~~~~~~~~~~~~
drivers/ata/ahci_platform.c:40:2: note: previous initialization is here
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1418:2: note: expanded from macro 'ATA_NCQ_SHT'
__ATA_BASE_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1400:16: note: expanded from macro '__ATA_BASE_SHT'
.can_queue = ATA_DEF_QUEUE, \
^~~~~~~~~~~~~
drivers/ata/ahci_platform.c:40:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:391:17: note: expanded from macro 'AHCI_SHT'
.sdev_attrs = ahci_sdev_attrs
^~~~~~~~~~~~~~~
drivers/ata/ahci_platform.c:40:2: note: previous initialization is here
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1419:17: note: expanded from macro 'ATA_NCQ_SHT'
.sdev_attrs = ata_ncq_sdev_attrs, \
^~~~~~~~~~~~~~~~~~
2 warnings and 1 error generated.
--
In file included from drivers/ata/libahci_platform.c:19:
In file included from include/linux/libata.h:16:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
>> drivers/ata/libahci_platform.c:644:42: warning: shift count >= width of type [-Wshift-count-overflow]
rc = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(64));
^~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK'
#define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
^ ~~~
1 warning and 1 error generated.
--
In file included from drivers/ata/sata_sil24.c:13:
In file included from include/linux/pci.h:1456:
In file included from include/linux/dmapool.h:14:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
drivers/ata/sata_sil24.c:378:16: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
.can_queue = SIL24_MAX_CMDS,
^~~~~~~~~~~~~~
drivers/ata/sata_sil24.c:377:2: note: previous initialization is here
ATA_NCQ_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1418:2: note: expanded from macro 'ATA_NCQ_SHT'
__ATA_BASE_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1400:16: note: expanded from macro '__ATA_BASE_SHT'
.can_queue = ATA_DEF_QUEUE, \
^~~~~~~~~~~~~
drivers/ata/sata_sil24.c:381:22: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
.tag_alloc_policy = BLK_TAG_ALLOC_FIFO,
^~~~~~~~~~~~~~~~~~
include/linux/blkdev.h:298:28: note: expanded from macro 'BLK_TAG_ALLOC_FIFO'
#define BLK_TAG_ALLOC_FIFO 0 /* allocate starting from 0 */
^
drivers/ata/sata_sil24.c:377:2: note: previous initialization is here
ATA_NCQ_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1418:2: note: expanded from macro 'ATA_NCQ_SHT'
__ATA_BASE_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1401:22: note: expanded from macro '__ATA_BASE_SHT'
.tag_alloc_policy = BLK_TAG_ALLOC_RR, \
^~~~~~~~~~~~~~~~
include/linux/blkdev.h:299:26: note: expanded from macro 'BLK_TAG_ALLOC_RR'
#define BLK_TAG_ALLOC_RR 1 /* allocate starting from last allocated tag */
^
>> drivers/ata/sata_sil24.c:1306:45: warning: shift count >= width of type [-Wshift-count-overflow]
rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
^~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK'
#define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
^ ~~~
3 warnings and 1 error generated.
--
In file included from drivers/ata/ahci_ceva.c:11:
In file included from include/linux/libata.h:16:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:707:
In file included from include/linux/huge_mm.h:8:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
>> include/linux/sched/signal.h:673:33: error: expected ';' after return statement
return p1->signal == p2->signal
^
;
drivers/ata/ahci_ceva.c:187:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:387:16: note: expanded from macro 'AHCI_SHT'
.can_queue = AHCI_MAX_CMDS, \
^~~~~~~~~~~~~
drivers/ata/ahci_ceva.c:187:2: note: previous initialization is here
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1418:2: note: expanded from macro 'ATA_NCQ_SHT'
__ATA_BASE_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1400:16: note: expanded from macro '__ATA_BASE_SHT'
.can_queue = ATA_DEF_QUEUE, \
^~~~~~~~~~~~~
drivers/ata/ahci_ceva.c:187:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:391:17: note: expanded from macro 'AHCI_SHT'
.sdev_attrs = ahci_sdev_attrs
^~~~~~~~~~~~~~~
drivers/ata/ahci_ceva.c:187:2: note: previous initialization is here
AHCI_SHT(DRV_NAME),
^~~~~~~~~~~~~~~~~~
drivers/ata/ahci.h:386:2: note: expanded from macro 'AHCI_SHT'
ATA_NCQ_SHT(drv_name), \
^~~~~~~~~~~~~~~~~~~~~
include/linux/libata.h:1419:17: note: expanded from macro 'ATA_NCQ_SHT'
.sdev_attrs = ata_ncq_sdev_attrs, \
^~~~~~~~~~~~~~~~~~
2 warnings and 1 error generated.
..

vim +673 include/linux/sched/signal.h

669
670 static inline
671 bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
672 {
> 673 return p1->signal == p2->signal
674 }
675

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

Attachments:

(No filename) (20.28 kB)
.config.gz (32.01 kB)
Download all attachments

2021-03-20 19:20:47

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds
<[email protected]> wrote:
>
> Alternatively, make it not use
> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it
> unnecessarily allocate its own signal state, so that's "cleaner" but
> not great either.

Thinking some more about that, it would be problematic for things like
the resource counters too. They'd be much better shared.

Not adding it to the thread list etc might be clever, but feels a bit too scary.

So on the whole I think Jens' minor patches to just not have IO helper
threads accept signals are probably the right thing to do.

Linus

2021-03-20 21:41:33

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 1/2] signal: don't allow sending any signals to PF_IO_WORKER threads

Linus Torvalds <[email protected]> writes:

> On Sat, Mar 20, 2021 at 9:19 AM Eric W. Biederman <[email protected]> wrote:
>>
>> The creds should be reasonably in-sync with the rest of the threads.
>
> It's not about credentials (despite the -EPERM).
>
> It's about the fact that kernel threads cannot handle signals, and
> then get caught in endless loops of "if (sigpending()) return
> -EAGAIN".
>
> For a normal user thread, that "return -EAGAIN" (or whatever) will end
> up returning an error to user space - and before it does that, it will
> go through the "oh, returning to user space, so handle signal" path.
> Which will clear sigpending etc.
>
> A thread that never returns to user space fundamentally cannot handle
> this. The sigpending() stays on forever, the signal never gets
> handled, the thread can't do anything.
>
> So delivering a signal to a kernel thread fundamentally cannot work
> (although we do have some threads that explicitly see "oh, if I was
> killed, I will exit" - think things like in-kernel nfsd etc).

I agree that getting a kernel thread to receive a signal is quite
tricky. But that is not what the patch affects.

The patch covers the case when instead of specifying the pid of the
process to kill(2) someone specifies the tid of a thread. Which implies
that type is PIDTYPE_TGID, and in turn the signal is being placed on the
t->signal->shared_pending queue. Not the thread specific t->pending
queue.

So my question is since the signal is delivered to the process as a
whole why do we care if someone specifies the tid of a kernel thread,
rather than the tid of a userspace thread?

Eric

2021-03-20 22:11:22

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

Added criu because I just realized that io_uring (which can open files
from an io worker thread) looks to require some special handling for
stopping and freezing processes. If not in the SIGSTOP case in the
related cgroup freezer case.

Linus Torvalds <[email protected]> writes:

> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds
> <[email protected]> wrote:
>>
>> Alternatively, make it not use
>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it
>> unnecessarily allocate its own signal state, so that's "cleaner" but
>> not great either.
>
> Thinking some more about that, it would be problematic for things like
> the resource counters too. They'd be much better shared.
>
> Not adding it to the thread list etc might be clever, but feels a bit too scary.
>
> So on the whole I think Jens' minor patches to just not have IO helper
> threads accept signals are probably the right thing to do.

The way I see it we have two options:

1) Don't ask PF_IO_WORKERs to stop do_signal_stop and in
task_join_group_stop.

The easiest comprehensive implementation looks like just
updating task_set_jobctl_pending to treat PF_IO_WORKER
as it treats PF_EXITING.

2) Have the main loop of the kernel thread test for JOBCTL_STOP_PENDING
and call into do_signal_stop.

It is a wee bit trickier to modify the io_workers to stop, but it does
not look prohibitively difficult.

All of the work performed by the io worker is work scheduled via
io_uring by the process being stopped.

- Is the amount of work performed by the io worker thread sufficiently
negligible that we don't care?

- Or is the amount of work performed by the io worker so great that it
becomes a way for an errant process to escape SIGSTOP?

As the code is all intermingled with the cgroup_freezer. I am also
wondering creating checkpoints needs additional stopping guarantees.

To solve the issue that SIGSTOP is simply broken right now I am totally
fine with something like:

diff --git a/kernel/signal.c b/kernel/signal.c
index ba4d1ef39a9e..cb9acdfb32fa 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));

- if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
+ if (unlikely(fatal_signal_pending(task) ||
+ (task->flags & (PF_EXITING | PF_IO_WORKER))))
return false;

if (mask & JOBCTL_STOP_SIGMASK)

Which just keeps from creating unstoppable processes today. I am just
not convinced that is what we want as a long term solution.

Eric

2021-03-20 22:54:05

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH 1/2] signal: don't allow sending any signals to PF_IO_WORKER threads

On 3/20/21 3:38 PM, Eric W. Biederman wrote:
> Linus Torvalds <[email protected]> writes:
>
>> On Sat, Mar 20, 2021 at 9:19 AM Eric W. Biederman <[email protected]> wrote:
>>>
>>> The creds should be reasonably in-sync with the rest of the threads.
>>
>> It's not about credentials (despite the -EPERM).
>>
>> It's about the fact that kernel threads cannot handle signals, and
>> then get caught in endless loops of "if (sigpending()) return
>> -EAGAIN".
>>
>> For a normal user thread, that "return -EAGAIN" (or whatever) will end
>> up returning an error to user space - and before it does that, it will
>> go through the "oh, returning to user space, so handle signal" path.
>> Which will clear sigpending etc.
>>
>> A thread that never returns to user space fundamentally cannot handle
>> this. The sigpending() stays on forever, the signal never gets
>> handled, the thread can't do anything.
>>
>> So delivering a signal to a kernel thread fundamentally cannot work
>> (although we do have some threads that explicitly see "oh, if I was
>> killed, I will exit" - think things like in-kernel nfsd etc).
>
> I agree that getting a kernel thread to receive a signal is quite
> tricky. But that is not what the patch affects.
>
> The patch covers the case when instead of specifying the pid of the
> process to kill(2) someone specifies the tid of a thread. Which implies
> that type is PIDTYPE_TGID, and in turn the signal is being placed on the
> t->signal->shared_pending queue. Not the thread specific t->pending
> queue.
>
> So my question is since the signal is delivered to the process as a
> whole why do we care if someone specifies the tid of a kernel thread,
> rather than the tid of a userspace thread?

Right, that's what this first patch does, and in all honesty, it's not
required like the 2/2 patch is. I do think it makes it more consistent,
though - the threads don't take signals, period. Allowing delivery from
eg kill(2) and then pass it to the owning task of the io_uring is
somewhat counterintuitive, and differs from earlier kernels where there
was no relationsship between that owning task and the async worker
thread.

That's why I think the patch DOES make sense. These threads may share a
personality with the owning task, but I don't think we should be able to
manipulate them from userspace at all. That includes SIGSTOP, of course,
but also regular signals.

Hence I do think we should do something like this.

--
Jens Axboe

2021-03-20 22:56:48

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

On 3/20/21 4:08 PM, Eric W. Biederman wrote:
>
> Added criu because I just realized that io_uring (which can open files
> from an io worker thread) looks to require some special handling for
> stopping and freezing processes. If not in the SIGSTOP case in the
> related cgroup freezer case.
>
> Linus Torvalds <[email protected]> writes:
>
>> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> Alternatively, make it not use
>>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it
>>> unnecessarily allocate its own signal state, so that's "cleaner" but
>>> not great either.
>>
>> Thinking some more about that, it would be problematic for things like
>> the resource counters too. They'd be much better shared.
>>
>> Not adding it to the thread list etc might be clever, but feels a bit too scary.
>>
>> So on the whole I think Jens' minor patches to just not have IO helper
>> threads accept signals are probably the right thing to do.
>
> The way I see it we have two options:
>
> 1) Don't ask PF_IO_WORKERs to stop do_signal_stop and in
> task_join_group_stop.
>
> The easiest comprehensive implementation looks like just
> updating task_set_jobctl_pending to treat PF_IO_WORKER
> as it treats PF_EXITING.
>
> 2) Have the main loop of the kernel thread test for JOBCTL_STOP_PENDING
> and call into do_signal_stop.
>
> It is a wee bit trickier to modify the io_workers to stop, but it does
> not look prohibitively difficult.
>
> All of the work performed by the io worker is work scheduled via
> io_uring by the process being stopped.
>
> - Is the amount of work performed by the io worker thread sufficiently
> negligible that we don't care?
>
> - Or is the amount of work performed by the io worker so great that it
> becomes a way for an errant process to escape SIGSTOP?
>
> As the code is all intermingled with the cgroup_freezer. I am also
> wondering creating checkpoints needs additional stopping guarantees.

The work done is the same a syscall, basically. So it could be long
running and essentially not doing anything (eg read from a socket, no
data is there), or it's pretty short lived (eg read from a file, just
waiting on DMA).

This is outside of my domain of expertise, which is exactly why I added
you and Linus to make some calls on what the best approach here would
be. My two patches obviously go route #1 in terms of STOP. And fwiw,
I tested this:

> To solve the issue that SIGSTOP is simply broken right now I am totally
> fine with something like:
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index ba4d1ef39a9e..cb9acdfb32fa 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>
> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
> + if (unlikely(fatal_signal_pending(task) ||
> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
> return false;
>
> if (mask & JOBCTL_STOP_SIGMASK)

and can confirm it works fine for me with 2/2 reverted and this applied
instead.

> Which just keeps from creating unstoppable processes today. I am just
> not convinced that is what we want as a long term solution.

How about we go with either my 2/2 or yours above to at least ensure we
don't leave workers looping as schedule() is a nop with sigpending? If
there's a longer timeline concern that "evading" SIGSTOP is a concern, I
have absolutely no qualms with making the IO threads participate. But
since it seems conceptually simple but with potentially lurking minor
issues, probably not the ideal approach for right now.

--
Jens Axboe

2021-03-20 22:58:32

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

On 3/20/21 1:18 PM, Linus Torvalds wrote:
> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds
> <[email protected]> wrote:
>>
>> Alternatively, make it not use
>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it
>> unnecessarily allocate its own signal state, so that's "cleaner" but
>> not great either.
>
> Thinking some more about that, it would be problematic for things like
> the resource counters too. They'd be much better shared.
>
> Not adding it to the thread list etc might be clever, but feels a bit too scary.

That would be my immediate concern - it might very well be the right long
term solution, but I'd be wary of doing it upfront and having weird fallout
due to it.

> So on the whole I think Jens' minor patches to just not have IO helper
> threads accept signals are probably the right thing to do.

I do think we should just go with those two - they are simple and
straight forward. I'm also totally fine replacing 2/2 with Eric's
variant if he prefers that, I've confirmed that it works fine for me as
well.

--
Jens Axboe

2021-03-21 16:05:43

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 1/2] signal: don't allow sending any signals to PF_IO_WORKER threads

Jens Axboe <[email protected]> writes:

> On 3/20/21 3:38 PM, Eric W. Biederman wrote:
>> Linus Torvalds <[email protected]> writes:
>>
>>> On Sat, Mar 20, 2021 at 9:19 AM Eric W. Biederman <[email protected]> wrote:
>>>>
>>>> The creds should be reasonably in-sync with the rest of the threads.
>>>
>>> It's not about credentials (despite the -EPERM).
>>>
>>> It's about the fact that kernel threads cannot handle signals, and
>>> then get caught in endless loops of "if (sigpending()) return
>>> -EAGAIN".
>>>
>>> For a normal user thread, that "return -EAGAIN" (or whatever) will end
>>> up returning an error to user space - and before it does that, it will
>>> go through the "oh, returning to user space, so handle signal" path.
>>> Which will clear sigpending etc.
>>>
>>> A thread that never returns to user space fundamentally cannot handle
>>> this. The sigpending() stays on forever, the signal never gets
>>> handled, the thread can't do anything.
>>>
>>> So delivering a signal to a kernel thread fundamentally cannot work
>>> (although we do have some threads that explicitly see "oh, if I was
>>> killed, I will exit" - think things like in-kernel nfsd etc).
>>
>> I agree that getting a kernel thread to receive a signal is quite
>> tricky. But that is not what the patch affects.
>>
>> The patch covers the case when instead of specifying the pid of the
>> process to kill(2) someone specifies the tid of a thread. Which implies
>> that type is PIDTYPE_TGID, and in turn the signal is being placed on the
>> t->signal->shared_pending queue. Not the thread specific t->pending
>> queue.
>>
>> So my question is since the signal is delivered to the process as a
>> whole why do we care if someone specifies the tid of a kernel thread,
>> rather than the tid of a userspace thread?
>
> Right, that's what this first patch does, and in all honesty, it's not
> required like the 2/2 patch is. I do think it makes it more consistent,
> though - the threads don't take signals, period. Allowing delivery from
> eg kill(2) and then pass it to the owning task of the io_uring is
> somewhat counterintuitive, and differs from earlier kernels where there
> was no relationsship between that owning task and the async worker
> thread.
>
> That's why I think the patch DOES make sense. These threads may share a
> personality with the owning task, but I don't think we should be able to
> manipulate them from userspace at all. That includes SIGSTOP, of course,
> but also regular signals.
>
> Hence I do think we should do something like this.

I agree about signals. Especially because being able to use kill(2)
with the tid of thread is a linuxism and a backwards compatibility thing
from before we had CLONE_THREAD.

I think for kill(2) we should just return -ESRCH.

Thank you for providing the reasoning that is what I really saw missing
in the patches. The why. And software is difficult to maintain without
the why.

Eric

2021-03-21 16:11:20

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

Jens Axboe <[email protected]> writes:

> On 3/20/21 4:08 PM, Eric W. Biederman wrote:
>>
>> Added criu because I just realized that io_uring (which can open files
>> from an io worker thread) looks to require some special handling for
>> stopping and freezing processes. If not in the SIGSTOP case in the
>> related cgroup freezer case.
>>
>> Linus Torvalds <[email protected]> writes:
>>
>>> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> Alternatively, make it not use
>>>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it
>>>> unnecessarily allocate its own signal state, so that's "cleaner" but
>>>> not great either.
>>>
>>> Thinking some more about that, it would be problematic for things like
>>> the resource counters too. They'd be much better shared.
>>>
>>> Not adding it to the thread list etc might be clever, but feels a bit too scary.
>>>
>>> So on the whole I think Jens' minor patches to just not have IO helper
>>> threads accept signals are probably the right thing to do.
>>
>> The way I see it we have two options:
>>
>> 1) Don't ask PF_IO_WORKERs to stop do_signal_stop and in
>> task_join_group_stop.
>>
>> The easiest comprehensive implementation looks like just
>> updating task_set_jobctl_pending to treat PF_IO_WORKER
>> as it treats PF_EXITING.
>>
>> 2) Have the main loop of the kernel thread test for JOBCTL_STOP_PENDING
>> and call into do_signal_stop.
>>
>> It is a wee bit trickier to modify the io_workers to stop, but it does
>> not look prohibitively difficult.
>>
>> All of the work performed by the io worker is work scheduled via
>> io_uring by the process being stopped.
>>
>> - Is the amount of work performed by the io worker thread sufficiently
>> negligible that we don't care?
>>
>> - Or is the amount of work performed by the io worker so great that it
>> becomes a way for an errant process to escape SIGSTOP?
>>
>> As the code is all intermingled with the cgroup_freezer. I am also
>> wondering creating checkpoints needs additional stopping guarantees.
>
> The work done is the same a syscall, basically. So it could be long
> running and essentially not doing anything (eg read from a socket, no
> data is there), or it's pretty short lived (eg read from a file, just
> waiting on DMA).
>
> This is outside of my domain of expertise, which is exactly why I added
> you and Linus to make some calls on what the best approach here would
> be. My two patches obviously go route #1 in terms of STOP. And fwiw,
> I tested this:
>
>> To solve the issue that SIGSTOP is simply broken right now I am totally
>> fine with something like:
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index ba4d1ef39a9e..cb9acdfb32fa 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>
>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>> + if (unlikely(fatal_signal_pending(task) ||
>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>> return false;
>>
>> if (mask & JOBCTL_STOP_SIGMASK)
>
> and can confirm it works fine for me with 2/2 reverted and this applied
> instead.
>
>> Which just keeps from creating unstoppable processes today. I am just
>> not convinced that is what we want as a long term solution.
>
> How about we go with either my 2/2 or yours above to at least ensure we
> don't leave workers looping as schedule() is a nop with sigpending? If
> there's a longer timeline concern that "evading" SIGSTOP is a concern, I
> have absolutely no qualms with making the IO threads participate. But
> since it seems conceptually simple but with potentially lurking minor
> issues, probably not the ideal approach for right now.

Here is the signoff for mine.

Signed-off-by: "Eric W. Biederman" <[email protected]>

Yours misses the joining of group stop during fork. So we better use
mine.

As far as I can see that fixes the outstanding bugs.

Jens can you make a proper patch out of it and send it to Linus for
-rc4? I unfortunately have other commitments and this is all I can do
for today.

Eric

2021-03-21 16:23:04

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

On 3/21/21 9:18 AM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 3/20/21 4:08 PM, Eric W. Biederman wrote:
>>>
>>> Added criu because I just realized that io_uring (which can open files
>>> from an io worker thread) looks to require some special handling for
>>> stopping and freezing processes. If not in the SIGSTOP case in the
>>> related cgroup freezer case.
>>>
>>> Linus Torvalds <[email protected]> writes:
>>>
>>>> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds
>>>> <[email protected]> wrote:
>>>>>
>>>>> Alternatively, make it not use
>>>>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it
>>>>> unnecessarily allocate its own signal state, so that's "cleaner" but
>>>>> not great either.
>>>>
>>>> Thinking some more about that, it would be problematic for things like
>>>> the resource counters too. They'd be much better shared.
>>>>
>>>> Not adding it to the thread list etc might be clever, but feels a bit too scary.
>>>>
>>>> So on the whole I think Jens' minor patches to just not have IO helper
>>>> threads accept signals are probably the right thing to do.
>>>
>>> The way I see it we have two options:
>>>
>>> 1) Don't ask PF_IO_WORKERs to stop do_signal_stop and in
>>> task_join_group_stop.
>>>
>>> The easiest comprehensive implementation looks like just
>>> updating task_set_jobctl_pending to treat PF_IO_WORKER
>>> as it treats PF_EXITING.
>>>
>>> 2) Have the main loop of the kernel thread test for JOBCTL_STOP_PENDING
>>> and call into do_signal_stop.
>>>
>>> It is a wee bit trickier to modify the io_workers to stop, but it does
>>> not look prohibitively difficult.
>>>
>>> All of the work performed by the io worker is work scheduled via
>>> io_uring by the process being stopped.
>>>
>>> - Is the amount of work performed by the io worker thread sufficiently
>>> negligible that we don't care?
>>>
>>> - Or is the amount of work performed by the io worker so great that it
>>> becomes a way for an errant process to escape SIGSTOP?
>>>
>>> As the code is all intermingled with the cgroup_freezer. I am also
>>> wondering creating checkpoints needs additional stopping guarantees.
>>
>> The work done is the same a syscall, basically. So it could be long
>> running and essentially not doing anything (eg read from a socket, no
>> data is there), or it's pretty short lived (eg read from a file, just
>> waiting on DMA).
>>
>> This is outside of my domain of expertise, which is exactly why I added
>> you and Linus to make some calls on what the best approach here would
>> be. My two patches obviously go route #1 in terms of STOP. And fwiw,
>> I tested this:
>>
>>> To solve the issue that SIGSTOP is simply broken right now I am totally
>>> fine with something like:
>>>
>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>> index ba4d1ef39a9e..cb9acdfb32fa 100644
>>> --- a/kernel/signal.c
>>> +++ b/kernel/signal.c
>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>
>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>> + if (unlikely(fatal_signal_pending(task) ||
>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>> return false;
>>>
>>> if (mask & JOBCTL_STOP_SIGMASK)
>>
>> and can confirm it works fine for me with 2/2 reverted and this applied
>> instead.
>>
>>> Which just keeps from creating unstoppable processes today. I am just
>>> not convinced that is what we want as a long term solution.
>>
>> How about we go with either my 2/2 or yours above to at least ensure we
>> don't leave workers looping as schedule() is a nop with sigpending? If
>> there's a longer timeline concern that "evading" SIGSTOP is a concern, I
>> have absolutely no qualms with making the IO threads participate. But
>> since it seems conceptually simple but with potentially lurking minor
>> issues, probably not the ideal approach for right now.
>
>
> Here is the signoff for mine.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
>
> Yours misses the joining of group stop during fork. So we better use
> mine.

I've updated it and attributed it to you, here is is:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=4db4b1a0d1779dc159f7b87feb97030ec0b12597

> As far as I can see that fixes the outstanding bugs.

Great!

> Jens can you make a proper patch out of it and send it to Linus for
> -rc4? I unfortunately have other commitments and this is all I can do
> for today.

Will do - I'm going to sanity run the current branch and do a followup
pull request for Linus once I've verified everything is still sane.

--
Jens Axboe

2021-03-21 16:23:44

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH 1/2] signal: don't allow sending any signals to PF_IO_WORKER threads

On 3/21/21 8:54 AM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 3/20/21 3:38 PM, Eric W. Biederman wrote:
>>> Linus Torvalds <[email protected]> writes:
>>>
>>>> On Sat, Mar 20, 2021 at 9:19 AM Eric W. Biederman <[email protected]> wrote:
>>>>>
>>>>> The creds should be reasonably in-sync with the rest of the threads.
>>>>
>>>> It's not about credentials (despite the -EPERM).
>>>>
>>>> It's about the fact that kernel threads cannot handle signals, and
>>>> then get caught in endless loops of "if (sigpending()) return
>>>> -EAGAIN".
>>>>
>>>> For a normal user thread, that "return -EAGAIN" (or whatever) will end
>>>> up returning an error to user space - and before it does that, it will
>>>> go through the "oh, returning to user space, so handle signal" path.
>>>> Which will clear sigpending etc.
>>>>
>>>> A thread that never returns to user space fundamentally cannot handle
>>>> this. The sigpending() stays on forever, the signal never gets
>>>> handled, the thread can't do anything.
>>>>
>>>> So delivering a signal to a kernel thread fundamentally cannot work
>>>> (although we do have some threads that explicitly see "oh, if I was
>>>> killed, I will exit" - think things like in-kernel nfsd etc).
>>>
>>> I agree that getting a kernel thread to receive a signal is quite
>>> tricky. But that is not what the patch affects.
>>>
>>> The patch covers the case when instead of specifying the pid of the
>>> process to kill(2) someone specifies the tid of a thread. Which implies
>>> that type is PIDTYPE_TGID, and in turn the signal is being placed on the
>>> t->signal->shared_pending queue. Not the thread specific t->pending
>>> queue.
>>>
>>> So my question is since the signal is delivered to the process as a
>>> whole why do we care if someone specifies the tid of a kernel thread,
>>> rather than the tid of a userspace thread?
>>
>> Right, that's what this first patch does, and in all honesty, it's not
>> required like the 2/2 patch is. I do think it makes it more consistent,
>> though - the threads don't take signals, period. Allowing delivery from
>> eg kill(2) and then pass it to the owning task of the io_uring is
>> somewhat counterintuitive, and differs from earlier kernels where there
>> was no relationsship between that owning task and the async worker
>> thread.
>>
>> That's why I think the patch DOES make sense. These threads may share a
>> personality with the owning task, but I don't think we should be able to
>> manipulate them from userspace at all. That includes SIGSTOP, of course,
>> but also regular signals.
>>
>> Hence I do think we should do something like this.
>
> I agree about signals. Especially because being able to use kill(2)
> with the tid of thread is a linuxism and a backwards compatibility thing
> from before we had CLONE_THREAD.
>
> I think for kill(2) we should just return -ESRCH.
>
> Thank you for providing the reasoning that is what I really saw missing
> in the patches. The why. And software is difficult to maintain without
> the why.

Thanks Eric, I'll change that patch to -ESRCH and augment the commit
message a bit.

--
Jens Axboe

2021-03-22 16:09:41

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks

On 03/20, Jens Axboe wrote:
>
> Hi,
>
> Been trying to ensure that we do the right thing wrt signals and
> PF_IO_WORKER threads

OMG. Am I understand correctly? create_io_thread() can create a sub-
thread of userspace process which acts as a kernel thread?

Looks like this is the recent feature I wasn't aware... Can't really
comment right now, just some random and possibly wrong notes.

> 1) Just don't allow signals to them in general. We do mask everything
> as blocked, outside of SIGKILL, so things like wants_signal() will
> never return true for them.

This only means that signal_wake_up() won't be called. But the signal
will be queued if sent via tkill/etc, I don't think this is what we want?

A PF_IO_WORKER thread should ignore the signals. But it seems that the
PF_IO_WORKER check in sig_task_ignored() makes no sense and can't help.
I don't think PF_IO_WORKER && SIG_KTHREAD_KERNEL is possible.

Not to mention that sig_ignored() won't even call sig_task_ignored(),
it will return false exactly because the signal is blocked.

Confused.

Plus the the setting of tsk->blocked in create_io_thread() looks racy,
signal_pending() can be already true. And in fact it can't really help,
calculate_sigpending() can set TIF_SIGPENDING after wake_up_new_task()
anyway.

And why does create_io_thread() use lower_32_bits() ? This looks very
confusing. This

.exit_signal = (lower_32_bits(flags) & CSIGNAL);

too. Firstly, the rhs is always zero, secondly it is ignored because
of CLONE_THREAD.

ptrace_attach() checks PF_IO_WORKER too. Yes, but 'gdb -p' will try
to attach to every thread /proc/pid/tasks, so it will probably just
hang?

Oleg.

2021-03-22 16:17:27

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [PATCH 2/2] signal: don't allow STOP on PF_IO_WORKER threads

On 03/20, Jens Axboe wrote:
>
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2349,6 +2349,10 @@ static bool do_signal_stop(int signr)
>
> t = current;
> while_each_thread(current, t) {
> + /* don't STOP PF_IO_WORKER threads */
> + if (t->flags & PF_IO_WORKER)
> + continue;
> +

This is not enough. At least task_join_group_stop() should check
PF_IO_WORKER too.

Oleg.

2021-03-22 16:22:21

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [PATCH 2/2] signal: don't allow STOP on PF_IO_WORKER threads

On 03/20, Eric W. Biederman wrote:
>
> But please tell me why is it not the right thing to have the io_uring
> helper threads stop? Why is it ok for that process to go on consuming
> cpu resources in a non-stoppable way.

Yes, I have the same question ;)

Oleg.