Hi there,
I’m still trying to piece together a reproducible test that triggers
this, but I wanted to post in case someone goes “hmmm... change X
might have done this”.
Basically, something’s broken (or at least, has changed enough to
cause problems in user space) in epoll since 5.0. It’s still broken in
5.1-rc5.
It doesn’t happen 100% of the time. It’s sort of hard to pin down but
I’ve observed the following:
* nginx not accepting connections under load
* A java app which uses netty / NIO having strange writability
semantics on channels, which confuses netty / java enough to not
properly flush written data on the socket.
I went and tested these Linux kernels:
4.20.17
4.19.32
4.14.111
And the issue(s) do not show up there.
I’m still actively chasing this up, and will report back — I haven’t
touched kernel code in 15 years so I’m a little rusty. :)
Regards,
Omar
Omar Kilani <[email protected]> wrote:
> Hi there,
>
> I’m still trying to piece together a reproducible test that triggers
> this, but I wanted to post in case someone goes “hmmm... change X
> might have done this”.
Maybe Davidlohr knows, since he's responsible for most of the
epoll changes in 5.0.
> Basically, something’s broken (or at least, has changed enough to
> cause problems in user space) in epoll since 5.0. It’s still broken in
> 5.1-rc5.
>
> It doesn’t happen 100% of the time. It’s sort of hard to pin down but
> I’ve observed the following:
>
> * nginx not accepting connections under load
> * A java app which uses netty / NIO having strange writability
> semantics on channels, which confuses netty / java enough to not
> properly flush written data on the socket.
>
> I went and tested these Linux kernels:
>
> 4.20.17
> 4.19.32
> 4.14.111
>
> And the issue(s) do not show up there.
>
> I’m still actively chasing this up, and will report back — I haven’t
> touched kernel code in 15 years so I’m a little rusty. :)
>
> Regards,
> Omar
On Wed, 24 Apr 2019, Davidlohr Bueso wrote:
>On Wed, 24 Apr 2019, Eric Wong wrote:
>
>>Omar Kilani <[email protected]> wrote:
>>>Hi there,
>>>
>>>I???m still trying to piece together a reproducible test that triggers
>>>this, but I wanted to post in case someone goes ???hmmm... change X
>>>might have done this???.
>>
>>Maybe Davidlohr knows, since he's responsible for most of the
>>epoll changes in 5.0.
>
>Not really, I have not been made aware of any issues until now.
>
>>
>>>Basically, something???s broken (or at least, has changed enough to
>>>cause problems in user space) in epoll since 5.0. It???s still broken in
>>>5.1-rc5.
>>>
>>>It doesn???t happen 100% of the time. It???s sort of hard to pin down but
>>>I???ve observed the following:
>>>
>>>* nginx not accepting connections under load
>>>* A java app which uses netty / NIO having strange writability
>>>semantics on channels, which confuses netty / java enough to not
>>>properly flush written data on the socket.
Off the top of my head, could the following be responsible?
c5a282e9635 (fs/epoll: reduce the scope of wq lock in epoll_wait())
On Wed, 24 Apr 2019, Eric Wong wrote:
>Omar Kilani <[email protected]> wrote:
>> Hi there,
>>
>> I???m still trying to piece together a reproducible test that triggers
>> this, but I wanted to post in case someone goes ???hmmm... change X
>> might have done this???.
>
>Maybe Davidlohr knows, since he's responsible for most of the
>epoll changes in 5.0.
Not really, I have not been made aware of any issues until now.
>
>> Basically, something???s broken (or at least, has changed enough to
>> cause problems in user space) in epoll since 5.0. It???s still broken in
>> 5.1-rc5.
>>
>> It doesn???t happen 100% of the time. It???s sort of hard to pin down but
>> I???ve observed the following:
>>
>> * nginx not accepting connections under load
>> * A java app which uses netty / NIO having strange writability
>> semantics on channels, which confuses netty / java enough to not
>> properly flush written data on the socket.
>>
>> I went and tested these Linux kernels:
>>
>> 4.20.17
>> 4.19.32
>> 4.14.111
>>
>> And the issue(s) do not show up there.
>>
>> I???m still actively chasing this up, and will report back ??? I haven???t
>> touched kernel code in 15 years so I???m a little rusty. :)
A bisection and/or workload that triggers the issue would be great to
see what's going on.
Thanks,
Davidlohr
Eric Wong <[email protected]> wrote:
> Omar Kilani <[email protected]> wrote:
> > Hi there,
> >
> > I’m still trying to piece together a reproducible test that triggers
> > this, but I wanted to post in case someone goes “hmmm... change X
> > might have done this”.
>
> Maybe Davidlohr knows, since he's responsible for most of the
> epoll changes in 5.0.
Well, I am not sure if I am hitting the same problem Omar is
hitting. But I did find an epoll_pwait regression in 5.0:
epoll_pwait seems unresponsive to SIGURG in my
heavily-parallelized use case[1] on 5.0.9. I bisected it to
commit 854a6ed56839a40f6b5d02a2962f48841482eec4
("signal: Add restore_user_sigmask()")
Just reverting the fs/eventpoll.c change in 854a6ed56 seems
enough to fix the non-responsive epoll_pwait for me. I have not
looked deeply into this, but perhaps the signal_pending check in
restore_user_sigmask is racy w.r.t. epoll. It is been a while
since I have looked at kernel stuff, myself.
Anyways, this revert works; but I'm not 100% sure why...
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index a5d219d920e7..151739d76801 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2247,7 +2247,20 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
error = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ /*
+ * If we changed the signal mask, we need to restore the original one.
+ * In case we've got a signal while waiting, we do not restore the
+ * signal mask yet, and we allow do_signal() to deliver the signal on
+ * the way back to userspace, before the signal mask is restored.
+ */
+ if (sigmask) {
+ if (error == -EINTR) {
+ memcpy(¤t->saved_sigmask, &sigsaved,
+ sizeof(sigsaved));
+ set_restore_sigmask();
+ } else
+ set_current_blocked(&sigsaved);
+ }
return error;
}
@@ -2272,7 +2285,20 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
err = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ /*
+ * If we changed the signal mask, we need to restore the original one.
+ * In case we've got a signal while waiting, we do not restore the
+ * signal mask yet, and we allow do_signal() to deliver the signal on
+ * the way back to userspace, before the signal mask is restored.
+ */
+ if (sigmask) {
+ if (err == -EINTR) {
+ memcpy(¤t->saved_sigmask, &sigsaved,
+ sizeof(sigsaved));
+ set_restore_sigmask();
+ } else
+ set_current_blocked(&sigsaved);
+ }
return err;
}
Comments and/or a proper fix would be greatly appreciated.
[1] my test case is running the cmogstored 1.7.0 test suite
in amd64 Debian stable environment.
test/mgmt_auto_adjust would get stuck and time-out after 60s
on vanilla v5.0.9
tgz: https://bogomips.org/cmogstored/files/cmogstored-1.7.0.tar.gz
# Standard autotools install, N=32 or some high-ish number
./configure
make -j$N
make check -j$N
# OR git clone https://bogomips.org/cmogstored.git
So, requoting the rest of Omar's original report, here; since
I am not sure if his use case involves epoll_pwait like mine does:
> Omar Kilani <[email protected]> wrote:
> > Basically, something’s broken (or at least, has changed enough to
> > cause problems in user space) in epoll since 5.0. It’s still broken in
> > 5.1-rc5.
> >
> > It doesn’t happen 100% of the time. It’s sort of hard to pin down but
> > I’ve observed the following:
> >
> > * nginx not accepting connections under load
> > * A java app which uses netty / NIO having strange writability
> > semantics on channels, which confuses netty / java enough to not
> > properly flush written data on the socket.
> >
> > I went and tested these Linux kernels:
> >
> > 4.20.17
> > 4.19.32
> > 4.14.111
> >
> > And the issue(s) do not show up there.
> >
> > I’m still actively chasing this up, and will report back — I haven’t
> > touched kernel code in 15 years so I’m a little rusty. :)
> >
> > Regards,
> > Omar
I tried to replicate the failure on qemu.
I do not see the failure with N=32.
Does it work for N < 32?
Does any other signal work?
Are there any other architectures that fail?
Could you help me figure out how to run just the one test that is failing?
-Deepa
Deepa Dinamani <[email protected]> wrote:
> I tried to replicate the failure on qemu.
> I do not see the failure with N=32.
> Does it work for N < 32?
Depends on number of cores you have; I have 4 cores, 8 threads
with HT; so I needed to have a lot of load on the machine to get
it to fail (it takes about 1 minute).
cmogstored is intended to run on machines that were already
saturated in CPU/memory from other processes, but not HDD I/O
bandwidth.
> Does any other signal work?
SIGCONT does, via:
perl -i -p -e 's/SIGURG/SIGCONT/g' `git ls-files`
> Are there any other architectures that fail?
I don't have other arches (well, 32-bit x86, but I've never
really tried cmogstored on that, even).
> Could you help me figure out how to run just the one test that is failing?
Just running one test won't trigger since it needs a busy
machine; but:
make test/mgmt_auto_adjust.log
(and "rm make test/mgmt_auto_adjust.log" if you want to rerun)
Thanks for looking into this. Fwiw, cmogstored uses epoll in
strange and uncommon ways which has led to kernel bugfixes
in the past.
On Sun, 28 Apr 2019, Eric Wong wrote:
>Just running one test won't trigger since it needs a busy
>machine; but:
>
> make test/mgmt_auto_adjust.log
> (and "rm make test/mgmt_auto_adjust.log" if you want to rerun)
fyi no luck reproducing on both either a large (280) or small (4 cpu)
machine, I'm running it along with a kernel build overcommiting cpus x2.
Is there any workload in particular you are using to stress the box?
Thanks,
Davidlohr
Davidlohr Bueso <[email protected]> wrote:
> On Sun, 28 Apr 2019, Eric Wong wrote:
>
> > Just running one test won't trigger since it needs a busy
> > machine; but:
> >
> > make test/mgmt_auto_adjust.log
> > (and "rm make test/mgmt_auto_adjust.log" if you want to rerun)
>
> fyi no luck reproducing on both either a large (280) or small (4 cpu)
> machine, I'm running it along with a kernel build overcommiting cpus x2.
Thanks for taking a look.
> Is there any workload in particular you are using to stress the box?
Just the "make -j$N check" I mentioned up-thread which spawns a
lot of tests in parallel. For the small 4 CPU machine,
-j32 or -j16 ought to be reproducing the problem.
I'll try to set aside some time this week to get familiar with
kernel internals w.r.t. signal handling.
I was also not able to reproduce this.
Arnd and I were talking about this today morning. Here is something
Arnd noticed:
If there was a signal after do_epoll_wait(), we never were not
entering the if (err = -EINTR) at all before. But, now we do.
We could try with the below patch:
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..5cfb800cf598 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2330,7 +2330,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,
error = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, error == -EITNR);
return error;
}
diff --git a/kernel/signal.c b/kernel/signal.c
index 3a9e41197d46..4a8f96f5c1c0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2849,7 +2849,7 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
* This is useful for syscalls such as ppoll, pselect, io_pgetevents and
* epoll_pwait where a new sigmask is passed in from userland for the syscalls.
*/
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+void restore_user_sigmask(const void __user *usigmask, sigset_t
*sigsaved, int sig_pending)
{
if (!usigmask)
@@ -2859,7 +2859,7 @@ void restore_user_sigmask(const void __user
*usigmask, sigset_t *sigsaved)
* Restoring sigmask here can lead to delivering signals that the above
* syscalls are intended to block because of the sigmask passed in.
*/
- if (signal_pending(current)) {
+ if (sig_pending) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
If this works that means we know what is busted.
I'm not sure what the hang in the userspace is about. Is it because
the syscall did not return an error or the particular signal was
blocked etc.
There are also a few timing differences also. But, can we try this first?
-Deepa
Deepa Dinamani <[email protected]> wrote:
> I was also not able to reproduce this.
> Arnd and I were talking about this today morning. Here is something
> Arnd noticed:
>
> If there was a signal after do_epoll_wait(), we never were not
> entering the if (err = -EINTR) at all before.
I'm not sure which `if' statement you're talking about, but ok...
> But, now we do.
> We could try with the below patch:
Wasn't close to applying or being buildable, but I put a
working version together (below).
epoll_pwait wakes up as expected, now :>
> If this works that means we know what is busted.
OK, good to know...
> I'm not sure what the hang in the userspace is about. Is it because
> the syscall did not return an error or the particular signal was
> blocked etc.
Uh, ok; that's less comforting.
> There are also a few timing differences also. But, can we try this first?
Anyways, the following patch works and builds cleanly for me
(didn't test AIO, but everything else seems good)
Thanks!
---------8<-------
Subject: [PATCH] test fix from Deepa
TBD
---
fs/aio.c | 8 ++++----
fs/eventpoll.c | 4 ++--
fs/select.c | 12 ++++++------
include/linux/signal.h | 2 +-
kernel/signal.c | 6 ++++--
5 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 3d9669d011b9..d54513ca11b6 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2146,7 +2146,7 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, -1);
if (signal_pending(current) && !ret)
ret = -ERESTARTNOHAND;
@@ -2180,7 +2180,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, -1);
if (signal_pending(current) && !ret)
ret = -ERESTARTNOHAND;
@@ -2244,7 +2244,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, -1);
if (signal_pending(current) && !ret)
ret = -ERESTARTNOHAND;
@@ -2277,7 +2277,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, -1);
if (signal_pending(current) && !ret)
ret = -ERESTARTNOHAND;
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index a5d219d920e7..bd84ec54a8fb 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2247,7 +2247,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
error = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, error == -EINTR);
return error;
}
@@ -2272,7 +2272,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
err = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, err == -EINTR);
return err;
}
diff --git a/fs/select.c b/fs/select.c
index d0f35dbc0e8f..04720e5856ed 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -760,7 +760,7 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp,
ret = core_sys_select(n, inp, outp, exp, to);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, -1);
return ret;
}
@@ -1106,7 +1106,7 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, -1);
/* We can restart this syscall, usually */
if (ret == -EINTR)
@@ -1142,7 +1142,7 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, -1);
/* We can restart this syscall, usually */
if (ret == -EINTR)
@@ -1352,7 +1352,7 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp,
ret = compat_core_sys_select(n, inp, outp, exp, to);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, -1);
return ret;
}
@@ -1425,7 +1425,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, -1);
/* We can restart this syscall, usually */
if (ret == -EINTR)
@@ -1461,7 +1461,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, -1);
/* We can restart this syscall, usually */
if (ret == -EINTR)
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 9702016734b1..b55804ae2021 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -276,7 +276,7 @@ extern int sigprocmask(int, sigset_t *, sigset_t *);
extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
sigset_t *oldset, size_t sigsetsize);
extern void restore_user_sigmask(const void __user *usigmask,
- sigset_t *sigsaved);
+ sigset_t *sigsaved, int sig_pending);
extern void set_current_blocked(sigset_t *);
extern void __set_current_blocked(const sigset_t *);
extern int show_unhandled_signals;
diff --git a/kernel/signal.c b/kernel/signal.c
index 57b7771e20d7..cc827e6c4bea 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2843,11 +2843,13 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
* usigmask: sigmask passed in from userland.
* sigsaved: saved sigmask when the syscall started and changed the sigmask to
* usigmask.
+ * sig_pending: "> 0" if a signal is pending, "< 0" to check signal_pending
*
* This is useful for syscalls such as ppoll, pselect, io_pgetevents and
* epoll_pwait where a new sigmask is passed in from userland for the syscalls.
*/
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved,
+ int sig_pending)
{
if (!usigmask)
@@ -2857,7 +2859,7 @@ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
* Restoring sigmask here can lead to delivering signals that the above
* syscalls are intended to block because of the sigmask passed in.
*/
- if (signal_pending(current)) {
+ if (sig_pending > 0 || (sig_pending < 0 && signal_pending(current))) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
return;
--
Eric Wong <[email protected]> wrote:
> Deepa Dinamani <[email protected]> wrote:
> > I'm not sure what the hang in the userspace is about. Is it because
> > the syscall did not return an error or the particular signal was
> > blocked etc.
>
> Uh, ok; that's less comforting.
Nevermind, I think I understand everything, now. epoll_pwait
never set errno without our patch.
cmogstored does this in notify.c:
/* wait_intr calls epoll_pwait: */
mfd = mog_idleq_wait_intr(mog_notify_queue, timeout);
if (mfd)
notify_queue_step(mfd);
else if (errno == EINTR) /* EINTR fails to be noticed */
note_run();
Eric Wong <[email protected]> wrote:
> (didn't test AIO, but everything else seems good)
"seems" != "is"
Now that I understand the fix for epoll, the fs/select.c changes
would hit the same problem and not return -EINTR when it should.
I'll let you guys decide how to fix this, but there's definitely
a problem when "(errno == EINTR)" comparisons in userspace
stop working.
Thanks for trying the fix.
So here is my analysis:
Let's start with epoll_pwait:
ep_poll() is what checks for signal_pending() and is responsible for
setting errno to -EINTR when there is a signal.
So if a signal is received after ep_poll(), it is never noticed by the
syscall during execution.
Moreover, the original code before
854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add
restore_user_sigmask()"), had the following call flow:
error = do_epoll_wait(epfd, events, maxevents, timeout);
**** Here error = 0 if the signal is received after ep_poll().
- /*
- * If we changed the signal mask, we need to restore the original one.
- * In case we've got a signal while waiting, we do not restore the
- * signal mask yet, and we allow do_signal() to deliver the signal on
- * the way back to userspace, before the signal mask is restored.
- */
- if (sigmask) {
- if (error == -EINTR) {
- memcpy(¤t->saved_sigmask, &sigsaved,
- sizeof(sigsaved));
- set_restore_sigmask();
- } else
**** Execution reaches this else statement and the sigmask is restored
directly, ignoring the newly generated signal. The signal is never
handled.
- set_current_blocked(&sigsaved);
- }
In the current execution flow:
error = do_epoll_wait(epfd, events, maxevents, timeout);
**** error is still 0 as ep_poll() did not detect the signal.
restore_user_sigmask(sigmask, &sigsaved, error == -EITNR);
void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{
if (!usigmask)
return;
/*
* When signals are pending, do not restore them here.
* Restoring sigmask here can lead to delivering signals that the above
* syscalls are intended to block because of the sigmask passed in.
*/
if (signal_pending(current)) {
**** execution path reaches here and do_signal() actually delivers the
signal to userspace. But the errno is not set. So the userspace fails
to notice it.
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
return;
}
/*
* This is needed because the fast syscall return path does not restore
* saved_sigmask when signals are not pending.
*/
set_current_blocked(sigsaved);
}
For other syscalls in the same commit:
sys_io_pgetevents() does not seem to have this problem as we are still
checking signal_pending() here.
sys_pselect6() seems to have a similar problem. The changes to
sys_pselect6() also impact sys_select() as the changes are in the
common code path.
So the 854a6ed56839a40f6 seems to be better than the original code in
that it detects the signal. But, the problem is that it doesn't
communicate it to the userspace.
So a patch like below solves the problem. This is incomplete. I'll
verify and send you a proper fix you can test soon. This is just for
the sake of discussion:
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..63a387329c3d 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,
int, maxevents, int, timeout, const sigset_t __user *, sigmask,
size_t, sigsetsize)
{
- int error;
+ int error, signal_detected;
sigset_t ksigmask, sigsaved;
/*
@@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,
error = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !error)
+ return -EITNR;
return error;
}
@@ -2342,7 +2345,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
const compat_sigset_t __user *, sigmask,
compat_size_t, sigsetsize)
{
- long err;
+ long err, signal_detected;
sigset_t ksigmask, sigsaved;
/*
@@ -2355,7 +2358,10 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
err = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !err)
+ return -EITNR;
return err;
}
diff --git a/kernel/signal.c b/kernel/signal.c
index 3a9e41197d46..c76ab2a52ebf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2849,11 +2849,11 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
* This is useful for syscalls such as ppoll, pselect, io_pgetevents and
* epoll_pwait where a new sigmask is passed in from userland for the syscalls.
*/
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{
if (!usigmask)
- return;
+ return 0;
/*
* When signals are pending, do not restore them here.
* Restoring sigmask here can lead to delivering signals that the above
@@ -2862,7 +2862,7 @@ void restore_user_sigmask(const void __user
*usigmask, sigset_t *sigsaved)
if (signal_pending(current)) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
- return;
+ return 0;
}
-Deepa
Deepa Dinamani <[email protected]> wrote:
> So here is my analysis:
<snip everything I agree with>
> So the 854a6ed56839a40f6 seems to be better than the original code in
> that it detects the signal.
OTOH, does matter to anybody that a signal is detected slightly
sooner than it would've been, otherwise?
> But, the problem is that it doesn't
> communicate it to the userspace.
Yup, that's a big problem :)
> So a patch like below solves the problem. This is incomplete. I'll
> verify and send you a proper fix you can test soon. This is just for
> the sake of discussion:
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 4a0e98d87fcc..63a387329c3d 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> epoll_event __user *, events,
> int, maxevents, int, timeout, const sigset_t __user *, sigmask,
> size_t, sigsetsize)
> {
> - int error;
> + int error, signal_detected;
> sigset_t ksigmask, sigsaved;
>
> /*
> @@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> epoll_event __user *, events,
>
> error = do_epoll_wait(epfd, events, maxevents, timeout);
>
> - restore_user_sigmask(sigmask, &sigsaved);
> + signal_detected = restore_user_sigmask(sigmask, &sigsaved);
> +
> + if (signal_detected && !error)
> + return -EITNR;
>
> return error;
Looks like a reasonable API.
> @@ -2862,7 +2862,7 @@ void restore_user_sigmask(const void __user
> *usigmask, sigset_t *sigsaved)
> if (signal_pending(current)) {
> current->saved_sigmask = *sigsaved;
> set_restore_sigmask();
> - return;
> + return 0;
Shouldn't that "return 1" if a signal is pending?
On Wed, May 1, 2019 at 1:48 PM Eric Wong <[email protected]> wrote:
>
> Deepa Dinamani <[email protected]> wrote:
> > So here is my analysis:
>
> <snip everything I agree with>
>
> > So the 854a6ed56839a40f6 seems to be better than the original code in
> > that it detects the signal.
>
> OTOH, does matter to anybody that a signal is detected slightly
> sooner than it would've been, otherwise?
The original code drops the signal altogether. This is because it
overwrites the current's sigmask with the provided
one(set_current_blocked()). If a signal bit was set, it is lost
forever. It does not detect it sooner. The check for pending signal is
sooner and not just before the syscall returns.
This is what the patch in discussion does: check for signals just
before returning.
>
> > But, the problem is that it doesn't
> > communicate it to the userspace.
>
> Yup, that's a big problem :)
>
> > So a patch like below solves the problem. This is incomplete. I'll
> > verify and send you a proper fix you can test soon. This is just for
> > the sake of discussion:
> >
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index 4a0e98d87fcc..63a387329c3d 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> > epoll_event __user *, events,
> > int, maxevents, int, timeout, const sigset_t __user *, sigmask,
> > size_t, sigsetsize)
> > {
> > - int error;
> > + int error, signal_detected;
> > sigset_t ksigmask, sigsaved;
> >
> > /*
> > @@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> > epoll_event __user *, events,
> >
> > error = do_epoll_wait(epfd, events, maxevents, timeout);
> >
> > - restore_user_sigmask(sigmask, &sigsaved);
> > + signal_detected = restore_user_sigmask(sigmask, &sigsaved);
> > +
> > + if (signal_detected && !error)
> > + return -EITNR;
> >
> > return error;
>
> Looks like a reasonable API.
>
> > @@ -2862,7 +2862,7 @@ void restore_user_sigmask(const void __user
> > *usigmask, sigset_t *sigsaved)
> > if (signal_pending(current)) {
> > current->saved_sigmask = *sigsaved;
> > set_restore_sigmask();
> > - return;
> > + return 0;
>
> Shouldn't that "return 1" if a signal is pending?
Yep, I meant this to be 1.
-Deepa
Eric,
Can you please help test this?
If this solves your problem, I can post the fix.
Thanks,
- Deepa
---------8<-------
Subject: [PATCH] signal: Adjust error codes according to restore_user_sigmask()
For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.
The issue was detected because of a regression caused by 854a6ed56839a.
The patch moved the signal_pending() check closer to restoring of the
user sigmask. But, it failed to update the error code accordingly.
Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/
Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND, etc)
only when there is no other error. If there is a signal and an error like
EINVAL, the syscalls return -EINVAL rather than the interrupted error codes.
Reported-by: Omar Kilani <[email protected]>
Reported-by: Eric Wong <[email protected]>
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add
restore_user_sigmask()")
Signed-off-by: Deepa Dinamani <[email protected]>
---
fs/aio.c | 24 ++++++++++++------------
fs/eventpoll.c | 14 ++++++++++----
fs/io_uring.c | 9 ++++++---
fs/select.c | 37 +++++++++++++++++++++----------------
include/linux/signal.h | 2 +-
kernel/signal.c | 8 +++++---
6 files changed, 55 insertions(+), 39 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 38b741aef0bf..7de2f7573d55 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2133,7 +2133,7 @@ SYSCALL_DEFINE6(io_pgetevents,
struct __aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 ts;
- int ret;
+ int ret, signal_detected;
if (timeout && unlikely(get_timespec64(&ts, timeout)))
return -EFAULT;
@@ -2146,8 +2146,8 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2166,7 +2166,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
struct __aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 ts;
- int ret;
+ int ret, signal_detected;
if (timeout && unlikely(get_old_timespec32(&ts, timeout)))
return -EFAULT;
@@ -2180,8 +2180,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2231,7 +2231,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
- int ret;
+ int ret, signal_detected;
if (timeout && get_old_timespec32(&t, timeout))
return -EFAULT;
@@ -2244,8 +2244,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2264,7 +2264,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
- int ret;
+ int ret, signal_detected;
if (timeout && get_timespec64(&t, timeout))
return -EFAULT;
@@ -2277,8 +2277,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..fe5a0724b417 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,
int, maxevents, int, timeout, const sigset_t __user *, sigmask,
size_t, sigsetsize)
{
- int error;
+ int error, signal_detected;
sigset_t ksigmask, sigsaved;
/*
@@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,
error = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !error)
+ error = -EINTR;
return error;
}
@@ -2342,7 +2345,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
const compat_sigset_t __user *, sigmask,
compat_size_t, sigsetsize)
{
- long err;
+ long err, signal_detected;
sigset_t ksigmask, sigsaved;
/*
@@ -2355,7 +2358,10 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
err = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !err)
+ err = -EINTR;
return err;
}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e2bd77da5e21..e107e299c3f3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1965,7 +1965,7 @@ static int io_cqring_wait(struct io_ring_ctx
*ctx, int min_events,
struct io_cq_ring *ring = ctx->cq_ring;
sigset_t ksigmask, sigsaved;
DEFINE_WAIT(wait);
- int ret;
+ int ret, signal_detected;
/* See comment at the top of this file */
smp_rmb();
@@ -1996,8 +1996,11 @@ static int io_cqring_wait(struct io_ring_ctx
*ctx, int min_events,
finish_wait(&ctx->wait, &wait);
- if (sig)
- restore_user_sigmask(sig, &sigsaved);
+ if (sig) {
+ signal_detected = restore_user_sigmask(sig, &sigsaved);
+ if (signal_detected && !ret)
+ ret = -EINTR;
+ }
return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0;
}
diff --git a/fs/select.c b/fs/select.c
index 6cbc9ff56ba0..348dbe5e6dd0 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -732,7 +732,7 @@ static long do_pselect(int n, fd_set __user *inp,
fd_set __user *outp,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;;
if (tsp) {
switch (type) {
@@ -760,7 +760,9 @@ static long do_pselect(int n, fd_set __user *inp,
fd_set __user *outp,
ret = core_sys_select(n, inp, outp, exp, to);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+ if (signal_detected && !ret)
+ ret = -EINTR;
return ret;
}
@@ -1089,7 +1091,7 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *,
ufds, unsigned int, nfds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_timespec64(&ts, tsp))
@@ -1106,10 +1108,10 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *,
ufds, unsigned int, nfds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
@@ -1125,7 +1127,7 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd
__user *, ufds, unsigned int, nfds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_old_timespec32(&ts, tsp))
@@ -1142,10 +1144,10 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd
__user *, ufds, unsigned int, nfds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
@@ -1324,7 +1326,7 @@ static long do_compat_pselect(int n,
compat_ulong_t __user *inp,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
switch (type) {
@@ -1352,7 +1354,10 @@ static long do_compat_pselect(int n,
compat_ulong_t __user *inp,
ret = compat_core_sys_select(n, inp, outp, exp, to);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !ret)
+ ret = -EINTR;
return ret;
}
@@ -1408,7 +1413,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct
pollfd __user *, ufds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_old_timespec32(&ts, tsp))
@@ -1425,10 +1430,10 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct
pollfd __user *, ufds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
@@ -1444,7 +1449,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct
pollfd __user *, ufds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_timespec64(&ts, tsp))
@@ -1461,10 +1466,10 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct
pollfd __user *, ufds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 9702016734b1..1d36e8629edf 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -275,7 +275,7 @@ extern int __group_send_sig_info(int, struct
kernel_siginfo *, struct task_struc
extern int sigprocmask(int, sigset_t *, sigset_t *);
extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
sigset_t *oldset, size_t sigsetsize);
-extern void restore_user_sigmask(const void __user *usigmask,
+extern int restore_user_sigmask(const void __user *usigmask,
sigset_t *sigsaved);
extern void set_current_blocked(sigset_t *);
extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 3a9e41197d46..4f8b49a7b058 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2845,15 +2845,16 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
* usigmask: sigmask passed in from userland.
* sigsaved: saved sigmask when the syscall started and changed the sigmask to
* usigmask.
+ * returns 1 in case a pending signal is detected.
*
* This is useful for syscalls such as ppoll, pselect, io_pgetevents and
* epoll_pwait where a new sigmask is passed in from userland for the syscalls.
*/
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{
if (!usigmask)
- return;
+ return 0;
/*
* When signals are pending, do not restore them here.
* Restoring sigmask here can lead to delivering signals that the above
@@ -2862,7 +2863,7 @@ void restore_user_sigmask(const void __user
*usigmask, sigset_t *sigsaved)
if (signal_pending(current)) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
- return;
+ return 1;
}
/*
@@ -2870,6 +2871,7 @@ void restore_user_sigmask(const void __user
*usigmask, sigset_t *sigsaved)
* saved_sigmask when signals are not pending.
*/
set_current_blocked(sigsaved);
+ return 0;
}
EXPORT_SYMBOL(restore_user_sigmask);
--
2.21.0.593.g511ec345e18-goog
Deepa Dinamani <[email protected]> wrote:
> Eric,
> Can you please help test this?
Nope, that was _really_ badly whitespace-damaged.
(C'mon, it's not like you're new to this)
On Thu, 02 May 2019, Deepa Dinamani wrote:
>Reported-by: Omar Kilani <[email protected]>
Do we actually know if this was the issue Omar was hitting?
Thanks,
Davidlohr
For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.
The issue was detected because of a regression caused by 854a6ed56839a.
The patch moved the signal_pending() check closer to restoring of the
user sigmask. But, it failed to update the error code accordingly.
Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/
Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND,
etc) only when there is no other error. If there is a signal and an error
like EINVAL, the syscalls return -EINVAL rather than the interrupted
error codes.
Reported-by: Eric Wong <[email protected]>
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()")
Signed-off-by: Deepa Dinamani <[email protected]>
---
Sorry, I was trying a new setup at work. I should have tested it.
My bad, I've checked this one.
I've removed the questionable reported-by, since we're not sure if
it is actually the same issue.
fs/aio.c | 24 ++++++++++++------------
fs/eventpoll.c | 14 ++++++++++----
fs/io_uring.c | 9 ++++++---
fs/select.c | 37 +++++++++++++++++++++----------------
include/linux/signal.h | 2 +-
kernel/signal.c | 8 +++++---
6 files changed, 55 insertions(+), 39 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 38b741aef0bf..7de2f7573d55 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2133,7 +2133,7 @@ SYSCALL_DEFINE6(io_pgetevents,
struct __aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 ts;
- int ret;
+ int ret, signal_detected;
if (timeout && unlikely(get_timespec64(&ts, timeout)))
return -EFAULT;
@@ -2146,8 +2146,8 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2166,7 +2166,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
struct __aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 ts;
- int ret;
+ int ret, signal_detected;
if (timeout && unlikely(get_old_timespec32(&ts, timeout)))
return -EFAULT;
@@ -2180,8 +2180,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2231,7 +2231,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
- int ret;
+ int ret, signal_detected;
if (timeout && get_old_timespec32(&t, timeout))
return -EFAULT;
@@ -2244,8 +2244,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2264,7 +2264,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
- int ret;
+ int ret, signal_detected;
if (timeout && get_timespec64(&t, timeout))
return -EFAULT;
@@ -2277,8 +2277,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+ if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
return ret;
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..fe5a0724b417 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
int, maxevents, int, timeout, const sigset_t __user *, sigmask,
size_t, sigsetsize)
{
- int error;
+ int error, signal_detected;
sigset_t ksigmask, sigsaved;
/*
@@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
error = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !error)
+ error = -EINTR;
return error;
}
@@ -2342,7 +2345,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
const compat_sigset_t __user *, sigmask,
compat_size_t, sigsetsize)
{
- long err;
+ long err, signal_detected;
sigset_t ksigmask, sigsaved;
/*
@@ -2355,7 +2358,10 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
err = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !err)
+ err = -EINTR;
return err;
}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e2bd77da5e21..e107e299c3f3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1965,7 +1965,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
struct io_cq_ring *ring = ctx->cq_ring;
sigset_t ksigmask, sigsaved;
DEFINE_WAIT(wait);
- int ret;
+ int ret, signal_detected;
/* See comment at the top of this file */
smp_rmb();
@@ -1996,8 +1996,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
finish_wait(&ctx->wait, &wait);
- if (sig)
- restore_user_sigmask(sig, &sigsaved);
+ if (sig) {
+ signal_detected = restore_user_sigmask(sig, &sigsaved);
+ if (signal_detected && !ret)
+ ret = -EINTR;
+ }
return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0;
}
diff --git a/fs/select.c b/fs/select.c
index 6cbc9ff56ba0..348dbe5e6dd0 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -732,7 +732,7 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
switch (type) {
@@ -760,7 +760,9 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp,
ret = core_sys_select(n, inp, outp, exp, to);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+ if (signal_detected && !ret)
+ ret = -EINTR;
return ret;
}
@@ -1089,7 +1091,7 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_timespec64(&ts, tsp))
@@ -1106,10 +1108,10 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
@@ -1125,7 +1127,7 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_old_timespec32(&ts, tsp))
@@ -1142,10 +1144,10 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
@@ -1324,7 +1326,7 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
switch (type) {
@@ -1352,7 +1354,10 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp,
ret = compat_core_sys_select(n, inp, outp, exp, to);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+ if (signal_detected && !ret)
+ ret = -EINTR;
return ret;
}
@@ -1408,7 +1413,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_old_timespec32(&ts, tsp))
@@ -1425,10 +1430,10 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
@@ -1444,7 +1449,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds,
{
sigset_t ksigmask, sigsaved;
struct timespec64 ts, end_time, *to = NULL;
- int ret;
+ int ret, signal_detected;
if (tsp) {
if (get_timespec64(&ts, tsp))
@@ -1461,10 +1466,10 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds,
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
/* We can restart this syscall, usually */
- if (ret == -EINTR)
+ if (ret == -EINTR || (signal_detected && !ret))
ret = -ERESTARTNOHAND;
ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 9702016734b1..1d36e8629edf 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -275,7 +275,7 @@ extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struc
extern int sigprocmask(int, sigset_t *, sigset_t *);
extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
sigset_t *oldset, size_t sigsetsize);
-extern void restore_user_sigmask(const void __user *usigmask,
+extern int restore_user_sigmask(const void __user *usigmask,
sigset_t *sigsaved);
extern void set_current_blocked(sigset_t *);
extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 3a9e41197d46..4f8b49a7b058 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2845,15 +2845,16 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
* usigmask: sigmask passed in from userland.
* sigsaved: saved sigmask when the syscall started and changed the sigmask to
* usigmask.
+ * returns 1 in case a pending signal is detected.
*
* This is useful for syscalls such as ppoll, pselect, io_pgetevents and
* epoll_pwait where a new sigmask is passed in from userland for the syscalls.
*/
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{
if (!usigmask)
- return;
+ return 0;
/*
* When signals are pending, do not restore them here.
* Restoring sigmask here can lead to delivering signals that the above
@@ -2862,7 +2863,7 @@ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
if (signal_pending(current)) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
- return;
+ return 1;
}
/*
@@ -2870,6 +2871,7 @@ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
* saved_sigmask when signals are not pending.
*/
set_current_blocked(sigsaved);
+ return 0;
}
EXPORT_SYMBOL(restore_user_sigmask);
--
2.21.0.593.g511ec345e18-goog
Deepa Dinamani <[email protected]> wrote:
> Sorry, I was trying a new setup at work. I should have tested it.
> My bad, I've checked this one.
Thanks. This is good w.r.t. epoll_pwait and ppoll when applied
to 5.0.11 (no fs/io_uring.c).
Can't think of anything which uses pselect or aio on my system;
but it looks right to me.
> I've removed the questionable reported-by, since we're not sure if
> it is actually the same issue.
Yes, I hope Omar can test this, too.
Thanks again, all!
On Thu, May 2, 2019 at 11:34 PM Eric Wong <[email protected]> wrote:
>
> Deepa Dinamani <[email protected]> wrote:
> > Sorry, I was trying a new setup at work. I should have tested it.
> > My bad, I've checked this one.
>
> Thanks. This is good w.r.t. epoll_pwait and ppoll when applied
> to 5.0.11 (no fs/io_uring.c).
Thanks. Al, would you be picking up this patch? I can resend it.
> Can't think of anything which uses pselect or aio on my system;
> but it looks right to me.
>
> > I've removed the questionable reported-by, since we're not sure if
> > it is actually the same issue.
>
> Yes, I hope Omar can test this, too.
Omar, would you be able to test this?
Thanks,
Deepa
This patch could also be routed via akpm, adding him.
On Thu, 02 May 2019, Deepa Dinamani wrote:
>For all the syscalls that receive a sigmask from the userland,
>the user sigmask is to be in effect through the syscall execution.
>At the end of syscall, sigmask of the current process is restored
>to what it was before the switch over to user sigmask.
>But, for this to be true in practice, the sigmask should be restored
>only at the the point we change the saved_sigmask. Anything before
>that loses signals. And, anything after is just pointless as the
>signal is already lost by restoring the sigmask.
>
>The issue was detected because of a regression caused by 854a6ed56839a.
>The patch moved the signal_pending() check closer to restoring of the
>user sigmask. But, it failed to update the error code accordingly.
>
>Detailed issue discussion permalink:
>https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/
>
>Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND,
>etc) only when there is no other error. If there is a signal and an error
>like EINVAL, the syscalls return -EINVAL rather than the interrupted
>error codes.
Thanks for doing this; I've reviewed the epoll bits (along with the overall
idea) and it seems like a sane alternative to reverting the offending patch.
Feel free to add:
Reviewed-by: Davidlohr Bueso <[email protected]>
A small nit, I think we should be a bit more verbose about the return semantics
of restore_user_sigmask()... see at the end.
>
>Reported-by: Eric Wong <[email protected]>
>Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()")
>Signed-off-by: Deepa Dinamani <[email protected]>
>---
>Sorry, I was trying a new setup at work. I should have tested it.
>My bad, I've checked this one.
>I've removed the questionable reported-by, since we're not sure if
>it is actually the same issue.
>
> fs/aio.c | 24 ++++++++++++------------
> fs/eventpoll.c | 14 ++++++++++----
> fs/io_uring.c | 9 ++++++---
> fs/select.c | 37 +++++++++++++++++++++----------------
> include/linux/signal.h | 2 +-
> kernel/signal.c | 8 +++++---
> 6 files changed, 55 insertions(+), 39 deletions(-)
>
>diff --git a/fs/aio.c b/fs/aio.c
>index 38b741aef0bf..7de2f7573d55 100644
>--- a/fs/aio.c
>+++ b/fs/aio.c
>@@ -2133,7 +2133,7 @@ SYSCALL_DEFINE6(io_pgetevents,
> struct __aio_sigset ksig = { NULL, };
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts;
>- int ret;
>+ int ret, signal_detected;
>
> if (timeout && unlikely(get_timespec64(&ts, timeout)))
> return -EFAULT;
>@@ -2146,8 +2146,8 @@ SYSCALL_DEFINE6(io_pgetevents,
> return ret;
>
> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
>- restore_user_sigmask(ksig.sigmask, &sigsaved);
>- if (signal_pending(current) && !ret)
>+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
>+ if (signal_detected && !ret)
> ret = -ERESTARTNOHAND;
>
> return ret;
>@@ -2166,7 +2166,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
> struct __aio_sigset ksig = { NULL, };
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts;
>- int ret;
>+ int ret, signal_detected;
>
> if (timeout && unlikely(get_old_timespec32(&ts, timeout)))
> return -EFAULT;
>@@ -2180,8 +2180,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
> return ret;
>
> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
>- restore_user_sigmask(ksig.sigmask, &sigsaved);
>- if (signal_pending(current) && !ret)
>+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
>+ if (signal_detected && !ret)
> ret = -ERESTARTNOHAND;
>
> return ret;
>@@ -2231,7 +2231,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> struct __compat_aio_sigset ksig = { NULL, };
> sigset_t ksigmask, sigsaved;
> struct timespec64 t;
>- int ret;
>+ int ret, signal_detected;
>
> if (timeout && get_old_timespec32(&t, timeout))
> return -EFAULT;
>@@ -2244,8 +2244,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> return ret;
>
> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
>- restore_user_sigmask(ksig.sigmask, &sigsaved);
>- if (signal_pending(current) && !ret)
>+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
>+ if (signal_detected && !ret)
> ret = -ERESTARTNOHAND;
>
> return ret;
>@@ -2264,7 +2264,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
> struct __compat_aio_sigset ksig = { NULL, };
> sigset_t ksigmask, sigsaved;
> struct timespec64 t;
>- int ret;
>+ int ret, signal_detected;
>
> if (timeout && get_timespec64(&t, timeout))
> return -EFAULT;
>@@ -2277,8 +2277,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
> return ret;
>
> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
>- restore_user_sigmask(ksig.sigmask, &sigsaved);
>- if (signal_pending(current) && !ret)
>+ signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
>+ if (signal_detected && !ret)
> ret = -ERESTARTNOHAND;
>
> return ret;
>diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>index 4a0e98d87fcc..fe5a0724b417 100644
>--- a/fs/eventpoll.c
>+++ b/fs/eventpoll.c
>@@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> int, maxevents, int, timeout, const sigset_t __user *, sigmask,
> size_t, sigsetsize)
> {
>- int error;
>+ int error, signal_detected;
> sigset_t ksigmask, sigsaved;
>
> /*
>@@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
>
> error = do_epoll_wait(epfd, events, maxevents, timeout);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>+
>+ if (signal_detected && !error)
>+ error = -EINTR;
>
> return error;
> }
>@@ -2342,7 +2345,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
> const compat_sigset_t __user *, sigmask,
> compat_size_t, sigsetsize)
> {
>- long err;
>+ long err, signal_detected;
> sigset_t ksigmask, sigsaved;
>
> /*
>@@ -2355,7 +2358,10 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
>
> err = do_epoll_wait(epfd, events, maxevents, timeout);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>+
>+ if (signal_detected && !err)
>+ err = -EINTR;
>
> return err;
> }
>diff --git a/fs/io_uring.c b/fs/io_uring.c
>index e2bd77da5e21..e107e299c3f3 100644
>--- a/fs/io_uring.c
>+++ b/fs/io_uring.c
>@@ -1965,7 +1965,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
> struct io_cq_ring *ring = ctx->cq_ring;
> sigset_t ksigmask, sigsaved;
> DEFINE_WAIT(wait);
>- int ret;
>+ int ret, signal_detected;
>
> /* See comment at the top of this file */
> smp_rmb();
>@@ -1996,8 +1996,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
>
> finish_wait(&ctx->wait, &wait);
>
>- if (sig)
>- restore_user_sigmask(sig, &sigsaved);
>+ if (sig) {
>+ signal_detected = restore_user_sigmask(sig, &sigsaved);
>+ if (signal_detected && !ret)
>+ ret = -EINTR;
>+ }
>
> return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0;
> }
>diff --git a/fs/select.c b/fs/select.c
>index 6cbc9ff56ba0..348dbe5e6dd0 100644
>--- a/fs/select.c
>+++ b/fs/select.c
>@@ -732,7 +732,7 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp,
> {
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts, end_time, *to = NULL;
>- int ret;
>+ int ret, signal_detected;
>
> if (tsp) {
> switch (type) {
>@@ -760,7 +760,9 @@ static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp,
> ret = core_sys_select(n, inp, outp, exp, to);
> ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>+ if (signal_detected && !ret)
>+ ret = -EINTR;
>
> return ret;
> }
>@@ -1089,7 +1091,7 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds,
> {
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts, end_time, *to = NULL;
>- int ret;
>+ int ret, signal_detected;
>
> if (tsp) {
> if (get_timespec64(&ts, tsp))
>@@ -1106,10 +1108,10 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds,
>
> ret = do_sys_poll(ufds, nfds, to);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>
> /* We can restart this syscall, usually */
>- if (ret == -EINTR)
>+ if (ret == -EINTR || (signal_detected && !ret))
> ret = -ERESTARTNOHAND;
>
> ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
>@@ -1125,7 +1127,7 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds,
> {
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts, end_time, *to = NULL;
>- int ret;
>+ int ret, signal_detected;
>
> if (tsp) {
> if (get_old_timespec32(&ts, tsp))
>@@ -1142,10 +1144,10 @@ SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds,
>
> ret = do_sys_poll(ufds, nfds, to);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>
> /* We can restart this syscall, usually */
>- if (ret == -EINTR)
>+ if (ret == -EINTR || (signal_detected && !ret))
> ret = -ERESTARTNOHAND;
>
> ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
>@@ -1324,7 +1326,7 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp,
> {
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts, end_time, *to = NULL;
>- int ret;
>+ int ret, signal_detected;
>
> if (tsp) {
> switch (type) {
>@@ -1352,7 +1354,10 @@ static long do_compat_pselect(int n, compat_ulong_t __user *inp,
> ret = compat_core_sys_select(n, inp, outp, exp, to);
> ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>+
>+ if (signal_detected && !ret)
>+ ret = -EINTR;
>
> return ret;
> }
>@@ -1408,7 +1413,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds,
> {
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts, end_time, *to = NULL;
>- int ret;
>+ int ret, signal_detected;
>
> if (tsp) {
> if (get_old_timespec32(&ts, tsp))
>@@ -1425,10 +1430,10 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds,
>
> ret = do_sys_poll(ufds, nfds, to);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>
> /* We can restart this syscall, usually */
>- if (ret == -EINTR)
>+ if (ret == -EINTR || (signal_detected && !ret))
> ret = -ERESTARTNOHAND;
>
> ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
>@@ -1444,7 +1449,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds,
> {
> sigset_t ksigmask, sigsaved;
> struct timespec64 ts, end_time, *to = NULL;
>- int ret;
>+ int ret, signal_detected;
>
> if (tsp) {
> if (get_timespec64(&ts, tsp))
>@@ -1461,10 +1466,10 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds,
>
> ret = do_sys_poll(ufds, nfds, to);
>
>- restore_user_sigmask(sigmask, &sigsaved);
>+ signal_detected = restore_user_sigmask(sigmask, &sigsaved);
>
> /* We can restart this syscall, usually */
>- if (ret == -EINTR)
>+ if (ret == -EINTR || (signal_detected && !ret))
> ret = -ERESTARTNOHAND;
>
> ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
>diff --git a/include/linux/signal.h b/include/linux/signal.h
>index 9702016734b1..1d36e8629edf 100644
>--- a/include/linux/signal.h
>+++ b/include/linux/signal.h
>@@ -275,7 +275,7 @@ extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struc
> extern int sigprocmask(int, sigset_t *, sigset_t *);
> extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
> sigset_t *oldset, size_t sigsetsize);
>-extern void restore_user_sigmask(const void __user *usigmask,
>+extern int restore_user_sigmask(const void __user *usigmask,
> sigset_t *sigsaved);
> extern void set_current_blocked(sigset_t *);
> extern void __set_current_blocked(const sigset_t *);
>diff --git a/kernel/signal.c b/kernel/signal.c
>index 3a9e41197d46..4f8b49a7b058 100644
>--- a/kernel/signal.c
>+++ b/kernel/signal.c
>@@ -2845,15 +2845,16 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
> * usigmask: sigmask passed in from userland.
> * sigsaved: saved sigmask when the syscall started and changed the sigmask to
> * usigmask.
>+ * returns 1 in case a pending signal is detected.
How about:
"
Callers must carefully coordinate between signal_pending() checks between the
actual system call and restore_user_sigmask() - for which a new pending signal
may come in between calls and therefore must consider this for returning a proper
error code.
Returns 1 in case a signal pending is detected, otherwise 0.
"
> *
> * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
> * epoll_pwait where a new sigmask is passed in from userland for the syscalls.
> */
>-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
>+int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> {
>
> if (!usigmask)
>- return;
>+ return 0;
> /*
> * When signals are pending, do not restore them here.
> * Restoring sigmask here can lead to delivering signals that the above
>@@ -2862,7 +2863,7 @@ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> if (signal_pending(current)) {
> current->saved_sigmask = *sigsaved;
> set_restore_sigmask();
>- return;
>+ return 1;
> }
>
> /*
>@@ -2870,6 +2871,7 @@ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> * saved_sigmask when signals are not pending.
> */
> set_current_blocked(sigsaved);
>+ return 0;
> }
> EXPORT_SYMBOL(restore_user_sigmask);
>
>--
>2.21.0.593.g511ec345e18-goog
>
The original patch was merged through the tip tree. Adding tglx just in case.
I will post the revised patch to everyone on this thread.
> >For all the syscalls that receive a sigmask from the userland,
> >the user sigmask is to be in effect through the syscall execution.
> >At the end of syscall, sigmask of the current process is restored
> >to what it was before the switch over to user sigmask.
> >But, for this to be true in practice, the sigmask should be restored
> >only at the the point we change the saved_sigmask. Anything before
> >that loses signals. And, anything after is just pointless as the
> >signal is already lost by restoring the sigmask.
> >
> >The issue was detected because of a regression caused by 854a6ed56839a.
> >The patch moved the signal_pending() check closer to restoring of the
> >user sigmask. But, it failed to update the error code accordingly.
> >
> >Detailed issue discussion permalink:
> >https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/
> >
> >Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> >etc) only when there is no other error. If there is a signal and an error
> >like EINVAL, the syscalls return -EINVAL rather than the interrupted
> >error codes.
>
> Thanks for doing this; I've reviewed the epoll bits (along with the overall
> idea) and it seems like a sane alternative to reverting the offending patch.
Sorry maybe the description wasn't clear. What I actually am saying is
that all these syscalls were dropping signals before and
854a6ed56839a4 actually did things right by making sure they did not
do so.
But, there was a bug in that it did not communicate to userspace when
the error code was not already set.
However, we could still argue that the check and flipping of the mask
isn't atomic and there is still a way this can theoretically happen.
But, this will also mean that these syscalls will slow down further.
But, they are already expected to be slow so maybe it doesn't matter.
I will note this down in the commit text.
I don't think reverting was an alternative. 854a6ed56839a4 exposed a
bug that was already there.
> Feel free to add:
>
> Reviewed-by: Davidlohr Bueso <[email protected]>
>
> A small nit, I think we should be a bit more verbose about the return semantics
> of restore_user_sigmask()... see at the end.
>
> >
> >Reported-by: Eric Wong <[email protected]>
> >Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()")
> >Signed-off-by: Deepa Dinamani <[email protected]>
> >--- a/kernel/signal.c
> >+++ b/kernel/signal.c
> >@@ -2845,15 +2845,16 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
> > * usigmask: sigmask passed in from userland.
> > * sigsaved: saved sigmask when the syscall started and changed the sigmask to
> > * usigmask.
> >+ * returns 1 in case a pending signal is detected.
>
> How about:
>
> "
> Callers must carefully coordinate between signal_pending() checks between the
> actual system call and restore_user_sigmask() - for which a new pending signal
> may come in between calls and therefore must consider this for returning a proper
> error code.
>
> Returns 1 in case a signal pending is detected, otherwise 0.
Ok, I will add more verbiage here.
Thanks,
Deepa