2021-02-20 09:10:28

by Sargun Dhillon

[permalink] [raw]
Subject: [RFC PATCH 0/3] Seccomp non-preemptible notifier

This patchset addresses a race condition we've dealt with recently with
seccomp. Specifically programs interrupting syscalls while they're in
progress. This was exacerbated by Golang's recent adoption of "async
preemption", in which they try to interrupt any syscall that's been
running for more than 10ms during GC. During certain syscalls, it's
non-trivial to write them in a reetrant manner in userspace (mount).

This has a couple semantic changes, and relaxes a check on seccomp_data.
I can deal with these, but this was a first cut. I also expect that the
patch would be squashed down, but it's split out for easier review.

Sargun Dhillon (3):
seccomp: Refactor notification handler to prepare for new semantics
seccomp: Add wait_killable semantic to seccomp user notifier
selftests/seccomp: Add test for wait killable notifier

include/uapi/linux/seccomp.h | 10 +++
kernel/seccomp.c | 63 +++++++++++++------
tools/testing/selftests/seccomp/seccomp_bpf.c | 60 ++++++++++++++++++
3 files changed, 114 insertions(+), 19 deletions(-)

--
2.25.1


2021-02-20 09:12:13

by Sargun Dhillon

[permalink] [raw]
Subject: [RFC PATCH 3/3] selftests/seccomp: Add test for wait killable notifier

This adds a test for the positive case of the wait killable notifier,
in testing that when the feature is activated the process acts as
expected -- in not terminating on a non-fatal signal, and instead
queueing it up. There is already a test case for normal handlers
and preemption.

Signed-off-by: Sargun Dhillon <[email protected]>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 60 +++++++++++++++++++
1 file changed, 60 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 26c72f2b61b1..a8ef4558d673 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -4139,6 +4139,66 @@ TEST(user_notification_addfd_rlimit)
close(memfd);
}

+TEST(user_notification_signal_wait_killable)
+{
+ pid_t pid;
+ long ret;
+ int status, listener, sk_pair[2];
+ struct seccomp_notif req = {
+ .flags = SECCOMP_USER_NOTIF_FLAG_WAIT_KILLABLE,
+ };
+ struct seccomp_notif_resp resp = {};
+ char c;
+
+ ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ ASSERT_EQ(0, ret) {
+ TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+ }
+
+ ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0);
+ ASSERT_EQ(fcntl(sk_pair[0], F_SETFL, O_NONBLOCK), 0);
+
+ listener = user_notif_syscall(__NR_gettid,
+ SECCOMP_FILTER_FLAG_NEW_LISTENER);
+ ASSERT_GE(listener, 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ close(sk_pair[0]);
+ handled = sk_pair[1];
+ if (signal(SIGUSR1, signal_handler) == SIG_ERR) {
+ perror("signal");
+ exit(1);
+ }
+
+ ret = syscall(__NR_gettid);
+ exit(!(ret == 42));
+ }
+ close(sk_pair[1]);
+
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+ EXPECT_EQ(kill(pid, SIGUSR1), 0);
+ /* Make sure we didn't get a signal */
+ EXPECT_EQ(read(sk_pair[0], &c, 1), -1);
+ /* Make sure the notification is still alive */
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ID_VALID, &req.id), 0);
+
+ resp.id = req.id;
+ resp.error = 0;
+ resp.val = 42;
+
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+ /* Check we eventually received the signal */
+ EXPECT_EQ(read(sk_pair[0], &c, 1), 1);
+}
+
+
/*
* TODO:
* - expand NNP testing
--
2.25.1

2021-02-20 09:13:24

by Sargun Dhillon

[permalink] [raw]
Subject: [RFC PATCH 1/3] seccomp: Refactor notification handler to prepare for new semantics

This refactors the user notification code to have a do / while loop around
the completion condition. This has a small change in semantic, in that
previously we ignored addfd calls upon wakeup if the notification had been
responded to, but instead with the new change we check for an outstanding
addfd calls prior to returning to userspace.

Signed-off-by: Sargun Dhillon <[email protected]>
---
kernel/seccomp.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 952dc1c90229..b48fb0a29455 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1098,28 +1098,30 @@ static int seccomp_do_user_notification(int this_syscall,

up(&match->notif->request);
wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
- mutex_unlock(&match->notify_lock);

/*
* This is where we wait for a reply from userspace.
*/
-wait:
- err = wait_for_completion_interruptible(&n.ready);
- mutex_lock(&match->notify_lock);
- if (err == 0) {
- /* Check if we were woken up by a addfd message */
+ do {
+ mutex_unlock(&match->notify_lock);
+ err = wait_for_completion_interruptible(&n.ready);
+ mutex_lock(&match->notify_lock);
+ if (err != 0)
+ goto interrupted;
+
addfd = list_first_entry_or_null(&n.addfd,
struct seccomp_kaddfd, list);
- if (addfd && n.state != SECCOMP_NOTIFY_REPLIED) {
+ /* Check if we were woken up by a addfd message */
+ if (addfd)
seccomp_handle_addfd(addfd);
- mutex_unlock(&match->notify_lock);
- goto wait;
- }
- ret = n.val;
- err = n.error;
- flags = n.flags;
- }

+ } while (n.state != SECCOMP_NOTIFY_REPLIED);
+
+ ret = n.val;
+ err = n.error;
+ flags = n.flags;
+
+interrupted:
/* If there were any pending addfd calls, clear them out */
list_for_each_entry_safe(addfd, tmp, &n.addfd, list) {
/* The process went away before we got a chance to handle it */
--
2.25.1