Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp2151614imd; Fri, 2 Nov 2018 06:51:13 -0700 (PDT) X-Google-Smtp-Source: AJdET5fWViybsxX2CT6MFMdVJbWWKdTQ/5LawfrwBufXe0PQ/IjxB6GDPZQGXA7ORL+o1X5RfrEC X-Received: by 2002:a17:902:6b4b:: with SMTP id g11-v6mr11820014plt.213.1541166673399; Fri, 02 Nov 2018 06:51:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541166673; cv=none; d=google.com; s=arc-20160816; b=fYjnZ1pQdcrv0SqXngDIpWNTN/8uhgj33amFUnQbtShiEcUPsRQiYUp4Tgx1pt8xAy fVfcmjcAVM1Z/Rwy/l8XLKs/+ImbEc29adfRN7MpYWSImExEXmaL9myUBaYp3OzdAIBk QpK3bq7zIQEaOoCAwUplnRNqqHsYoCGdGvzc5ht1E2L1ifKO3nCKdU3y15ifn4tGNmA0 NwDWIgwETktXzqXGflV9KIC/IRA8F5mAL46qUcmvExWvu+EOsYWPhhcrzgLGbjzuUVUx xDMp+ZednDwJqhEbjD3pwX6rtxU6aRSHtAi0ORL6ZLYtoxFzR3fUFYSnS6aNQnjebfhb IBDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=PnptN6cZ1jWu6MmjASLMC8vLTTjDJgkcp9FaLvSTpG8=; b=rkzoTX63GE2AkD5CTuQBG4NlmPr4vMHIzBnG6t4snvnGPqPnNn0Mb7Oivw603rcaVl mayl617LBV6Vl+A2mRnEVE8U7CvJxZGt9Fh8hUVEuf02mBNPcJ7KfhmL6e0WIJyE+GR4 ZItDMMjYs05O9rtxF5VbZynSQrOMitXiQmu9hiK06cCElqn/2la35rLHNC4H0o9oz0Jk QJSN29TIoTq1lXkVQameFLvR6SEwGEa9PKmzpVWN7bKL8Up3QKcIGj3JUxeae2/fw3Ux f5QN9EKh93i3YYg0U1CHZIXsvGCtFv7PGU++CKDykhcgpn3TessXp6Y73nP2whUgjYtA J+sA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@tycho-ws.20150623.gappssmtp.com header.s=20150623 header.b="Dg2gw/SC"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v5-v6si16003291pfe.237.2018.11.02.06.50.58; Fri, 02 Nov 2018 06:51:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@tycho-ws.20150623.gappssmtp.com header.s=20150623 header.b="Dg2gw/SC"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727709AbeKBW52 (ORCPT + 99 others); Fri, 2 Nov 2018 18:57:28 -0400 Received: from mail-it1-f194.google.com ([209.85.166.194]:50359 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726147AbeKBW52 (ORCPT ); Fri, 2 Nov 2018 18:57:28 -0400 Received: by mail-it1-f194.google.com with SMTP id k206-v6so3223114ite.0 for ; Fri, 02 Nov 2018 06:50:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=PnptN6cZ1jWu6MmjASLMC8vLTTjDJgkcp9FaLvSTpG8=; b=Dg2gw/SCM1zUT7xayesds6Wo8Ob3Y59P43lacx0ndHPD9vN9bDHYXG2ga5QraIFsq+ dRtYz4lAfC4Ge4QPzaVf5HgMNGV1IMa4xite9zWcFVHHYv9nxNCQz5IcYX7kxGPoFWAA 4Ne3QRkgKRBb73hDDQei5xqi7yl5QSwVYR+kvLGmgMwPLWAnzpWT73lQzCN7aT+nBpdu DOd9ysp+GZj+zkDpNj9cASAcXhlQ+xammgKlIXF2k9gqJWW8fMacwH0FQbtglzO8IVFl aJnbFbvrPMSTE4Cs20U3UkA++hAgB6U0fDZEJwhvgEZvPjxiJ+hSaBMQvion3ZWGCgyB 7U0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=PnptN6cZ1jWu6MmjASLMC8vLTTjDJgkcp9FaLvSTpG8=; b=uOQuxLMP3zxBYp/jVC0H1vaRwpVP5pjknqCVDutROlAOLP6aAs6GuRTyjXyRGBW5QE M5njp13qFhKQaK40upeYXoRgZ41xtkN84nYab7kZrIUcXhUVnYGBaeoog3jdZZdQrAH0 tcCxBbSM4fEkSx+bNhja19Dqz0mYdbBIjJ+gFrUus1bD0WexHYqwtdboi3TboLObAxhU IT/RVQ/mAldCUDWYsMsRpwTVd7jKglOickqrLGV3OTCSS+7eyB9bY4WLmR25RuNgJjin SUj7IydTeLN6lb5ArBHaZ5X5tLEbMbG1MK8/7/BGE9GKlYPNt7yeysNY3PYFH6JS8kjl HY8g== X-Gm-Message-State: AGRZ1gJS4Z0e65p24ZLCC7ESbNNrB8g5Z80HF4Pc3CFh/g7/HX17gi9i 0UsjwdFqcIe0f6diGA6YfZF6CBul5i0= X-Received: by 2002:a24:d983:: with SMTP id p125-v6mr28733itg.97.1541166614449; Fri, 02 Nov 2018 06:50:14 -0700 (PDT) Received: from cisco (75-166-162-63.hlrn.qwest.net. [75.166.162.63]) by smtp.gmail.com with ESMTPSA id 142-v6sm3530107itw.40.2018.11.02.06.50.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 02 Nov 2018 06:50:13 -0700 (PDT) Date: Fri, 2 Nov 2018 07:50:11 -0600 From: Tycho Andersen To: Oleg Nesterov Cc: Kees Cook , Andy Lutomirski , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Aleksa Sarai , linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org Subject: Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace Message-ID: <20181102135011.GK2180@cisco> References: <20181029224031.29809-1-tycho@tycho.ws> <20181029224031.29809-2-tycho@tycho.ws> <20181030143235.GA3385@redhat.com> <20181030153231.GB7343@cisco> <20181101144804.GD23232@redhat.com> <20181101203328.GI2180@cisco> <20181102112903.GB12360@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181102112903.GB12360@redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 02, 2018 at 12:29:03PM +0100, Oleg Nesterov wrote: > On 11/01, Tycho Andersen wrote: > > > > On Thu, Nov 01, 2018 at 03:48:05PM +0100, Oleg Nesterov wrote: > > > > > > > > But my main concern is that either way wait_for_completion_killable() allows > > > > > to trivially create a process which doesn't react to SIGSTOP, not good... > > > > > > > > > > Note also that this can happen if, say, both the tracer and tracee run in the > > > > > same process group and SIGSTOP is sent to their pgid, if the tracer gets the > > > > > signal first the tracee won't stop. > > > > > > > > > > Of freezer. try_to_freeze_tasks() can fail if it freezes the tracer before > > > > > it does SECCOMP_IOCTL_NOTIF_SEND. > > > > > > > > I think in general the way this is intended to be used these things > > > > wouldn't happen. > > > > > > Why? > > > > The intent is to run the tracer on the host and have it trace > > containers, which would live in a different freezer cgroup, process > > group, etc. > > I didn't mean the freezer cgroup, suspend can fail, it does the "global" freeze. > Nevermind. > > > > Yes I think it would be nice to avoid wait_for_completion_killable(). > > > > > > So please help me to understand the problem. Once again, why can not > > > seccomp_do_user_notification() use wait_for_completion_interruptible() only? > > > > > > This is called before the task actually starts the syscall, so > > > -ERESTARTNOINTR if signal_pending() can't hurt. > > > > The idea was that when the tracee gets a signal, it notifies the > > tracer exactly once, and then waits for the tracer to decide what to > > do. So if we use another wait_for_completion_interruptible(), doesn't > > it just get re-woken immediately because the signal is still pending? > > Hmm. I meant that we should use a single wait_for_completion_interruptible(). Yes, but if we can use a second _interruptible(), then we can avoid the SIGSTOP issue and still perhaps preserve the SIGNALED bit if we decide it's worth it at the conclusion of this thread. > > > Now lets suppose seccomp_do_user_notification() simply does > > > > > > err = wait_for_completion_interruptible(&n.ready); > > > > > > if (err < 0 && state != SECCOMP_NOTIFY_REPLIED) { > > > syscall_set_return_value(ERESTARTNOINTR); > > > list_del(&n.list); > > > return -1; > > > } > > > > > > (I am ignoring the locking/etc). Now the obvious problem is that the listener > > > doing SECCOMP_IOCTL_NOTIF_SEND can't distinguish -ENOENT from the case when the > > > tracee was killed, yes? > > > > > > Is it that important? > > > > The answer to this question depends on how we want the listener to be > > able to react. For example, if the listener is in the middle of doing > > a mount() on behalf of the task and it gets a signal and we return > > immediately, the listener will complete the mount(), try to respond > > with success and get -ENOENT. > > Yes. Should we undo the mount if the tracee is killed? > > > If the task handles the signal and > > restarts the mount(), it'll happen twice unless the listener undoes > > it when it sees the -ENOENT. > > Yes. But note that we know that if the same tracee sends another notification > it must be the same syscall. > > So. If the listener needs to undo mount when the tracee is killed, it should > undo it if it was interrupted too. > > If no, the listener can simply "ignore" the next notification and do > SECCOMP_IOCTL_NOTIF_SEND(val = 0, error = 0). > > I see no real difference... Well, doesn't it seem like a hack? What if the tracee really makes the same syscall twice? How do we tell the difference between one that was restarted and a real second call? > > If we send another notification with the > > SIGNALED flag, the listener has a better picture of what's going on, > > which might be nice. > > Yes, but this returns us to my my question: Is it that important? > > What exactly the listener can do if it gets SECCOMP_NOTIF_FLAG_SIGNALED? > > Undo the mount? No. This doesn't differ from the case when the tracee gets > the non-fatal signal right after SECCOMP_NOTIFY_REPLIED. I guess that in do_user_notification(), even if we get -EINTER from the wait, we should check to see that the state is REPLIED. If it is, we can use the err and val from it, and complete the syscall as normal, since it did. Tycho