Received: by 10.213.65.68 with SMTP id h4csp82176imn; Thu, 15 Mar 2018 17:51:11 -0700 (PDT) X-Google-Smtp-Source: AG47ELvWq9cMMp7jwRSDhy+pmpSBQ4LPNJhgeFKJkyhZLG/t5j6J2FDJjKyky6GjUyJHgj4621sB X-Received: by 2002:a17:902:b704:: with SMTP id d4-v6mr10342352pls.406.1521161470990; Thu, 15 Mar 2018 17:51:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521161470; cv=none; d=google.com; s=arc-20160816; b=WLEG4Ulg+UyYcm41KGxV/ys21+d9oRA3sh57QZVEdYESziQ4boU5ZeILLi2564dWfJ I1yx08+DVWQVtz/z4A06pv3Al9QknvC2JkJLyt98orhFhOyz238HOXdnXgH86IlKmPcn CUG+x10U4RA+/Sckzw+Mm6yXQcHV1W/S/ZjImfmt4Vc5WNsYaWXnaZz69SVU3zcyqBsE bqY+7GPzXfn97lrTYfG3xqv+8sFfuivFdwkmCUjOw7sl36nrVx9wdHxMcQa9cAvfntaJ W2ho58x+lYByAuMebZM4FnR8fXU5aSzhbbRkA/O8LYJVKgJD2edJmMwr16ll7lKmHBi5 bBgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dmarc-filter :arc-authentication-results; bh=fvxy+3izltGS7WGlCFRMQ2skVRbTKF1PWLUYbEEX/Qk=; b=OWYEY9bEidpvmwgOMFEWBDm3wVvekp2QWEQMwNtOVh7VSM67b6SA+o6B1buWX9z+/P xfVOzo7FbdoAtgizjXuepkge6MtdVcCsAgKpfxajxl5+4PYH1VcyZAVh2uA6msBZJF9y aS3WPHCxHS96UZIQoc7Vhf4g00FYpmuejSawXZAHwzAEjmHLXQW9zz2TF6TCNumnUmvU MGgTIg9H4TFfGaUR2wt3cap4Ors8FE6OookU3J3XUZ5XHeIvEUGFX0jVkIu7+zdxTXr7 MyAgEBqD+Oc0vBL60cbojUfo5lRbz9IG/1RuuWpQNcKFSTmHI3mOlVuME1eyarCSgEEi qgRg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y15si4627385pfb.346.2018.03.15.17.50.57; Thu, 15 Mar 2018 17:51:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933012AbeCPArS (ORCPT + 99 others); Thu, 15 Mar 2018 20:47:18 -0400 Received: from mail.kernel.org ([198.145.29.99]:42594 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751466AbeCPArR (ORCPT ); Thu, 15 Mar 2018 20:47:17 -0400 Received: from mail-io0-f171.google.com (mail-io0-f171.google.com [209.85.223.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BB09621777 for ; Fri, 16 Mar 2018 00:47:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB09621777 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-io0-f171.google.com with SMTP id m22so10694795iob.12 for ; Thu, 15 Mar 2018 17:47:16 -0700 (PDT) X-Gm-Message-State: AElRT7GCD34V5fQGisZIf/uOlp57kw0AkF5SzyROpeqOtFKIeGHASf8F fWgyJXAQASBeee+UHTymJ1Bt23kR+8EQYx1udSouPA== X-Received: by 10.107.146.67 with SMTP id u64mr471487iod.144.1521161236148; Thu, 15 Mar 2018 17:47:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.2.137.101 with HTTP; Thu, 15 Mar 2018 17:46:55 -0700 (PDT) In-Reply-To: <20180315173524.k7vwnvnhomg2j5yv@smitten> References: <20180204104946.25559-1-tycho@tycho.ws> <20180315160924.GA12744@gmail.com> <20180315170509.GA32766@mail.hallyn.com> <20180315173524.k7vwnvnhomg2j5yv@smitten> From: Andy Lutomirski Date: Fri, 16 Mar 2018 00:46:55 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC 0/3] seccomp trap to userspace To: Tycho Andersen Cc: Andy Lutomirski , "Serge E. Hallyn" , Christian Brauner , LKML , Linux Containers , Kees Cook , Oleg Nesterov , "Eric W . Biederman" , Christian Brauner , Tyler Hicks , Akihiro Suda , Alexei Starovoitov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 15, 2018 at 5:35 PM, Tycho Andersen wrote: > Hi Andy, > > On Thu, Mar 15, 2018 at 05:11:32PM +0000, Andy Lutomirski wrote: >> On Thu, Mar 15, 2018 at 5:05 PM, Serge E. Hallyn wrote: >> > Hm, synchronously - that brings to mind a thought... I should re-look at >> > Tycho's patches first, but, if I'm in a container, start some syscall that >> > gets trapped to userspace, then I hit ctrl-c. I'd like to be able to have >> > the handler be interrupted and have it return -EINTR. Is that going to >> > be possible with the synchronous approach? >> >> I think so, but it should be possible with the classic async approach >> too. The main issue is the difference between a classic filter like >> this (pseudocode): >> >> if (nr == SYS_mount) return TRAP_TO_USERSPACE; >> >> and the eBPF variant: >> >> if (nr == SYS_mount) trap_to_userspace(); > > Sargun started a private design discussion thread that I don't think > you were on, but Alexei said something to the effect of "eBPF programs > will never wait on userspace", so I'm not sure we can do something > like this in an eBPF program. I'm cc-ing him here again to confirm, > but I doubt things have changed. > >> I admit that it's still not 100% clear to me that the latter is >> genuinely more useful than the former. >> >> The case where I think the synchronous function call is a huge win is this one: >> >> if (nr == SYS_mount) { >> log("Someone called mount with args %lx\n", ...); >> return RET_KILL; >> } >> >> The idea being that the log message wouldn't show up in the kernel log >> -- it would get sent to the listener socket belonging to whoever >> created the filter, and that process could then go and log it >> properly. This would work perfectly in containers and in totally >> unprivileged applications like Chromium. > > The current implementation can't do exactly this, but you could do: > > if (nr == SYS_mount) { > log(...); > kill(pid, SIGKILL); > } > > from the handler instead. > > I guess Serge is asking a slightly different question: what if the > task gets e.g. SIGINT from the user doing a ^C or SIGALARM or > something, we should probably send the handler some sort of message or > interrupt to let it know that the syscall was cancelled. Right now the > current set doesn't behave that way, and the handler will just > continue on its merry way and get an EINVAL when it tries to respond > with the cancelled cookie. Hmm, I think we have to be very careful to avoid nasty races. I think the correct approach is to notice the signal and send a message to the listener that a signal is pending but to take no additional action. If the handler ends up completing the syscall with a successful return, we don't want to replace it with -EINTR. IOW the code looks kind of like: send_to_listener("hey I got a signal"); wait_ret = wait_interruptible for the listener to reply; if (wait_ret == -EINTR) { send_to_listener("hey there's a signal"); wait_ret = wait_killable for the listener to reply to the original request; } if (wait_ret == -EINTR) { /* hmm, this next line might not actually be necessary, but it's harmless and possibly useful */ send_to_listener("hey we're going away"); /* and stop waiting */ } ... actually handle the result. --Andy