Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp299918pxx; Thu, 29 Oct 2020 03:01:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyCIHncsmfIcqrsEb315KCPlRoN8teEoSzGyfVhEr31ndxfFA1UL7mXUZlB6wPBQU6rJze0 X-Received: by 2002:a17:906:240d:: with SMTP id z13mr3195721eja.267.1603965681458; Thu, 29 Oct 2020 03:01:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603965681; cv=none; d=google.com; s=arc-20160816; b=E5q+kJ8zphjItM2G4jsuy5zQEj1QR1ALn3/XOqO7tdT+LDOrauIxKSNZgWSpS7sPxb y/anDKcE9jMN3eyZVXyLcLpFE/tL+/VD07GW+jYdhRGN827ogsVoY6v6YPbe43nQIWTZ Gt/Ab3WHKveHZyO66lJCC+RnN/8xaX1v0dPdf5omiYTHUodiWW9UFDZJhaXfHvnhe9+h Em8FBGL5BkJ9T2V3hVB1M3Oo7TQ55z1x+d4fu0c6ZwY2P25imETT7NE99a/Io/7OYhkU iQKkd58HIPr2nqQHNdR9AqoZ/OT+K0ALeKooRB4rTPgm+IlQLYS/xKzglcBmJKvcNjPq VQmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=1KKfnGnBEdd12X89nwYRLUfbpvsSCoekdcU0DHREQhE=; b=Oe7ZcTMTY4FDY2b/nDAORCkt1rPU9gzlUBQc7xjUhkeaBt67TBhGKIxPjK/p+zhEGg HZ4r54X6NVzCgNe/sxxyGXerayWvMYKv7jAY2nGNJdfGwWsTSVMnICU7qWmd0N3B92PV 2E0bHGH400OhOQoF8P315Vw/0JSFxeH5cwwG6L7tZZXBXGsb5p/jHlK+J2EANn/y7xJs /L3LYu2+EOMkfIX1gDiFZxmr4tqA/ZTSqyXeaN42O/zDM/eH0DQd4nyWcW+KRSZOsSxg KsGAevMwSkbr+3TmU2HQjuOHt8UsauutpYfR32Pz9Q5a6REJl2ykX8vb7bvJ5J92lJAb vTVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b=tF7de7mb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b22si1551687eds.561.2020.10.29.03.00.59; Thu, 29 Oct 2020 03:01:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b=tF7de7mb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389233AbgJ1Wq3 (ORCPT + 99 others); Wed, 28 Oct 2020 18:46:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731875AbgJ1WqJ (ORCPT ); Wed, 28 Oct 2020 18:46:09 -0400 Received: from mail-ed1-x544.google.com (mail-ed1-x544.google.com [IPv6:2a00:1450:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DC18C0613CF for ; Wed, 28 Oct 2020 15:46:08 -0700 (PDT) Received: by mail-ed1-x544.google.com with SMTP id l16so1204054eds.3 for ; Wed, 28 Oct 2020 15:46:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1KKfnGnBEdd12X89nwYRLUfbpvsSCoekdcU0DHREQhE=; b=tF7de7mbM6VeEXWhVjMrNC5cvKMpl0WDoIIp7fDbiFMyA9PP4g3gTmf4LaBzwarT14 4HlQUlmQiDgdPcyNEbZCIirlXnui4R0jL648Tx95iRiVKLtDrIrgQ3s56LWy8BP1Jh4m e5D2GJEtVeWX660AowCOJZdoycU2F+72CGe2Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1KKfnGnBEdd12X89nwYRLUfbpvsSCoekdcU0DHREQhE=; b=LzwlP0+gecVO/YoOSp3yGLyNd46tFu7k4J1xDCx+WLB6+b+0lQmx/+NTZ8MdKCXzQY idHVJpgQTQOVy+lx/L7t1yjq7Cj5jXe47a0pxpB/9C3jhjg3VTy/FFHA7BmcKTK0STjO lTCBOEYT0ewwkcNr2b7drqptKHmeOAHghpMzRNshjakiehmO091G05UzoEXONe0ve43Y E6H32pc7zAorbYcya4gzXgsqE1lZBsn+exEkBYZtrLua1Z5KbhduUAcG6sv2Yvg+2c5f N2c8l3yARPdF9q00yNTrNglzFSLLfTjof+PQZEsYmGvtozc7R4h+4OsGaJ3S5SilBxo1 WshQ== X-Gm-Message-State: AOAM531W1R1Keh3/7/gq5Zt9W+c4X/oj9YGI5ZwdWbDbgJ9yxgXlbEA+ qpnA7St8+K5qG0oc6PFbxOyej0e9JR74TnBneAQAAf1laIbApvYQ X-Received: by 2002:a17:906:bc42:: with SMTP id s2mr173094ejv.251.1603907070187; Wed, 28 Oct 2020 10:44:30 -0700 (PDT) MIME-Version: 1.0 References: <45f07f17-18b6-d187-0914-6f341fe90857@gmail.com> <20200930150330.GC284424@cisco> <8bcd956f-58d2-d2f0-ca7c-0a30f3fcd5b8@gmail.com> <20200930230327.GA1260245@cisco> <20200930232456.GB1260245@cisco> <656a37b5-75e3-0ded-6ba8-3bb57b537b24@gmail.com> In-Reply-To: From: Sargun Dhillon Date: Wed, 28 Oct 2020 10:43:54 -0700 Message-ID: Subject: Re: For review: seccomp_user_notif(2) manual page To: Jann Horn Cc: "Michael Kerrisk (man-pages)" , Tycho Andersen , Kees Cook , Christian Brauner , linux-man , lkml , Aleksa Sarai , Alexei Starovoitov , Will Drewry , bpf , Song Liu , Daniel Borkmann , Andy Lutomirski , Linux Containers , Giuseppe Scrivano , Robert Sesek Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 28, 2020 at 2:43 AM Jann Horn wrote: > > On Wed, Oct 28, 2020 at 7:32 AM Sargun Dhillon wrote: > > On Tue, Oct 27, 2020 at 3:28 AM Jann Horn wrote: > > > On Tue, Oct 27, 2020 at 7:14 AM Michael Kerrisk (man-pages) > > > wrote: > > > > On 10/26/20 4:54 PM, Jann Horn wrote: > > > > > I'm a bit on the fence now on whether non-blocking mode should use > > > > > ENOTCONN or not... I guess if we returned ENOENT even when there are > > > > > no more listeners, you'd have to disambiguate through the poll() > > > > > revents, which would be kinda ugly? > > > > > > > > I must confess, I'm not quite clear on which two cases you > > > > are trying to distinguish. Can you elaborate? > > > > > > Let's say someone writes a program whose responsibilities are just to > > > handle seccomp events and to listen on some other fd for commands. And > > > this is implemented with an event loop. Then once all the target > > > processes are gone (including zombie reaping), we'll start getting > > > EPOLLERR. > > > > > > If NOTIF_RECV starts returning -ENOTCONN at this point, the event loop > > > can just call into the seccomp logic without any arguments; it can > > > just call NOTIF_RECV one more time, see the -ENOTCONN, and terminate. > > > The downside is that there's one more error code userspace has to > > > special-case. > > > This would be more consistent with what we'd be doing in the blocking case. > > > > > > If NOTIF_RECV keeps returning -ENOENT, the event loop has to also tell > > > the seccomp logic what the revents are. > > > > > > I guess it probably doesn't really matter much. > > > > So, in practice, if you're emulating a blocking syscall (such as open, > > perf_event_open, or any of a number of other syscalls), you probably > > have to do it on a separate thread in the supervisor because you want > > to continue to be able to receive new notifications if any other process > > generates a seccomp notification event that you need to handle. > > > > In addition to that, some of these syscalls are preemptible, so you need > > to poll SECCOMP_IOCTL_NOTIF_ID_VALID to make sure that the program > > under supervision hasn't left the syscall. > > > > If we're to implement a mechanism that makes the seccomp ioctl receive > > non-blocking, it would be valuable to address this problem as well (getting > > a notification when the supervisor is processing a syscall and needs to > > preempt it). In the best case, this can be a minor inconvenience, and > > in the worst case this can result in weird errors where you're keeping > > resources open that the container expects to be closed. > > Does "a notification" mean signals? Or would you want to have a second > thread in userspace that poll()s for cancellation events on the > seccomp fd and then somehow takes care of interrupting the first > thread, or something like that? I would be reluctant to be prescriptive in that it be a signal. Right now, it's implemented as a second thread in userspace that does a ioctl(...) and checks if the notification is valid / alive, and does what's required if the notification has died (interrupting the first thread). > > Either way, I think your proposal goes beyond the scope of patching > the existing weirdness, and should be a separate patch. I agree it should be a separate patch, but I think that it'd be nice if there was a way to do something like: * opt-in to getting another message after receiving the notification that indicates the program has left the syscall * when you do the RECV, you can specify a flag or some such asking that you get signaled / notified about the program leaving the syscall * a multiplexed receive that can say if an existing notification in progress has left the valid state. --- The reason I bring this up as part of this current thread / discussion is that I think that they may be related in terms of how we want the behaviour to act. I would love to hear how people think this should work, or better suggestions than the second thread approach above, or the alternative approach of polling all the notifications in progress on some interval [and relying on epoll timeout to trigger that interval].