Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp280599imm; Thu, 21 Jun 2018 18:29:38 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIg9U0D4mmnEpiDiYR2HY9jD/FnLdCXjkDiRMYUztGiG36B0PWNz1c194hW563Q37N1cS7/ X-Received: by 2002:a17:902:aa87:: with SMTP id d7-v6mr25350233plr.215.1529630978477; Thu, 21 Jun 2018 18:29:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529630978; cv=none; d=google.com; s=arc-20160816; b=TcFhJmq1DcnReuJGDJlWnpaJYBowv8ffoLIFFR4jaRQYCARm+wGJeZql9UFkBNu4sx a5vlAFVtNziVc6gD8KkTdLsg04TXUgHuRrx7CEWfrJgGWww61rIHxV9hf3HpOdGcAHdI 4sgALwDYccOFy6OM3wLx+kspV+JtjFvWvdga5L7kWZHQKGS/PUAVCy6eT0xbppAYWnS3 C0DLzwDjqGo5PTHso7zH0rdKYlFudSIwpA2WGAqXrupvmbeC1pLXLmoVoc40jMb8Xm24 ihqPWsGCeRgvOcaObjB9mnmKa1qvuRCbnOiUwRp18Hy9z6lriNvXp4kffvx0PwnI3Qfw owVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=Kgqu9VYT1TaU24pZjye+OB8Bvfsz1IZnru2Dr8kYFKM=; b=cf0yFhx/ChC1yx9Uc5gKl6wk4YyrbnLPdZG/I9FY0YJhw22fpdczIsmn5w0Z55iQOC kVG2L5QX8hgy8sc0Ofe1Z61m4s9egqNnjNSLLS+ycMU1JI9D5sIgUV9h8M23jwW6Ja6j I+I1LzoJspu5zLY+WRCYcmqDVuMhHRpIFBU0FJucxZnh4WAErTJASse2ojXduCLSkV4j v4XHWzkW/TgPmMThIncN/5xeT9836q7u2gCGWH2oZravjOjA+i8fTmqdTq86RedK+wnw x3leeexQ0ZCFfSsI/AfSO5L6DrtO0KgTL7Pvwq+iO/zQ656oYogCUeQyvShaM8nDm+jE qLGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bBt1ZoMl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j9-v6si4995505pgc.627.2018.06.21.18.29.22; Thu, 21 Jun 2018 18:29:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bBt1ZoMl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934132AbeFVB2j (ORCPT + 99 others); Thu, 21 Jun 2018 21:28:39 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:42570 "EHLO mail-oi0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933129AbeFVB2h (ORCPT ); Thu, 21 Jun 2018 21:28:37 -0400 Received: by mail-oi0-f67.google.com with SMTP id k190-v6so4699418oib.9 for ; Thu, 21 Jun 2018 18:28:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Kgqu9VYT1TaU24pZjye+OB8Bvfsz1IZnru2Dr8kYFKM=; b=bBt1ZoMlkG3cH/sUG773e5Ba1lz16T4UPjjlfdJXXBkfPQf1NIJZNQNnmP1Up0UQi8 3C2k23drGUtzH9iG2NKOQT6d158sVjqYHzy3fwzt9HVSEGA6BxIkUnaDxoziArCu2WrZ Y22NpP15vlIKRImebpbZ7rKU0dYmetishxlwqIJwL4yJNFuw/DL7pXr92xorezhpaRZa H9ebVCZOFf/PbHumwWUwAGzR3PKrCeVE6BcewcOKDc+w5PQPvYErOLeBaMhSuqmu5aic XvLNFkSwsqNEyTO+NbxxzjfTfJ6o5jLrEzMnI2pSTTzOVO6E+vzyQP/uN0Pf9hcjL76n fPUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Kgqu9VYT1TaU24pZjye+OB8Bvfsz1IZnru2Dr8kYFKM=; b=ccXN57cZ6pm86cS2l0UTT24EpYO7cklFXGR+IrdSxz7lXp6SaqNcR05Ai7DmZz6hrm q6+lRe/J6I6xslphUD0THxt+R237inokZxuhVppt/BvxccBDX8JiQoXf8R4Cn19s+L8V Zn7dB5Z4ceqKu5x3A5GeFE3b3vqXGdqC04pvV2DwmX6kBsHTUZL+5wy9/QyGn7CFdQGt UJv31PC8UPB8JVB0VnvioZN9ulYzPyU/+UkgFhLj19HwLR3qniz2isWeY3Bn/FhkO9aG XPBWb3uhEvy7F9uyBEJBDBwdxB7ePl6M551WbbORGn0rQrXVl+rIe/JIDsb0FM5dhP9k h8Qg== X-Gm-Message-State: APt69E3Rtn+7DDJUl42sqjJcC8bcIEr9+3Z0j/xUk+Z6khR+CUJkCbGT 4JYCIEgcam/VD//Lip+S4nMBl9WJ6BIs2EHvwGbGLg== X-Received: by 2002:aca:3048:: with SMTP id w69-v6mr14543733oiw.29.1529630916681; Thu, 21 Jun 2018 18:28:36 -0700 (PDT) MIME-Version: 1.0 References: <20180621220416.5412-1-tycho@tycho.ws> <20180621220416.5412-2-tycho@tycho.ws> <20180622005829.GK3992@cisco> In-Reply-To: <20180622005829.GK3992@cisco> From: Jann Horn Date: Fri, 22 Jun 2018 03:28:24 +0200 Message-ID: Subject: Re: [PATCH v4 1/4] seccomp: add a return code to trap to userspace To: Tycho Andersen Cc: Kees Cook , kernel list , containers@lists.linux-foundation.org, Linux API , Andy Lutomirski , Oleg Nesterov , "Eric W. Biederman" , "Serge E. Hallyn" , Christian Brauner , Tyler Hicks , suda.akihiro@lab.ntt.co.jp, "Tobin C. Harding" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 22, 2018 at 2:58 AM Tycho Andersen wrote: > > On Fri, Jun 22, 2018 at 01:21:47AM +0200, Jann Horn wrote: > > On Fri, Jun 22, 2018 at 12:05 AM Tycho Andersen wrote: > > > > > > This patch introduces a means for syscalls matched in seccomp to notify > > > some other task that a particular filter has been triggered. > > [...] > > > +Userspace Notification > > > +====================== > > > + > > > +The ``SECCOMP_RET_USER_NOTIF`` return code lets seccomp filters pass a > > > +particular syscall to userspace to be handled. This may be useful for > > > +applications like container managers, which whish to intercept particular > > > > typo: "wish" > > > > [...] > > > +passed around via ``SCM_RIGHTS`` or similar. Alternativley, a filter fd can be > > > > typo: "Alternatively" > > > > [...] > > > +It is worth noting that ``struct seccomp_data`` contains the values of register > > > +arguments to the syscall, but does not contain pointers to memory. The task's > > > +memory is accessiable to suitably privileged traces via via ``ptrace()`` or > > > > Typo: "accessible" > > Thanks! > > > [...] > > > + > > > +static void seccomp_do_user_notification(int this_syscall, > > > + struct seccomp_filter *match, > > > + const struct seccomp_data *sd) > > > +{ > > > + int err; > > > + long ret = 0; > > > + struct seccomp_knotif n = {}; > > > + > > > + mutex_lock(&match->notify_lock); > > > + err = -ENOSYS; > > > + if (!match->has_listener) > > > + goto out; > > > + > > > + n.pid = task_pid(current); > > > + n.state = SECCOMP_NOTIFY_INIT; > > > + n.data = sd; > > > + n.id = seccomp_next_notify_id(match); > > > + init_completion(&n.ready); > > > + > > > + list_add(&n.list, &match->notifications); > > > + wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); > > > + > > > + mutex_unlock(&match->notify_lock); > > > + up(&match->request); > > > + > > > + err = wait_for_completion_interruptible(&n.ready); > > > + mutex_lock(&match->notify_lock); > > > + > > > + /* > > > + * Here it's possible we got a signal and then had to wait on the mutex > > > + * while the reply was sent, so let's be sure there wasn't a response > > > + * in the meantime. > > > + */ > > > + if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) { > > > + /* > > > + * We got a signal. Let's tell userspace about it (potentially > > > + * again, if we had already notified them about the first one). > > > + */ > > > + if (n.state == SECCOMP_NOTIFY_SENT) { > > > + n.state = SECCOMP_NOTIFY_INIT; > > > + up(&match->request); > > > + } > > > + mutex_unlock(&match->notify_lock); > > > + err = wait_for_completion_killable(&n.ready); > > > > Does this mean that when you get a signal that isn't SIGKILL, > > wait_for_completion_interruptible() will bail out with -ERESTARTSYS, > > but then you hang on this wait_for_completion_killable()? I don't > > understand what's going on here. What's the point of using > > wait_for_completion_interruptible() when you'll just hang on another > > wait on the same "struct completion"? > > This is the implementation of this suggestion by Andy: > https://lkml.org/lkml/2018/3/15/1122 > > The idea is to alert the listener that there was a signal exactly > once, in case it's in the middle of processing a request it could bail > out and do something else. So the killable wait is intended to ignore > other (non-fatal) signals after the first one and wait for whatever > the handler decides to do with the signal it received. How can the listener tell that a signal arrived? When the first non-fatal signal comes in, you just set the state to SECCOMP_NOTIFY_INIT if it was SECCOMP_NOTIFY_SENT, right? So the listener will potentially see the request twice, but with no additional indicator that a signal arrived? And in particular, if the listener doesn't read the request before the signal arrives, it will only see the request once, just as if it was a normal request with no signals involved? Would it perhaps make sense to add a field to struct seccomp_notif that indicates whether the notification is for a normal syscall or a canceled syscall?