Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp639603pxk; Thu, 1 Oct 2020 10:17:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxjbidjdQKBEKQNCsv9AjdCEFZgPn9Ct6tZfM5F+NUOMKT8sbrbzK8F+OLSkjl5MQnWVWGM X-Received: by 2002:a17:906:4553:: with SMTP id s19mr8846085ejq.475.1601572677023; Thu, 01 Oct 2020 10:17:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601572677; cv=none; d=google.com; s=arc-20160816; b=Jbn9pOLyHYabeUAtMaJvGvCx00Zp0CIsYi0UqnXn5vqxqrH6irubLNdOHXKKaeWfLh UW9KHb+bypO20cPJGBwuzxIsly/G/22ylGoEduky47rquyEoOr6fCoA+6IgV/LOlTBLs Kq7D/QECP4sErRa6nX2wx1OrF5Q4/BfhcwyXV4D20guezuRhRBgyJyQPE/HRv8wj6oCt 2KrCfRDHEB8IhTksf02LI+XH/VtU/Q4k3xfPtCaSYA6HoTilgBe8ZeB5ShQmoUNB2JTD J0yisXXCbTjYOYY0kM3GjjxJJ6ZRt6nClYkyxJGs++RCa8eOxB03FUr/h/Gf6Jt46aWH 0pcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=R1wJmFfDlNS2ZK16MyLmhBE/huODDJnjpuBQkOKkIq0=; b=Jn4tH5NEYbNHNlr4vhWv35XlyeO6Z5T1J8mTA5OAg39bEavRxe24YVoFuYXC3kf2Zs SN/7hXyKuHFBGPIb92M81T0Nnzqh2+6NEY8IrYvp5JiXJkhtIspHriji7ZGZ7KLHhwyp atcBV8/WHWEqU7psL4pnH3xw2MJWZOLAQHUTbzXRm5VyqNTNSm13Ci7ORC+wP4wPWiPl Wf18HJTIuWBClyTaY5iX1ZoS0fnMjiA7Sud6Ejr/ydrWKXVJtDaEYvIRKfZD6swjT31W 4+iRo/tc/nLYo7kpMu1jGhjU530XBQ7pc0uCGPyPKIAxD9KQg9jb7zE5PyysWxTDpD6C 7QqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d20si3842774ejb.704.2020.10.01.10.17.33; Thu, 01 Oct 2020 10:17:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732871AbgJARMR (ORCPT + 99 others); Thu, 1 Oct 2020 13:12:17 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:52993 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732407AbgJARMO (ORCPT ); Thu, 1 Oct 2020 13:12:14 -0400 Received: from mail-ed1-f70.google.com ([209.85.208.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kO27m-00081l-B9 for linux-kernel@vger.kernel.org; Thu, 01 Oct 2020 17:12:10 +0000 Received: by mail-ed1-f70.google.com with SMTP id n25so2477276edr.13 for ; Thu, 01 Oct 2020 10:12:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=R1wJmFfDlNS2ZK16MyLmhBE/huODDJnjpuBQkOKkIq0=; b=POQGwni0FGQnoGAdkd/eiweNBBeDjRUcANVbCTJT0PyAKdglAKteIQ/tG1rg4GyZrc bIBI9s+VdjBkoMVFOfrHm1hbx7kRd/0lM28ccDfUC2Ivnh2r8yDNFMg/HYdG+hZRTz64 Ah6AyDYMsCzHT2YrUTcorVxrBC39UcMYgcC0omJ2h11hKptR/7zKP5CwF7g3xOBfSylt ZRj5A+gWUk0bzNAxg9YUxjoqHYSyUkSPeNN8Wjh2ARrx3pjjr5/OHlfs4yKqvgz3Jbi7 wgrlk1F7o7/LABW+0clKTz4K89zTlJclTBsxq5tt2Rzimo9LGSUZhuV9O+oqkUmsJAdx ePZA== X-Gm-Message-State: AOAM53286kmjtZ+It1IdKYf56EFIjL9PgunxBbOrwU/B9K3WoA8ZWy9P bx2zSAPXc0gDRo7HqgdUatk0OwFsw5WQe6nUDa7/FeA44PNdUE/jXOvwywIDsUgvbR4J4CEDqsx 9EDK3C3mtpPaTq8dH5CcwW21P0EzRJE/OpsVk7SXLWw== X-Received: by 2002:a17:906:b74a:: with SMTP id fx10mr8883937ejb.232.1601572329843; Thu, 01 Oct 2020 10:12:09 -0700 (PDT) X-Received: by 2002:a17:906:b74a:: with SMTP id fx10mr8883899ejb.232.1601572329483; Thu, 01 Oct 2020 10:12:09 -0700 (PDT) Received: from gmail.com ([176.32.19.8]) by smtp.gmail.com with ESMTPSA id d24sm4644094edp.17.2020.10.01.10.12.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Oct 2020 10:12:08 -0700 (PDT) Date: Thu, 1 Oct 2020 19:12:06 +0200 From: Christian Brauner To: Tycho Andersen Cc: Jann Horn , linux-man , Song Liu , Will Drewry , Kees Cook , Daniel Borkmann , Giuseppe Scrivano , Robert Sesek , Linux Containers , lkml , Alexei Starovoitov , "Michael Kerrisk (man-pages)" , bpf , Andy Lutomirski , Christian Brauner Subject: Re: For review: seccomp_user_notif(2) manual page Message-ID: <20201001171206.jvkdx4htqux5agdv@gmail.com> References: <45f07f17-18b6-d187-0914-6f341fe90857@gmail.com> <20201001125043.dj6taeieatpw3a4w@gmail.com> <20201001165850.GC1260245@cisco> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20201001165850.GC1260245@cisco> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 01, 2020 at 10:58:50AM -0600, Tycho Andersen wrote: > On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote: > > On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner > > wrote: > > > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote: > > > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages) > > > > wrote: > > > > > NOTES > > > > > The file descriptor returned when seccomp(2) is employed with the > > > > > SECCOMP_FILTER_FLAG_NEW_LISTENER flag can be monitored using > > > > > poll(2), epoll(7), and select(2). When a notification is pend‐ > > > > > ing, these interfaces indicate that the file descriptor is read‐ > > > > > able. > > > > > > > > We should probably also point out somewhere that, as > > > > include/uapi/linux/seccomp.h says: > > > > > > > > * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF > > > > * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the > > > > * same syscall, the most recently added filter takes precedence. This means > > > > * that the new SECCOMP_RET_USER_NOTIF filter can override any > > > > * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all > > > > * such filtered syscalls to be executed by sending the response > > > > * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally > > > > * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. > > > > > > > > In other words, from a security perspective, you must assume that the > > > > target process can bypass any SECCOMP_RET_USER_NOTIF (or > > > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from > > > > calling seccomp(). This should also be noted over in the main > > > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part. > > > > > > So I was actually wondering about this when I skimmed this and a while > > > ago but forgot about this again... Afaict, you can only ever load a > > > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there > > > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property > > > in the tasks filter hierarchy then the kernel will refuse to load a new > > > one? > > > > > > static struct file *init_listener(struct seccomp_filter *filter) > > > { > > > struct file *ret = ERR_PTR(-EBUSY); > > > struct seccomp_filter *cur; > > > > > > for (cur = current->seccomp.filter; cur; cur = cur->prev) { > > > if (cur->notif) > > > goto out; > > > } > > > > > > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't > > > override each other for the same task simply because there can only ever > > > be a single one? > > > > Good point. Exceeeept that that check seems ineffective because this > > happens before we take the locks that guard against TSYNC, and also > > before we decide to which existing filter we want to chain the new > > filter. So if two threads race with TSYNC, I think they'll be able to > > chain two filters with listeners together. > > Yep, seems the check needs to also be in seccomp_can_sync_threads() to > be totally effective, > > > I don't know whether we want to eternalize this "only one listener > > across all the filters" restriction in the manpage though, or whether > > the man page should just say that the kernel currently doesn't support > > it but that security-wise you should assume that it might at some > > point. > > This requirement originally came from Andy, arguing that the semantics > of this were/are confusing, which still makes sense to me. Perhaps we > should do something like the below? I think we should either keep up this restriction and then cement it in the manpage or add a flag to indicate that the notifier is non-overridable. I don't care about the default too much, i.e. whether it's overridable by default and exclusive if opting in or the other way around doesn't matter too much. But from a supervisor's perspective it'd be quite nice to be able to be sure that a notifier can't be overriden by another notifier. I think having a flag would provide the greatest flexibility but I agree that the semantics of multiple listeners are kinda odd. Below looks sane to me though again, I'm not sitting in fron of source code. Christian > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 3ee59ce0a323..7b107207c2b0 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -376,6 +376,18 @@ static int is_ancestor(struct seccomp_filter *parent, > return 0; > } > > +static bool has_listener_parent(struct seccomp_filter *child) > +{ > + struct seccomp_filter *cur; > + > + for (cur = current->seccomp.filter; cur; cur = cur->prev) { > + if (cur->notif) > + return true; > + } > + > + return false; > +} > + > /** > * seccomp_can_sync_threads: checks if all threads can be synchronized > * > @@ -385,7 +397,7 @@ static int is_ancestor(struct seccomp_filter *parent, > * either not in the correct seccomp mode or did not have an ancestral > * seccomp filter. > */ > -static inline pid_t seccomp_can_sync_threads(void) > +static inline pid_t seccomp_can_sync_threads(unsigned int flags) > { > struct task_struct *thread, *caller; > > @@ -407,6 +419,11 @@ static inline pid_t seccomp_can_sync_threads(void) > caller->seccomp.filter))) > continue; > > + /* don't allow TSYNC to install multiple listeners */ > + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER && > + !has_listener_parent(thread->seccomp.filter)) > + continue; > + > /* Return the first thread that cannot be synchronized. */ > failed = task_pid_vnr(thread); > /* If the pid cannot be resolved, then return -ESRCH */ > @@ -637,7 +654,7 @@ static long seccomp_attach_filter(unsigned int flags, > if (flags & SECCOMP_FILTER_FLAG_TSYNC) { > int ret; > > - ret = seccomp_can_sync_threads(); > + ret = seccomp_can_sync_threads(flags); > if (ret) { > if (flags & SECCOMP_FILTER_FLAG_TSYNC_ESRCH) > return -ESRCH; > @@ -1462,12 +1479,9 @@ static const struct file_operations seccomp_notify_ops = { > static struct file *init_listener(struct seccomp_filter *filter) > { > struct file *ret = ERR_PTR(-EBUSY); > - struct seccomp_filter *cur; > > - for (cur = current->seccomp.filter; cur; cur = cur->prev) { > - if (cur->notif) > - goto out; > - } > + if (has_listener_parent(current->seccomp.filter)) > + goto out; > > ret = ERR_PTR(-ENOMEM); > filter->notif = kzalloc(sizeof(*(filter->notif)), GFP_KERNEL);