Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp629254pxk; Thu, 1 Oct 2020 10:09:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeios//8xzAJSRVTIm1Q+z8mjznHeIGLIXEXt6Nfl1uzJeY/mELE3Te6KK3VFXq7q7dJqV X-Received: by 2002:a17:906:3ad0:: with SMTP id z16mr9628635ejd.193.1601572190394; Thu, 01 Oct 2020 10:09:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601572190; cv=none; d=google.com; s=arc-20160816; b=viUJ6foz5IBGGanx5FeTTbbNgUrLDKw68/ewMSIFjXwqWy817a0ffNHloOBl7ES0Z4 hyM4PiexSt1Z9FbZv7lQKLwTibhoZB+vc+XLcL/ZaX251qIYsMp8V9U48Xzsi4jSqPYt /nRCTxkqFrot1tFU6WYI/dKBb7fI+2SipW3ulzqOOYeHVxV/yyQp7ziB81mkUiTEVzQG FzB3XZibUcklkVbujw6YozAsUYMMrqI8LMlNuYCk8O2HDaaKHVJzRdDVH+GF+1bV2uqb auqlPPD6zCAVOMR+WjZNejIwGplCtDgBL1tczNejv/2HNVX70SwW625fumv7esZlhzQr 9+nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Aap9MP/HBi9/qankX0BesgfqzXvo4bvGgH/pO8G9acQ=; b=LOliUN8eDfHhf9KBELxFZHqXIXHTEOZFwCs2OWP6kOdF3+tGf1NXL8lzP6S9+UmYGw EFedzOsBMQIs7HcuoXa7xO3BNWYn2fJ51WV1UaKLCHZm0iFUd9Ie36gSQBuANpNl0Zim PPC/vSZdOo5x2VmHDYJj9vdVfV1TmRaU+KGXgjZ+Qr4v9Lk+ZBI74ho1+v0V07Q4a0r2 UHOu+FjYWMoLj4bTI/5ZvScQxi/EYQ87DzrGUviY04NoN3kL2mcHA9ks66Yik/NyWYaZ ua2TyB1pYQNEeDjjBwr1Ho1EK1Bb4kARyhJKgNnD71oDJIk5M+27M4WpWknAMTdUSGW2 PPBA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q16si941423ejt.139.2020.10.01.10.09.26; Thu, 01 Oct 2020 10:09:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732407AbgJARFz (ORCPT + 99 others); Thu, 1 Oct 2020 13:05:55 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:52635 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732213AbgJARFy (ORCPT ); Thu, 1 Oct 2020 13:05:54 -0400 Received: from mail-ed1-f71.google.com ([209.85.208.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kO21f-0007Sd-Ri for linux-kernel@vger.kernel.org; Thu, 01 Oct 2020 17:05:51 +0000 Received: by mail-ed1-f71.google.com with SMTP id l1so2480466edv.14 for ; Thu, 01 Oct 2020 10:05:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=Aap9MP/HBi9/qankX0BesgfqzXvo4bvGgH/pO8G9acQ=; b=CliEttqzKDNhO5bEvQr6tasipDXzo7MefydAAt1ImH95Tnx9Z7mjyhg/v/vVFGS+9F 8MLnTxdPikE3ScsqP8a9e71oIVhloq3AHH21qklW0dkBnDM7hYPgW4SASu7zWW4f0HpW ymuRAijsofKDyEl7Q427yRsWtkIIXqQKLkTKEaDQKRBpyw8VOv31w6SK/P9fC9x7Nux4 +GV3NzIlNs1bBBkmSouWjwP2T1Z+Xs0+p33o2JgQZeE6vLEHz2L5cVztuIvkWpRXDkJF JMg14ri3znJ1JznzPsPW8wBAGxg+zpy9P/lWr5E6pgPUanHtuTbwS4C4doRoeWKojiYV syTQ== X-Gm-Message-State: AOAM532zwQ+Hh9TaAwZbtSbqjDMHrj95BPTLL9N9z0TTpwEZC/Zv0Un7 bugXEuwFxLjyxoNlaJeYuJ+wjm4ku3EThLLwqgThG2HsjYXAG6Bz3KBXH0IdNk/Uai4Q2tlWKjw sFt28raN5XPqV5uvqmS2Mu3Wb8fqbVUvQGVqT41yMkg== X-Received: by 2002:a17:906:71cc:: with SMTP id i12mr9027279ejk.507.1601571951465; Thu, 01 Oct 2020 10:05:51 -0700 (PDT) X-Received: by 2002:a17:906:71cc:: with SMTP id i12mr9027243ejk.507.1601571951125; Thu, 01 Oct 2020 10:05:51 -0700 (PDT) Received: from gmail.com ([176.32.19.8]) by smtp.gmail.com with ESMTPSA id s7sm4481136ejd.103.2020.10.01.10.05.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Oct 2020 10:05:50 -0700 (PDT) Date: Thu, 1 Oct 2020 19:05:43 +0200 From: Christian Brauner To: Jann Horn Cc: "Michael Kerrisk (man-pages)" , linux-man , Song Liu , Will Drewry , Kees Cook , Daniel Borkmann , Giuseppe Scrivano , Robert Sesek , Linux Containers , lkml , Alexei Starovoitov , bpf , Andy Lutomirski , Christian Brauner Subject: Re: For review: seccomp_user_notif(2) manual page Message-ID: <20201001170501.7umqgtfdx6jenkla@gmail.com> References: <45f07f17-18b6-d187-0914-6f341fe90857@gmail.com> <20201001125043.dj6taeieatpw3a4w@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn wrote: > On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner > wrote: > > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote: > > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages) > > > wrote: > > > > NOTES > > > > The file descriptor returned when seccomp(2) is employed with the > > > > SECCOMP_FILTER_FLAG_NEW_LISTENER flag can be monitored using > > > > poll(2), epoll(7), and select(2). When a notification is pend‐ > > > > ing, these interfaces indicate that the file descriptor is read‐ > > > > able. > > > > > > We should probably also point out somewhere that, as > > > include/uapi/linux/seccomp.h says: > > > > > > * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF > > > * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the > > > * same syscall, the most recently added filter takes precedence. This means > > > * that the new SECCOMP_RET_USER_NOTIF filter can override any > > > * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all > > > * such filtered syscalls to be executed by sending the response > > > * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally > > > * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. > > > > > > In other words, from a security perspective, you must assume that the > > > target process can bypass any SECCOMP_RET_USER_NOTIF (or > > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from > > > calling seccomp(). This should also be noted over in the main > > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part. > > > > So I was actually wondering about this when I skimmed this and a while > > ago but forgot about this again... Afaict, you can only ever load a > > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there > > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property > > in the tasks filter hierarchy then the kernel will refuse to load a new > > one? > > > > static struct file *init_listener(struct seccomp_filter *filter) > > { > > struct file *ret = ERR_PTR(-EBUSY); > > struct seccomp_filter *cur; > > > > for (cur = current->seccomp.filter; cur; cur = cur->prev) { > > if (cur->notif) > > goto out; > > } > > > > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't > > override each other for the same task simply because there can only ever > > be a single one? > > Good point. Exceeeept that that check seems ineffective because this > happens before we take the locks that guard against TSYNC, and also > before we decide to which existing filter we want to chain the new > filter. So if two threads race with TSYNC, I think they'll be able to > chain two filters with listeners together. That's a bug, imho. I don't have source code in front of me right now though. > > I don't know whether we want to eternalize this "only one listener > across all the filters" restriction in the manpage though, or whether > the man page should just say that the kernel currently doesn't support > it but that security-wise you should assume that it might at some > point. Maybe. I would argue that it might be worth having at least a new flag/option to indicate either "This is a non-overridable filter." or at least for the seccomp notifier have an option to indicate that no other notifer can be installed. Christian