Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp297413ybm; Thu, 28 May 2020 03:03:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzH6B6kiDDxQPZewULcA4DVp/fliSdS4VKCfK7tMexdam3G//ZzKphsfRvhe9wJWb2b7TMW X-Received: by 2002:a50:c906:: with SMTP id o6mr2193959edh.95.1590660208546; Thu, 28 May 2020 03:03:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590660208; cv=none; d=google.com; s=arc-20160816; b=D0i7P4xPvpCWpBJAgKxzjHbLraw9fJsC8zMdMvNpzv0DBj+8/oRuRD1qFeYT1/shIN 8dY+a1uMH5CVhHSLX0HAWbqjYjgNjI67h8xl8Jy/N+7WYf57n3ijVHKOJVQLfmLkH0r2 aKJ4DzpEkKmK4IotrtQdPNaUVVn6XkOUyw/3KGktWOYEYJ9TBbs1ZaX8lFKSF+QKU1PU dcHC7J8L7aUXfLp38lpIovLqPhtFRn2bZiRkjB6gjv/6oGvehEptDGdDXlG01YhjfoPt w2dNWZOhW4C5P8MIcuXpB8nr8PAWmlf4v1tyNF+qokmJCmEjXKyd6ozYie/kcoAl06W6 xSJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=DeGLSihIXe2dye7k+/JhZ1x/IeWjFaR/QlOoqoE3C5k=; b=JBiXCbq5RFgCcZPDCbr6rYW8nZcw9K7tCNTKKR7SDuzCNaQP+6pJARbdIzSVsJMNWw 1+nVWitFaftOfumGccPtzqizncPt72zU5FTePx9BXYIDKMsKah42ZW1J+ozo+nP4M3LH LPSxvo9i/gHHiegRtSQqm8og46U6v7ukyEyhVdUSOvbEYZmhiMwU8raNdctH2Y1VbnAf czZQ1h+bmE6dvivhdWUh4bdifjdx+/GVKHt8oCs6ra4nZ2NGfLHjud2eVbneqyLO43Gn ycMcNqRPm73Ch0Ax91OS74IsEZ4p7oeTmfSsOUPP4B+YyHZQsI6fLj/cWpDmI/yb0wVL Gs2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e3si3023493edq.595.2020.05.28.03.03.05; Thu, 28 May 2020 03:03:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387669AbgE1J5t (ORCPT + 99 others); Thu, 28 May 2020 05:57:49 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:46311 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387660AbgE1J5p (ORCPT ); Thu, 28 May 2020 05:57:45 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jeFIF-0002s4-9R; Thu, 28 May 2020 09:57:43 +0000 Date: Thu, 28 May 2020 11:57:42 +0200 From: Christian Brauner To: Jann Horn Cc: kernel list , Kees Cook , Andy Lutomirski , Tycho Andersen , Matt Denton , Sargun Dhillon , Chris Palmer , Aleksa Sarai , Robert Sesek , Jeffrey Vander Stoep , Linux Containers Subject: Re: [PATCH 1/2] seccomp: notify user trap about unused filter Message-ID: <20200528095742.cjwemtucwgvhxnxv@wittgenstein> References: <20200527111902.163213-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 28, 2020 at 06:04:48AM +0200, Jann Horn wrote: > On Wed, May 27, 2020 at 1:19 PM Christian Brauner > wrote: > > We've been making heavy use of the seccomp notifier to intercept and > > handle certain syscalls for containers. This patch allows a syscall > > supervisor listening on a given notifier to be notified when a seccomp > > filter has become unused. > [...] > > To fix this, we introduce a new "live" reference counter that tracks the > > live tasks making use of a given filter and when a notifier is > > registered waiting tasks will be notified that the filter is now empty > > by receiving a (E)POLLHUP event. > > The concept in this patch introduces is the same as for signal_struct, > > i.e. reference counting for life-cycle management is decoupled from > > reference counting live taks using the object. > [...] > > + * @live: tasks that actually use this filter, only to be altered > > + * during fork(), exit()/free_task(), and filter installation > > This comment is a bit off. Actually, @live counts the number of tasks > that use the filter directly plus the number of dependent filters that > have non-zero @live. I'll update the comment. > > [...] > > +void seccomp_filter_notify(const struct task_struct *tsk) > > +{ > > + struct seccomp_filter *orig = tsk->seccomp.filter; > > + > > + while (orig && refcount_dec_and_test(&orig->live)) { > > + if (waitqueue_active(&orig->wqh)) > > + wake_up_poll(&orig->wqh, EPOLLHUP); > > + orig = orig->prev; > > + } > > +} > > /me fetches the paint bucket > > Maybe name this seccomp_filter_unuse() or > seccomp_filter_unuse_notify() or something like that? The current name > isn't very descriptive. I think seccomp_filter_release() might be the right color. It would also line-up nicely with: - cgroup_release() - exit_mm_release() - exec_mm_release() - futex_exec_release() - ptrace_release_task() and others. Christian