Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp99520pxk; Wed, 30 Sep 2020 19:17:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzCTZIlk/inEfSFHdnlr6XNkjTZUpcDBVkUmaPBlud6sjuBnEbmWJnuDJ8kIKT63lCddez X-Received: by 2002:aa7:c054:: with SMTP id k20mr5932061edo.224.1601518664039; Wed, 30 Sep 2020 19:17:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601518664; cv=none; d=google.com; s=arc-20160816; b=d2HrP9cFEvpugkAd66cw04QmONUFVXpFU2qJi87jckyp2m1INAnnJg8ShCJzIsQHeY 2tgVF1/+3M6VvfANNgKJ8BA4uv4qSgbQoMVJo7HIDxepgFBiKofOgwENgVhx/uyAgJ2s RGhQ297t05E1Hxmj1Oky+nj0Z/+221Ptk0/7ATZDW9arGfGUhjzPypRMR24UPIm8emcw +7GwtpjXflIGsk+41hnJkx59eZxCLMDwpq+j83vQIl1H4orq8e+l0vjBSFSaPSTPf9BI PGJFHuPV/lhge6gGWxp/9J2mGgX7i+tzWIps0Q9KSJwbuouWyEtGmuLt+kdwrqqhEO1Q ctxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=VRlF+sTfUEUU0FQV/Q3ZTIfFAHBYdXbqZatAP+Y9VjY=; b=ZdT5Nlj0IG6Zv+vwkMD3edz/8ZqtCedfBCpIzGFmnTGsoLGDWIOvC2Gy8f7mU1TX3+ uxCZrX9EOxJmzbIoGj+kYdjdSQXgn+xEedZQ0eroeXvnZ4WaJgP5gZpY1AQvmIjNhSxz SpRlpbEZjmRsUMy0JG0KJMFFxgReqFZO5B/O7h+GYcZ+GwBnkEY7XKrYvM/zzSnj83NV +Em8kWJQ2Hon95Tbdr+9CD3jZFlsjvnjqwmPeo2dREbM8rZJRcjamT+/XHLzN6CeWFtn lFtwiSFE2hABen//aQy6ANdl8SBgMT+JMz4dlFmGjxk88mwFD92IRgRd8gd6vwoek//o fY8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=f3D4rSyO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c26si2505772eds.320.2020.09.30.19.17.21; Wed, 30 Sep 2020 19:17:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=f3D4rSyO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730457AbgJACOf (ORCPT + 99 others); Wed, 30 Sep 2020 22:14:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726992AbgJACOf (ORCPT ); Wed, 30 Sep 2020 22:14:35 -0400 Received: from mail-ej1-x641.google.com (mail-ej1-x641.google.com [IPv6:2a00:1450:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB077C0613D1 for ; Wed, 30 Sep 2020 19:14:34 -0700 (PDT) Received: by mail-ej1-x641.google.com with SMTP id p15so5672663ejm.7 for ; Wed, 30 Sep 2020 19:14:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=VRlF+sTfUEUU0FQV/Q3ZTIfFAHBYdXbqZatAP+Y9VjY=; b=f3D4rSyONiBy/W2syYMtXYxEiZxsifbXExflXIoDgFLYJaq88Fye9iITAygHX0Inau 94AyiPomN616HnDcapWTX7fv1JiE7wFoC7v7CGcBX4h3ObrkigSJekXbgkrObeDgb8sA RoePxFu67r3XoJaEPM8u2UGZdHgBWVWWciLe7zB2C32HVmLdywhAnM6XV9PG5GpX0DvI ix1v3FQCqnYVPBd5u6kN3TyXKZYWMZEbJXQF83FDnR0n8xs7nobbIdjyzYkj9y3ye20i unMoq/H84N4g7kWMxVnH5h0nwEh01QJ09jrh6oGSqFwyb22qu+9FNVlqLXAiqafRe4e+ z4xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=VRlF+sTfUEUU0FQV/Q3ZTIfFAHBYdXbqZatAP+Y9VjY=; b=TDPPNdTYVWelkrj8sLTMTcBpaNUami5vfu/XcI4ObimAw+nTS8dbOF3AgFVNpUoQoS BDfeEQv9vaVFJZ2yWxu9/q8YIo4t6szsUeuUdYRtZBUOQGWLVZ4JEJk2Wf0ivkV+1xA0 0J2BFH/ObNlychUou0aAcdLhJp7INgxAjd99UqLrTOOO8FI3Y/yhTPTg7xEJQ23DQmyN bn0o8P6co120WwGfF1ilHKsQ3+n0Vj3WBy7p4v6FAHCx5D/hJXC3C9kScHIc4oWmljPo X/Y+UJh5TWW43WDzMAmrXQOa0c/y0JriXpHSX7KCvFcdmNonKOVo7ff4DsiDz846WjU4 /EDA== X-Gm-Message-State: AOAM531B/wTta7Gw/Ip5YYVorjMitO1X0CXqxp4n/OH9ATlXN+e1U9i6 1BW9Jf8CeJL+igczNmsAxD0TfVLZPYAGlQKCHv33RA== X-Received: by 2002:a17:907:94cf:: with SMTP id dn15mr5974052ejc.114.1601518472957; Wed, 30 Sep 2020 19:14:32 -0700 (PDT) MIME-Version: 1.0 References: <45f07f17-18b6-d187-0914-6f341fe90857@gmail.com> <20200930150330.GC284424@cisco> <8bcd956f-58d2-d2f0-ca7c-0a30f3fcd5b8@gmail.com> <20200930230327.GA1260245@cisco> <20200930232456.GB1260245@cisco> In-Reply-To: From: Jann Horn Date: Thu, 1 Oct 2020 04:14:06 +0200 Message-ID: Subject: Re: For review: seccomp_user_notif(2) manual page To: Tycho Andersen Cc: "Michael Kerrisk (man-pages)" , Sargun Dhillon , Kees Cook , Christian Brauner , linux-man , lkml , Aleksa Sarai , Alexei Starovoitov , Will Drewry , bpf , Song Liu , Daniel Borkmann , Andy Lutomirski , Linux Containers , Giuseppe Scrivano , Robert Sesek Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 1, 2020 at 3:52 AM Jann Horn wrote: > On Thu, Oct 1, 2020 at 1:25 AM Tycho Andersen wrote: > > On Thu, Oct 01, 2020 at 01:11:33AM +0200, Jann Horn wrote: > > > On Thu, Oct 1, 2020 at 1:03 AM Tycho Andersen wro= te: > > > > On Wed, Sep 30, 2020 at 10:34:51PM +0200, Michael Kerrisk (man-page= s) wrote: > > > > > On 9/30/20 5:03 PM, Tycho Andersen wrote: > > > > > > On Wed, Sep 30, 2020 at 01:07:38PM +0200, Michael Kerrisk (man-= pages) wrote: > > > > > >> =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=90 > > > > > >> =E2=94=82FIXME = =E2=94=82 > > > > > >> =E2=94=9C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=A4 > > > > > >> =E2=94=82From my experiments, it appears that if a= SEC=E2=80=90 =E2=94=82 > > > > > >> =E2=94=82COMP_IOCTL_NOTIF_RECV is done after the = target =E2=94=82 > > > > > >> =E2=94=82process terminates, then the ioctl() simply = blocks =E2=94=82 > > > > > >> =E2=94=82(rather than returning an error to indicate th= at the =E2=94=82 > > > > > >> =E2=94=82target process no longer exists). = =E2=94=82 > > > > > > > > > > > > Yeah, I think Christian wanted to fix this at some point, > > > > > > > > > > Do you have a pointer that discussion? I could not find it with a > > > > > quick search. > > > > > > > > > > > but it's a > > > > > > bit sticky to do. > > > > > > > > > > Can you say a few words about the nature of the problem? > > > > > > > > I remembered wrong, it's actually in the tree: 99cdb8b9a573 ("secco= mp: > > > > notify about unused filter"). So maybe there's a bug here? > > > > > > That thing only notifies on ->poll, it doesn't unblock ioctls; and > > > Michael's sample code uses SECCOMP_IOCTL_NOTIF_RECV to wait. So that > > > commit doesn't have any effect on this kind of usage. > > > > Yes, thanks. And the ones stuck in RECV are waiting on a semaphore so > > we don't have a count of all of them, unfortunately. > > > > We could maybe look inside the wait_list, but that will probably make > > people angry :) > > The easiest way would probably be to open-code the semaphore-ish part, > and let the semaphore and poll share the waitqueue. The current code > kind of mirrors the semaphore's waitqueue in the wqh - open-coding the > entire semaphore would IMO be cleaner than that. And it's not like > semaphore semantics are even a good fit for this code anyway. > > Let's see... if we didn't have the existing UAPI to worry about, I'd > do it as follows (*completely* untested). That way, the ioctl would > block exactly until either there actually is a request to deliver or > there are no more users of the filter. The problem is that if we just > apply this patch, existing users of SECCOMP_IOCTL_NOTIF_RECV that use > an event loop and don't set O_NONBLOCK will be screwed. So we'd > probably also have to add some stupid counter in place of the > semaphore's counter that we can use to preserve the old behavior of > returning -ENOENT once for each cancelled request. :( > > I guess this is a nice point in favor of Michael's usual complaint > that if there are no man pages for a feature by the time the feature > lands upstream, there's a higher chance that the UAPI will suck > forever... And I guess this would be the UAPI-compatible version - not actually as terrible as I thought it might be. Do y'all want this? If so, feel free to either turn this into a proper patch with Co-developed-by, or tell me that I should do it and I'll try to get around to turning it into something proper. diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 676d4af62103..d08c453fcc2c 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -138,7 +138,7 @@ struct seccomp_kaddfd { * @notifications: A list of struct seccomp_knotif elements. */ struct notification { - struct semaphore request; + bool canceled_reqs; u64 next_id; struct list_head notifications; }; @@ -859,7 +859,6 @@ static int seccomp_do_user_notification(int this_syscal= l, list_add(&n.list, &match->notif->notifications); INIT_LIST_HEAD(&n.addfd); - up(&match->notif->request); wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM); mutex_unlock(&match->notify_lock); @@ -901,8 +900,20 @@ static int seccomp_do_user_notification(int this_sysca= ll, * *reattach* to a notifier right now. If one is added, we'll need = to * keep track of the notif itself and make sure they match here. */ - if (match->notif) + if (match->notif) { list_del(&n.list); + + /* + * We are stuck with a UAPI that requires that after a spur= ious + * wakeup, SECCOMP_IOCTL_NOTIF_RECV must return immediately= . + * This is the tracking for that, keeping track of whether = we + * canceled a request after waking waiters, but before user= space + * picked up the notification. + */ + if (n.state =3D=3D SECCOMP_NOTIFY_INIT) + match->notif->canceled_reqs =3D true; + } + out: mutex_unlock(&match->notify_lock); @@ -1178,6 +1189,7 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, void __user *buf) { struct seccomp_knotif *knotif =3D NULL, *cur; + DECLARE_WAITQUEUE(wait, current); struct seccomp_notif unotif; ssize_t ret; @@ -1190,11 +1202,9 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, memset(&unotif, 0, sizeof(unotif)); - ret =3D down_interruptible(&filter->notif->request); - if (ret < 0) - return ret; - mutex_lock(&filter->notify_lock); + +retry: list_for_each_entry(cur, &filter->notif->notifications, list) { if (cur->state =3D=3D SECCOMP_NOTIFY_INIT) { knotif =3D cur; @@ -1202,14 +1212,32 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, } } - /* - * If we didn't find a notification, it could be that the task was - * interrupted by a fatal signal between the time we were woken and - * when we were able to acquire the rw lock. - */ if (!knotif) { - ret =3D -ENOENT; - goto out; + /* This has to happen before checking &filter->users. */ + prepare_to_wait(&filter->wqh, &wait, TASK_INTERRUPTIBLE); + + /* + * If all users of the filter are gone, throw an error inst= ead + * of pointlessly continuing to block. + */ + if (refcount_read(&filter->users) =3D=3D 0) { + ret =3D -ENOTCON; + goto out; + } + if (filter->notif->canceled_reqs) { + ret =3D -ENOENT; + goto out; + } else { + /* No notifications pending - wait for one, then retry. */ + mutex_unlock(&filter->notify_lock); + schedule(); + mutex_lock(&filter->notify_lock); + if (signal_pending(current)) { + ret =3D -EINTR; + goto out; + } + goto retry; + } } unotif.id =3D knotif->id; @@ -1220,6 +1248,8 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, wake_up_poll(&filter->wqh, EPOLLOUT | EPOLLWRNORM); ret =3D 0; out: + filter->notif->canceled_reqs =3D false; + finish_wait(&filter->wqh, &wait); mutex_unlock(&filter->notify_lock); if (ret =3D=3D 0 && copy_to_user(buf, &unotif, sizeof(unotif))) { @@ -1233,10 +1263,8 @@ static long seccomp_notify_recv(struct seccomp_filter *filter, */ mutex_lock(&filter->notify_lock); knotif =3D find_notification(filter, unotif.id); - if (knotif) { + if (knotif) knotif->state =3D SECCOMP_NOTIFY_INIT; - up(&filter->notif->request); - } mutex_unlock(&filter->notify_lock); } @@ -1485,7 +1513,6 @@ static struct file *init_listener(struct seccomp_filter *filter) if (!filter->notif) goto out; - sema_init(&filter->notif->request, 0); filter->notif->next_id =3D get_random_u64(); INIT_LIST_HEAD(&filter->notif->notifications);