Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp512917pxx; Mon, 26 Oct 2020 13:54:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxpuVvda1P/cTmj6qKoRIgyKtXrCKk+QgYhe6VbqReOKS02ocRQ9w68w8y41YRY9xPCTcsV X-Received: by 2002:a50:8125:: with SMTP id 34mr2140277edc.39.1603745640319; Mon, 26 Oct 2020 13:54:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603745640; cv=none; d=google.com; s=arc-20160816; b=BJsJDEw6vbzXKmdjdTYHeM2rRGH3ZNBWWA/V7DHm9SQ9DCLuHY4Xt4vu0PTnN+DHkT UR/0P00XL8ce2V6V8UTRBLoLXBWBgZkbI8B4NIWX97KKGstNl83oEYFgccOvLQ8cj2/9 /9849jE2Qq3yAN/3WjK2KvPqMjSlvwno3wp5Zbh3V7cN3iLKIsuGlGp1hW31KYbMCnYY 4yTxGjzPeZxU8kqz9UC4+0snNDcGg+kmY/lK1HqnSkPLzkkr4By/sOhNGtTjeAA3PiVP tYog+NwuH6GS460bhITWJoFvUSJ8B1Am5r0TaIz9EhjVP/656AJqAxlny06EBpr1j/0Y 0ilA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=f9BpXZd1dePTO41DoQrRpftD/OWsykwv67IUek0mRlE=; b=ujjeY9SADLGA2jfeh5qWDubQC4X7gPRT5uh01mS6nSTRNugofZsgiacKz5+JFQaj6/ 4BXXrp3M94RMVHvP6Ulp6cJ0uCVBcRMqCKAG6QNVQYNGOUrIX1tklej2LOiA8JgT9+BU YiECHMvrHuEgSxPrPTbMIr2d/BP23tVLai5bMurYdnyKYVmG67F0BFvYbvD/if2fyMRJ PEkEVqtkjWX2JOmZcUPSNfwvggdHEOtdlytGvdH/uPtzOD4kkKXtU74W3RPuRL1h+Eqz 6boR2wV7QjspYezmbRXlkFHJ9kjBqxLMJf07Qx/5vdmUbAcWQBBjYlYjIy/HIdMD6LFC 1fag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ds1ibtfW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jz5si8358597ejb.728.2020.10.26.13.53.35; Mon, 26 Oct 2020 13:54:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ds1ibtfW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1784559AbgJZP42 (ORCPT + 99 others); Mon, 26 Oct 2020 11:56:28 -0400 Received: from mail-lj1-f193.google.com ([209.85.208.193]:39691 "EHLO mail-lj1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1784512AbgJZPyg (ORCPT ); Mon, 26 Oct 2020 11:54:36 -0400 Received: by mail-lj1-f193.google.com with SMTP id m16so10741829ljo.6 for ; Mon, 26 Oct 2020 08:54:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=f9BpXZd1dePTO41DoQrRpftD/OWsykwv67IUek0mRlE=; b=ds1ibtfW9ahR96Tlz4l3O2V/NUXaipvZr+aslTh6VXSfEM3kHadDmcLxHX3+fim6vJ XY0aMB/b/crNHJ+KYUBJZ9iGD5uXLpL3GlW/LXPeqfuahvJ/oHJALa+ujnz4qCr+aF3G r3YPp5iqz7DHWZi7sVTcnGmjsSpbh0UvP+A1D/jSGedwgiDcEzQA2CCrlL6Pp0TOOfRR YMZcHkmybCLa/INJ/1vhYw0788bYvORlLQPrQTIeXQW7YGQIcVcM1gEQN52jutOBxAoP ZwBh7dYIIgqLz9FcZRZd4P1UVyI45sguGjVKNUDIESjDnNWyFcFcsZjbvjMRfKowcfFd EtFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=f9BpXZd1dePTO41DoQrRpftD/OWsykwv67IUek0mRlE=; b=U4W4XbGo6VKw3/+p85phEp4oVuEpDP/Iw7HgTJWGtlA6e+O4gRL7WZKlDgdJqugEAB SzhFrq9uBzkJqs7/7SL7gVW0TAJX4lzIagI0+NOaOQYa0OTHbgoPro9zskKCXCKTXAj7 PWYW9xOV45STmu4V/IrSPUXi8WI1HsRtH0rNbVRS6S1eFiCT0DtmkdX3UEQzx2l0H2iP W3jTSXTvRQP26uSl3nBUtFnjWiA/dhgCVkUszGo6us7mlN7x+xHN13sd1L8imbQKBOH+ REYjVl1fgtKf6R8UyNCqtOmchSb6hjE+QQ/dtvk/z0jMcLy1t037zjiZwizRCuP+9w2D vXfw== X-Gm-Message-State: AOAM533WLV1rq8CfTbqr+GWtucQnVA/FzPYPFNAb6AJlCAmVWOxuTYUJ aZZE8+Li+tKENdFlBWeBHzerfmBRd3WtDorSlum/IA== X-Received: by 2002:a2e:9c84:: with SMTP id x4mr5972527lji.326.1603727672694; Mon, 26 Oct 2020 08:54:32 -0700 (PDT) MIME-Version: 1.0 References: <45f07f17-18b6-d187-0914-6f341fe90857@gmail.com> <20200930150330.GC284424@cisco> <8bcd956f-58d2-d2f0-ca7c-0a30f3fcd5b8@gmail.com> <20200930230327.GA1260245@cisco> <20200930232456.GB1260245@cisco> <656a37b5-75e3-0ded-6ba8-3bb57b537b24@gmail.com> In-Reply-To: <656a37b5-75e3-0ded-6ba8-3bb57b537b24@gmail.com> From: Jann Horn Date: Mon, 26 Oct 2020 16:54:05 +0100 Message-ID: Subject: Re: For review: seccomp_user_notif(2) manual page To: "Michael Kerrisk (man-pages)" Cc: Tycho Andersen , Sargun Dhillon , Kees Cook , Christian Brauner , linux-man , lkml , Aleksa Sarai , Alexei Starovoitov , Will Drewry , bpf , Song Liu , Daniel Borkmann , Andy Lutomirski , Linux Containers , Giuseppe Scrivano , Robert Sesek Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 25, 2020 at 5:32 PM Michael Kerrisk (man-pages) wrote: > On 10/1/20 4:14 AM, Jann Horn wrote: > > On Thu, Oct 1, 2020 at 3:52 AM Jann Horn wrote: > >> On Thu, Oct 1, 2020 at 1:25 AM Tycho Andersen wrot= e: > >>> On Thu, Oct 01, 2020 at 01:11:33AM +0200, Jann Horn wrote: > >>>> On Thu, Oct 1, 2020 at 1:03 AM Tycho Andersen wr= ote: > >>>>> On Wed, Sep 30, 2020 at 10:34:51PM +0200, Michael Kerrisk (man-page= s) wrote: > >>>>>> On 9/30/20 5:03 PM, Tycho Andersen wrote: > >>>>>>> On Wed, Sep 30, 2020 at 01:07:38PM +0200, Michael Kerrisk (man-pa= ges) wrote: > >>>>>>>> =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=90 > >>>>>>>> =E2=94=82FIXME = =E2=94=82 > >>>>>>>> =E2=94=9C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=A4 > >>>>>>>> =E2=94=82From my experiments, it appears that if a = SEC=E2=80=90 =E2=94=82 > >>>>>>>> =E2=94=82COMP_IOCTL_NOTIF_RECV is done after the ta= rget =E2=94=82 > >>>>>>>> =E2=94=82process terminates, then the ioctl() simply bl= ocks =E2=94=82 > >>>>>>>> =E2=94=82(rather than returning an error to indicate that= the =E2=94=82 > >>>>>>>> =E2=94=82target process no longer exists). = =E2=94=82 > >>>>>>> > >>>>>>> Yeah, I think Christian wanted to fix this at some point, > >>>>>> > >>>>>> Do you have a pointer that discussion? I could not find it with a > >>>>>> quick search. > >>>>>> > >>>>>>> but it's a > >>>>>>> bit sticky to do. > >>>>>> > >>>>>> Can you say a few words about the nature of the problem? > >>>>> > >>>>> I remembered wrong, it's actually in the tree: 99cdb8b9a573 ("secco= mp: > >>>>> notify about unused filter"). So maybe there's a bug here? > >>>> > >>>> That thing only notifies on ->poll, it doesn't unblock ioctls; and > >>>> Michael's sample code uses SECCOMP_IOCTL_NOTIF_RECV to wait. So that > >>>> commit doesn't have any effect on this kind of usage. > >>> > >>> Yes, thanks. And the ones stuck in RECV are waiting on a semaphore so > >>> we don't have a count of all of them, unfortunately. > >>> > >>> We could maybe look inside the wait_list, but that will probably make > >>> people angry :) > >> > >> The easiest way would probably be to open-code the semaphore-ish part, > >> and let the semaphore and poll share the waitqueue. The current code > >> kind of mirrors the semaphore's waitqueue in the wqh - open-coding the > >> entire semaphore would IMO be cleaner than that. And it's not like > >> semaphore semantics are even a good fit for this code anyway. > >> > >> Let's see... if we didn't have the existing UAPI to worry about, I'd > >> do it as follows (*completely* untested). That way, the ioctl would > >> block exactly until either there actually is a request to deliver or > >> there are no more users of the filter. The problem is that if we just > >> apply this patch, existing users of SECCOMP_IOCTL_NOTIF_RECV that use > >> an event loop and don't set O_NONBLOCK will be screwed. So we'd > >> probably also have to add some stupid counter in place of the > >> semaphore's counter that we can use to preserve the old behavior of > >> returning -ENOENT once for each cancelled request. :( > >> > >> I guess this is a nice point in favor of Michael's usual complaint > >> that if there are no man pages for a feature by the time the feature > >> lands upstream, there's a higher chance that the UAPI will suck > >> forever... > > > > And I guess this would be the UAPI-compatible version - not actually > > as terrible as I thought it might be. Do y'all want this? If so, feel > > free to either turn this into a proper patch with Co-developed-by, or > > tell me that I should do it and I'll try to get around to turning it > > into something proper. > > Thanks for taking a shot at this. > > I tried applying the patch below to vanilla 5.9.0. > (There's one typo: s/ENOTCON/ENOTCONN). > > It seems not to work though; when I send a signal to my test > target process that is sleeping waiting for the notification > response, the process enters the uninterruptible D state. > Any thoughts? Ah, yeah, I think I was completely misusing the wait API. I'll go change th= at. (Btw, in general, for reports about hangs like that, it can be helpful to have the contents of /proc/$pid/stack. And for cases where CPUs are spinning, the relevant part from the output of the "L" sysrq, or something like that.) Also, I guess we can probably break this part of UAPI after all, since the only user of this interface seems to currently be completely broken in this case anyway? So I think we want the other implementation without the ->canceled_reqs logic after all. I'm a bit on the fence now on whether non-blocking mode should use ENOTCONN or not... I guess if we returned ENOENT even when there are no more listeners, you'd have to disambiguate through the poll() revents, which would be kinda ugly? I'll try to turn this into a proper patch submission...