Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp56545pxx; Mon, 26 Oct 2020 03:16:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz7brFAJr5mm2GkOFz1NJN8ZcxCll9UPOuQLv6oG13aAqbxRqI0kf98mTaHX5IR6g+twU/n X-Received: by 2002:a17:906:3a1a:: with SMTP id z26mr14645493eje.519.1603707389816; Mon, 26 Oct 2020 03:16:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603707389; cv=none; d=google.com; s=arc-20160816; b=AGgpKPlRYnYvJyzUcx85B7bEUC4CmGHUylFtCMJ+5rq9PR7TfnUXX91VI4g3DekMAS B/1jGLyPh+D17Lq61WzeBMKyRcJG4jiG1ItlW/wx4t1n1ryY1CeGBXTKNtjW1x7gp5Hz 2cT1PNuOx2Nie+hyICLOj94h+3y4zlVcCr/JeBOvqrEZI8DHMmtrrPZJO7gCqIh54AwR w/Eh/EvEyR27XtvvogNvfet7ovlaZIf8woBqx/YT9qV0j4DNOIT3nBvnaL5WOsyGIZlR Z7rXZqHxDJgiZ6hli/nm6eTZ0j8p0asv1w2Fm7ZzFVWzxC2NpeGoYuk94D5MpvRwTCRq YphA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=StRkGUih/x/ZPb5WlhQHkdi1r1gUGmh5+DpBU/QI0b8=; b=W83QZSwCeVdAkH3hr95DAtts1q4yvHybQ/aDyn5rJXic2plpT9tAME78Bv6LbF+Y3F ZSlyC3A3eNv8C0i7uhDsfhORZYZ+gs8dwFhZxghbzPH6FPSS2u4NgCnpJRH3ghkVf8IX KqZ00C9zD2SSTDbrwqPsAQflPDh3hLWpYjZ5Wi+JOW0S0X7zMxDryC5omA3KKNZ3bt0C Uqp0s8A8eEmbhQxb4TgGjFeBm3FsXoTxX10yALMn2UE3+HIwFnCjmbm31z0tH+oflDxu Sti73lhJqs8O6dJOZ94qfQXAFeyijup/G2C83TdNy4B/P5Gr3givzIIZCsD8Vdvh4esh 1HDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bxArDnwn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b18si7065518ejz.84.2020.10.26.03.16.06; Mon, 26 Oct 2020 03:16:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bxArDnwn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422408AbgJZJcd (ORCPT + 99 others); Mon, 26 Oct 2020 05:32:33 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:46927 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422404AbgJZJcd (ORCPT ); Mon, 26 Oct 2020 05:32:33 -0400 Received: by mail-lf1-f65.google.com with SMTP id v6so10816996lfa.13 for ; Mon, 26 Oct 2020 02:32:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=StRkGUih/x/ZPb5WlhQHkdi1r1gUGmh5+DpBU/QI0b8=; b=bxArDnwnKKVb47+2JvvYUq8R/IEKsf7ylVrQGNciSnSoYPwE7dnVtR5YwfMO5uBNZw fHEDR8Tr/1TRXyiJJq95vuLudMRzpJqxxDTnXOTuZKU6/Z3xjWXSTs4P9xyBhGIaB0Mj dEj7BC/L/+TXKWqGb0DXEmgMH0eBzD3NgBeP5L0Bz4eQGAdM3hCz+SHgxO9GytIKoU4w JiHeltxTxG2TscaaBmkbuFitP3camZYG1gpykQPwS9aCxt/gwIFltkyjA/3cOBDh9NN8 32EcuS/qkGiTUBRiM6AmttqPELBPELkKaBF1xwD7TWGHSMVEVSqBV892aAqbMi/3zsVw Y4mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=StRkGUih/x/ZPb5WlhQHkdi1r1gUGmh5+DpBU/QI0b8=; b=in+PoyRmkEokzIp1ccDK8lDJXMvEqD+ywZyYPLqwhM3chstSa2C9t3dMOXj2Cu1e+n eqlC6r4GwCnN8glcANBceeCVpbY5vdjqsjDKBUeGM3EE5vOvJiMqQF1SwCxNpSeGyxrc YW9P8pzrIziptDnXFTu57nRPpKjX9rN5TdIT5MdUaSik+e0+9Eb5Nfj6QMGI4UqT29dZ Iycx4yMhMeNnnAisxKB1Ne/7u/kd7zjFu/gBbVs52IvApF0K4f7tTUQfeBcBFfYUMXCZ Ah7Wq3G90r/U1bKVqJopXbgL7mDbyRUt2Y6yVAqqsXEC19WcGCxekck+QoM7qZrRdjI3 rG5A== X-Gm-Message-State: AOAM533cCwz9cdSJ/E0jaimP9c4GbnLZxIGPE/6P/xX5OZIZYcdedt+e cFCmDmqx6aVfA5tNCs16AK7f02ovVlgLHTo37nDWXw== X-Received: by 2002:a05:6512:1182:: with SMTP id g2mr4425834lfr.198.1603704749529; Mon, 26 Oct 2020 02:32:29 -0700 (PDT) MIME-Version: 1.0 References: <45f07f17-18b6-d187-0914-6f341fe90857@gmail.com> <5647b94a-4693-dad0-6e0d-ed178b495d65@gmail.com> <0f41f776-9379-9ee6-df4b-e7538f69313e@gmail.com> <887d5a29-edaa-2761-1512-370c1f5c3a6f@gmail.com> In-Reply-To: <887d5a29-edaa-2761-1512-370c1f5c3a6f@gmail.com> From: Jann Horn Date: Mon, 26 Oct 2020 10:32:03 +0100 Message-ID: Subject: Re: For review: seccomp_user_notif(2) manual page To: "Michael Kerrisk (man-pages)" Cc: Tycho Andersen , Sargun Dhillon , Kees Cook , Christian Brauner , linux-man , lkml , Aleksa Sarai , Alexei Starovoitov , Will Drewry , bpf , Song Liu , Daniel Borkmann , Andy Lutomirski , Linux Containers , Giuseppe Scrivano , Robert Sesek Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 24, 2020 at 2:53 PM Michael Kerrisk (man-pages) wrote: > On 10/17/20 2:25 AM, Jann Horn wrote: > > On Fri, Oct 16, 2020 at 8:29 PM Michael Kerrisk (man-pages) > > wrote: [...] > >> I'm not sure if I should write anything about this small UAPI > >> breakage in BUGS, or not. Your thoughts? > > > > Thinking about it a bit more: Any code that relies on pause() or > > epoll_wait() not restarting is buggy anyway, right? Because a signal > > could also arrive directly before entering the syscall, while > > userspace code is still executing? So one could argue that we're just > > enlarging a preexisting race. (Unless the signal handler checks the > > interrupted register state to figure out whether we already entered > > syscall handling?) > > Yes, that all makes sense. > > > If userspace relies on non-restarting behavior, it should be using > > something like epoll_pwait(). And that stuff only unblocks signals > > after we've already past the seccomp checks on entry. > > Thanks for elaborating that detail, since as soon as you talked > about "enlarging a preexisting race" above, I immediately wondered > sigsuspend(), pselect(), etc. > > (Mind you, I still wonder about the effect on system calls that > are normally nonrestartable because they have timeouts. My > understanding is that the kernel doesn't restart those system > calls because it's impossible for the kernel to restart the call > with the right timeout value. I wonder what happens when those > system calls are restarted in the scenario we're discussing.) Ah, that's an interesting edge case... > Anyway, returning to your point... So, to be clear (and to > quickly remind myself in case I one day reread this thread), > there is not a problem with sigsuspend(), pselect(), ppoll(), > and epoll_pwait() since: > > * Before the syscall, signals are blocked in the target. > * Inside the syscall, signals are still blocked at the time > the check is made for seccomp filters. > * If a seccomp user-space notification event kicks, the target > is put to sleep with the signals still blocked. > * The signal will only get delivered after the supervisor either > triggers a spoofed success/failure return in the target or the > supervisor sends a CONTINUE response to the kernel telling it > to execute the target's system call. Either way, there won't be > any restarting of the target's system call (and the supervisor > thus won't see multiple notifications). > > (Right?) Yeah. [...] > > So we should probably document the restarting behavior as something > > the supervisor has to deal with in the manpage; but for the > > "non-restarting syscalls can restart from the target's perspective" > > aspect, it might be enough to document this as quirky behavior that > > can't actually break correct code? (Or not document it at all. Dunno.) > > So, I've added the following to the page: > > Interaction with SA_RESTART signal handlers > Consider the following scenario: > > =C2=B7 The target process has used sigaction(2) to install a s= ignal > handler with the SA_RESTART flag. > > =C2=B7 The target has made a system call that triggered a seccomp = user- > space notification and the target is currently blocked until the > supervisor sends a notification response. > > =C2=B7 A signal is delivered to the target and the signal handl= er is > executed. > > =C2=B7 When (if) the supervisor attempts to send a notific= ation > response, the SECCOMP_IOCTL_NOTIF_SEND ioctl(2)) operation will > fail with the ENOENT error. > > In this scenario, the kernel will restart the target's system > call. Consequently, the supervisor will receive another user- > space notification. Thus, depending on how many times the blocked > system call is interrupted by a signal handler, the supervisor may > receive multiple notifications for the same system call in the > target. > > One oddity is that system call restarting as described in this > scenario will occur even for the blocking system calls listed in > signal(7) that would never normally be restarted by the SA_RESTART > flag. > > Does that seem okay? Sounds good to me. > In addition, I've queued a cross-reference in signal(7): > > In certain circumstances, the seccomp(2) user-space notifi=E2=80= =90 > cation feature can lead to restarting of system calls that > would otherwise never be restarted by SA_RESTART; for > details, see seccomp_user_notif(2).