Received: by 10.213.65.68 with SMTP id h4csp481765imn; Fri, 16 Mar 2018 09:04:52 -0700 (PDT) X-Google-Smtp-Source: AG47ELuKA4HoQxZo9aA51u32uB/RlulXxanpGLTFs2uOs/4HPRUR1jl5lft586r9XuCuG3QpQ+xD X-Received: by 10.101.74.77 with SMTP id a13mr1928660pgu.32.1521216292771; Fri, 16 Mar 2018 09:04:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521216292; cv=none; d=google.com; s=arc-20160816; b=lrZDr8vlhcUmnV+tHd0rEv4dJOjumQY293lwddVBDQEw+ZsPd4/ak5ZO3f++XtHmA8 ibQv+bHWUfB02mcxPXdHVSd+O3yUp9dwXQjdYSQrCLFPIS/Vw2cf0n+84w6+cjNeLLaO oWKUvkC+Yg6g+H8Rd64NuYJC0YJTN7BTq+29XIN6xbxC62icFftnelZKLLRtMDCBkJDQ Bzaji9TEA/mFOGm8JjN+NcwsrmM8Z4F4HcPHLB0uBcd0YKmmizy01FwfDSVSQwef9mdB D7YuxAN46aHbL7Deg84niA9DbBnokSd1yiypOS2I3XzvYz4U/SzPN5lc8LIsem5nRVal Ua2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=xuUyjlW6VRIWOVBkXfvL+1usO0aHaMc4XoWi8YQYKoY=; b=CiEPmUt3B91SQFs0SyDkwhrjOa1rGQqE9iPNnHg4fHkAp3DJj56VfZQ93W30uLPvFa 7qtdfd7R9mshzJmWm1gd0SWhdOHJHbZWglCcuEwl46eEtAg8tNzXet1fYhlxMhT4eqdQ ZPuIoUgM8lLkM/g3E/xdsHK16e+oprYG/A1bwQwwq2+xvkgNKCaIus6bYbRA+6E2OJ9B JPJul19vUntq+aMueB6yVzpZHIgH3ondcQawFeHPCZ2z0d5cUIj51IOKW89sQhsZPhoS kynNEjdJjbSYTJaybyCWsPQ5sEY6UipgdkGQIoD9bYaM3a+ZZya3y4e4SsZe781vXO6h iByg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=hMWsqsbu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x75si5702594pff.339.2018.03.16.09.04.37; Fri, 16 Mar 2018 09:04:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=hMWsqsbu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934591AbeCPQBx (ORCPT + 99 others); Fri, 16 Mar 2018 12:01:53 -0400 Received: from mail-pl0-f68.google.com ([209.85.160.68]:45307 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933817AbeCPQBv (ORCPT ); Fri, 16 Mar 2018 12:01:51 -0400 Received: by mail-pl0-f68.google.com with SMTP id v9-v6so6159150plp.12 for ; Fri, 16 Mar 2018 09:01:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=xuUyjlW6VRIWOVBkXfvL+1usO0aHaMc4XoWi8YQYKoY=; b=hMWsqsbulbgBvR9yb50UzBp9uv+gI3aQcs6lAVKPgJGq2GOnf/QRhrw/if/DzaWeaW 4kf5pvxdxwNPlrfzR37mMNQJ7r+PO83IyHmrjXO+yIdBAWGhCUxYNVtOH06oml2JdUYM 9AuECF1bDODCQEfzJXsh287HE1XAu7gsG7AJAacXvMmt6dL4qUn2qT5ZIcCajCeoTksb 637kQiOCSULSYUrBU9mmhB2Vb/lzgINhSqyyY3pGC6exAv/QdgzUh/il4MyttDbwiuCr r4pzW7C36w2rnSDToiuzsE7CACSuOYyvY02MaDRiT1rIoDMTyqNOjl5LeIV6RTCoTSCn kLSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=xuUyjlW6VRIWOVBkXfvL+1usO0aHaMc4XoWi8YQYKoY=; b=rkjrNuKH7OubjNi7nO/BVs7rgMJerQPRKPBoLDLE6vR5aKvrAYlatg/tGZ29Y51soR znAMAv5j+aQoWIie76xJImUhcZw0hlIGihPl/r9Z9bgJBf9MgdRk81sVi+zsTpK9oKbj fzQ4fs6HSTHFv4O3uadQwYNRb3d3cD3lBkTW1OXB/Yf3gTU66jZbGw96KsFajNR8XxsN S/O4cZO0sJfFebwxNGmKd5CRfjbp7ztxEmA0/I5aCVbj7MvIe4pgxc4YEiPdxTE9JiRy nSLtLn7bZ3SzunSKGkqUjXqIRo/DP9QR39xfRVxH6/63fyrbJdJIAw2Du0LkRORFX8De c+KQ== X-Gm-Message-State: AElRT7G0b6w08E1aPdvaCLLq/PlgqV3O0YB3vOLW3U6N724i5DxMN6GB oWhVJZx2fgAwvTd/uifncxDl+A== X-Received: by 2002:a17:902:6c06:: with SMTP id q6-v6mr97882plk.37.1521216109956; Fri, 16 Mar 2018 09:01:49 -0700 (PDT) Received: from ?IPv6:2600:1010:b04b:2121:d061:81e4:a07a:10a6? ([2600:1010:b04b:2121:d061:81e4:a07a:10a6]) by smtp.gmail.com with ESMTPSA id s86sm16362116pfi.4.2018.03.16.09.01.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Mar 2018 09:01:49 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [RFC 0/3] seccomp trap to userspace From: Andy Lutomirski X-Mailer: iPhone Mail (15D100) In-Reply-To: <20180316144751.GA3304@mailbox.org> Date: Fri, 16 Mar 2018 09:01:47 -0700 Cc: Andy Lutomirski , Tycho Andersen , Kees Cook , Linux Containers , Akihiro Suda , LKML , Oleg Nesterov , Christian Brauner , "Eric W . Biederman" , Christian Brauner , Tyler Hicks , Alexei Starovoitov Content-Transfer-Encoding: quoted-printable Message-Id: References: <20180204104946.25559-1-tycho@tycho.ws> <20180315160924.GA12744@gmail.com> <20180315170509.GA32766@mail.hallyn.com> <20180315173524.k7vwnvnhomg2j5yv@smitten> <20180316144751.GA3304@mailbox.org> To: Christian Brauner Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 16, 2018, at 7:47 AM, Christian Brauner wrote: >=20 >> On Fri, Mar 16, 2018 at 12:46:55AM +0000, Andy Lutomirski wrote: I bet I confused everyone with a blatant typo: >>=20 >> Hmm, I think we have to be very careful to avoid nasty races. I think >> the correct approach is to notice the signal and send a message to the >> listener that a signal is pending but to take no additional action. >> If the handler ends up completing the syscall with a successful >> return, we don't want to replace it with -EINTR. IOW the code looks >> kind of like: >>=20 >> send_to_listener("hey I got a signal"); That should be =E2=80=9Chey I got a syscall=E2=80=9D. D=E2=80=99oh! >> wait_ret =3D wait_interruptible for the listener to reply; >> if (wait_ret =3D=3D -EINTR) { >=20 > Hm, so from the pseudo-code it looks like: The handler would inform the > listener that it received a signal (either from the syscall requester or > from somewhere else) and then wait for the listener to reply to that > message. This would allow the listener to decide what action it wants > the handler to take based on the signal, i.e. either cancel the request > or retry? The comment makes it sound like that the handler doesn't > really wait on the listener when it receives a signal it simply moves > on. It keeps waiting killably but not interruptibly.=20 > So no "taking no additional action" here means not have the handler > decide to abort but the listener? If by =E2=80=9Chandler=E2=80=9D you mean kernel, then yes.=20 There=E2=80=99s no userspace syscall handler involved. =46rom the kernel=E2=80= =99s perspective, a syscall is never still in progress when a signal handler= is invoked =E2=80=94 we only actually invoke syscall handlers in prepare_ex= it_to_usermode() or the non-x86 equivalent and the functions it calls. While= a syscall is running, the kernel might notice that a signal is pending and d= o one of a few things: 1. Just keep going. Not all syscalls can be interrupted.=20 2. Try to finish early. If a send() call has already sent some but not all d= ata, it can stop waiting and return the number of bytes sent. 3. Abort with -EINTR. 4. Abort with -ERESTARTSYS or one of its relatives. These fiddle with user r= egisters in a somewhat unpleasant way to pretend that the syscall never actu= ally happened. This works for syscalls that wait with an absolute timeout, f= or example.=20 5. Set up restart_syscall() magic, rewrite regs so it looks like the user wa= s about to call restart_syscall() when the signal happened, and abort.=20 In all cases, the signal is dealt with afterwards. This could result in chan= ging regs to call the handler or in simply returning.=20 1-3 should work fully in seccomp. The only issue is that the kernel doesn=E2= =80=99t know *which* to do, nor can the kernel force the listener to abort c= leanly, so I think we have no real choice but to let the listener decide.=20= 4 could be supported just like 1-3. 5 is awful, and I don=E2=80=99t think we= should support it for user listeners.=20