Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3633736imu; Fri, 30 Nov 2018 03:42:42 -0800 (PST) X-Google-Smtp-Source: AFSGD/Wzi4j6Akr8kMwiF0doXNHGqPa31Ig7GlxPx051NthymnJatLm6RyOU/Ye4mN9MIDoTb5sO X-Received: by 2002:a65:4142:: with SMTP id x2mr4464170pgp.356.1543578161960; Fri, 30 Nov 2018 03:42:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543578161; cv=none; d=google.com; s=arc-20160816; b=pI5fqiLniyAwfRbM8bQmYj/eTdNHzqN30/mVU9ou+s55NC537mjeVFdT26Pr0o+e2G tHXbDAjokqm9LfhtTOSLkprf2BNwM9u42RvHLeM3hgds/N4W3YdDOl20HQS24FYwAlq6 fXKdUMrE6ljoWLfimobmPzZVL+k7d5OF+MegtjhDVOVS3PUux03jyiCwiy9X+Lp8UaqT nbFv67sji5sy5TPJArR7QFa+H6x7qoSxhicOPAJ9wvDi2RHqoDv+uuGt5KBsGXDv6md1 TU4yfWw7XgpGYKtDES23z3c81+lmYdjwuTI7+Z09tRNR+1o++EzI65aan/t94Gq5uTN8 8xSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=nZQHKiZnV9bH/l6eyhiWqAq+vzH9+nbM90oA3cnkcJE=; b=W2QnIxsem1vpXnRsjD9mpaGT/43pyyw8/IzbndLTAMP67WYH+jJ2DqrKNUJk1gwwu4 ikuppTabjrv/dFWqqpGEcErpd4K3xBo0/Qax3nDXD2rZpGqNnE3rNx4tmhkRLdSBqNNh Y4G+i/H7opTDOG/xpJ130vCY1GtMKwG0yrDjb9m8YUJbsafToX+GRtZPQgvgarKIuSka NIp2Q3CppReNGVdgYSaFw7OyTxQNZ0/Tv+yUOVwrUNrDSxYm7nFElmSc+6qby9iFtygo hFZu4B/pW3n11h7QNjNPPNx7NMN8PpGMaWgGmH54zf6KrxIBVQL2jqENRygt9oCZEcWm uD5Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 44si5264159plb.57.2018.11.30.03.42.26; Fri, 30 Nov 2018 03:42:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726660AbeK3Wuu (ORCPT + 99 others); Fri, 30 Nov 2018 17:50:50 -0500 Received: from mail-qk1-f194.google.com ([209.85.222.194]:38348 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726473AbeK3Wuu (ORCPT ); Fri, 30 Nov 2018 17:50:50 -0500 Received: by mail-qk1-f194.google.com with SMTP id d19so2955859qkg.5; Fri, 30 Nov 2018 03:41:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nZQHKiZnV9bH/l6eyhiWqAq+vzH9+nbM90oA3cnkcJE=; b=HL5AzEmCmYUFuerpT02ODuC3QFjRk7GmaHaOZFDYYNxLjTNNWl0vIoH+3w1G5R3RDC qE8y65QDGi0AyzDroZ96guCsGjk9lgXQls6Uy9HSsL6GeSwjH9ZekPTHVDETB+T2ZuCT eoMvWAPYhYm5IzuL8miTJaZAx5qEFCxbe99DEN8YBm57ElZ76ENcgdJu534UOrGqBRt6 Ii48568lz33xPWaGQqrnfHUnC4GrEp2p016ust+Iu1P3e2o0/Qe/D8l3+Fd+iJ4aVXjI IEQ4w39PpmMla0R61jfU37QNmeuVF5VuM9bRaqMxKRBYnLcd8clNBAIA7WI/AESFYWUc 3krA== X-Gm-Message-State: AA+aEWb9qiqLyl0CSn/lIVv9VXUlJAWL0va7j/xcFoRUenYOI6h1Q56g eNybNgGqYkfEL+3Y+3kH9uEAVkO4xJ/Rvnn93Ow= X-Received: by 2002:ae9:e102:: with SMTP id g2mr4591998qkm.343.1543578109004; Fri, 30 Nov 2018 03:41:49 -0800 (PST) MIME-Version: 1.0 References: <20181120105124.14733-1-christian@brauner.io> <87in0g5aqo.fsf@oldenburg.str.redhat.com> <36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net> <993B98AC-51DF-4131-AF7F-7DA2A7F485F1@brauner.io> <20181129195551.woe2bl3z3yaysqb6@brauner.io> <6E21165F-2C76-4877-ABD9-0C86D55FD6AA@amacapital.net> <87y39b2lm2.fsf@xmission.com> <20181130065606.kmilbbq46oeycjp5@brauner.io> In-Reply-To: <20181130065606.kmilbbq46oeycjp5@brauner.io> From: Arnd Bergmann Date: Fri, 30 Nov 2018 12:41:28 +0100 Message-ID: Subject: Re: [PATCH v2] signal: add procfd_signal() syscall To: christian@brauner.io Cc: "Eric W . Biederman" , Andy Lutomirski , Andy Lutomirski , Florian Weimer , Linux Kernel Mailing List , "Serge E. Hallyn" , Jann Horn , Andrew Morton , Oleg Nesterov , cyphar@cyphar.com, Al Viro , Linux FS-devel Mailing List , Linux API , Daniel Colascione , Tim Murray , linux-man@vger.kernel.org, Kees Cook Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 30, 2018 at 7:56 AM Christian Brauner wrote: > On Thu, Nov 29, 2018 at 11:13:57PM -0600, Eric W. Biederman wrote: > > Arnd Bergmann writes: > > > On Thu, Nov 29, 2018 at 9:14 PM Andy Lutomirski wrote: > > > > > > It looks like we already have a 'struct signalfd_siginfo' that is defined in a > > > sane architecture-independent way, so I'd suggest we use that. > > > > Unfortunately it isn't maintained very well. What you can > > express with signalfd_siginfo is a subset what you can express with > > siginfo. Many of the linux extensions to siginfo for exception > > information add pointers and have integers right after those pointers. > > Not all of those linux specific extentions for exceptions are handled > > by signalfd_siginfo (it needs new fields). Would those fit in the 30 remaining padding bytes? > > As originally defined siginfo had the sigval_t union at the end so it > > was perfectly fine on both 32bit and 64bit as it only had a single > > pointer in the structure and the other fields were 32bits in size. The problem with sigval_t of course is that it is incompatible between 32-bit and 64-bit. We can add the same information, but at least on the syscall level that would have to be a __u64. > > Although I do feel the pain as x86_64 has to deal with 3 versions > > of siginfo. A 64bit one. A 32bit one for ia32. A 32bit one for x32 > > with a 64bit si_clock_t. At least you and Al have managed to get it down to a single source-level definition across all architectures, but there is also the (lesser) problem that the structure has a slightly different binary layout on each of the classic architectures. If we go with Andy's suggestion of having only a single binary layout on x86 for the new call, I'd argue that we should also make it have the exact same layout on all other architectures. > > > We may then also want to make sure that any system call that takes a > > > siginfo has a replacement that takes a signalfd_siginfo, and that this > > > replacement can be used to implement the old version purely in > > > user space. > > > > If you are not implementing CRIU and reinserting exceptions to yourself. > > At most user space wants the ability to implement sigqueue: > > > > AKA: > > sigqueue(pid_t pid, int sig, const union sigval value); > > > > Well sigqueue with it's own si_codes so the following would cover all > > the use cases I know of: > > int sendnewsig(pid_t pid, int sig, int si_code, const union sigval value); > > > > The si_code could even be set to SI_USER to request that the normal > > trusted SI_USER values be filled in. si_code values of < 0 if not > > recognized could reasonably safely be treated as the _rt member of > > the siginfo union. Or perhaps better we error out in such a case. > > > > If we want to be flexible and not have N kinds of system calls that > > is the direction I would go. That is simple, and you don't need any of > > the rest. I'm not sure I understand what you are suggesting here. Would the low-level implementation of this be based on procfd, or do you mean that would be done for pid_t at the kernel level, plus another syscall for procfd? > > Alternatively we abandon the mistake of sigqueueinfo and not allow > > a signal number in the arguments that differs from the si_signo in the > > siginfo and allow passing the entire thing unchanged from sender to > > receiver. That is maximum flexibility. This would be regardless of which flavor of siginfo (today's arch specific one, signalfd_siginfo, or a new one) we pass, right? > > signalfd_siginfo just sucks in practice. It is larger that siginfo 104 > > bytes versus 48. We must deliver it to userspace as a siginfo so it > > must be translated. Because of the translation signalfd_siginfo adds > > no flexiblity in practice, because it can not just be passed through. > > Finallay signalfd_siginfo does not have encodings for all of the > > siginfo union members, so it fails to be fully general. > > > > Personally if I was to define signalfd_siginfo today I would make it: > > struct siginfo_subset { > > __u32 sis_signo; > > __s32 sis_errno; > > __s32 sis_code; > > __u32 sis_pad; > > __u32 sis_pid; > > __u32 sis_uid; > > __u64 sis_data (A pointer or integer data field); > > }; > > > > That is just 32bytes, and is all that is needed for everything > > except for synchronous exceptions. Oh and that happens to be a proper > > subset of a any sane siginfo layout, on both 32bit and 64bit. And that would work for signalfd and waitid_time64, but not for procfd_signal/kill/sendsiginfo? > > This is one of those rare times where POSIX is sane and what linux > > has implemented is not. > > Thanks for the in-depth explanation. So your point is that we are better > off if we stick with siginfo_t instead of struct signalfd_siginfo in > procfd_signal()? I think it means we still need more discussion. Using signalfd_siginfo without further changes doesn't seem sufficient because of the missing sigval and the excessive length adds some cost. siginfo_t as it is now still has a number of other downsides, and Andy in particular didn't like the idea of having three new variants on x86 (depending on how you count). His alternative suggestion of having a single syscall entry point that takes a 'signfo_t __user *' but interprets it as compat_siginfo depending on in_compat_syscall()/in_x32_syscall() should work correctly, but feels wrong to me, or at least inconsistent with how we do this elsewhere. Arnd