Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1612486yba; Thu, 4 Apr 2019 14:27:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqzRPHodJZq6rJw+jacbd4juIl0qYl9fJbBxSNoexri+ZgI48RWf1jGDnkyILGaheDmE/T+F X-Received: by 2002:aa7:818a:: with SMTP id g10mr8084993pfi.178.1554413226090; Thu, 04 Apr 2019 14:27:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554413226; cv=none; d=google.com; s=arc-20160816; b=H/LZ6SYgPp73Muuad5ISXrOTGJTbms58k1wY+AoMUdTzrUpZZrW5KiARwPzpzdwFSr l0wy2aWTelkhH1bgZB0CqU2WsOpwOeoJun8LIu/S9eYYOkkJqadnPW8QVrjOYn0GHjIU vHPYjuupsVI+/t+pqw3UJK2AxqECcEjukJmCv5bs6+XRIp0rWC+rRrYbVVc37ZDORUJK CCYRGQ+MHyOLdQvMkHiGnMF9cHemyN7NMSvGgI9qznyWniVAg1NnacNTychCxQC/g8IY LNKSSpagIIz9VaFJXaDM4k8M6aQqK244gIHzvJadGk1kMl2Ph0T1qpKgasxyvBtDMCqy k4ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=Q/0t2/mOf/vEgSIHmwRPc/7abCdI76JiFBmL2e4UGzA=; b=XhPYFKuwKpqCaJ063xFd2oV2g6Kxi+zH3J3qUDwo9QMbuHHK6vFYUr8PhQHWjLWTHb xfKbJg0T8QgYlrYSdEOR1l4mfI8mXxfVWdla7z5RMgZgt7u8mh+p8qVsIkgd6FsLW8Bc iZB15v4FAjh0flW8oSXuBNKeOQSufVtJSV46ENIgdmS9fYCu9EbagNFfG/YfCOvQ3Gri +sJIBZf1NoZxmkq3R/YQ+n9QhWSsoNfSOT/pfjLkZIFKzNPy+j6H+bAahrSn42m7Ws0h QI3qGOTuWGVPtleWqxi+LF2t8RvHFT2l91HeOrLm1+ZesNk5NUUGOtQQ1AoLLaDYuS6O 22wA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l91si17316671plb.336.2019.04.04.14.26.50; Thu, 04 Apr 2019 14:27:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731237AbfDDUuP (ORCPT + 99 others); Thu, 4 Apr 2019 16:50:15 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:44909 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727398AbfDDUuO (ORCPT ); Thu, 4 Apr 2019 16:50:14 -0400 Received: by mail-qk1-f194.google.com with SMTP id y5so2534999qkc.11 for ; Thu, 04 Apr 2019 13:50:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Q/0t2/mOf/vEgSIHmwRPc/7abCdI76JiFBmL2e4UGzA=; b=aYdkLTFPveHgmucm70x6+6uHz7Ev5+SlOrItcwOcfF/dgBu86Gy0qXLOx0tBCIeZ2F 3iL1HIsqq4eZ2+737ZcFysqRM9RbuB0vtuHMrYo/c/J2FP/XnsB35VagzEKsCZf8nsXa K1VD8b2Wac1MiGbEDCLdNafBjvO2dNcgYpJ+fN3dRSciLrwQdgAmeIjBn3RmmjnrU2Y9 CO3wNVOyyd2zJoTggl9zC/YbgkuZy8NB79X0AeZWlkVS3gQF+liNZBLOGI66iEa5xlO5 /B9AX/akTu38hfVoOVGcTj+E4KZqMN3oSoSgSTma9z+Iua8Tf5YcnS0Ef5hWj2/w3mXD fmRQ== X-Gm-Message-State: APjAAAVUR3Whq21YvZ7Zvdtv21h5cpfXvLZB61bimCli5GO0dnr2cWdN 2HEjArdjpoE9wjchLXA/iATamA== X-Received: by 2002:ae9:c314:: with SMTP id n20mr7090940qkg.191.1554411013446; Thu, 04 Apr 2019 13:50:13 -0700 (PDT) Received: from [10.150.73.190] (214.sub-174-240-132.myvzw.com. [174.240.132.214]) by smtp.gmail.com with ESMTPSA id m31sm13916855qtm.46.2019.04.04.13.50.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Apr 2019 13:50:12 -0700 (PDT) Subject: Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) To: Mathieu Desnoyers , Paul Burton , Will Deacon , Boqun Feng , Heiko Carstens , Vasily Gorbik , Martin Schwidefsky , Russell King , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman Cc: carlos , Florian Weimer , Joseph Myers , Szabolcs Nagy , libc-alpha , Thomas Gleixner , Ben Maurer , Peter Zijlstra , "Paul E. McKenney" , Dave Watson , Paul Turner , Rich Felker , linux-kernel , linux-api References: <20190212194253.1951-1-mathieu.desnoyers@efficios.com> <20190212194253.1951-2-mathieu.desnoyers@efficios.com> <5166fbe9-cfe0-8554-abc7-4fc844cf2765@redhat.com> <1965431879.7576.1553529272844.JavaMail.zimbra@efficios.com> From: Carlos O'Donell Message-ID: <602718e0-7375-deb7-b6e6-2d17022173c5@redhat.com> Date: Thu, 4 Apr 2019 16:50:08 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <1965431879.7576.1553529272844.JavaMail.zimbra@efficios.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/25/19 11:54 AM, Mathieu Desnoyers wrote: > Hi Carlos, > > ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: > > [...] > > I took care of all your comments for an upcoming round of patches, except the > following that remain open (see answer inline). I'm adding Linux maintainers > for ARM, aarch64, MIPS, powerpc, s390, x86 in CC to discuss the choice of > code signature prior to the abort handler for each of those architectures. Thank you for kicking off this conversation. Every architecture should have a reasonable RSEQ_SIG that applies to their ISA with a comment about why that instruction was chosen. It should be a conscious choice, without a default. > * Support for automatically registering threads with the Linux rseq(2) > system call has been added. This system call is implemented starting > from Linux 4.18. The Restartable Sequences ABI accelerates user-space > operations on per-cpu data. It allows user-space to perform updates > on per-cpu data without requiring heavy-weight atomic operations. See > https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/ > for further explanation. > > In order to be activated, it requires that glibc is built against > kernel headers that include this system call, and that glibc detects > availability of that system call at runtime. Suggest: * Support for automatically registering threads with the Linux rseq(2) system call has been added. This system call is implemented starting from Linux 4.18. The Restartable Sequences ABI accelerates user-space operations on per-cpu data. It allows user-space to perform updates on per-cpu data without requiring heavy-weight atomic operations. Automatically registering threads allows all libraries, including libc, to make immediate use of the rseq(2) support by using the documented ABI. See 'man 2 rseq' for the details of the ABI shared between libc and the kernel. > > For reference the assembly code I'm pointing at below can be found > in the Linux selftests under: > > tools/testing/selftests/rseq/rseq-*.h OK. >>> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h > [...] >>> + >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an arm specific op code? Does the user have to mark this >> up as part of a constant pool when placing it in front of the abort handler >> to avoid disassembling the constant as code? This was at one point required >> to get gdb to work properly. >> > > For arm, the abort is defined as: > > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ > abort_label, version, flags, \ > start_ip, post_commit_offset, abort_ip) \ > ".balign 32\n\t" \ > __rseq_str(table_label) ":\n\t" \ > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ > ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "b %l[" __rseq_str(abort_label) "]\n\t" > > Which contains a copy of the struct rseq_cs for that critical section > close to the actual critical section, within the code, followed by the > signature. The reason why we have a copy of the struct rseq_cs there is > to speed up entry into the critical section by using the "adr" instruction > to compute the address to store into __rseq_cs->rseq_cs. > > AFAIU, a literal pool on ARM is defined as something which is always > jumped over (never executed), which is the case here. We always have > an unconditional branch instruction ("b") skipping over each > RSEQ_ASM_DEFINE_ABORT(). > > Therefore, given that the signature is part of a literal pool on ARM, > it can be any value we choose and should not need to be an actual valid > instruction. > > aarch64 defines the abort as: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > " b 222f\n" \ > " .inst " __rseq_str(RSEQ_SIG) "\n" \ > __rseq_str(label) ":\n" \ > " b %l[" __rseq_str(abort_label) "]\n" \ > "222:\n" > > Where the signature actually maps to a valid instruction. Considering that > aarch64 also have literal pools, and we branch over the signature, I wonder > why it's so important to ensure the signature is a valid trap instruction. > Perhaps Will Deacon can enlighten us ? In the event that you accidentally jump to it then you trap? However, you want an *uncommon* trap insn. I think the order of preference is: 1. An uncommon insn (with random immediate values), in a literal pool, that is not a useful ROP/JOP sequence (very uncommon) 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) 2b. A NOP to avoid affecting speculative execution (maybe uncommon) With 2a/2b being roughly equivalent depending on speculative execution policy. >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this a mips-specific op code? > > MIPS also has a literal pool just before the abort handler, and it > jumps over it. My understanding is that we can use any signature value > we want, and it does not need to be a valid instruction, similarly to ARM: > > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ > abort_label, version, flags, \ > start_ip, post_commit_offset, abort_ip) \ > ".balign 32\n\t" \ > __rseq_str(table_label) ":\n\t" \ > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \ > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "b %l[" __rseq_str(abort_label) "]\n\t" > > Perhaps Paul Burton can confirm this ? Yes please. You also want to avoid the value being a valid MIPS insn that's common. Did you check that? > [...] >>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an opcode specific to power? > > On powerpc 32/64, the abort is placed in a __rseq_failure executable section: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > "b %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > That section only contains snippets of those trampolines. Arguably, it would be > good if disassemblers could find valid instructions there. Boqun Feng could perhaps > shed some light on this signature choice ? Now would be a good time to decide > once and for all whether a valid instruction would be a better choice. This seems questionable too. > [...] >>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > [...] >>> + >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why not a s390 specific value here? > > s390 also has the abort handler in a __rseq_failure section: > > #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "j %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > Same question applies as powerpc: since disassemblers will try to decode > that instruction, would it be better to define it as a valid one ? Yes, I think it needs to be a valid uncommon insn or nop. > [...] >>> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why not an x86-specific op code? > > On x86, we use this 4-byte signature as operand to a "no-op" instruction > taking 4-byte immediate operand: That makes perfect sense. Thanks. So what is left to audit? In summary: - Why does AArch64 choose a trap? - Is the current choice of 0x53053053 OK for MIPS? Does it map to a valid insn? - What better choice is there for power? Pick a real uncommon insn or nop? - What better choice is there for s390? Pick a real uncommon insn or nop? - Todays choice could become something special in the future since it's unassigned. -- Cheers, Carlos.