Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3985161pxk; Tue, 29 Sep 2020 11:03:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxCgnI4L6dS1Kg6mJDXOTFNSHN3b9N304EjObc5IcS1u9MOaXhQ5m25RaJ2Yuh1TMsbT02j X-Received: by 2002:a17:906:a981:: with SMTP id jr1mr4874582ejb.99.1601402592544; Tue, 29 Sep 2020 11:03:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601402592; cv=none; d=google.com; s=arc-20160816; b=X4tgEWF32U/Yw81RR+L12B0ZeZlnya0naFtvnn1f4E62fnP0SgLZEfyx2QdhJ1ZdBd Udh63+to5Ko878vCKR0LA3DxtBrG5/rsGsLOx5T8NEbFyU6KNfsKHNyjC28s0vNn9PJ7 VQ5MC8o5f2IhkCZ4CJ7eitqZinNgNL0FuJzQduS296vYi/yE8di4rvzJwc4OZTSEZa/X xjW5/HZQ2FLodwvkZBjO/X1GS5OnCceJ66ZeOXZZEWi9H9Oa9NpMk0Bln1qLF4PaeYVw /96RWWq3q5Jp5GeWawqL56D8DLl3RHMen6RkCR7zfw4q9zzb0mLJBTYNqcrLLzKzRu7P BGmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=xC6t+BlSGYWVt/BDizS+uFfkXKOAXDP9Wsd0VPLsEn4=; b=IsiXAGn7EM5wT3KuO4isLqw/CKqzTiOFs741DTbR5S5g8GinvxhMGAA+SvOjmWyneP Q0kTVtbLJ74KHVCN4X+89PQhpGulMqSgxSk4cD95uCG/YcZAVXfiJfZ4xODUwqtE75o0 iDOh7ETXvcyVd5TpeCDhb6QqkO3REzx0sjhxmWdIP6LDp1DGXoa4NVLVImy0C1qKNwAU kVDgup5JW17ico54ljoEArREunKtJagoIUpUcMhfoWjhh3b6hkFPl1KrOxy9fv7NiSZD iNoSHYD6oOQxOQfh3EbCtlBWhrdHxApWq8xWCfLpwknKxQ+gCEZ8hZGlEsD22jGv4Uiw qGpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GmhvtEAK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t12si3089468ejr.161.2020.09.29.11.02.48; Tue, 29 Sep 2020 11:03:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GmhvtEAK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727971AbgI2SBo (ORCPT + 99 others); Tue, 29 Sep 2020 14:01:44 -0400 Received: from mail.kernel.org ([198.145.29.99]:37710 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725320AbgI2SBn (ORCPT ); Tue, 29 Sep 2020 14:01:43 -0400 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E061020702 for ; Tue, 29 Sep 2020 18:01:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601402503; bh=qHGaWE5xeSFtQ38W8/1bcT9r0GT5DWk3xHCTe2Q+VyQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=GmhvtEAKgfmvplao8J2xA3KgrSoqxRB/gFNKpQxz2OVUJFFlv8lpJaqfWSeu6pBDD TzZxyLhXE5exJbeZ0HzFrwP37VSRejRhecommvSf6BUONsw3Voj/EcXn/cjteOEXLD D8Izj1Qf9NupVuv5YuKwQtTAHHzQ4qmxttVJkjE0= Received: by mail-wr1-f53.google.com with SMTP id o5so6406542wrn.13 for ; Tue, 29 Sep 2020 11:01:42 -0700 (PDT) X-Gm-Message-State: AOAM533IHav/V4mJSRBwxq6h86T0Hyv8mzJyKDcnCTKZHak9yZhl1k5q S9FDl6QXuIm58V5Xf4mc4sp06GJqD/vG9BNg/PJYyw== X-Received: by 2002:adf:a3c3:: with SMTP id m3mr5626947wrb.70.1601402501271; Tue, 29 Sep 2020 11:01:41 -0700 (PDT) MIME-Version: 1.0 References: <20200925181518.4141-1-mathieu.desnoyers@efficios.com> <87r1qm2atk.fsf@oldenburg2.str.redhat.com> In-Reply-To: <87r1qm2atk.fsf@oldenburg2.str.redhat.com> From: Andy Lutomirski Date: Tue, 29 Sep 2020 11:01:29 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 To: Florian Weimer Cc: Mathieu Desnoyers , Peter Zijlstra , LKML , Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , Linux API , Christian Brauner , "Carlos O'Donell" , Vincenzo Frascino Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 28, 2020 at 8:14 AM Florian Weimer wrote: > > * Mathieu Desnoyers: > > > Upstreaming efforts aiming to integrate rseq support into glibc led to > > interesting discussions, where we identified a clear need to extend the > > size of the per-thread structure shared between kernel and user-space > > (struct rseq). This is something that is not possible with the current > > rseq ABI. The fact that the current non-extensible rseq kernel ABI > > would also prevent glibc's ABI to be extended prevents its integration > > into glibc. > > > > Discussions with glibc maintainers led to the following design, which we > > are calling "Kernel Thread Local Storage" or KTLS: > > > > - at glibc library init: > > - glibc queries the size and alignment of the KTLS area supported by the > > kernel, > > - glibc reserves the memory area required by the kernel for main > > thread, > > - glibc registers the offset from thread pointer where the KTLS area > > will be placed for all threads belonging to the threads group which > > are created with clone3 CLONE_RSEQ_KTLS, > > - at nptl thread creation: > > - glibc reserves the memory area required by the kernel, > > - application/libraries can query glibc for the offset/size of the > > KTLS area, and offset from the thread pointer to access that area. > > One remaining challenge see is that we want to use vDSO functions to > abstract away the exact layout of the KTLS area. For example, there are > various implementation strategies for getuid optimizations, some of them > exposing a shared struct cred in a thread group, and others not doing > that. > > The vDSO has access to the thread pointer because it's ABI (something > that we recently (and quite conveniently) clarified for x86). What it > does not know is the offset of the KTLS area from the thread pointer. > In the original rseq implementation, this offset could vary from thread > to thread in a process, although the submitted glibc implementation did > not use this level of flexibility and the offset is constant. The vDSO > is not relocated by the run-time dynamic loader, so it can't use ELF TLS > data. I assume that, by "thread pointer", you mean the pointer stored in GSBASE on x86_32, FSBASE on x86_64, and elsewhere on other architectures? The vDSO has done pretty well so far having the vDSO not touch FS, GS, or their bases at all. If we want to change that, I would be very nervous about doing so in existing vDSO functions. Regardless of anything an ABI document might say and anything that existing or previous glibc versions may or may not have done, there are plenty of bizarre programs out there that don't really respect the psABI document. Go and various not-ready-for-prime-time-but-released-anyway Bionic branches come to mind. So we would need to tread very, very carefully. One way to side-step much of this would be to make the interface explicit: long __vdso_do_whatever(void *ktls_ptr, ...); Sadly, on x86, actually generating the ktls ptr is bit nasty due to the fact that lea %fs:(offset) doesn't do what one might have liked it to do. I suppose this could also be: long __vdso_do_whatever(unsigned long ktls_offset); which will generate quite nice code on x86_64. I can't speak for the asm capabilities of other architectures. What I *don't* want to do is to accidentally repeat anything like the %gs:0x28 mess we have with the stack cookie on x86_32. (The stack cookie is, in kernel code, in a completely nonsensical location. I'm quite surprised that any of the maintainers ever accepted the current stack cookie implementation. I assume there's some history there, but I don't know it. The end result is a festering mess in the x86_32 kernel code that only persists because no one cares quite enough about x86_32 to fix it.) We obviously won't end up with precisely the same type of mistake here, but a mis-step here certainly does have the possibility of promoting an unfortunate-in-hindsight design decision in glibc and/or psABI to something that every other x86_64 Linux software stack has to copy to be compatible with the vDSO. As for errno itself, with all due respect to those who designed errno before I was born, IMO it was a mistake. Why exactly should the vDSO know about errno?