Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3096390pxk; Mon, 28 Sep 2020 08:16:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxXW8wrUpcszizJAiXUcN0m7vsj7g9Ci71GTR6ymzvN2UM1Ain5gIVGu3XGKsEEMJC+tctK X-Received: by 2002:a17:906:8508:: with SMTP id i8mr2210371ejx.390.1601306197064; Mon, 28 Sep 2020 08:16:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601306197; cv=none; d=google.com; s=arc-20160816; b=A+vtCcsp5rxzBfKV4PekviXf2WvruoZ2GtgytX0yQ97zxck4nzocrNYCe4qIefmCjp A7PFGmmTIPsOnwWu4KYiMpbkzQZd6RBBLHZXmnwLKB//AF1yt/2hvk0fCsko6NI10p4E Nyd5YS5eZd+o4odN9oEE5Cc/e9GC40C3PfyMkOLOT+73fozqjrBl0Fx33fbMOIPU/HVu HQADs8m3rbNet9ZFzic8Fo71H9Z1fZW9k4hNyBhqF858ELcDemcD5VFg9Dahp3P4citx V1oCqszXe2HNAaX7XsocbboJBlSm26TM20tP4QZDnzQ0bNVyh+K1y4h+5hi1Q5gQgH7i Kjwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=GDx1d4uF52ufWWLZ4NH1bd4kYuWSE/VqVT2CjETd86o=; b=RdDl5JPwFa2ki4V0RIqcxcHyzEVZ9Kll7Qc7zAeK1bxbqsQUaDlVYFgOvmbmuq2xQm UO2o36FOei/kswcRMQwSa65kbUu1jHk7UUU67O+HSO5dEH9zLMvJ5GwUYTPYyFbsRfsU fWnlepOVv4dZg9yyqk0oluqOUqy4wiJWnRslQLryuKvkNVqbzuBaz9PvV4l2axBYrCSA dwSEyb0bcSrLSOkgvbd8WKz1Rg+KLwaEnUQQPzOSxEmgd6/4L1qrhNwf3PYDXpglTk79 C78kW8sRR3a3cMf9BfSAPVuj8wkHNdSjOz64iBtKDOZ3/Dsk7ysHOJXrDopxG71Lw2lY lFfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Vr0/MIAQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r19si760560ejr.560.2020.09.28.08.16.13; Mon, 28 Sep 2020 08:16:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Vr0/MIAQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726621AbgI1POc (ORCPT + 99 others); Mon, 28 Sep 2020 11:14:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:45825 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726325AbgI1POc (ORCPT ); Mon, 28 Sep 2020 11:14:32 -0400 Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1601306070; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=GDx1d4uF52ufWWLZ4NH1bd4kYuWSE/VqVT2CjETd86o=; b=Vr0/MIAQ8dVxURulNgo2t7TmtV7Q0otzoc03+JxlK7y4L7rvIcjI/9LFyD/VQm3osb2Z5h 2t4Za1rmZf0efNIezhZk2iAshhJolKY/KiTUVveJ/bNt66Kuc1s51YgyJc9+1D/gBR4/u9 nQSeA1CYngJbgLJz22mFC7aKgESt7EM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-193-mpnVVk3dNCGbX1d9LReang-1; Mon, 28 Sep 2020 11:14:25 -0400 X-MC-Unique: mpnVVk3dNCGbX1d9LReang-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F119C1891E98; Mon, 28 Sep 2020 15:14:21 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-114-84.ams2.redhat.com [10.36.114.84]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6188D6EE5C; Mon, 28 Sep 2020 15:14:06 +0000 (UTC) From: Florian Weimer To: Mathieu Desnoyers Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Christian Brauner , carlos@redhat.com, Vincenzo Frascino Subject: Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 References: <20200925181518.4141-1-mathieu.desnoyers@efficios.com> Date: Mon, 28 Sep 2020 17:13:59 +0200 In-Reply-To: <20200925181518.4141-1-mathieu.desnoyers@efficios.com> (Mathieu Desnoyers's message of "Fri, 25 Sep 2020 14:15:17 -0400") Message-ID: <87r1qm2atk.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Mathieu Desnoyers: > Upstreaming efforts aiming to integrate rseq support into glibc led to > interesting discussions, where we identified a clear need to extend the > size of the per-thread structure shared between kernel and user-space > (struct rseq). This is something that is not possible with the current > rseq ABI. The fact that the current non-extensible rseq kernel ABI > would also prevent glibc's ABI to be extended prevents its integration > into glibc. > > Discussions with glibc maintainers led to the following design, which we > are calling "Kernel Thread Local Storage" or KTLS: > > - at glibc library init: > - glibc queries the size and alignment of the KTLS area supported by the > kernel, > - glibc reserves the memory area required by the kernel for main > thread, > - glibc registers the offset from thread pointer where the KTLS area > will be placed for all threads belonging to the threads group which > are created with clone3 CLONE_RSEQ_KTLS, > - at nptl thread creation: > - glibc reserves the memory area required by the kernel, > - application/libraries can query glibc for the offset/size of the > KTLS area, and offset from the thread pointer to access that area. One remaining challenge see is that we want to use vDSO functions to abstract away the exact layout of the KTLS area. For example, there are various implementation strategies for getuid optimizations, some of them exposing a shared struct cred in a thread group, and others not doing that. The vDSO has access to the thread pointer because it's ABI (something that we recently (and quite conveniently) clarified for x86). What it does not know is the offset of the KTLS area from the thread pointer. In the original rseq implementation, this offset could vary from thread to thread in a process, although the submitted glibc implementation did not use this level of flexibility and the offset is constant. The vDSO is not relocated by the run-time dynamic loader, so it can't use ELF TLS data. Furthermore, not all threads in a thread group may have an associated KTLS area. In a potential glibc implementation, only the threads created by pthread_create would have it; threads created directly using clone would lack it (and would not even run with a correctly set up userspace TCB). So we have a bootstrap issue here that needs to be solved, I think. In most cases, I would not be too eager to bypass the vDSO completely, and having the kernel expose a data-only interface. I could perhaps make an exception for the current TID because that's so convenient to use in mutex implementations, and errno. With the latter, we could directly expose the vDSO implementation to applications, assuming that we agree that the vDSO will not fail with ENOSYS to request fallback to the system call, but will itself perform the system call. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill