Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp4957967pxu; Wed, 21 Oct 2020 09:23:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJygJ36NyTm0W7YUq51L7mG7pOOxxypjhttD7WzXlgg+xFtgsqNBVtTkWXUUc351rm8ZB9wm X-Received: by 2002:a17:906:139a:: with SMTP id f26mr3416005ejc.472.1603297380424; Wed, 21 Oct 2020 09:23:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603297380; cv=none; d=google.com; s=arc-20160816; b=smHiE+H1jCWdYQoVoHCUdYoc1T7SPjaG/VZx4w7ssY7s8zS6EUnQH05Tue7ONTTsaN K4XB7PsQynphFlVc12+3eeKzNT2Oczrq2ecSnS32fSCUa6GcM+nTd5p4mTESUD/EdNps YulfN2h0L5F+xzLkITPrBQk85vOxVWwuptQJZWOnwDIWP67uKk2xRaLghhfNBP5g397O T/eMkLE1Z7Lq/8/tOK9kffzEDVJl6JNhAha4tOO8WnDBWNMVTa7u5IwCNEjqyybddtnU TGV/mJGbvJcCxM/Lq5gDBl8Sre6eLoGRQY7YOjBwE8KZ6zje5YEgEWmlfu+v2V9ckF/8 No8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=0G55Jjn+yLDTEo95H/YMY9F6JPADwDSbuwGNt0k6SXk=; b=rBpb+xkoDl7Y6CFvN9xN6/ZTvieltDZK9Z4EoouVovywygydXDQXiMtP/6uM4G6SWV D7/E8D7vmqmrJ6RWnxG1khvNrbPVw1f2pReGWiXfP/bwiV1uqaRKWj1aCxA10gXEvvnm OZXE4z6sss3OZQXA/FFBnXJpXHc8tJDa/QXT6xv7CkCSp1C7B7OcNDncMG0arDB+SXlL w8gQrgGF6hlt0ieR0j0xMVSIXTqLf0k3SKkHcPofMNka1JNBGy4DiwB4KcPB6po5fJgH tv3iLPYlppcYJS2uLoxNSCup9OJWncsNHuHOF1r4sr/R95f4XeJ0SK1cUmDr//tS9gbV XvPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b="OTU3/AMQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bx28si1721379edb.412.2020.10.21.09.22.36; Wed, 21 Oct 2020 09:23:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b="OTU3/AMQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437726AbgJTSsC (ORCPT + 99 others); Tue, 20 Oct 2020 14:48:02 -0400 Received: from mail.efficios.com ([167.114.26.124]:36852 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437713AbgJTSr7 (ORCPT ); Tue, 20 Oct 2020 14:47:59 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id C440C2DD550; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id icPB035Au0zd; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 3E6482DD726; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 3E6482DD726 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603219677; bh=0G55Jjn+yLDTEo95H/YMY9F6JPADwDSbuwGNt0k6SXk=; h=Date:From:To:Message-ID:MIME-Version; b=OTU3/AMQIYUoqdwfh2PdKV1GQ7jHj34ZYf2SlbsgfYtKVThUkR9TV6v9jpF1dhrRH zOCKlMnhzptwj/sxL12cHRwM3o+aiEa0xBqMpF9cPR6KIQ8Gqqd+sIhiQ4TIcaT5gH d6NJnVx8wlcwXwDrInz+0kHT52RErZbaJhsERQYSVGQ9/VUYqbRoNAEzs/8umtmJcZ mmzMYns6wop3n4zdP3aKqmJqGcVWei+D0hC+Q+yyXNJeN8I9Ixd4p1vp3YvfzVsJGz OFIGRUV/rA4lITr2cJlUWC0pmohrLkyRDRFGJ2vRlu5qNRAi5aQSWz6X3Yd3aofz5q Sv0vRFFWU3htg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id dQDrfJVEjK7e; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 2DB262DD7A6; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) Date: Tue, 20 Oct 2020 14:47:57 -0400 (EDT) From: Mathieu Desnoyers To: Florian Weimer Cc: Peter Zijlstra , linux-kernel , Thomas Gleixner , paulmck , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Christian Brauner , carlos , Vincenzo Frascino Message-ID: <1247061646.32339.1603219677094.JavaMail.zimbra@efficios.com> In-Reply-To: <873631yp8t.fsf@oldenburg2.str.redhat.com> References: <20200925181518.4141-1-mathieu.desnoyers@efficios.com> <87r1qm2atk.fsf@oldenburg2.str.redhat.com> <905713397.71512.1601314192367.JavaMail.zimbra@efficios.com> <873631yp8t.fsf@oldenburg2.str.redhat.com> Subject: Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3968 (ZimbraWebClient - FF81 (Linux)/8.8.15_GA_3968) Thread-Topic: rseq: Implement KTLS prototype for x86-64 Thread-Index: i/hhmILYI0bX65bbHI+Ec7cJZSCn7Q== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Sep 29, 2020, at 4:13 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >>> So we have a bootstrap issue here that needs to be solved, I think. >> >> The one thing I'm not sure about is whether the vDSO interface is indeed >> superior to KTLS, or if it is just the model we are used to. >> >> AFAIU, the current use-cases for vDSO is that an application calls into >> glibc, which then calls the vDSO function exposed by the kernel. I wonder >> whether the vDSO indirection is really needed if we typically have a glibc >> function used as indirection ? For an end user, what is the benefit of vDSO >> over accessing KTLS data directly from glibc ? > > I think the kernel can only reasonably maintain a single userspace data > structure. It's not reasonable to update several versions of the data > structure in parallel. I disagree with your statement. Considering that the kernel needs to keep ABI compatibility for whatever it exposes to user-space, claiming that it should never update several versions of data structures exposed to user-space in parallel means that once a data structure is exposed to user-space as ABI in a certain way, it can never ever change in the future, even if we find a better way to do things. It makes more sense to allow multiple data structures to be updated in parallel until older ones become deprecated/unused/irrelevant, at which point those can be configured out at build time and eventually phased out after years of deprecation. Having the ability to update multiple data structures in user-space with replicated information is IMHO necessary to allow creation of new/better accelerated ABIs. > > This means that glibc would have to support multiple kernel data > structures, and users might lose userspace acceleration after a kernel > update, until they update glibc as well. The glibc update should be > ABI-compatible, but someone would still have to backport it, apply it to > container images, etc. No. If the kernel ever exposes a data structure to user-space as ABI, then it needs to stay there, and not break userspace. Hence the need to duplicate information provided to user-space if need be, so we can move on to better ABIs without breaking the old ones. > > What's worse, the glibc code would be quite hard to test because we > would have to keep around multiple kernel versions to exercise all the > different data structure variants. > > In contrast, the vDSO code always matches the userspace data structures, > is always updated at the same time, and tested together. That looks > like a clear win to me. For cases where the overhead of vDSO is not an issue, I agree that it makes things tidier than directly accessing a data structure. The documentation of the ABI becomes much simpler as well. > >> If we decide that using KTLS from a vDSO function is indeed a requirement, >> then, as you point out, the thread_pointer is available as ABI, but we miss >> the KTLS offset. >> >> Some ideas on how we could solve this: we could either make the KTLS >> offset part of the ABI (fixed offset), or save the offset near the >> thread pointer at a location that would become ABI. It would have to >> be already populated with something which can help detect the case >> where a vDSO is called from a thread which does not populate KTLS >> though. Is that even remotely doable ? > > I don't know. > > We could decide that these accelerated system calls must only be called > with a valid TCB. That's unavoidable if the vDSO sets errno directly, > so it's perhaps not a big loss. It's also backwards-compatible because > existing TCB-less code won't know about those new vDSO entrypoints. > Calling into glibc from a TCB-less thread has always been undefined. > TCB-less code would have to make direct, non-vDSO system calls, as today. > > For discovering the KTLS offset, a per-process page at a fixed offset > from the vDSO code (i.e., what real shared objects already do for global > data) could store this offset. This way, we could entirely avoid an ABI > dependency. Or as Andy mentioned, we would simply pass the ktls offset as argument to the vDSO ? It seems simple enough. Would it fit all our use-cases including errno ? > > We'll see what will break once we have the correct TID after vfork. 8-> > glibc currently supports malloc-after-vfork as an extension, and > a lot of software depends on it (OpenJDK, for example). I am not sure to see how that is related to ktls ? > >>> With the latter, we could >>> directly expose the vDSO implementation to applications, assuming that >>> we agree that the vDSO will not fail with ENOSYS to request fallback to >>> the system call, but will itself perform the system call. >> >> We should not forget the fields needed by rseq as well: the rseq_cs >> pointer and the cpu_id fields need to be accessed directly from the >> rseq critical section, without function call. Those use-cases require >> that applications and library can know the KTLS offset and size and >> use those fields directly. > > Yes, but those offsets could be queried using a function from the vDSO > (or using a glibc interface, to simplify linking). Good point! Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com