Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3612609pxk; Tue, 29 Sep 2020 01:17:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzRD+NVaJyki0ZT/FBcYU+AqSxVbR9aLFIcxyXJYg/VLRRmXtLZFZ2IYwblnoaBjWsBJvUj X-Received: by 2002:aa7:c896:: with SMTP id p22mr1935951eds.382.1601367444226; Tue, 29 Sep 2020 01:17:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601367444; cv=none; d=google.com; s=arc-20160816; b=rEaHFzQRf8L0jxoPnqQavFFFwcXJ0CVYw12xPpi+1FppkAvR95wZ1XCb17m49NG07Q U0sEzmZJmYwpA94rnJsNuHNUOTYxP/Fyv8XIfS6hdr5jvui3TgkGej6lAzT4t3oDRQKL 7XOft3duwR3j9Pro/GZVVC7y2UtBlUGT1SLVjpVNhrXH729Jxb9k93MiYPOPz2B/1w77 WBA9Y9glz3Q+Vv/u7oIJe4aGHWhj5p1cuZgHv3GmgnCzh1Qg29Yv/abZOznvgebl4nHC ODeawsrkwTVSuRWqLxvOt2STRh6sWV5QGZ4afSytJHvOxN4DAwWtNSjLS6ZIYZyeLan3 qHhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=MeMpmPFrdOGRtU5ng8ECDSHXJcr1++BDe0Y6s4CWwho=; b=vQ/Y78F9frqQ6rH/Ki2BoUINTvOn8TmT65fB0aHfM/LiaN8F79SE5P2OIikxQWSFrI 9PmnwrDZOeGOASDW4pGxJ/w6TtTVDKEXMR8vAmoYWRdkvyD6sp2F/UW5R41zk3pTcw8P u/WlYAAQA60/OpOXNXlsvMkEpOnnA1/s7WpyLDOpdlRObSJjVpmxwdU4pFTJeIgnxufg yRtUlbwA0ClGlL7ho8WVd/6P0PCBvq41wBm0f5Z4JPUFuFP3JeNbO9SbP96z46NT8LWF M3SF8CMjdU0VoKILNf/HaPrP+LOJKAgltE5rQP0ptUQbPz7WpFcHJjU3O0GnC9PLZ9Cr ipnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XPKSSVEp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p8si2128920edr.224.2020.09.29.01.17.01; Tue, 29 Sep 2020 01:17:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XPKSSVEp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727522AbgI2INy (ORCPT + 99 others); Tue, 29 Sep 2020 04:13:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:21215 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725355AbgI2INy (ORCPT ); Tue, 29 Sep 2020 04:13:54 -0400 Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1601367232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=MeMpmPFrdOGRtU5ng8ECDSHXJcr1++BDe0Y6s4CWwho=; b=XPKSSVEp5BcqTv0TN6UV1giQOHyU4MTWqpAz0/KnL7OrUozJ9MSI0LGP9JTkqSrWQ3O0Xs p1I9VKqRDrqaZbEzbiUGzM+BcwHkpLIWlUdLvW+4hWxbv4Qy+R9/18tAyFF5TZHqEN732Z GKvW5dOmktqK04OVaf7bKFz+bJdJh8w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-99-TTVOso58PKmkvxrSBD76sA-1; Tue, 29 Sep 2020 04:13:48 -0400 X-MC-Unique: TTVOso58PKmkvxrSBD76sA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D3985425D2; Tue, 29 Sep 2020 08:13:45 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-114-84.ams2.redhat.com [10.36.114.84]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5223510013BD; Tue, 29 Sep 2020 08:13:40 +0000 (UTC) From: Florian Weimer To: Mathieu Desnoyers Cc: Peter Zijlstra , linux-kernel , Thomas Gleixner , paulmck , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Christian Brauner , carlos , Vincenzo Frascino Subject: Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 References: <20200925181518.4141-1-mathieu.desnoyers@efficios.com> <87r1qm2atk.fsf@oldenburg2.str.redhat.com> <905713397.71512.1601314192367.JavaMail.zimbra@efficios.com> Date: Tue, 29 Sep 2020 10:13:38 +0200 In-Reply-To: <905713397.71512.1601314192367.JavaMail.zimbra@efficios.com> (Mathieu Desnoyers's message of "Mon, 28 Sep 2020 13:29:52 -0400 (EDT)") Message-ID: <873631yp8t.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Mathieu Desnoyers: >> So we have a bootstrap issue here that needs to be solved, I think. > > The one thing I'm not sure about is whether the vDSO interface is indeed > superior to KTLS, or if it is just the model we are used to. > > AFAIU, the current use-cases for vDSO is that an application calls into > glibc, which then calls the vDSO function exposed by the kernel. I wonder > whether the vDSO indirection is really needed if we typically have a glibc > function used as indirection ? For an end user, what is the benefit of vDSO > over accessing KTLS data directly from glibc ? I think the kernel can only reasonably maintain a single userspace data structure. It's not reasonable to update several versions of the data structure in parallel. This means that glibc would have to support multiple kernel data structures, and users might lose userspace acceleration after a kernel update, until they update glibc as well. The glibc update should be ABI-compatible, but someone would still have to backport it, apply it to container images, etc. What's worse, the glibc code would be quite hard to test because we would have to keep around multiple kernel versions to exercise all the different data structure variants. In contrast, the vDSO code always matches the userspace data structures, is always updated at the same time, and tested together. That looks like a clear win to me. > If we decide that using KTLS from a vDSO function is indeed a requirement, > then, as you point out, the thread_pointer is available as ABI, but we miss > the KTLS offset. > > Some ideas on how we could solve this: we could either make the KTLS > offset part of the ABI (fixed offset), or save the offset near the > thread pointer at a location that would become ABI. It would have to > be already populated with something which can help detect the case > where a vDSO is called from a thread which does not populate KTLS > though. Is that even remotely doable ? I don't know. We could decide that these accelerated system calls must only be called with a valid TCB. That's unavoidable if the vDSO sets errno directly, so it's perhaps not a big loss. It's also backwards-compatible because existing TCB-less code won't know about those new vDSO entrypoints. Calling into glibc from a TCB-less thread has always been undefined. TCB-less code would have to make direct, non-vDSO system calls, as today. For discovering the KTLS offset, a per-process page at a fixed offset from the vDSO code (i.e., what real shared objects already do for global data) could store this offset. This way, we could entirely avoid an ABI dependency. We'll see what will break once we have the correct TID after vfork. 8-> glibc currently supports malloc-after-vfork as an extension, and a lot of software depends on it (OpenJDK, for example). >> With the latter, we could >> directly expose the vDSO implementation to applications, assuming that >> we agree that the vDSO will not fail with ENOSYS to request fallback to >> the system call, but will itself perform the system call. > > We should not forget the fields needed by rseq as well: the rseq_cs > pointer and the cpu_id fields need to be accessed directly from the > rseq critical section, without function call. Those use-cases require > that applications and library can know the KTLS offset and size and > use those fields directly. Yes, but those offsets could be queried using a function from the vDSO (or using a glibc interface, to simplify linking). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill