Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1587436pxb; Wed, 2 Feb 2022 08:10:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJy0ObSAzoXLH/j8oO+fhoM8BJwk5fQTvBAYvkv0gkeiIgKxT5lZFz9aP1tIYk7iLuohX1UK X-Received: by 2002:a63:e1e:: with SMTP id d30mr25679375pgl.352.1643818244941; Wed, 02 Feb 2022 08:10:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643818244; cv=none; d=google.com; s=arc-20160816; b=aVtNlq4CZGAsazQAi3toFiu1LtkSn15vAdLklxmqEL7jBqFlolZVteleLOJ+aGy48Q tex8VtPW0J5bYPa/SI7XQDnHpz/xPjazzAwdP/V6DAcU0ddtACm0DJkUJ/NzCGHeM5h3 vfgUBsAEUgjd3StM42qwhcNUUgx0dZgrdKQGHgUiIJCuqvG4h7gNBysmLNKRnFDkGCIY Qxws+Ly6yy8VU2gz36bSZtqiuA+3Sf26dstfkVhJO4tpi3IiM9QAveZu2qIPfG0MWQkN 7TFniqIi4tGSyxxBX315STObJVaGDnaYIS5SIiSkBURGp4qxlh0zoHwDTuGNEm2FXKof PhDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=lPiM71Cp9ZD6U9r0qS47fL7HiJ2HIt5ccbckyQ6rHVs=; b=BG6U5VptqqtwKQfswRrM64tnaQiDkacjTuv6/AIJaFDouRmkoyKbENP97dErSOgwNf 79mQhqDGAjkackETdip2l1ownp/LWQb0SWes0zdtmd8xnRfxibjreKrzekwhRQOxPeGZ lB+JwFvGCeFe7BPLy6fhnYFmS03jkmDNvjBeFHtlJRyVMNMSk36KwT8DJjAeQ2Q/UdwR 3GIceDCYdWy4qLf2tmoKul5XFsN8wtVaaIdHjuR+ncls3UAmfWVLLld3Cq5f7V18K9l1 +nXagZIS3BTGiTmTQTuyYcYr0NPdNmCQRovQmA62pyraaWw4+fZSPCK49oji2qhSm2UR GGcA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=ABnXjypf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m7si5230233pjc.101.2022.02.02.08.10.31; Wed, 02 Feb 2022 08:10:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=ABnXjypf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243657AbiBBBck (ORCPT + 99 others); Tue, 1 Feb 2022 20:32:40 -0500 Received: from mail.efficios.com ([167.114.26.124]:34636 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243588AbiBBBcj (ORCPT ); Tue, 1 Feb 2022 20:32:39 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 94CEE34ADCC; Tue, 1 Feb 2022 20:32:38 -0500 (EST) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id B08G1LtHUYL6; Tue, 1 Feb 2022 20:32:38 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id E66E634AD59; Tue, 1 Feb 2022 20:32:37 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com E66E634AD59 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1643765557; bh=lPiM71Cp9ZD6U9r0qS47fL7HiJ2HIt5ccbckyQ6rHVs=; h=Date:From:To:Message-ID:MIME-Version; b=ABnXjypffa7t/zyFCUaUIlLar2q6MXBX4wO5rTMDtL9WalG5487R+eU9V4YMPIdmx rDJTMqW/bSvLvYlZLwaB+igGua9RNLjj5OcQK9NczwPouKJ9xJuOdiNGF1hBl4i4eg 6XRvbA4hZnpZFG0RMy6gYAIFpMnial2Kdyr/acQrRpFqbIMjdB+YoGFlgWDngtrBHF Egic14uAqbylWKK118k3HG5WycZOco3W1dzbd/2ZQjCWwMnuYaXXAop+KJ5dEItfPR elax4Jijk+135s2kxcQQ0SJWKFZ5Es094apKmbHvCxAKUHta6TzfhP64QYaOvcXzJ2 hhvYJw7ipEMUA== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id M7mesyI4WlzK; Tue, 1 Feb 2022 20:32:37 -0500 (EST) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id D2F5234AF27; Tue, 1 Feb 2022 20:32:37 -0500 (EST) Date: Tue, 1 Feb 2022 20:32:37 -0500 (EST) From: Mathieu Desnoyers To: Florian Weimer Cc: Peter Zijlstra , linux-kernel , Thomas Gleixner , paulmck , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Christian Brauner , David Laight , carlos , Peter Oskolkov Message-ID: <1285409089.26848.1643765557716.JavaMail.zimbra@efficios.com> In-Reply-To: <87o83qxok9.fsf@mid.deneb.enyo.de> References: <20220201192540.10439-1-mathieu.desnoyers@efficios.com> <20220201192540.10439-2-mathieu.desnoyers@efficios.com> <87bkzqz75q.fsf@mid.deneb.enyo.de> <1075473571.25688.1643746930751.JavaMail.zimbra@efficios.com> <87sft2xr7w.fsf@mid.deneb.enyo.de> <1339477886.25835.1643750440726.JavaMail.zimbra@efficios.com> <87o83qxok9.fsf@mid.deneb.enyo.de> Subject: Re: [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_4203 (ZimbraWebClient - FF96 (Linux)/8.8.15_GA_4203) Thread-Topic: rseq: extend struct rseq with per thread group vcpu id Thread-Index: C6NWYE2w3ULPRtwNBg28J/VU9Te00g== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Feb 1, 2022, at 4:30 PM, Florian Weimer fw@deneb.enyo.de wrote: > * Mathieu Desnoyers: >=20 >> ----- On Feb 1, 2022, at 3:32 PM, Florian Weimer fw@deneb.enyo.de wrote: >> [...] >>>=20 >>>>> Is the switch really useful? I suspect it's faster to just write as >>>>> much as possible all the time. The switch should be well-predictable >>>>> if running uniform userspace, but still =E2=80=A6 >>>> >>>> The switch ensures the kernel don't try to write to a memory area beyo= nd >>>> the rseq size which has been registered by user-space. So it seems to = be >>>> useful to ensure we don't corrupt user-space memory. Or am I missing y= our >>>> point ? >>>=20 >>> Due to the alignment, I think you'd only ever see 32 and 64 bytes for >>> now? >> >> Yes, but I would expect the rseq registration arguments to have a rseq_l= en >> of offsetofend(struct rseq, tg_vcpu_id) when userspace wants the tg_vcpu= _id >> feature to be supported (but not the following features). >=20 > But if rseq is managed by libc, it really has to use the full size > unconditionally. I would even expect that eventually, the kernel only > supports the initial 32, maybe 64 for a few early extension, and the > size indicated by the auxiliary vector. >=20 > Not all of that area would be ABI, some of it would be used by the > vDSO only and opaque to userspace application (with applications/libcs > passing __rseq_offset as an argument to these functions). >=20 I think one aspect leading to our misunderstanding here is the distinction between the size of the rseq area _allocation_, and the offset after the la= st field supported by the given kernel. With this in mind, let's state a bit more clearly our expected aux. vector extensibility scheme. With CONFIG_RSEQ=3Dy, the kernel would pass the following information throu= gh the ELF auxv: - rseq allocation size (auxv_rseq_alloc_size), - rseq allocation alignment (auxv_rseq_alloc_align), - offset after the end of the last rseq field supported by this kernel (aux= v_rseq_offset_end), We always have auxv_rseq_alloc_size >=3D auxv_rseq_offset_end. I would expect libc to use this information to allocate a memory area at least auxv_rseq_alloc_size in size, with an alignment respecting auxv_rseq_alloc_align. It would use a value >=3D auvx_rseq_alloc_size as rseq_len argument for the rseq registration. But I would expect libc to use the auxv_rseq_offset_end value to populate _= _rseq_size, so rseq users can rely on this to check whether the fields they are trying = to access is indeed populated by the kernel. Of course, the kernel would still allow the original 32-byte rseq_len argum= ent for the rseq registration, so the original ABI still works. It would howeve= r reject any rseq registration with size smaller than auxv_rseq_alloc_size (o= ther than the 32-byte special-case). Is that in line with what you have in mind ? Do we really need to expose th= ose 3 auxv variables independently or can we somehow remove auxv_rseq_alloc_size = and use auxv_rseq_offset_end as a min value for allocation instead ? Thanks, Mathieu --=20 Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com