Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp108299ybh; Tue, 14 Jul 2020 19:36:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJztD4mxH2u0XW8/jGIiP6zP0FGQZIRakz1uwjysBowSJHHJdcmhVnkKJejepBMPV1FHKnBv X-Received: by 2002:a17:906:a28b:: with SMTP id i11mr7109925ejz.524.1594780608069; Tue, 14 Jul 2020 19:36:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594780608; cv=none; d=google.com; s=arc-20160816; b=WRwVplI8lugh+NrPW0OxsdSR2CD+oVAwVU19Y3oAd752ut6vwWbQBVEFlye4udDDld 05ObiXQ0AY2YRfiHJSTL7BglEkol3HJAz30JK0pWyU59vHcdkv/0hGNu4tpA0+2inc6L w3CegfRPDzhA2UJXlzNvE75Kmi+NE9XQ2Gks3BR9aaBHuWANfHEh2ZbQu8m34eG6kkHf 3Pt07/JKx/3tioJpIZXJRcAYG9ZZLDvJ8J8iMPyMCZ2RhejoNBUpUpd05VMD7+yaVQVD AOTcysHHKiTvrJXzlgeExmRGCkO7zm7KDV2al14xOHesCEMxqhsBMarLV/WynCm5nSck 85Qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=T6LjRRrOVLGwVCH+E/vyiV5aYGtBQNbXUznocfAb3SM=; b=yh+14kaR6GfVI87JP0oI8byGQvbtXeRwi6SeF0cAEntSUXRy+ArZVdnrKvUeqc6uMI v6d2wd4xq9TGROP3K6fXgx9WBTo7SmYyMtPBnqvLOSpKiASdl8tKkbK3HovKZ9zVis+p pscOs/6NNChGtN1HTVxRuZ4YfQyGD2EL0OGHsGpNYVb99icEKPh7vIxnxNp86ws9ta4r fuSLXeOs1TjtvLQiNDBXGDqabqM86UuF1FQmEgjHxBMHycc1BILZxKoOMUM5EZumCMvF mDvbixXg66fLvopwQY3/3GV5+5iWoPoJrR0RTo2EGsFt9WJx9twaF+eBhYUgVrWrZ3/A vIBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=VqLvow7S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b6si391373edn.454.2020.07.14.19.36.25; Tue, 14 Jul 2020 19:36:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=VqLvow7S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728172AbgGOCex (ORCPT + 99 others); Tue, 14 Jul 2020 22:34:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726396AbgGOCew (ORCPT ); Tue, 14 Jul 2020 22:34:52 -0400 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CB77C061755 for ; Tue, 14 Jul 2020 19:34:52 -0700 (PDT) Received: by mail-wm1-x343.google.com with SMTP id c80so3192668wme.0 for ; Tue, 14 Jul 2020 19:34:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=T6LjRRrOVLGwVCH+E/vyiV5aYGtBQNbXUznocfAb3SM=; b=VqLvow7SaFWzu6jDNNTg/b7EHiPxeh+07ePJpCnkINLTRLnBsPXLN/jem/POkOU2gL Gr9jYNphMaj/vFSifxXuYaM/jC96ugVhM9gJlpCKEBy3lA6X1S/y8ZxI74ZSSzZcwqg4 ZxvBJOvLFMcOXkEOCSCpECuvyJ90PbxqrCwIGwcwgfw41C9sa+VTV0gWxiE9O5esrg2o Fx3KWfmjdHd/aMdcylVCG6aT+FcI2imzVqUVwuJsPVZK9ERFpS2z/Ih49SWHUDpVKaB7 Qwh4cxCWKsczYrcvb4pBTb09JpTGkWcvu1VkVUyou396rpeyLyv1FxGAUkn7Y01V62jW MsxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=T6LjRRrOVLGwVCH+E/vyiV5aYGtBQNbXUznocfAb3SM=; b=eD0mAkD4qa/vGPCSal6IFuLncZdg3eDJ21y9+fGf3zITBNaaNFOvfRdwbKc63KYoks +yj0pSMsNcD3OqrnrzvDOQhuMEMtfpn75OWOlZFqcohS9tgDXim15wUDR5qVtr5B+jjy fN/kwKpKAC3HeXuPwGK4+pR28ykzVE4bhEe6MuTvP6TTkqg+h2Dtm9S+S0bAXKThA6Iz PIH10NTgu8hJBhgYtFD1jEkFNdvFmzj/CG9lNjngeIEKiAy9C72O9bhNf7uIqI8SpTry xe/q/0ur7ZwR8C/lCCnAAdt2e8HrM+324M9jOx9cT4FxyBaLJJK9v7LRlLkr4L/zIILV qyHg== X-Gm-Message-State: AOAM530NL2gYGrUzaiJaWlW3OPAzc8v0BO+GLEWpvzyJrjrmd5nlyrLC ti1U5JK4wQy4GwaZ7XiCKp7lc4tq/P5UG3Vkxegf0w== X-Received: by 2002:a1c:9e4c:: with SMTP id h73mr6743574wme.177.1594780490700; Tue, 14 Jul 2020 19:34:50 -0700 (PDT) MIME-Version: 1.0 References: <20200714030348.6214-1-mathieu.desnoyers@efficios.com> <20200714030348.6214-3-mathieu.desnoyers@efficios.com> <775688146.12145.1594748580461.JavaMail.zimbra@efficios.com> In-Reply-To: From: Chris Kennelly Date: Tue, 14 Jul 2020 22:34:38 -0400 Message-ID: Subject: Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq To: Peter Oskolkov Cc: Mathieu Desnoyers , Peter Oskolkov , Peter Zijlstra , linux-kernel , Thomas Gleixner , paulmck , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Christian Brauner , Florian Weimer , carlos Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 14, 2020 at 2:33 PM Peter Oskolkov wrote: > > On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers > wrote: > > > > ----- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov posk@posk.io wrote: > > > > > At Google, we actually extended struct rseq (I will post the patches > > > here once they are fully deployed and we have specific > > > benefits/improvements to report). We did this by adding several fields > > > below __u32 flags (the last field currently), and correspondingly > > > increasing rseq_len in rseq() syscall. If the kernel does not know of > > > this extension, it will return -EINVAL due to an unexpected rseq_len; > > > then the application can either fall-back to the standard/upstream > > > rseq, or bail. If the kernel does know of this extension, it accepts > > > it. If the application passes the old rseq_len (32), the kernel knows > > > that this is an old application and treats it as such. > > > > > > I looked through the archives, but I did not find specifically why the > > > pretty standard approach described above is considered inferior to the > > > one taken in this patch (freeze rseq_len at 32, add additional length > > > fields to struct rseq). Can these be summarized? > > > > I think you don't face the issues I'm facing with libc rseq integration > > because you control the entire user-space software ecosystem at Google. > > > > The main issue we face is that the library responsible for registering > > rseq (either glibc 2.32+, an early-adopter librseq library, or the > > application) may very well not be the same library defining the __rseq_abi > > symbol used in the global symbol table. Interposition with ld preload or > > by defining the __rseq_abi in the program's executable are good examples > > of this kind of scenario, and those use-cases are supported. Does this work if/when we run out of bytes in the current sizeof(__rseq_abi)? Which library provides the TLS symbol (and N bytes of storage) seems sensitive to the choices the linker makes for us, once the symbol sizes diverge. > > So the size of the __rseq_abi structure may be larger than the struct > > rseq known by glibc (and eventually smaller, if future glibc versions > > extend their __rseq_abi size but is loaded with an older program/library > > doing __rseq_abi interposition). When glibc provides registration, is the anticipated use case that a library would unregister and reregister each thread to "upgrade" it to the most modern version of interface it knows about provided by the kernel? > > So we need some way to allow code defining the __rseq_abi to let the kernel > > know how much room is available, without necessarily requiring the code > > responsible for rseq registration to be aware of that extended layout. > > This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field > > __rseq_abi.user_size. > > > > And we need some way to allow the kernel to let user-space rseq critical > > sections (user code) know how much of those fields are actually populated > > by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE > > with __rseq_abi.kernel_size. I authored the userspace component (https://github.com/google/tcmalloc/commit/ad136d45f75a273b934446699cef8b278c34ec6e) that consumes the extensions Peter mentions and found that minimizing the performance impact of their potential absence was a bit of a challenge. There, I could assume an all-or-nothing registration of the new feature--limited only by kernel availability for thread homogeneity--but inconsistencies across early adopter libraries would mean each thread would have to examine its own TLS to determine if a feature were available. Chris