Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp97278ybf; Wed, 26 Feb 2020 09:28:23 -0800 (PST) X-Google-Smtp-Source: APXvYqzCdDDsgtk+lsk42aR9vWPXCzMFaWo07CXAM1uIy65SmkQqAkFW/O1nLWroETW5NkaEgBSd X-Received: by 2002:a05:6830:128e:: with SMTP id z14mr3928419otp.184.1582738103830; Wed, 26 Feb 2020 09:28:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582738103; cv=none; d=google.com; s=arc-20160816; b=LdLpnJ3hoiwu3iLvHoVDVl9YtOhr+HKKUHlFlNJ4WNw2enoam5FftPHbo+o2Vkqo11 BaRJNp9XWd3LFH4Nqi5e0C1ukdOaWpsjlw+l3FJOTBhyr3SgCvtcGTvfqPy0kGTlqg2W G8l+19o0XUhmPUTi3/NEJ0fHvn8/jrrlhZU/5bpU/zcLg4yauoZegUYlcE2PRGJ69Qwj D8qzV4AfkmTsjChkB6261Y62gl2ItOaztshKIRgPhbWQgo5npHk8l5wasPjUoWt8ArRD CkfyF1T8LswiYBiTROKm1f/4aNGYzqOIS+7KYBFQZm2mcAdaJwZzScNAQy+T0EOA8ACq L+Ow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Guhz1TvLa+j1kLsI8L+19deSsQysL9993WVY47CsKR0=; b=Mq5DXkEOWF+WnS/FOKModP4w2wJxOAczF2J2fboHaq7a4C2rYf1c/k+Bkfu6IItaD/ Z/0Hi+0kKpVBf+aM9UT1lQ4uZALeMPr3GutCn7CXGWiEf6CEl3aMuroQjPm1VrP7Zaj6 jbbSBz2FNBZLxSvmqm6upcl7NNlVWgBri518y8UE5tjeo0bFsWTXv9T4IJ4C7gwXQHoK fQTbew0WUhQZM+5MkywDWLzdJrZ9LsYI7Uc5ELEwer6ishnFIWHvrDIYBiBgPAZeu1Ii HHvd3hNOIKWQaEywzcHelTGo26l/6a5reeU1cjEECRManW7PjTwF9C4uhE8hAjTIhyeD sjHA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oCEmacRF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i11si113200otc.105.2020.02.26.09.28.10; Wed, 26 Feb 2020 09:28:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oCEmacRF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726859AbgBZR1v (ORCPT + 99 others); Wed, 26 Feb 2020 12:27:51 -0500 Received: from mail-wr1-f45.google.com ([209.85.221.45]:45045 "EHLO mail-wr1-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726579AbgBZR1v (ORCPT ); Wed, 26 Feb 2020 12:27:51 -0500 Received: by mail-wr1-f45.google.com with SMTP id m16so4070376wrx.11 for ; Wed, 26 Feb 2020 09:27:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Guhz1TvLa+j1kLsI8L+19deSsQysL9993WVY47CsKR0=; b=oCEmacRFo+nQYpacfKMzpwTIVY0CzTR7rB6sAB57a3/DesOjLeh4CRwMIfbSeR8FYx F1uipdbUYpVCoTJCTXeQ15cNiRG9R4BnexbXkB2FPCLIyrc7ey8cgBnZQ69ivDQM1b4T O/vnc3tzeOI+oSnEArvlIU972xBCmVhn/ONkrK+oUrfqV1lXHgGpz0EISocApJHMMHAR /FZt9VuPJv6fEwCEVLG2rdwSiIqJ6BLWNo6jn8hyBtOZgkXyov3ntp3uHxZd9nfQxYDA fga64Rq72jEGJyzSImRuPiUiAHAxjseokf8Vg6lPOLh6uEl41jbti5UTGMrZOE0SMyUk 3OSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Guhz1TvLa+j1kLsI8L+19deSsQysL9993WVY47CsKR0=; b=DAB2b1a0k7EJlEAO75nwblBNCyrtwOpPhvJLZMcp1hKjqykXfsbJYwGa6Mi2GPlFaI o6NJXMCIuElTTbqKeaXNxzf/xgRtnAyp61r6faFvb4bpcvd1pCzYl/Fk7Fhx0Y1/f/EF hOHLRb30QNXDK2gA2JTKX2MjXudNj344OYNImOwMjEptQSGeOfvG9YcM21AhZpXxsW3M 09S1mEk7ymwUoihSmMRmVE7f3H97x5I3mIPM6PCn8NJjviYmV3nh/uTZ8du5KHWelvCG vFD+zwcm0kaX5MzEnJdHJaUjw8uvPioICGXzq3hE94rQjEMKFx5NoeeueD4LP4apnB3U S/lA== X-Gm-Message-State: APjAAAU9SkmMeQ6pWFuGaACsZtVoJWjYWU6HKRKoJ7Kt1hZHFfsOtNLu LDYWRQodBi02a3XkHA3HbZRw3kD2F+FCcg52SUphSA== X-Received: by 2002:adf:e40f:: with SMTP id g15mr6482239wrm.223.1582738068080; Wed, 26 Feb 2020 09:27:48 -0800 (PST) MIME-Version: 1.0 References: <1503467992.2999.1582234410317.JavaMail.zimbra@efficios.com> <20200221154923.GC194360@google.com> <1683022606.3452.1582301632640.JavaMail.zimbra@efficios.com> <1089333712.8657.1582736509318.JavaMail.zimbra@efficios.com> In-Reply-To: <1089333712.8657.1582736509318.JavaMail.zimbra@efficios.com> From: Chris Kennelly Date: Wed, 26 Feb 2020 12:27:36 -0500 Message-ID: Subject: Re: Rseq registration: Google tcmalloc vs glibc To: Mathieu Desnoyers Cc: "Joel Fernandes, Google" , Paul Turner , Florian Weimer , "Carlos O'Donell" , libc-alpha , linux-kernel , Peter Zijlstra , paulmck , Boqun Feng , Brian Geffon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 26, 2020 at 12:01 PM Mathieu Desnoyers wrote: > > ----- On Feb 25, 2020, at 10:38 PM, Chris Kennelly ckennelly@google.com wrote: > > > On Tue, Feb 25, 2020 at 10:25 PM Joel Fernandes wrote: > >> > >> On Fri, Feb 21, 2020 at 11:13 AM Mathieu Desnoyers > >> wrote: > >> > > >> > ----- On Feb 21, 2020, at 10:49 AM, Joel Fernandes, Google > >> > joel@joelfernandes.org wrote: > >> > > >> > [...] > >> > >> > >> > >> 3) Use the __rseq_abi TLS cpu_id field to know whether Rseq has been > >> > >> registered. > >> > >> > >> > >> - Current protocol in the most recent glibc integration patch set. > >> > >> - Not supported yet by Linux kernel rseq selftests, > >> > >> - Not supported yet by tcmalloc, > >> > >> > >> > >> Use the per-thread state to figure out whether each thread need to register > >> > >> Rseq individually. > >> > >> > >> > >> Works for integration between a library which exists for the entire lifetime > >> > >> of the executable (e.g. glibc) and other libraries. However, it does not > >> > >> allow a set of libraries which are dlopen'd/dlclose'd to co-exist without > >> > >> having a library like glibc handling the registration present. > >> > > > >> > > Mathieu, could you share more details about why during dlopen/close > >> > > libraries we cannot use the same __rseq_abi TLS to detect that rseq was > >> > > registered? > >> > > >> > Sure, > >> > > >> > A library which is only loaded and never closed during the execution of the > >> > program can let the kernel implicitly unregister rseq at thread exit. For > >> > the dlopen/dlclose use-case, we need to be able to explicitly unregister > >> > each thread's __rseq_abi which sit in a library which is going to be > >> > dlclose'd. > >> > >> Mathieu, Thanks a lot for the explanation, it makes complete sense. It > >> sounds from Chris's reply that tcmalloc already checks > >> __rseq_abi.cpu_id and is not dlopened/closed. Considering these, it > >> seems to already handle things properly - CMIIW. > > > > I'll make a note about this, since we can probably benefit from some > > more comments about the assumptions/invariants the fastpath uses. > > I suspect the integration with glibc and with dlopen'd/dlclose'd libraries will not > behave correctly with the current tcmalloc implementation. > > Based on the tcmalloc code-base, InitFastPerCpu is only called from IsFast. As long > as this is the only expected caller, having IsFast comparing the RseqCpuId detects > whether glibc (or some other library) has already registered rseq for the current > thread. > > However, if the application chooses to invoke InitFastPerCpu() directly, things become > expected, because it invokes: > > absl::base_internal::LowLevelCallOnce(&init_per_cpu_once, InitPerCpu); > > which AFAIU invokes InitPerCpu once after execution of the current program. Which > does: > > static bool InitThreadPerCpu() { > if (__rseq_refcount++ > 0) { > return true; > } > > auto ret = syscall(__NR_rseq, &__rseq_abi, sizeof(__rseq_abi), 0, > PERCPU_RSEQ_SIGNATURE); > if (ret == 0) { > return true; > } else { > __rseq_refcount--; > } > > return false; > } > > static void InitPerCpu() { > // Based on the results of successfully initializing the first thread, mark > // init_status to initialize all subsequent threads. > if (InitThreadPerCpu()) { > init_status = kFastMode; > } > } > > In a scenario where glibc has already registered Rseq, the __rseq_refcount will > be incremented, the __NR_rseq syscall will fail with -1, errno=EBUSY, so the refcount > will be immediately decremented and it will return false. Therefore, "init_status" will > never be set fo kFastMode, leaving it in kSlowMode for the entire lifetime of this > program. That being said, even though this state can come as a surprise, it seems to > be entirely bypassed by the fast-paths IsFast() and IsFastNoInit(), so maybe it won't > have any observable side-effects other than leaving init_status in a state that does not > match reality. I agree that this could potentially violate inviarants, but InitFastPerCpu is not intended to be called by the application. > In the other use-case where tcmalloc co-exist with a dlopened/dlclosed library, but glibc > does not provide Rseq registration, we run into issues as well if the dlopened library > registers rseq first for a given thread. The IsFastNoInit() expects that if Rseq has been > observed as registered in the past for a thread, it stays registered. However, if a > dlclosed library unregisters Rseq, we need to be prepared to re-register it. So either > tcmalloc needs to express its use of Rseq by incrementing __rseq_refcount even when Rseq > is registered (this would hurt the fast-path however, and I would hate to have to do this), > or tcmalloc needs to be able to handle the fact that Rseq may be unregistered by a dlclosed > library which was the actual owner of the Rseq registration. We have a bit of an opportunity to figure out whether this is the first time--from TCMalloc's perspective--a thread is doing per-CPU and bump the __rseq_count accordingly. I think this could be done off of the fast path. Chris