Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp298701ybf; Wed, 26 Feb 2020 13:22:54 -0800 (PST) X-Google-Smtp-Source: APXvYqxzJyAbOzxyFeESqwJEPJ9N23qW+SNa9ksG+zn4cMIuTk7J7ii8waOErNMbgvJcwZffhfKb X-Received: by 2002:a05:6808:104:: with SMTP id b4mr743652oie.169.1582752174135; Wed, 26 Feb 2020 13:22:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582752174; cv=none; d=google.com; s=arc-20160816; b=b7HLGnkNCGTew+CkHbNuRcVHsl9pFqXnufYdhS5nCezxL+g/dFfJKF8l0yuaUL+p2W M7ECa5xa3V+PUisrgoMZXrBVoegNpnqO6jH5lczxp45lhUB+PaTuiQmGJP10c5FUCS/V ttDbq3ciM3lqTSWnDBnO6UWpUv8GCZSCnhEnS1PynWy7eHoI13mOQhDkcDiB0wBJiTc6 ZyPm+3Xj93sBZViT9SwzQGTh9oZ8v7q8PrGaNThEj9xmUyy0vls74bp+YBkwMPjUwfO7 sBz2+EckSE9Za5DYDSW2UZWreTVZ/aAMmdwYQSavvFzaKJSUKdoX4t6pap2jinUUpBaC RFGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=OED3DPvD9JClZeVEY6Ljgwb6qq2mHEoFGrtuVQS5EI4=; b=qZ8BkUNZ/mQL+A/PlmZ43oQ/D5rds4NO+v3NWy9Igm9y6biHzdJncF4qgJTQaWlz37 A9TnyS3O6r/yfltJrxBdDWAmORhTd0iEAcd4zcz+HThVB4jILVCa36e8JETh/d54sUVz PL2sER4yBuGKd+8EVbe9mmVpVWy7nRfy9uRc/ynS+tOwPxqseRumnN6TOnZuqRago5ys n2YV0eAtfPW5beaxnafqnAnoUtfib3TE9YOj4zs7FOMIQLtYBMIcaW+yzPKXarZ3BOs6 /eezHzQ9CryeealdOhyOnxJhX6irFtv1hSqa/IGpD4+NzN7/sK/EBEekqoyDQt0rBejs GLPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=CMbMcWy9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y8si229979oih.141.2020.02.26.13.22.40; Wed, 26 Feb 2020 13:22:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=CMbMcWy9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727552AbgBZVVs (ORCPT + 99 others); Wed, 26 Feb 2020 16:21:48 -0500 Received: from mail.efficios.com ([167.114.26.124]:46600 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727503AbgBZVVr (ORCPT ); Wed, 26 Feb 2020 16:21:47 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 8DEEA26FC16; Wed, 26 Feb 2020 16:21:45 -0500 (EST) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id uIPLYfupRLTf; Wed, 26 Feb 2020 16:21:45 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 11C8C26FA48; Wed, 26 Feb 2020 16:21:45 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 11C8C26FA48 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1582752105; bh=OED3DPvD9JClZeVEY6Ljgwb6qq2mHEoFGrtuVQS5EI4=; h=Date:From:To:Message-ID:MIME-Version; b=CMbMcWy95vo7qOs7HBt0d6VKI4o5J6kHFD09o9IQBiNPC/g9cS3w40K6vR3MGlVhn +7jTFgwRBZ5dHjTOAYltplO9HFwU5Do+hgwrE00HhZrRvWKf/8QYvaa9sSnrS+pEjw y4hE2IkBuqoXdDw41BJmhDCulO0LCFtsSChZmrYc1AlfHVuacD4aQPZ4/V6u0OKWym kx3WBekDkC/OxhB4LV5SVPiadltl79jS2kCq8DSGVk3kOqfRkHndpcliBrgdO1XiJU 5kbymq9hHWZyWGLiZN4JxAySpABj4cpXqUKD2b5EttCaW/V8Y+s9XIbSOV52PHzxb7 4CrFXWZE9petg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jFMq9slHgwXY; Wed, 26 Feb 2020 16:21:45 -0500 (EST) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id EE57F26FBAD; Wed, 26 Feb 2020 16:21:44 -0500 (EST) Date: Wed, 26 Feb 2020 16:21:44 -0500 (EST) From: Mathieu Desnoyers To: Chris Kennelly Cc: "Joel Fernandes, Google" , Paul Turner , Florian Weimer , Carlos O'Donell , libc-alpha , linux-kernel , Peter Zijlstra , paulmck , Boqun Feng , Brian Geffon Message-ID: <1484567919.9131.1582752104880.JavaMail.zimbra@efficios.com> In-Reply-To: References: <1503467992.2999.1582234410317.JavaMail.zimbra@efficios.com> <1683022606.3452.1582301632640.JavaMail.zimbra@efficios.com> <1089333712.8657.1582736509318.JavaMail.zimbra@efficios.com> <982202794.8791.1582743392060.JavaMail.zimbra@efficios.com> Subject: Re: Rseq registration: Google tcmalloc vs glibc MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3901 (ZimbraWebClient - FF72 (Linux)/8.8.15_GA_3895) Thread-Topic: Rseq registration: Google tcmalloc vs glibc Thread-Index: Q56Z9F8Xedj3u3qgHl3iGhH6fyNEzA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Feb 26, 2020, at 2:12 PM, Chris Kennelly ckennelly@google.com wrote: > On Wed, Feb 26, 2020 at 1:56 PM Mathieu Desnoyers > wrote: >> >> ----- On Feb 26, 2020, at 12:27 PM, Chris Kennelly ckennelly@google.com wrote: >> >> > On Wed, Feb 26, 2020 at 12:01 PM Mathieu Desnoyers >> > wrote: >> >> >> >> ----- On Feb 25, 2020, at 10:38 PM, Chris Kennelly ckennelly@google.com wrote: >> >> >> >> > On Tue, Feb 25, 2020 at 10:25 PM Joel Fernandes wrote: >> >> >> >> >> >> On Fri, Feb 21, 2020 at 11:13 AM Mathieu Desnoyers >> >> >> wrote: >> >> >> > >> >> >> > ----- On Feb 21, 2020, at 10:49 AM, Joel Fernandes, Google >> >> >> > joel@joelfernandes.org wrote: >> >> >> > >> >> >> > [...] >> >> >> > >> >> >> >> > >> 3) Use the __rseq_abi TLS cpu_id field to know whether Rseq has been >> >> >> > >> registered. >> >> >> > >> >> >> >> > >> - Current protocol in the most recent glibc integration patch set. >> >> >> > >> - Not supported yet by Linux kernel rseq selftests, >> >> >> > >> - Not supported yet by tcmalloc, >> >> >> > >> >> >> >> > >> Use the per-thread state to figure out whether each thread need to register >> >> >> > >> Rseq individually. >> >> >> > >> >> >> >> > >> Works for integration between a library which exists for the entire lifetime >> >> >> > >> of the executable (e.g. glibc) and other libraries. However, it does not >> >> >> > >> allow a set of libraries which are dlopen'd/dlclose'd to co-exist without >> >> >> > >> having a library like glibc handling the registration present. >> >> >> > > >> >> >> > > Mathieu, could you share more details about why during dlopen/close >> >> >> > > libraries we cannot use the same __rseq_abi TLS to detect that rseq was >> >> >> > > registered? >> >> >> > >> >> >> > Sure, >> >> >> > >> >> >> > A library which is only loaded and never closed during the execution of the >> >> >> > program can let the kernel implicitly unregister rseq at thread exit. For >> >> >> > the dlopen/dlclose use-case, we need to be able to explicitly unregister >> >> >> > each thread's __rseq_abi which sit in a library which is going to be >> >> >> > dlclose'd. >> >> >> >> >> >> Mathieu, Thanks a lot for the explanation, it makes complete sense. It >> >> >> sounds from Chris's reply that tcmalloc already checks >> >> >> __rseq_abi.cpu_id and is not dlopened/closed. Considering these, it >> >> >> seems to already handle things properly - CMIIW. >> >> > >> >> > I'll make a note about this, since we can probably benefit from some >> >> > more comments about the assumptions/invariants the fastpath uses. >> >> >> >> I suspect the integration with glibc and with dlopen'd/dlclose'd libraries will >> >> not >> >> behave correctly with the current tcmalloc implementation. >> >> >> >> Based on the tcmalloc code-base, InitFastPerCpu is only called from IsFast. As >> >> long >> >> as this is the only expected caller, having IsFast comparing the RseqCpuId >> >> detects >> >> whether glibc (or some other library) has already registered rseq for the >> >> current >> >> thread. >> >> >> >> However, if the application chooses to invoke InitFastPerCpu() directly, things >> >> become >> >> expected, because it invokes: >> >> >> >> absl::base_internal::LowLevelCallOnce(&init_per_cpu_once, InitPerCpu); >> >> >> >> which AFAIU invokes InitPerCpu once after execution of the current program. >> >> Which >> >> does: >> >> >> >> static bool InitThreadPerCpu() { >> >> if (__rseq_refcount++ > 0) { >> >> return true; >> >> } >> >> >> >> auto ret = syscall(__NR_rseq, &__rseq_abi, sizeof(__rseq_abi), 0, >> >> PERCPU_RSEQ_SIGNATURE); >> >> if (ret == 0) { >> >> return true; >> >> } else { >> >> __rseq_refcount--; >> >> } >> >> >> >> return false; >> >> } >> >> >> >> static void InitPerCpu() { >> >> // Based on the results of successfully initializing the first thread, mark >> >> // init_status to initialize all subsequent threads. >> >> if (InitThreadPerCpu()) { >> >> init_status = kFastMode; >> >> } >> >> } >> >> >> >> In a scenario where glibc has already registered Rseq, the __rseq_refcount will >> >> be incremented, the __NR_rseq syscall will fail with -1, errno=EBUSY, so the >> >> refcount >> >> will be immediately decremented and it will return false. Therefore, >> >> "init_status" will >> >> never be set fo kFastMode, leaving it in kSlowMode for the entire lifetime of >> >> this >> >> program. That being said, even though this state can come as a surprise, it >> >> seems to >> >> be entirely bypassed by the fast-paths IsFast() and IsFastNoInit(), so maybe it >> >> won't >> >> have any observable side-effects other than leaving init_status in a state that >> >> does not >> >> match reality. >> > >> > I agree that this could potentially violate inviarants, but >> > InitFastPerCpu is not intended to be called by the application. >> >> OK, explicitly documenting this would be a good thing. In my own projects, >> I prefix those symbols with double-underscores (__) to indicate that those >> are not meant to be called by other means than the static inlines in the API. >> >> There may be use-cases which justify exposing InitFastPerCpu as a public API for >> applications though, especially for those which require some level of >> real-time guarantees from the malloc/free APIs. I've run into this situation >> with liburcu which I maintain. >> >> > >> >> In the other use-case where tcmalloc co-exist with a dlopened/dlclosed library, >> >> but glibc >> >> does not provide Rseq registration, we run into issues as well if the dlopened >> >> library >> >> registers rseq first for a given thread. The IsFastNoInit() expects that if Rseq >> >> has been >> >> observed as registered in the past for a thread, it stays registered. However, >> >> if a >> >> dlclosed library unregisters Rseq, we need to be prepared to re-register it. So >> >> either >> >> tcmalloc needs to express its use of Rseq by incrementing __rseq_refcount even >> >> when Rseq >> >> is registered (this would hurt the fast-path however, and I would hate to have >> >> to do this), >> >> or tcmalloc needs to be able to handle the fact that Rseq may be unregistered by >> >> a dlclosed >> >> library which was the actual owner of the Rseq registration. >> > >> > We have a bit of an opportunity to figure out whether this is the >> > first time--from TCMalloc's perspective--a thread is doing per-CPU and >> > bump the __rseq_count accordingly. I think this could be done off of >> > the fast path. >> >> Is there an explicit tcmalloc API call that each thread need to do before >> starting >> to use tcmalloc to allocate and free memory ? If not, you'll probably need to >> add >> at least a load of __rseq_refcount (or some other TLS variable), test and >> conditional >> branch on the fast-path, which is an additional cost I would ideally prefer to >> avoid. >> Or do you have something else in mind ? > > No explicit call is necessary. This is something that can be done in > the slow path, since we can recognize the transition from slow -> fast > path for that thread Got it, it should work. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com