Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1116608ybv; Thu, 20 Feb 2020 13:34:01 -0800 (PST) X-Google-Smtp-Source: APXvYqxBX7oSWTNq8A8sFIODCsIIWX6QbXhUgg4bBzYeTWLv037XjNMGuopt+gVst3fLrnTUiSVB X-Received: by 2002:a05:6808:b13:: with SMTP id s19mr3493968oij.119.1582234441199; Thu, 20 Feb 2020 13:34:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582234441; cv=none; d=google.com; s=arc-20160816; b=Sj9Aovqa3aF07TMnuK0ff1y8iBgdCUcNzq/k2l5ksQz9zriMmrQl7f6MzoGpEULT1c ECLtKp01dxOs917hpK0+FZZ7TBuOxkJgkNvZAcyU7FMB2/8+Q0hR5sSWwfKPerxJO/SC PUHyKR60/gqAGkYmQCCCRXIxjxlHGDrxYg1d/KfAtE9OncPCVzh60bqprkPAB394U1F9 HQ3ncPm6R758MO2t3FWwoPXbNEDNQEpSGHLnYXANrqPLleNJQqjstg72WnlYs4EoaCKg rEAPe6GnVlrpvEKQUJtZUP2HoAzLMwy/il4qdlcZ0yaa32GIy0KdIVmdhuvCvVY9+S7u Gbcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-topic:thread-index :content-transfer-encoding:mime-version:subject:message-id:cc:to :from:date:dkim-signature:dkim-filter; bh=qUvJahEG+Ifsoqa25pPmv+QBKwuV/zZcj9KWXJdnAYA=; b=cVsZRv9zMvy4EzAhSyDlDT5eYOUNWXpdnsQfbYdTOq4fiQRiHqcb1VAdEOHmnNE1V/ 4fDa8/eDAHhu/m5cW2JNCM/QpL4ySMz6dbfnsQJ0jJA13GGoeMcH9Y9AVAWDy9LuPm3o 2vdbyVYCX6fVeeZsC7PsX5DaIDCEg/hdVgg3g21YWIPHGn8cGH8wZz8tnI/bfiMqDlXu Z1eY/HlbNXUw0kDgO71aHLGQnsYBCwuQi9F1WW6WJ2ERkhtnwby3kATFPUrtpJLDwh5N avZ125YLalxASbMr7DkZuu1uN60RXO0f7zwaM0Ris830pdGRGMYXcIPL0pxjlS+JwNUL tmkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=EbqMF2ly; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l14si346894otk.225.2020.02.20.13.33.49; Thu, 20 Feb 2020 13:34:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=EbqMF2ly; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729128AbgBTVdc (ORCPT + 99 others); Thu, 20 Feb 2020 16:33:32 -0500 Received: from mail.efficios.com ([167.114.26.124]:53214 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729075AbgBTVdc (ORCPT ); Thu, 20 Feb 2020 16:33:32 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id EC75E259150; Thu, 20 Feb 2020 16:33:30 -0500 (EST) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id JTPi-6MrkGF1; Thu, 20 Feb 2020 16:33:30 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 74E4725921E; Thu, 20 Feb 2020 16:33:30 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 74E4725921E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1582234410; bh=qUvJahEG+Ifsoqa25pPmv+QBKwuV/zZcj9KWXJdnAYA=; h=Date:From:To:Message-ID:MIME-Version; b=EbqMF2lyIv6SRmw4Z75e1iWR0cXEKb7mtL4W+P8cg+h32o9URa/9MLUaVkLOg/p32 zcuGwy+QiIeWJM7fAu3wxCwM1dG+JuBYvEW2JJxO0LHDTIV/cAdVmsLbmXouWrzOlJ tkn0y1Yo5LOk+bV38vxigeHS54LBv09nNAW4BCCbOq0kxrFaGaIp6HPDB6WRX3XaPO g3lWD1ooJUO+7QLk0XZQcECxOltTNyMqEUmk4wV8iKtZ+Fz3jyOoHxKbFdW3LoXIm5 0VVTWIIrnjdQQUcZ9QYqQHi04cr/mlYEDEyJR4eZOrml1qoQ7t5r7m9iYeuurnkPVr lz1anhDGhvPkg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id j92rGpVRavqm; Thu, 20 Feb 2020 16:33:30 -0500 (EST) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 64AF82591AC; Thu, 20 Feb 2020 16:33:30 -0500 (EST) Date: Thu, 20 Feb 2020 16:33:30 -0500 (EST) From: Mathieu Desnoyers To: Chris Kennelly Cc: "Joel Fernandes, Google" , Paul Turner , Florian Weimer , Carlos O'Donell , libc-alpha , linux-kernel , Peter Zijlstra , paulmck , Boqun Feng , Brian Geffon Message-ID: <1503467992.2999.1582234410317.JavaMail.zimbra@efficios.com> Subject: Rseq registration: Google tcmalloc vs glibc MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3901 (ZimbraWebClient - FF72 (Linux)/8.8.15_GA_3895) Thread-Index: cKUnKpwYxwNeLX5048w9AFEVK9mHtA== Thread-Topic: Rseq registration: Google tcmalloc vs glibc Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Chris, As one of the maintainers of the Rseq system call in the Linux kernel, I would like to thank the Google team for open sourcing a tcmalloc implementation based on Rseq! I've looked into some critical integration aspects of the tcmalloc implementation, and would like to bring up a topic which involves both tcmalloc developers and the glibc community. I have been discussing aspects of co-existence between early Rseq adopter libraries and glibc for the past year with the glibc community, and tcmalloc happens to be the first project to publicly use Rseq outside of prototype branches or selftests code. Considering that there can only be one Rseq registration per thread (as imposed by the rseq ABI), there needs to be some kind of protocol between libraries to ensure we don't introduce regressions when we eventually combine a newer glibc which takes care of registration of the __rseq_abi TLS along with tcmalloc which also try to perform that registration within the same thread. Throughout the various rounds of review of the glibc Rseq integration patch set, there were a few solutions envisioned. Here is a brief history: 1) Introduce a __rseq_refcount TLS variable. - Currently used by Linux tools/testing/selftests/rseq/rseq.c, - Currently used by Google tcmalloc, - Emitted by glibc as well my the original patchset (but was later removed), A user incrementing the refcount from 0 -> 1 performs rseq registration. The last user decrementing from 1 -> 0 performs rseq unregistration. Works for co-existence of dlopen'd/dlclose'd libraries, for dynamically linked libraries, and for the main executable. The refcounting was deemed too complex for glibc's needs (it always exists for the entire executable's lifetime), so we moved to __rseq_handled instead. 2) Introduce a __rseq_handled global variable. - Currently used by Linux tools/testing/selftests/rseq/rseq.c, - At some point emitted by glibc as well in my patch set (but was later removed), A library may take rseq ownership if it is still 0 when executing the library constructor. Set to 1 by library constructor when handling rseq. Set to 0 in destructor if handling rseq. Not meant to be set by dlopen'd/dlclose'd libraries, only by libraries existing for the whole lifetime of the executable and/or the main executable. This __rseq_handled symbol has been identified as being somewhat redundant with the information provided in the __rseq_abi.cpu_id field (uninitialized state), which motivated removing this symbol from the glibc integration entirely. The only reason for having __rseq_handled separate from __rseq_abi.cpu_id was because it was then impossible to touch TLS data early in the glibc initialization. This issue was later resolved within glibc. 3) Use the __rseq_abi TLS cpu_id field to know whether Rseq has been registered. - Current protocol in the most recent glibc integration patch set. - Not supported yet by Linux kernel rseq selftests, - Not supported yet by tcmalloc, Use the per-thread state to figure out whether each thread need to register Rseq individually. Works for integration between a library which exists for the entire lifetime of the executable (e.g. glibc) and other libraries. However, it does not allow a set of libraries which are dlopen'd/dlclose'd to co-exist without having a library like glibc handling the registration present. So overall, I suspect the protocol we want for early adopters is that they only register Rseq if __rseq_abi.cpu_id == RSEQ_CPU_ID_UNINITIALIZED, which ensure they do not get -1, errno = EBUSY when linked against a newer glibc which handles Rseq registration. In order to handle multiple early adopters dlopen'd/dlclose'd in the same executable, those should synchronize with a __rseq_refcount TLS reference count, but it does not have to be taken into account by the main executable or libraries present for the entire executable lifetime (like glibc). Based on this, what I think would be missing from the current Google tcmalloc implementation is a check for __rseq_abi.cpu_id == RSEQ_CPU_ID_UNINITIALIZED in InitThreadPerCpu(). Is tcmalloc ever meant to be dlopen'd/dlclose'd (either directly or indirectly), or is it required to exist for the entire executable lifetime ? The check and increment of __rseq_refcount is only useful to co-exist with dlopen'd/dlclose'd libraries, but it would not allow discovering the presence of a glibc which takes care of the rseq registration with the planned protocol. A dlopen'd library should then only perform rseq unregistration if if brings the __rseq_refcount back to 0 (e.g. in a pthread_key destructor). Adding this check for __rseq_abi.cpu_id == RSEQ_CPU_ID_UNINITIALIZED is something I need to do in the Linux rseq selftests, but I refrained from submitting any further change to those tests until the glibc rseq integration gets finally merged. Is it something that could be easily changed at this stage in Google tcmalloc, or should we reconsider adding back __rseq_refcount within the glibc integration patch set, even though it is not strictly useful to glibc ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com