Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp72356ybf; Wed, 26 Feb 2020 09:02:57 -0800 (PST) X-Google-Smtp-Source: APXvYqzVk9Uh8GZLBRWgJK96oa6PiAY16EPhFnW8EZ7SjHscU3dGLT5mgaXAYaWBnDG0hCA7Mty1 X-Received: by 2002:aca:d6c8:: with SMTP id n191mr4166678oig.103.1582736577066; Wed, 26 Feb 2020 09:02:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582736577; cv=none; d=google.com; s=arc-20160816; b=X3VYfrE0SeC5su1Wc1Hzl9vfOiWYE3UUbdF5IYiOWu8aacwlyjuOwtKcbTzzTJWtdW 8EkehbjfO5aon0WSk6+58MTxumOy4+K7ZvQwUqCkIc/NroJ8aetBRBfx7UKoTMqjZ8Nr TQWPjKdJu5kdVPbLfXlK4qZaMR9uB6zhGz9/5GJiTsOT+w6mhIPrlfWHuwBs5SaQMKkc qlyjZhDcvnVzW8+gB3uZ3qVPusLZD+g5aCdmwSs18sMRnnwOIhbGNcpuL65f9dwv2iQe J0UQxoK6oF/3Rqr9v0pE/ZdkosmMILEeGn58OasXW3it0w+WsQMdG7wjUUuen2DnpeMq +Jnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=CPFWpTU4FvoEboYMxaYMx+BSZ4BePWEjHEP8w+tky44=; b=NAXtDQhms8f6hTmnKZMFfc1zVabxWHd4iqqMcgv5UGP9ixZi/MXJtBLcjzs8KlAzeH c3fqtva+OQVsLqKbTr6Na4znYRlu8ORhLKg/x2YWWF3bOeAwwZocYoGtIkiDc6spkcUg /tf9gzzKrARxGxIctEA+oJqlJ496daKghzYlZ1MCZK0fRuAV0iyfnmy+lmIK9+dt3tyu BwTqs/1aDPs3l09raq+hIhreR31h0/IBKuW2XbziVwB2kz1MNnVrU049pqJuz/bUtwxn U59qtpIMN1s71+b7C0/qwvIcVDeJ8j58sCOI6kiz3LcIfZL8/HJhrxwsNfHbzA7xBjX3 9n2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=P4XxdpRk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r67si1479311oie.145.2020.02.26.09.02.44; Wed, 26 Feb 2020 09:02:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=P4XxdpRk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728102AbgBZRBv (ORCPT + 99 others); Wed, 26 Feb 2020 12:01:51 -0500 Received: from mail.efficios.com ([167.114.26.124]:37204 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726214AbgBZRBv (ORCPT ); Wed, 26 Feb 2020 12:01:51 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id D0BA926DE05; Wed, 26 Feb 2020 12:01:49 -0500 (EST) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id rzN0qM8C3Ax9; Wed, 26 Feb 2020 12:01:49 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 7A34826DE04; Wed, 26 Feb 2020 12:01:49 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 7A34826DE04 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1582736509; bh=CPFWpTU4FvoEboYMxaYMx+BSZ4BePWEjHEP8w+tky44=; h=Date:From:To:Message-ID:MIME-Version; b=P4XxdpRk7VYlpjOLI5MM13ZaO40m7/widKGQ30vFqC3O+8Al3IxMpUe8bEVUz5Vax acYtmuaoQbpU9wpHvqrO6dBJQquOvLqDf+X/cg4YIOWuhy00z33yApZbAYtdv057hm +L5iPC4ITu041sck/pZIwTko898sw+IaYKpgx3LQtBUNhVRzJqxSyAduOmSc7R3rB2 8W3CpEthby1IdmMmcOOlsugUQcv4YFmKYUR2yHZdlaUURgwlu5aK6G+lMf4cqm37hg 1jJtvpfTt3NxBvu7On+i/zqDRfQ8Cg6SZGC5YJ5qbrgHTMJhNBhPG8ofpR3wl8hexr kiVxroeRrUwqg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id HnOMB7vfbHj9; Wed, 26 Feb 2020 12:01:49 -0500 (EST) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 6606F26D9FC; Wed, 26 Feb 2020 12:01:49 -0500 (EST) Date: Wed, 26 Feb 2020 12:01:49 -0500 (EST) From: Mathieu Desnoyers To: Chris Kennelly Cc: "Joel Fernandes, Google" , Paul Turner , Florian Weimer , Carlos O'Donell , libc-alpha , linux-kernel , Peter Zijlstra , paulmck , Boqun Feng , Brian Geffon Message-ID: <1089333712.8657.1582736509318.JavaMail.zimbra@efficios.com> In-Reply-To: References: <1503467992.2999.1582234410317.JavaMail.zimbra@efficios.com> <20200221154923.GC194360@google.com> <1683022606.3452.1582301632640.JavaMail.zimbra@efficios.com> Subject: Re: Rseq registration: Google tcmalloc vs glibc MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3901 (ZimbraWebClient - FF72 (Linux)/8.8.15_GA_3895) Thread-Topic: Rseq registration: Google tcmalloc vs glibc Thread-Index: kibzHoftXFpKPvtOPIttnZNRigDBUg== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Feb 25, 2020, at 10:38 PM, Chris Kennelly ckennelly@google.com wrote: > On Tue, Feb 25, 2020 at 10:25 PM Joel Fernandes wrote: >> >> On Fri, Feb 21, 2020 at 11:13 AM Mathieu Desnoyers >> wrote: >> > >> > ----- On Feb 21, 2020, at 10:49 AM, Joel Fernandes, Google >> > joel@joelfernandes.org wrote: >> > >> > [...] >> > >> >> > >> 3) Use the __rseq_abi TLS cpu_id field to know whether Rseq has been >> > >> registered. >> > >> >> > >> - Current protocol in the most recent glibc integration patch set. >> > >> - Not supported yet by Linux kernel rseq selftests, >> > >> - Not supported yet by tcmalloc, >> > >> >> > >> Use the per-thread state to figure out whether each thread need to register >> > >> Rseq individually. >> > >> >> > >> Works for integration between a library which exists for the entire lifetime >> > >> of the executable (e.g. glibc) and other libraries. However, it does not >> > >> allow a set of libraries which are dlopen'd/dlclose'd to co-exist without >> > >> having a library like glibc handling the registration present. >> > > >> > > Mathieu, could you share more details about why during dlopen/close >> > > libraries we cannot use the same __rseq_abi TLS to detect that rseq was >> > > registered? >> > >> > Sure, >> > >> > A library which is only loaded and never closed during the execution of the >> > program can let the kernel implicitly unregister rseq at thread exit. For >> > the dlopen/dlclose use-case, we need to be able to explicitly unregister >> > each thread's __rseq_abi which sit in a library which is going to be >> > dlclose'd. >> >> Mathieu, Thanks a lot for the explanation, it makes complete sense. It >> sounds from Chris's reply that tcmalloc already checks >> __rseq_abi.cpu_id and is not dlopened/closed. Considering these, it >> seems to already handle things properly - CMIIW. > > I'll make a note about this, since we can probably benefit from some > more comments about the assumptions/invariants the fastpath uses. I suspect the integration with glibc and with dlopen'd/dlclose'd libraries will not behave correctly with the current tcmalloc implementation. Based on the tcmalloc code-base, InitFastPerCpu is only called from IsFast. As long as this is the only expected caller, having IsFast comparing the RseqCpuId detects whether glibc (or some other library) has already registered rseq for the current thread. However, if the application chooses to invoke InitFastPerCpu() directly, things become expected, because it invokes: absl::base_internal::LowLevelCallOnce(&init_per_cpu_once, InitPerCpu); which AFAIU invokes InitPerCpu once after execution of the current program. Which does: static bool InitThreadPerCpu() { if (__rseq_refcount++ > 0) { return true; } auto ret = syscall(__NR_rseq, &__rseq_abi, sizeof(__rseq_abi), 0, PERCPU_RSEQ_SIGNATURE); if (ret == 0) { return true; } else { __rseq_refcount--; } return false; } static void InitPerCpu() { // Based on the results of successfully initializing the first thread, mark // init_status to initialize all subsequent threads. if (InitThreadPerCpu()) { init_status = kFastMode; } } In a scenario where glibc has already registered Rseq, the __rseq_refcount will be incremented, the __NR_rseq syscall will fail with -1, errno=EBUSY, so the refcount will be immediately decremented and it will return false. Therefore, "init_status" will never be set fo kFastMode, leaving it in kSlowMode for the entire lifetime of this program. That being said, even though this state can come as a surprise, it seems to be entirely bypassed by the fast-paths IsFast() and IsFastNoInit(), so maybe it won't have any observable side-effects other than leaving init_status in a state that does not match reality. In the other use-case where tcmalloc co-exist with a dlopened/dlclosed library, but glibc does not provide Rseq registration, we run into issues as well if the dlopened library registers rseq first for a given thread. The IsFastNoInit() expects that if Rseq has been observed as registered in the past for a thread, it stays registered. However, if a dlclosed library unregisters Rseq, we need to be prepared to re-register it. So either tcmalloc needs to express its use of Rseq by incrementing __rseq_refcount even when Rseq is registered (this would hurt the fast-path however, and I would hate to have to do this), or tcmalloc needs to be able to handle the fact that Rseq may be unregistered by a dlclosed library which was the actual owner of the Rseq registration. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com