Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3163028imu; Fri, 23 Nov 2018 23:04:24 -0800 (PST) X-Google-Smtp-Source: AFSGD/Uk/0ASCbhUO9lI7xyanlVJ4+/ec7HfVLH+IGi3AItTqqLTLECRRHfh/0PvBMsGNsgB3CKW X-Received: by 2002:a63:5d14:: with SMTP id r20mr17101843pgb.329.1543043064793; Fri, 23 Nov 2018 23:04:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543043064; cv=none; d=google.com; s=arc-20160816; b=p2hi/DXK/I2yaGB+O8QriZJnLkmjrV5OBctFF9bxVr6oqqWOppQLhbRcaM+bg65tf4 AQAPROaHJVZWLBjk8bcIjgw+iibVmLm1XD8gzlMYqIm74r8d1mk6bMZu6K8O/5bPSl2O 5D5ZJyk0ZxZ99Jie+UeEkLoze28AXgw6G//soF42dBTc7Z9JKz6LPkOCDuST1dWlMat0 L/mP24ujItsVmnZEOvtrD8z4/oGtmSCxVJpOwcRgcTZmNiZ6mPIqJFvzr7sgic7MQuCy 8uNf803zVpKfDk/bTzloxqpop/0FtRC0I6Iw32AKyMY/ToCFouEQjMYHyYAVT1osvS2h cfCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=JRB1cHnZVs2Q1494Tm4yX9b2eb8zwIL23tApa4FqV2c=; b=D6YT60H5hWVKQGYkj13pQ0DSlGy4haj1qnHLDYDz08+V6/3IJYIB7er4V3+2YpAEex Fuz1qJ4u0cSlrwlns2PruZdLxQv3TjVVr7MzZ5UNNRM2TmcUibY2A19jawxrLXe+LIX7 V3XJLoDB1YFqsfSwT8bmuBwg2iLEWiFN7w7gjWsLAmdoXACpvOUMfD9nriIsHPMvOxpi 4LVzmJEr3tO1ljrLz9dYdztrozqzGZJLnKZdUhdgFpMfGQy77vSCrRbDgUlesTVTk7sK w+JDS5pAn5JAIV3mWdQZjtnA7JCaFruU5X0b8q0S5cVZXa4A0Zo8eZNA83HU97GO1Kqv VY7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=O7MHUe0N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1si22739504pgq.553.2018.11.23.23.04.10; Fri, 23 Nov 2018 23:04:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=O7MHUe0N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404814AbeKWEKF (ORCPT + 99 others); Thu, 22 Nov 2018 23:10:05 -0500 Received: from mail.efficios.com ([167.114.142.138]:36036 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731476AbeKWEKE (ORCPT ); Thu, 22 Nov 2018 23:10:04 -0500 Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 62788E1FD; Thu, 22 Nov 2018 12:29:44 -0500 (EST) Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id pIL4_smxxJaf; Thu, 22 Nov 2018 12:29:43 -0500 (EST) Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id A1F14E1FA; Thu, 22 Nov 2018 12:29:43 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com A1F14E1FA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1542907783; bh=JRB1cHnZVs2Q1494Tm4yX9b2eb8zwIL23tApa4FqV2c=; h=Date:From:To:Message-ID:MIME-Version; b=O7MHUe0NNkZ46p2Ci3+/Cnd1wS9KO/815lRfptucOdVQPIFwpZd9mERySer175RVB y39KQMODBeQmmY2NsT/HDw3y01+qtm6dkZMKUPXo6qdsYe5j2CuINdhPrR4c/uKO/E CpcQMjiQj3ouevbysHl95r+Wnv/69df7akjxAG2f/KI/OxbU82yNPijkKP5FA33HjA 1zKzi6J3niVQtHLbUd//n9XsMhyLw8kRBGK6M+NXbFMaesEer+8GYvfv50ImxSnlS2 xoKdq7n2Vl+goqrI4RjfX8TYp+6K55GPxO8EKiCwBnR9uGc7SWVsZpRLnUA8Pa/bH9 6F8M29IXt9dmw== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id kFycq6WV28TC; Thu, 22 Nov 2018 12:29:43 -0500 (EST) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id 80169E1EF; Thu, 22 Nov 2018 12:29:43 -0500 (EST) Date: Thu, 22 Nov 2018 12:29:43 -0500 (EST) From: Mathieu Desnoyers To: Florian Weimer Cc: Rich Felker , carlos , Joseph Myers , Szabolcs Nagy , libc-alpha , Thomas Gleixner , Ben Maurer , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Will Deacon , Dave Watson , Paul Turner , linux-kernel , linux-api Message-ID: <644835950.10383.1542907783295.JavaMail.zimbra@efficios.com> In-Reply-To: <87k1l5xd33.fsf@oldenburg.str.redhat.com> References: <20181121183936.8176-1-mathieu.desnoyers@efficios.com> <20181122143603.GD23599@brightrain.aerifal.cx> <782067422.9852.1542899056778.JavaMail.zimbra@efficios.com> <20181122151444.GE23599@brightrain.aerifal.cx> <686626451.10113.1542901620250.JavaMail.zimbra@efficios.com> <87wop5xeit.fsf@oldenburg.str.redhat.com> <1045257294.10291.1542905262086.JavaMail.zimbra@efficios.com> <87k1l5xd33.fsf@oldenburg.str.redhat.com> Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.10_GA_3047 (ZimbraWebClient - FF52 (Linux)/8.8.10_GA_3041) Thread-Topic: glibc: Perform rseq(2) registration at nptl init and thread creation Thread-Index: 4oU0aXTlYLrkMjd1q0cczFwLPsKVtQ== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Nov 22, 2018, at 11:59 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >> ----- On Nov 22, 2018, at 11:28 AM, Florian Weimer fweimer@redhat.com wrote: >> >>> * Mathieu Desnoyers: >>> >>>> Here is one scenario: we have 2 early adopter libraries using rseq which >>>> are deployed in an environment with an older glibc (which does not >>>> support rseq). >>>> >>>> Of course, none of those libraries can be dlclose'd unless they somehow >>>> track all registered threads. >>> >>> Well, you can always make them NODELETE so that dlclose is not an issue. >>> If the library is small enough, that shouldn't be a problem. >> >> That's indeed what I do with lttng-ust, mainly due to use of pthread_key. >> >>> >>>> But let's focus on how exactly those libraries can handle lazily >>>> registering rseq. They can use pthread_key, and pthread_setspecific on >>>> first use by the thread to setup a destructor function to be invoked >>>> at thread exit. But each early adopter library is unaware of the >>>> other, so if we just use a "is_initialized" flag, the first destructor >>>> to run will unregister rseq while the second library may still be >>>> using it. >>> >>> I don't think you need unregistering if the memory is initial-exec TLS >>> memory. Initial-exec TLS memory is tied directly to the TCB and cannot >>> be freed while the thread is running, so it should be safe to put the >>> rseq area there even if glibc knows nothing about it. >> >> Is it true for user-supplied stacks as well ? > > I'm not entirely sure because the glibc terminology is confusing, but I > think it places intial-exec TLS into the static TLS area (so that it has > a fixed offset from the TCB). The static TLS area is placed on the > user-supplied stack. You said earlier in the email thread that user-supplied stack can be reclaimed by __free_tcb () while the thread still runs, am I correct ? If so, then we really want to unregister the rseq TLS before that. I notice that __free_tcb () calls __deallocate_stack (), which invokes _dl_deallocate_tls (). Accessing the TLS from the kernel upon preemption would appear fragile after this call. [...] >> One issue here is that early adopter libraries cannot always use >> the IE model. I tried using it for other TLS variables in lttng-ust, and >> it ended up hanging our CI tests when tracing a sample application with >> lttng-ust under a Java virtual machine: being dlopen'd in a process that >> possibly already exhausts the number of available backup TLS IE entries >> seems to have odd effects. This is why I'm worried about using the IE model >> within lttng-ust. > > You can work around this by preloading the library. I'm not sure if > this is a compelling reason not to use initial-exec TLS memory. LTTng-UST is meant to be used as a dependency for e.g. a java logger, or a python logger. Those rely on dlopen, and it would be very painful to ask all users to preload lttng-ust within their environment which is sometimes already complex. It works today through dlopen, and I consider this a user-facing behavior which I am very reluctant to break. > >>>> The same problem arises if we have an application early adopter which >>>> explicitly deal with rseq, with a library early adopter. The issue is >>>> similar, except that the application will explicitly want to unregister >>>> rseq before exiting the thread, which leaves a race window where rseq >>>> is unregistered, but the library may still need to use it. >>>> >>>> The reference counter solves this: only the last rseq user for a thread >>>> performs unregistration. >>> >>> If you do explicit unregistration, you will run into issues related to >>> destructor ordering. You should really find a way to avoid that. >> >> The per-thread reference counter is a way to avoid issues that arise from >> lack of destructor ordering. Is it an acceptable approach for you, or >> you have something else in mind ? > > Only for the involved libraries. It will not help if other TLS > destructors run and use these libraries. You bring an interesting point. The reference counter suffice to ensure that the kernel will not try to reference the TLS area beyond its registration scope, but it does not guarantee that another destructor (or a signal handler) won't try to use the rseq TLS area after it has been unregistered. Unregistration of the TLS before freeing its memory is required for correctness. However, a use-after-unregistration can be dealt with by other means. This is one of the reasons why I want to upstream the "cpu_opv" system call into Linux: this is a fallback mechanism to use when rseq cannot do forward progress (e.g. debugger single-stepping), or to use in those scenarios where rseq is not registered (early at thread creation, or late at thread exit). Moreover, it allows handling use-cases of migration of data between per-cpu data structures, which is pretty much impossible to do right if we only have rseq available. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com