Received: by 10.192.165.148 with SMTP id m20csp2030487imm; Thu, 3 May 2018 09:13:03 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoVS8tZhsHPpzNd+57A4/pfBZgalmXYlUoeHg5W3wvEQVEGiZ8EW3+MemsFURu2VDhB3uTo X-Received: by 2002:a17:902:d88a:: with SMTP id b10-v6mr18382799plz.220.1525363983914; Thu, 03 May 2018 09:13:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525363983; cv=none; d=google.com; s=arc-20160816; b=II2bIvbn1tdfjLNdXllXJrHz6yMTLF6+v8ojatzsKDze297wVB09AizjMOd9Q1tFL7 H3ce2ThmceyHSjOygP+akmcokh926DgbLWHCwzeVILDeN5C/Lc4ORpJuBvNV3MQdm2+R XM0ulvzwPMF8LaRdZq7ARL6d+mGT53D5kNiPKxpAFlPP9KkeRi1Tr0OaQFSbb0EIB8IW 79+fFzDlovAL5biujTjSROesuqpK+9MPSSD+wDVvXjltquH2HL67qxOn9FI9Pj0EN2NM 0awW61mxNJ8Wf9GIf7kQilLL8saFaKqIh79zLfQfG2brCBB5qjuo8l8ZSuwpKH+W6gEL oZQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:arc-authentication-results; bh=/UabDf4oMPlYqp/1yTLWBgJnb8kKN3eEKukgEja1n5Y=; b=J5iXKc1WH9odaYc88Vv/GQxoqnOtgX7QXWtTvkWo2Nu0OWsWei7zH2+3RPCjvQMZbO vsFvGyLinpH/595xJvR3ZQ2YJtCDWvEUq4JeLB7lbrWzR2eM3rLZ/grTNH4m4h1VIIsP dxlXEfogablwAXWiSSYecUjAsA/65zF5djUcVZOWDbp4qL2+vH+f0lOhotsi979nRIvU ktbxOIMbKFfo0fBOAAkiXEOhZWOZVtzDrSz/CqpnRJp+0gQG5n/x06XPYE/X6BUayl4h +68zPknSj3hLox7oMAZmGZEffddly1z6Kl0LX5crMO5dzTwnSWS2H+Nwlwpan/N2KLAu GCIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p66-v6si11692991pga.180.2018.05.03.09.12.49; Thu, 03 May 2018 09:13:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751359AbeECQMZ (ORCPT + 99 others); Thu, 3 May 2018 12:12:25 -0400 Received: from mail.efficios.com ([167.114.142.138]:51388 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751160AbeECQMX (ORCPT ); Thu, 3 May 2018 12:12:23 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 8E6D41BA8E6; Thu, 3 May 2018 12:12:22 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 200E7d6u600g; Thu, 3 May 2018 12:12:22 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 0A20A1BA8E3; Thu, 3 May 2018 12:12:22 -0400 (EDT) X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id klWIyW6JjbJW; Thu, 3 May 2018 12:12:21 -0400 (EDT) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id DFABD1BA8DC; Thu, 3 May 2018 12:12:21 -0400 (EDT) Date: Thu, 3 May 2018 12:12:21 -0400 (EDT) From: Mathieu Desnoyers To: Daniel Colascione Cc: Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , linux-kernel , linux-api , Paul Turner , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andrew Hunter , Andi Kleen , Chris Lameter , Ben Maurer , rostedt , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk , Joel Fernandes Message-ID: <1718748931.10084.1525363941807.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20180430224433.17407-1-mathieu.desnoyers@efficios.com> <660904075.9201.1525276988842.JavaMail.zimbra@efficios.com> Subject: Re: [RFC PATCH for 4.18 00/14] Restartable Sequences MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.8_GA_2009 (ZimbraWebClient - FF52 (Linux)/8.8.8_GA_2009) Thread-Topic: Restartable Sequences Thread-Index: 7fukaqctSGjCOecB52A5d9zYEWZiUg== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On May 2, 2018, at 12:07 PM, Daniel Colascione dancol@google.com wrote: > On Wed, May 2, 2018 at 9:03 AM Mathieu Desnoyers < > mathieu.desnoyers@efficios.com> wrote: > >> ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@google.com > wrote: >> [...] >> > >> > I think a small enhancement to rseq would let us build a perfect > userspace >> > mutex, one that spins on lock-acquire only when the lock owner is > running >> > and that sleeps otherwise, freeing userspace from both specifying ad-hoc >> > spin counts and from trying to detect situations in which spinning is >> > generally pointless. >> > >> > It'd work like this: in the per-thread rseq data structure, we'd > include a >> > description of a futex operation for the kernel would perform (in the >> > context of the preempted thread) upon preemption, immediately before >> > schedule(). If the futex operation itself sleeps, that's no problem: we >> > will have still accomplished our goal of running some other thread > instead >> > of the preempted thread. > >> Hi Daniel, > >> I agree that the problem you are aiming to solve is important. Let's see >> what prevents the proposed rseq implementation from doing what you > envision. > >> The main issue here is touching userspace immediately before schedule(). >> At that specific point, it's not possible to take a page fault. In the > proposed >> rseq implementation, we get away with it by raising a task struct flag, > and using >> it in a return to userspace notifier (where we can actually take a > fault), where >> we touch the userspace TLS area. > >> If we can find a way to solve this limitation, then the rest of your > design >> makes sense to me. > > Thanks for taking a look! > > Why couldn't we take a page fault just before schedule? The reason we can't > take a page fault in atomic context is that doing so might call schedule. > Here, we're about to call schedule _anyway_, so what harm does it do to > call something that might call schedule? If we schedule via that call, we > can skip the manual schedule we were going to perform. By the way, if we eventually find a way to enhance user-space mutexes in the fashion you describe here, it would belong to another TLS area, and would be registered by another system call than rseq. I proposed a more generic "TLS area registration" system call a few years ago, but Linus told me he wanted a system call that was specific to rseq. If we need to implement other use-cases in a TLS area shared between kernel and user-space in a similar fashion, the plan is to do it in a distinct system call. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com