Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1937968iob; Thu, 5 May 2022 11:17:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4ZRXNg2BBdQ9fksujClJetB7YDQ55ACRlZjvNuopsPMgkEymkJZuRNI95W0FLlJHiHqAR X-Received: by 2002:a17:902:bb90:b0:158:a031:2ff2 with SMTP id m16-20020a170902bb9000b00158a0312ff2mr28584733pls.117.1651774659641; Thu, 05 May 2022 11:17:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651774659; cv=none; d=google.com; s=arc-20160816; b=uygZ6NyJm1R+LiSkZ6Hhq1qUO/yjq0jwFT3thT2MK2KtrbzAHzkS2/QtTwsPO3ynX4 Xz3F+Shwu8OluiGE+eDBDSGqLkR1JXBFfMNYJxGCkCJvLFjhU1Xo3Bo52P+nGLFse0xP p6C5NDxAZ4C/Y29rXOQfF652U3+N6j5caRtcXHBuX2eQue7yaNE1vsLZ4yRNC8W2UtXp ZLg4bnVsMCCkibdMCei+xeIdTZZCeDDTKdhM2GNbe5jk5Qp6Zi3yajPBKd9vrE4PJByj vjjjerCOHijl90rUacz13dmxLJDZGl86jSI8I/l7ECPz+/wgcRLdxAxnoovNHbutv19M 0Hrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=nnLXKYzsn4Ff0H53e/iYWG4pt2fEuSs3Y0Owjo05quQ=; b=az63ga2JmgArMlVvP0n11qv+RbdWnSCVazoJdMcjlSqUyql5d0OkCcNN8hSILjKJX8 FdWLKNcESFybWYOWt1iCfh/nCJOvkCq45pQf4FgGnpsJW896XiztGaQkbYMXWd31QZvC 6l6WXntSy6DlYAf6uBe+J//np7Ai1Zgkmvx9S3H9rU0P4o0krgkTqzAl3R31bto7/xyY sJKUOABfpp+kuRuo98c9Nzpie/DtbQPyN26RmgbGG6+w9b5WRNAVNi7Zd1bfSyhy8lKb hhMqNpVhod5gx+G/d4KEfM2FTQ756IWWly4GmgUpp4RMY/2V8hjVWpULolFfWoL1OZNo C6Pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=rAsWAKrg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t190-20020a6381c7000000b003c5d888f2e5si2243501pgd.364.2022.05.05.11.17.21; Thu, 05 May 2022 11:17:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=rAsWAKrg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243744AbiEDVsM (ORCPT + 99 others); Wed, 4 May 2022 17:48:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378877AbiEDVsH (ORCPT ); Wed, 4 May 2022 17:48:07 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9970532E7 for ; Wed, 4 May 2022 14:44:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=nnLXKYzsn4Ff0H53e/iYWG4pt2fEuSs3Y0Owjo05quQ=; b=rAsWAKrg/BnsEj+sAFL9zDKNzj hycwa5qwZygumLrMwKD/S+OiNu7+Yz+Dbt4bx4+NKHGc0rOdAq/RFrexskpjkW5T9/7nsTBCwZfEM 9EvPXDvEPG0s4fDupLglTDTRiqXhukfN7f/+gDEzMNuL+roIbSVkKAHBxTBXOYCqb8yQzKulD+fF8 EDWCM4CfZ5ykkEQJOwGRWQ7He2cAmCpJJK4gUFET9kjHEGoH8Gv+gIETUY51zeuDloTrVh4ADB4VU 16/8ig2MuowRWAingduT4D+ZrOsle3Aa9WwOKxjWn9BR01RCcv3lKSIAZymkP4MRSTZpH2WIUa2l6 Hs1Q7/Pg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nmMn6-00GyNr-6H; Wed, 04 May 2022 21:44:12 +0000 Date: Wed, 4 May 2022 22:44:12 +0100 From: Matthew Wilcox To: Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long Cc: "Paul E. McKenney" , Thomas Gleixner , "Liam R. Howlett" , linux-kernel@vger.kernel.org Subject: Wait for mutex to become unlocked Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Paul, Liam and I were talking about some code we intend to write soon and realised there's a missing function in the mutex & rwsem API. We're intending to use it for an rwsem, but I think it applies equally to mutexes. The customer has a low priority task which wants to read /proc/pid/smaps of a higher priority task. Today, everything is awful; smaps acquires mmap_sem read-only, is preempted, then the high-pri task calls mmap() and the down_write(mmap_sem) blocks on the low-pri task. Then all the other threads in the high-pri task block on the mmap_sem as they take page faults because we don't want writers to starve. The approach we're looking at is to allow RCU lookup of VMAs, and then take a per-VMA rwsem for read. Because we're under RCU protection, that looks a bit like this: rcu_read_lock(); vma = vma_lookup(); if (down_read_trylock(&vma->sem)) { rcu_read_unlock(); } else { rcu_read_unlock(); down_read(&mm->mmap_sem); vma = vma_lookup(); down_read(&vma->sem); up_read(&mm->mmap_sem); } (for clarity, I've skipped the !vma checks; don't take this too literally) So this is Good. For the vast majority of cases, we avoid taking the mmap read lock and the problem will appear much less often. But we can do Better with a new API. You see, for this case, we don't actually want to acquire the mmap_sem; we're happy to spin a bit, but there's no point in spinning waiting for the writer to finish when we can sleep. I'd like to write this code: again: rcu_read_lock(); vma = vma_lookup(); if (down_read_trylock(&vma->sem)) { rcu_read_unlock(); } else { rcu_read_unlock(); rwsem_wait_read(&mm->mmap_sem); goto again; } That is, rwsem_wait_read() puts the thread on the rwsem's wait queue, and wakes it up without giving it the lock. Now this thread will never be able to block any thread that tries to acquire mmap_sem for write. Similarly, it may make sense to add rwsem_wait_write() and mutex_wait(). Perhaps also mutex_wait_killable() and mutex_wait_interruptible() (the combinatoric explosion is a bit messy; I don't know that it makes sense to do the _nested, _io variants). Does any of this make sense?