Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp3662816iob; Sat, 7 May 2022 11:12:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzcMut1qgO2tVK9n2GkQ111p2IC+blLcw76yc5NEXay7IFjorsBOkZkdsI0QO1rbSvuYecd X-Received: by 2002:a17:90a:d593:b0:1d9:2bc9:f1a6 with SMTP id v19-20020a17090ad59300b001d92bc9f1a6mr10226944pju.207.1651947127603; Sat, 07 May 2022 11:12:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651947127; cv=none; d=google.com; s=arc-20160816; b=1B6rhCeD+u7J8t1GRJIoQO//PHgNZKXEvgjl8Z4fyHLNApNxq5leKTVUyBC8YMVYFf 5tYtmxitr+8n/NCbDNJuxMLZ10wE6uCBrxvywon2msyVS+8piXxxo0cRCEwW2ly/Dvj8 O8kAgyKsmj1r5f1U/Bmw7QSIco3HCH6PVLQJDwRpXz835Ua1SFUMIA4ChKh4yDDekaBm zaNbg7bbWm88YGPktmG3+1TqdHqdqILWYDxG34Y3tiqX0rSiRvDpGawlicDOIRvYWtyN KRP0/HQlvacPhNTDJaPRPvnsBq/hGfCyEKDPGUPK/MYQxWdmfzh0ce/3Y4lnYngFYn32 +yzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=hjXZjoXRwWi27O+yncPytNpTaOYs3QPDkObrlrwm7sg=; b=sIU1lGOMTx1waujvDJ+CHJlieSs9zT2L+vDM3rV6iz6KMOGGORFnmpjfxzNLHr6No1 vW+eBVPN/PjbYAKt6ZKWfaX1Jfr69ymiKwQk1fsM9yPe2fScMpBldRMeMoGYUcQ1vuSx PFuTgu6LEFPAZoFlX6Vo/5Zq+w5JZp6HjDuEioQSA+5k9tHMlHtJP2WiNWO3ATZ9h0OX sWzvyeYCdzGVyXN/KowxdpCChaGlsqdoYseKZhpF/tfvX/Q+7sRn8f5Px4y6Li/oTQ9D 0sw0Y5gJ/Ts2dq0Lo3ILnlQwDOqNycs1d11krlv/yHE5eTPEB3+bx/dph6DDtITZY9Ek AHZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=afyM6DfH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j63-20020a638042000000b003c1c9054ddasi7385605pgd.275.2022.05.07.11.11.50; Sat, 07 May 2022 11:12:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=afyM6DfH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235080AbiEEBVr (ORCPT + 99 others); Wed, 4 May 2022 21:21:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230176AbiEEBVq (ORCPT ); Wed, 4 May 2022 21:21:46 -0400 X-Greylist: delayed 306 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 04 May 2022 18:18:08 PDT Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [170.10.133.74]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3ABD254F8C for ; Wed, 4 May 2022 18:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651713483; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hjXZjoXRwWi27O+yncPytNpTaOYs3QPDkObrlrwm7sg=; b=afyM6DfHula07uikirOgZfUYYWoovBX3OZ3TV7UqgMZr1KjIMrYYkZVQLTLRedQKqv9apT FVL9ZFkVSEAzQtQxGkkRVk1NdZycEStHP0YzvqSetuS4RBTj2hfHw2c8hnyoByzB0BGoDA UTjitouSiriuiDJrsfx0muIfUSCahOo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-79-u3D2452pOgiHzviR_wT0RQ-1; Wed, 04 May 2022 21:11:50 -0400 X-MC-Unique: u3D2452pOgiHzviR_wT0RQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 14EF485A5BC; Thu, 5 May 2022 01:11:50 +0000 (UTC) Received: from [10.22.16.87] (unknown [10.22.16.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id B7D5743E779; Thu, 5 May 2022 01:11:49 +0000 (UTC) Message-ID: Date: Wed, 4 May 2022 21:11:49 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: Wait for mutex to become unlocked Content-Language: en-US To: Matthew Wilcox , Peter Zijlstra , Ingo Molnar , Will Deacon Cc: "Paul E. McKenney" , Thomas Gleixner , "Liam R. Howlett" , linux-kernel@vger.kernel.org References: From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/4/22 17:44, Matthew Wilcox wrote: > Paul, Liam and I were talking about some code we intend to write soon > and realised there's a missing function in the mutex & rwsem API. > We're intending to use it for an rwsem, but I think it applies equally > to mutexes. > > The customer has a low priority task which wants to read /proc/pid/smaps > of a higher priority task. Today, everything is awful; smaps acquires > mmap_sem read-only, is preempted, then the high-pri task calls mmap() > and the down_write(mmap_sem) blocks on the low-pri task. Then all the > other threads in the high-pri task block on the mmap_sem as they take > page faults because we don't want writers to starve. > > The approach we're looking at is to allow RCU lookup of VMAs, and then > take a per-VMA rwsem for read. Because we're under RCU protection, > that looks a bit like this: > > rcu_read_lock(); > vma = vma_lookup(); > if (down_read_trylock(&vma->sem)) { > rcu_read_unlock(); > } else { > rcu_read_unlock(); > down_read(&mm->mmap_sem); > vma = vma_lookup(); > down_read(&vma->sem); > up_read(&mm->mmap_sem); > } > > (for clarity, I've skipped the !vma checks; don't take this too literally) > > So this is Good. For the vast majority of cases, we avoid taking the > mmap read lock and the problem will appear much less often. But we can > do Better with a new API. You see, for this case, we don't actually > want to acquire the mmap_sem; we're happy to spin a bit, but there's no > point in spinning waiting for the writer to finish when we can sleep. > I'd like to write this code: > > again: > rcu_read_lock(); > vma = vma_lookup(); > if (down_read_trylock(&vma->sem)) { > rcu_read_unlock(); > } else { > rcu_read_unlock(); > rwsem_wait_read(&mm->mmap_sem); > goto again; > } > > That is, rwsem_wait_read() puts the thread on the rwsem's wait queue, > and wakes it up without giving it the lock. Now this thread will never > be able to block any thread that tries to acquire mmap_sem for write. I suppose that a writer that needs to take a write lock on vma->sem will have to take a write lock on mmap_sem first, then it makes sense to me that you want to wait for all the vma->sem writers to finish by waiting on the wait queue of mmap_sem. By the time the waiting task is being woken up, there is no active write lock on the vma->sem and hopefully by the time the waiting process wakes up and do a down_read_trylock(), it will succeed. However, the time gap in the wakeup process may have another writer coming in taking the vma->sem write lock. It improves the chance of a successful trylock but it is not guaranteed. So you will need a retry count and revert back to a direct down_read() when there are too many retries. Since the waiting process isn't taking any lock, the name rwsem_wait_read() may be somewhat misleading. I think a better name may be rwsem_flush_waiters(). So do you want to flush the waiters at the point this API is called or you want to wait until the wait queue is empty? Cheers, Longman