Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1361053imm; Wed, 8 Aug 2018 15:51:20 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwgyoWQ20/zFy9Ea3kxropjxo8GshgeGreUAeTVbquQp4qlVjJttlsELZiRF09PeYtjZYQg X-Received: by 2002:a17:902:a713:: with SMTP id w19-v6mr4182197plq.271.1533768680229; Wed, 08 Aug 2018 15:51:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533768680; cv=none; d=google.com; s=arc-20160816; b=VOhOp08BDSt3GCXCz5lqKDju35Xhd35ZJZl1V0EZZCNd1LzdxCLbztpvup7/nYMXou E/8fe5DnH6inkvF/OUDDC17jrm6nPK2DB2yOSfYTb2cDZH4qKI7V1bxzvc81KfD1UMsr 7WR3SUS5a1jNcgx5eNKQoe9kTOJs4ge5MEfNszAaI+XoYTxNRLivVzp65bW1JjtNC/+p jgZgzN1AxwELjUejh5mpyB9/QYDMcFe2FHq0q9lTJvTwDOMGFg9EhbEjI1AuMRN49e02 6Njw/Hogviqic4AjPe7p0YL+3ueqqIgRjQBIY+sIJ6wGW6MiuSdIiIwZ87BzWdkAlHcu 5QEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :dkim-signature:arc-authentication-results; bh=ajI738PULE+ShRSk2QI5N5hnfb3KCwF9ZJKMpn8dSTI=; b=hLQsv6/VTo8ChHSZKokjuTmxKx829pjLAybxAA6WnkfhETu1ggw0+Ss7B2yIFqOS7h hEPG7JvXf9T2PSDIWkRRdC3Ep+9nq8mdNzVwMhPFYsn0tB+tLGPEDSVqSX6nPFSoP2Eg f2MoSsIlaj6HveBtiL8gDgaou3UJ7E7dC4wsrQ+gpI8xw6Dckv+BxgZ8Kf4IfKZb5WJc PB8tRUSIFGusJ2sphndrlGIdsQ2PSOJDbfQRXbDYHXRouGomNeHc97qJ4sbXf5C58BG9 jFkZoCx8LrbgJuFW5d5e2xCe5NI/7w5/i4MPaErXVknB9M2GyuNO2qTUqsrR8TxjrTvV Nxiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PAurSwU2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g15-v6si4339915plo.284.2018.08.08.15.51.05; Wed, 08 Aug 2018 15:51:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PAurSwU2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731148AbeHIBL5 (ORCPT + 99 others); Wed, 8 Aug 2018 21:11:57 -0400 Received: from mail.kernel.org ([198.145.29.99]:59380 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727530AbeHIBL5 (ORCPT ); Wed, 8 Aug 2018 21:11:57 -0400 Received: from tleilax.poochiereds.net (cpe-71-70-156-158.nc.res.rr.com [71.70.156.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9D68221A86; Wed, 8 Aug 2018 22:50:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533768609; bh=BJPE6ZLt0v2ZuoQxn8+/w0W5p/nDn1Jg+smoOjFnCGg=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=PAurSwU2l/rMH6P0OM3ahhQ9jvPMihnxWVLEM1lRxieuvHpLra5jZ8ZikPP5VV2uI do2SDTMlMKCqOhVG9p2FjUArir0+Ic39CdmZnT9LX/bQqWK8xMQNMM6T7HSEMSVF9m HK66nO/Z1sCHjj7VNTAyKaS5usQaZx+gz2YdqbbI= Message-ID: <04ffa27c29d2bff8bd9cb9b6d4ea6b6fd3969b6c.camel@kernel.org> Subject: Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups From: Jeff Layton To: "J. Bruce Fields" , NeilBrown Cc: Alexander Viro , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Martin Wilck Date: Wed, 08 Aug 2018 18:50:06 -0400 In-Reply-To: <20180808212832.GF23873@fieldses.org> References: <153369219467.12605.13472423449508444601.stgit@noble> <20180808195445.GD23873@fieldses.org> <20180808200912.GE23873@fieldses.org> <20180808212832.GF23873@fieldses.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-1.fc28) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2018-08-08 at 17:28 -0400, J. Bruce Fields wrote: > On Wed, Aug 08, 2018 at 04:09:12PM -0400, J. Bruce Fields wrote: > > On Wed, Aug 08, 2018 at 03:54:45PM -0400, J. Bruce Fields wrote: > > > On Wed, Aug 08, 2018 at 11:51:07AM +1000, NeilBrown wrote: > > > > If you have a many-core machine, and have many threads all wanting to > > > > briefly lock a give file (udev is known to do this), you can get quite > > > > poor performance. > > > > > > > > When one thread releases a lock, it wakes up all other threads that > > > > are waiting (classic thundering-herd) - one will get the lock and the > > > > others go to sleep. > > > > When you have few cores, this is not very noticeable: by the time the > > > > 4th or 5th thread gets enough CPU time to try to claim the lock, the > > > > earlier threads have claimed it, done what was needed, and released. > > > > With 50+ cores, the contention can easily be measured. > > > > > > > > This patchset creates a tree of pending lock request in which siblings > > > > don't conflict and each lock request does conflict with its parent. > > > > When a lock is released, only requests which don't conflict with each > > > > other a woken. > > > > > > Are you sure you aren't depending on the (incorrect) assumption that "X > > > blocks Y" is a transitive relation? > > > > > > OK I should be able to answer that question myself, my patience for > > > code-reading is at a real low this afternoon.... > > > > In other words, is there the possibility of a tree of, say, exclusive > > locks with (offset, length) like: > > > > (0, 2) waiting on (1, 2) waiting on (2, 2) waiting on (0, 4) > > > > and when waking (0, 4) you could wake up (2, 2) but not (0, 2), leaving > > a process waiting without there being an actual conflict. > > After batting it back and forth with Jeff on IRC.... So do I understand > right that when we wake a waiter, we leave its own tree of waiters > intact, and when it wakes if it finds a conflict it just adds it lock > (with tree of waiters) in to the tree of the conflicting lock? > > If so then yes I think that depends on the transitivity > assumption--you're assuming that finding a conflict between the root of > the tree and a lock proves that all the other members of the tree also > conflict. > > So maybe this example works. (All locks are exclusive and written > (offset, length), X->Y means X is waiting on Y.) > > process acquires (0,3) > 2nd process requests (1,2), is put to sleep. > 3rd process requests (0,2), is put to sleep. > > The tree of waiters now looks like (0,2)->(1,2)->(0,3) > > (0,3) is unlocked. > A 4th process races in and locks (2,2). > The 2nd process wakes up, sees this new conflict, and waits on > (2,2). Now the tree looks like (0,2)->(1,2)->(2,2), and (0,2) > is waiting for no reason. > That seems like a legit problem. One possible fix might be to have the waiter on (1,2) walk down the entire subtree and wake up any waiter that is waiting on a lock that doesn't conflict with the lock on which it's waiting. So, before the task waiting on 1,2 goes back to sleep to wait on 2,2, it could walk down its entire fl_blocked subtree and wake up anything waiting on a lock that doesn't conflict with (2,2). That's potentially an expensive operation, but: a) the task is going back to sleep anyway, so letting it do a little extra work before that should be no big deal b) it's probably still cheaper than waking up the whole herd -- Jeff Layton