Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp1118488ybh; Tue, 10 Mar 2020 15:09:55 -0700 (PDT) X-Google-Smtp-Source: ADFU+vt/Xx6PenYtBEjeqaA0ubZIySJMlnvUOAd7EUcxMnd5oG+LJxOTo+kon9YPIuPS6ZWYo5Dx X-Received: by 2002:aca:c70f:: with SMTP id x15mr2742950oif.80.1583878195734; Tue, 10 Mar 2020 15:09:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583878195; cv=none; d=google.com; s=arc-20160816; b=yOV+7XQ3NSCBNlcHW0F1pCr9myIVUQ4Wr7Bas28l9ZA+rdPqZRHWufKG8eePMNYJTM O1ppx+nZ8l0XuUbuFmUFTZaJQ0qDlaLN+HTF82lcOC2OcZH0mfCbFbe55HnKKy+huiYl m1zwwR4Buwp3XyJhA7Gw9IQm/IK6a+1N73JrP9y91gan8oFJv0OIJ0CpW9nHHG8DPN8m i1VIu+m+2MSh11rTEM1w4rKv2mBg6XlIp1AQyeMERSSUsHdyFEIHRSvm1uIZY+QVQSbr xZEaIiaR6ywQ6oPbew7Zc/LuQwAc1w2jEmSw12WwTM+15ekvBF8OX+X3hnhYe+ied1gE jnBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=euL5xX5AMdKba7MrmcohxYVIAeY15M0sfcdEc687FaY=; b=v7Mj24uboztdTnlzZdiYQ++vRLq0yZEm4UG6e0PIclyPx2mOVsyxeT2dBUM4JN3GpE 1GC+UMaNfk91SId9WMk/hKPviqkYx4EzFNGfrOY6rhTwntwaNxeXPH3KBDaHnW+P44D9 hOPMIFfNjvatMDkRHMFbTUQKq8SH3k4Ytn85c5R2KdHzBx+LQ5HuYUYTfiwO5KZMFWHv axxjAv9mumdiX35NyB6zt1N4C2BlUzqH7QXqjPH0Bjmc2hjMxPHugGrsVtR2MdxrmCtS zdOVOrs2ogxaxGibtl9ABIxwSt9/tlT6Sl1FfeDkZg4IKGEUQKQ5gxZe69L689TQUHV+ Pk2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=YoJuo5RO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 73si90014oii.60.2020.03.10.15.09.42; Tue, 10 Mar 2020 15:09:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=YoJuo5RO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727742AbgCJWHp (ORCPT + 99 others); Tue, 10 Mar 2020 18:07:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:49602 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726273AbgCJWHp (ORCPT ); Tue, 10 Mar 2020 18:07:45 -0400 Received: from tleilax.poochiereds.net (68-20-15-154.lightspeed.rlghnc.sbcglobal.net [68.20.15.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C0A97208E4; Tue, 10 Mar 2020 22:07:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1583878064; bh=DHAN0I29DwoZWN2Wh3iFT+l4WD2xdxIe3omBWt1StEY=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=YoJuo5ROOds6vuSyTs8LNVz977cmPul0DuRXUS8pdyBbgJ24BMedhZfWEKZCSx2J+ +OL+gkt7Qh9P6FOrO2ZKnYxkbeKhpvc8t6JNc6TXO1zrOi9gKv/dUyqccnYLWbhmc7 KVM2aE2pMO+6epG9zL6VU2ofDpVqQe/3t80b9sII= Message-ID: <0066a9f150a55c13fcc750f6e657deae4ebdef97.camel@kernel.org> Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression From: Jeff Layton To: Linus Torvalds , NeilBrown Cc: yangerkun , kernel test robot , LKML , lkp@lists.01.org, Bruce Fields , Al Viro Date: Tue, 10 Mar 2020 18:07:42 -0400 In-Reply-To: References: <20200308140314.GQ5972@shao2-debian> <34355c4fe6c3968b1f619c60d5ff2ca11a313096.camel@kernel.org> <1bfba96b4bf0d3ca9a18a2bced3ef3a2a7b44dad.camel@kernel.org> <87blp5urwq.fsf@notabene.neil.brown.name> <41c83d34ae4c166f48e7969b2b71e43a0f69028d.camel@kernel.org> <923487db2c9396c79f8e8dd4f846b2b1762635c8.camel@kernel.org> <36c58a6d07b67aac751fca27a4938dc1759d9267.camel@kernel.org> <878sk7vs8q.fsf@notabene.neil.brown.name> <875zfbvrbm.fsf@notabene.neil.brown.name> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2020-03-10 at 14:47 -0700, Linus Torvalds wrote: > On Tue, Mar 10, 2020 at 2:22 PM NeilBrown wrote: > > A compiler barrier() is probably justified. Memory barriers delay reads > > and expedite writes so they cannot be needed. > > That's not at all guaranteed. Weakly ordered memory things can > actually have odd orderings, and not just "writes delayed, reads done > early". Reads may be delayed too by cache misses, and memory barriers > can thus expedite reads as well (by forcing the missing read to happen > before later non-missing ones). > > So don't assume that a memory barrier would only delay reads and > expedite writes. Quite the reverse: assume that there is no ordering > at all unless you impose one with a memory barrier (*). > > Linus > > (*) it's a bit more complex than that, in that we do assume that > control dependencies end up gating writes, for example, but those > kinds of implicit ordering things should *not* be what you depend on > in the code unless you're doing some seriously subtle memory ordering > work and comment on it extensively. Good point. I too prefer code that's understandable by mere mortals. Given that, and the fact that Neil pointed out that yangerkun's latest patch would reintroduce the original race, I'm leaning back toward the patch Neil sent yesterday. It relies solely on spinlocks, and so doesn't have the subtle memory-ordering requirements of the others. I did some cursory testing with it and it seems to fix the performance regression. If you guys are OK with this patch, and Neil can send an updated changelog, I'll get it into -next and we can get this sorted out. Thanks, -------------------8<------------------- [PATCH] locks: reintroduce locks_delete_block shortcut --- fs/locks.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/fs/locks.c b/fs/locks.c index 426b55d333d5..8aa04d5ac8b3 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker) waiter = list_first_entry(&blocker->fl_blocked_requests, struct file_lock, fl_blocked_member); + spin_lock(&waiter->fl_wait.lock); __locks_delete_block(waiter); if (waiter->fl_lmops && waiter->fl_lmops->lm_notify) waiter->fl_lmops->lm_notify(waiter); else - wake_up(&waiter->fl_wait); + wake_up_locked(&waiter->fl_wait); + spin_unlock(&waiter->fl_wait.lock); } } @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter) { int status = -ENOENT; + /* + * If fl_blocker is NULL, it won't be set again as this thread + * "owns" the lock and is the only one that might try to claim + * the lock. So it is safe to test fl_blocker locklessly. + * Also if fl_blocker is NULL, this waiter is not listed on + * fl_blocked_requests for some lock, so no other request can + * be added to the list of fl_blocked_requests for this + * request. So if fl_blocker is NULL, it is safe to + * locklessly check if fl_blocked_requests is empty. If both + * of these checks succeed, there is no need to take the lock. + * However, some other thread might have only *just* set + * fl_blocker to NULL and it about to send a wakeup on + * fl_wait, so we mustn't return too soon or we might free waiter + * before that wakeup can be sent. So take the fl_wait.lock + * to serialize with the wakeup in __locks_wake_up_blocks(). + */ + if (waiter->fl_blocker == NULL) { + spin_lock(&waiter->fl_wait.lock); + if (waiter->fl_blocker == NULL && + list_empty(&waiter->fl_blocked_requests)) { + spin_unlock(&waiter->fl_wait.lock); + return status; + } + spin_unlock(&waiter->fl_wait.lock); + } spin_lock(&blocked_lock_lock); if (waiter->fl_blocker) status = 0; -- 2.24.1