Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp279759imm; Tue, 7 Aug 2018 18:54:08 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzvmzXxjRgw8M2P3FRTGqEu5lkupEODLVLrTzXVR76Py44v6VZzUgZdNsgpCXgtzc1/W/iU X-Received: by 2002:a63:1d22:: with SMTP id d34-v6mr630490pgd.133.1533693248219; Tue, 07 Aug 2018 18:54:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533693248; cv=none; d=google.com; s=arc-20160816; b=lndIjZKEuxk53Zl9ZhZ7gi8cSwakK8dt4YHdy8nSevF2ymTMhat0jhbIWNbGLe/stH M7wPfAkPDeRCC6f2i++BVzwaDgaucRoPc+koyKBpiVISDZWxYK9lHk2h71twRI+NlmM0 kWKEZv8t9mQWgAtVsRoaa6EAy5f/7Ix5MKxEbV/TWVG2IzAWeIhv3B+O0jxWJGC/IyE0 EhiLeKX9JIoLUVABBEXgAxjBRYs1WC3duWtNH5c3DoWOfLK10vyEw4rg0+ndHWr1AeMS gavRMUr9jOLgMPG89Bj7JbFvG0S7y84hs3Slng27BZHN1tVMLLLDlJXjaBYG4a64YODM eodA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:cc:cc:subject:date:to :from:arc-authentication-results; bh=+0YdfOckN2oBdEspAV5KX6ZnnToN64mVGqoqgPSGWMI=; b=MiD0gVIWvoTtIuXs9YXXg99YeJ6pBRwc8+RUMwebJF/5SQGW8g/nqFEtaJiLtyoo/8 b+LuSwG4eC4QkVWpY0l3kpDxjMu4+FNUTK9xTk5XeyvqozhaRfGXNfmYyB/XfplsvcE4 kU/QJdzWw5qTe5xVWmDLCVDb+7iyz63TJzc8g2h4SFfXoiQFSmoABnAKe/N12MbWKMG9 jLFyNeGlrsihZcNs+SG9LvnygLjrXbJSYgwmoOgHPNHWlLJvf1yRGzgAlD9+Z+Sq+/lv KuT5l3ZK/06xEXKIrC55WXMozJKwsP4nysZNWmTW90q6r2bHCo+a2ux9HNe5/56FkJIs F2LA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e4-v6si2203480plb.400.2018.08.07.18.53.53; Tue, 07 Aug 2018 18:54:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727229AbeHHEJs (ORCPT + 99 others); Wed, 8 Aug 2018 00:09:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:59248 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726245AbeHHEJr (ORCPT ); Wed, 8 Aug 2018 00:09:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "Cc" Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 12468AF4D; Wed, 8 Aug 2018 01:52:32 +0000 (UTC) From: NeilBrown To: Jeff Layton , Alexander Viro Date: Wed, 08 Aug 2018 11:51:08 +1000 Subject: [PATCH 4/4] fs/locks: create a tree of dependent requests. Cc: "J. Bruce Fields" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Martin Wilck Message-ID: <153369306798.12605.11900283893787605168.stgit@noble> In-Reply-To: <153369219467.12605.13472423449508444601.stgit@noble> References: <153369219467.12605.13472423449508444601.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When we find an existing lock which conflicts with a request, and the request wants to wait, we currently add the request to a list. When the lock is removed, the whole list is woken. This can cause the thundering-herd problem. To reduce the problem, we make use of the (new) fact that a pending request can itself have a list of blocked requests. When we find a conflict, we look through the existing blocked requests. If any one of them blocks the new request, the new request is attached below that request. This way, when the lock is released, only a set of non-conflicting locks will be woken. The rest of the herd can stay asleep. Reported-and-tested-by: Martin Wilck Signed-off-by: NeilBrown --- fs/locks.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index aaa55925c788..69a30421218b 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -723,11 +723,24 @@ static void locks_delete_block(struct file_lock *waiter) * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring * that the flc_lock is also held on insertions we can avoid taking the * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. + * + * Rather than just adding to the list, we check for conflicts with any existing + * waiter, and add to that waiter instead. + * Thus wakeups don't happen until needed. */ static void __locks_insert_block(struct file_lock *blocker, - struct file_lock *waiter) + struct file_lock *waiter, + bool conflict(struct file_lock *, + struct file_lock *)) { + struct file_lock *fl; BUG_ON(!list_empty(&waiter->fl_block)); +new_blocker: + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) + if (conflict(fl, waiter)) { + blocker = fl; + goto new_blocker; + } waiter->fl_blocker = blocker; list_add_tail(&waiter->fl_block, &blocker->fl_blocked); if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) @@ -736,10 +749,12 @@ static void __locks_insert_block(struct file_lock *blocker, /* Must be called with flc_lock held. */ static void locks_insert_block(struct file_lock *blocker, - struct file_lock *waiter) + struct file_lock *waiter, + bool conflict(struct file_lock *, + struct file_lock *)) { spin_lock(&blocked_lock_lock); - __locks_insert_block(blocker, waiter); + __locks_insert_block(blocker, waiter, conflict); spin_unlock(&blocked_lock_lock); } @@ -995,7 +1010,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) if (!(request->fl_flags & FL_SLEEP)) goto out; error = FILE_LOCK_DEFERRED; - locks_insert_block(fl, request); + locks_insert_block(fl, request, flock_locks_conflict); goto out; } if (request->fl_flags & FL_ACCESS) @@ -1069,7 +1084,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, spin_lock(&blocked_lock_lock); if (likely(!posix_locks_deadlock(request, fl))) { error = FILE_LOCK_DEFERRED; - __locks_insert_block(fl, request); + __locks_insert_block(fl, request, + posix_locks_conflict); } spin_unlock(&blocked_lock_lock); goto out; @@ -1542,7 +1558,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) break_time -= jiffies; if (break_time == 0) break_time++; - locks_insert_block(fl, new_fl); + locks_insert_block(fl, new_fl, leases_conflict); trace_break_lease_block(inode, new_fl); spin_unlock(&ctx->flc_lock); percpu_up_read_preempt_enable(&file_rwsem);