Received: by 2002:a25:1bc2:0:0:0:0:0 with SMTP id b185csp9989971ybb; Mon, 9 Mar 2020 14:44:48 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvzXzSBo1904g62vKHfQK4D7kn3rmpuGwT7bEttSySb926xjINfqp8dskVFFD18iwmx9H7A X-Received: by 2002:a05:6830:210f:: with SMTP id i15mr14268525otc.335.1583790287842; Mon, 09 Mar 2020 14:44:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583790287; cv=none; d=google.com; s=arc-20160816; b=ggiAIIyqGaG9zoHUOFl3l46xjaBm43zPGl12yCSaNfevEkRxikR+MQ2HuCOd2MZjxu Az3rDD3FTXlbCEy7jDqJTYxXL19kky7KA8kCXKq+rHJ6b6jcDOVuoD8cbB+DhD7iYsX0 EM/6cyxihn0Cz7+sq1k+/W/qVnK+CmDkLUtp0n8KJYu1czeCTDjyNzUxtTzJVg8pXlho f6Sv1+lR1wTmxcrF6Xg6lmdieIGHKdX/3xW3zkV+rzbRAKvkcdlRDBkQNRzafQknl4XH +Cw8/K6Hm0q7GxzbhSJ/lL8ORkG90dtd6nkMEUDWZ2JJCasslQHZqrEdzEmj4PK53p4k x7Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:references :in-reply-to:subject:cc:date:to:from; bh=kXR8WFJySXRDqF8xlDsxnKcLYhhtuhSu8GEPjvLd04c=; b=wBQvJ8oU81e1r3nxkasb1BbPmum2aoukRE5SVH0TZYQxYiK1PNYBoVMEMroDU40Ukc V46e0vd59i2BnwaDin9URBh60vCFl8z/ChjuqONgncNUuefQtASEXemt4s+/wsHyrX/p orW0pOTlqVV1OiyhwmJcYwVHxYpAcaHo470lQnj2yva1/0vYkG8zCeJVp8ckIdHo3Zmp ob61WxSv7edxRPIRmXgLhaTyhCqNmlOmCG3ny4VsT79QtzscX/QvBCHb1qOpTUeiaVV2 A2sXfSdvAulU1RkqM2PYBtDXzelrFgP9rW2+7QUaFlUa/MXfON/tYMEF0Rh0yUwNMdJk CANw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w128si4518427oib.247.2020.03.09.14.44.33; Mon, 09 Mar 2020 14:44:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727131AbgCIVm2 (ORCPT + 99 others); Mon, 9 Mar 2020 17:42:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:46800 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726266AbgCIVm2 (ORCPT ); Mon, 9 Mar 2020 17:42:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 5CA25AC37; Mon, 9 Mar 2020 21:42:24 +0000 (UTC) From: NeilBrown To: Jeff Layton , Linus Torvalds Date: Tue, 10 Mar 2020 08:42:13 +1100 Cc: kernel test robot , yangerkun , LKML , lkp@lists.01.org, Bruce Fields , Al Viro Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression In-Reply-To: <1bfba96b4bf0d3ca9a18a2bced3ef3a2a7b44dad.camel@kernel.org> References: <20200308140314.GQ5972@shao2-debian> <34355c4fe6c3968b1f619c60d5ff2ca11a313096.camel@kernel.org> <1bfba96b4bf0d3ca9a18a2bced3ef3a2a7b44dad.camel@kernel.org> Message-ID: <87blp5urwq.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, Mar 09 2020, Jeff Layton wrote: > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote: >> On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote: >> > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton wrote: >> > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote: >> > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_o= ps due to commit: >> > >=20 >> > > This is not completely unexpected as we're banging on the global >> > > blocked_lock_lock now for every unlock. This test just thrashes file >> > > locks and unlocks without doing anything in between, so the workload >> > > looks pretty artificial [1]. >> > >=20 >> > > It would be nice to avoid the global lock in this codepath, but it >> > > doesn't look simple to do. I'll keep thinking about it, but for now = I'm >> > > inclined to ignore this result unless we see a problem in more reali= stic >> > > workloads. >> >=20 >> > That is a _huge_ regression, though. >> >=20 >> > What about something like the attached? Wouldn't that work? And make >> > the code actually match the old comment about wow "fl_blocker" being >> > NULL being special. >> >=20 >> > The old code seemed to not know about things like memory ordering eith= er. >> >=20 >> > Patch is entirely untested, but aims to have that "smp_store_release() >> > means I'm done and not going to touch it any more", making that >> > smp_load_acquire() test hopefully be valid as per the comment.. >>=20 >> Yeah, something along those lines maybe. I don't think we can use >> fl_blocker that way though, as the wait_event_interruptible is waiting >> on it to go to NULL, and the wake_up happens before fl_blocker is >> cleared. >>=20 >> Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that >> instead of testing for !fl_blocker to see whether we can avoid the >> blocked_lock_lock? >>=20=20=20 > > How about something like this instead? (untested other than for > compilation) > > Basically, this just switches the waiters over to wait for > fl_blocked_member to go empty. That still happens before the wakeup, so > it should be ok to wait on that. > > I think we can also eliminate the lockless list_empty check in > locks_delete_block, as the fl_blocker check should be sufficient now. > --=20 > Jeff Layton > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001 > From: Linus Torvalds > Date: Mon, 9 Mar 2020 14:35:43 -0400 > Subject: [PATCH] locks: reinstate locks_delete_lock optimization > > ...by using smp_load_acquire and smp_store_release to close the race > window. > > [ jlayton: wait on the fl_blocked_requests list to go empty instead of > the fl_blocker pointer to clear. Remove the list_empty check > from locks_delete_lock shortcut. ] Why do you think it is OK to remove that list_empty check? I don't think it is. There might be locked requests that need to be woken up. As the problem here is a use-after-free due to a race, one option would be to use rcu_free() on the file_lock, and hold rcu_read_lock() around test/use. Another option is to use a different lock. The fl_wait contains a spinlock, and we have wake_up_locked() which is provided for exactly these sorts of situations where the wake_up call can race with a thread waking up. So my compile-tested-only proposal is below. I can probably a proper change-log entry if you think the patch is a good way to go. NeilBrown diff --git a/fs/locks.c b/fs/locks.c index 426b55d333d5..8aa04d5ac8b3 100644 =2D-- a/fs/locks.c +++ b/fs/locks.c @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *= blocker) =20 waiter =3D list_first_entry(&blocker->fl_blocked_requests, struct file_lock, fl_blocked_member); + spin_lock(&waiter->fl_wait.lock); __locks_delete_block(waiter); if (waiter->fl_lmops && waiter->fl_lmops->lm_notify) waiter->fl_lmops->lm_notify(waiter); else =2D wake_up(&waiter->fl_wait); + wake_up_locked(&waiter->fl_wait); + spin_unlock(&waiter->fl_wait.lock); } } =20 @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter) { int status =3D -ENOENT; =20 + /* + * If fl_blocker is NULL, it won't be set again as this thread + * "owns" the lock and is the only one that might try to claim + * the lock. So it is safe to test fl_blocker locklessly. + * Also if fl_blocker is NULL, this waiter is not listed on + * fl_blocked_requests for some lock, so no other request can + * be added to the list of fl_blocked_requests for this + * request. So if fl_blocker is NULL, it is safe to + * locklessly check if fl_blocked_requests is empty. If both + * of these checks succeed, there is no need to take the lock. + * However, some other thread might have only *just* set + * fl_blocker to NULL and it about to send a wakeup on + * fl_wait, so we mustn't return too soon or we might free waiter + * before that wakeup can be sent. So take the fl_wait.lock + * to serialize with the wakeup in __locks_wake_up_blocks(). + */ + if (waiter->fl_blocker =3D=3D NULL) { + spin_lock(&waiter->fl_wait.lock); + if (waiter->fl_blocker =3D=3D NULL && + list_empty(&waiter->fl_blocked_requests)) { + spin_unlock(&waiter->fl_wait.lock); + return status; + } + spin_unlock(&waiter->fl_wait.lock); + } spin_lock(&blocked_lock_lock); if (waiter->fl_blocker) status =3D 0; --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAl5muDgACgkQOeye3VZi gbl3tQ//WG1aE9dJOm6lVwZE8QAYH86Yf2SNH9TbiZSKz9cegn/oLLipES9nk9NL qVrVKDWHEB2IYi9pQo2W9C8RMGd0sl3uDP3D/xF//72yrG5mgcuQrehvSpROJ1zu MsjrbFOC6mUfsWd+MNZ41miR/ZmxuVwHsdMqvTExkxaRszgU9zUsFkmkl4eTvB4D G1zczLrkD2GeAvIQiC/c9AoKtcM1P0KEK3Hn/xcLWtDchD9INd4R57+PZ/HMzt6Y aAlEZHxI9t/CDjS5WYRQ98zXD+mNeRpAoLN0fPVCxwgNpHjMNUj4hViwXEdQpJ5P cBXr2yFi6f6ts3mSLQT/yAHwG+30M10LS8FtRa0dgTxdMCMENIq1YkjnBt6ktd3n bfG82pZrUHK8yxQBefpsV8odkJLmo9qVuLT9vTOsUd9xDNVi9xidICrDhN8JbsVd Zt4C0sZfCLIdL4ZFTP6DSTHSQxmvsE8FD1YLYBh0SoyXYNSvQk8tmXdaHKAQvTtg HCHVMVnb0QI5kLdb/53f9Db1FyIjHpLfHaouhtOjY+mpr7jdpGWp4zmcsZNBo8UL nEwT3tG4XP+BPS8i9d5QpDfY8YdUOF40g73G897mqlaM9E3qogEbDccQMA3oz62f 56d5OKO4CF2HXcmPOpPzkVx1bimZaQyC0rRW26PqK1oGp53Pl7s= =xVa1 -----END PGP SIGNATURE----- --=-=-=--