Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757334AbcLUTCW (ORCPT ); Wed, 21 Dec 2016 14:02:22 -0500 Received: from mail-pg0-f68.google.com ([74.125.83.68]:36412 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752058AbcLUTCV (ORCPT ); Wed, 21 Dec 2016 14:02:21 -0500 Date: Thu, 22 Dec 2016 05:01:30 +1000 From: Nicholas Piggin To: Linus Torvalds Cc: Peter Zijlstra , Dave Hansen , Bob Peterson , Linux Kernel Mailing List , Steven Whitehouse , Andrew Lutomirski , Andreas Gruenbacher , Mel Gorman , linux-mm , Hugh Dickins Subject: Re: [RFC][PATCH] make global bitlock waitqueues per-node Message-ID: <20161222050130.49d93982@roar.ozlabs.ibm.com> In-Reply-To: <20161222043331.31aab9cc@roar.ozlabs.ibm.com> References: <20161219225826.F8CB356F@viggo.jf.intel.com> <156a5b34-ad3b-d0aa-83c9-109b366c1bdf@linux.intel.com> <20161221080931.GQ3124@twins.programming.kicks-ass.net> <20161221083247.GW3174@twins.programming.kicks-ass.net> <20161222043331.31aab9cc@roar.ozlabs.ibm.com> Organization: IBM X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2143 Lines: 54 On Thu, 22 Dec 2016 04:33:31 +1000 Nicholas Piggin wrote: > On Wed, 21 Dec 2016 10:02:27 -0800 > Linus Torvalds wrote: > > > I do think your approach of just re-using the existing bit waiting > > with just a page-specific waiting function is nicer than Nick's "let's > > just roll new waiting functions" approach. It also avoids the extra > > initcall. > > > > Nick, comments? > > Well yes we should take my patch 1 and use the new bit for this > purpose regardless of what way we go with patch 2. I'll reply to > that in the other mail. Actually when I hit send, I thought your next mail was addressing a different subject. So back here. Peter's patch is less code and in that regard a bit nicer. I tried going that way once, but I just thought it was a bit too sloppy to do nicely with wait bit APIs. - The page can be added to waitqueue without PageWaiters being set. This is transient condition where the lock is retested, but it remains that PageWaiters is not quite the same as waitqueue_active to some degree. - This set + retest means every time a page gets a waiter, the cost is 2 test-and-set for the lock bit plus 2 spin_lock+spin_unlock for the waitqueue. - Setting PageWaiters is done outside the waitqueue lock, so you also have a new interleavings to think about versus clearing the bit. - It fails to clear up the bit and return to fastpath when there are hash collisions. Yes I know this is a rare case and on average it probably does not matter. But jitter is important, but also we really *want* to keep the waitqueue table small and lean like you have made it if possible. None of this 100KB per zone crap -- I do want to keep it small and tolerating collisions better would help that. Anyway that's about my 2c. Keep in mind Mel just said he might have seen a lockup with Peter's patch, and mine has not been hugely tested either, so let's wait for a bit more testing before merging either. Although we could start pipelining the process by merging patch 1 if Hugh acks it (cc'ed), then I'll resend with SOB and Ack. Thanks, Nick