Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757810AbcLTNVn (ORCPT ); Tue, 20 Dec 2016 08:21:43 -0500 Received: from mail-pg0-f66.google.com ([74.125.83.66]:34758 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752426AbcLTNVl (ORCPT ); Tue, 20 Dec 2016 08:21:41 -0500 Date: Tue, 20 Dec 2016 23:21:22 +1000 From: Nicholas Piggin To: Mel Gorman Cc: Dave Hansen , Linus Torvalds , Bob Peterson , Linux Kernel Mailing List , swhiteho@redhat.com, luto@kernel.org, agruenba@redhat.com, peterz@infradead.org, linux-mm@kvack.org Subject: Re: [RFC][PATCH] make global bitlock waitqueues per-node Message-ID: <20161220232122.62c8196e@roar.ozlabs.ibm.com> In-Reply-To: <20161220125825.hfwyzy2mzc4lna7x@techsingularity.net> References: <20161219225826.F8CB356F@viggo.jf.intel.com> <156a5b34-ad3b-d0aa-83c9-109b366c1bdf@linux.intel.com> <20161220123113.1e1de7b0@roar.ozlabs.ibm.com> <20161220125825.hfwyzy2mzc4lna7x@techsingularity.net> Organization: IBM X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3584 Lines: 86 On Tue, 20 Dec 2016 12:58:25 +0000 Mel Gorman wrote: > On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote: > > On Mon, 19 Dec 2016 16:20:05 -0800 > > Dave Hansen wrote: > > > > > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > > > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > > +{ > > > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); > > > > + > > > > + return __bit_waitqueue(word, bit, nid); > > > > > > > > No can do. Part of the problem with the old coffee was that it did that > > > > virt_to_page() crud. That doesn't work with the virtually mapped stack. > > > > > > Ahhh, got it. > > > > > > So, what did you have in mind? Just redirect bit_waitqueue() to the > > > "first_online_node" waitqueues? > > > > > > wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > { > > > return __bit_waitqueue(word, bit, first_online_node); > > > } > > > > > > We could do some fancy stuff like only do virt_to_page() for things in > > > the linear map, but I'm not sure we'll see much of a gain for it. None > > > of the other waitqueue users look as pathological as the 'struct page' > > > ones. Maybe: > > > > > > wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > { > > > int nid > > > if (word >= VMALLOC_START) /* all addrs not in linear map */ > > > nid = first_online_node; > > > else > > > nid = page_to_nid(virt_to_page(word)); > > > return __bit_waitqueue(word, bit, nid); > > > } > > > > I think he meant just make the page_waitqueue do the per-node thing > > and leave bit_waitqueue as the global bit. > > > > I'm pressed for time but at a glance, that might require a separate > structure of wait_queues for page waitqueue. Most users of bit_waitqueue > are not operating with pages. The first user is based on a word inside > a block_device for example. All non-page users could assume node-0. Yes it would require something or other like that. Trivial to keep things balanced (if not local) over nodes by take a simple hash of the virtual address to spread over the nodes. Or just keep using this separate global table for the bit_waitqueue... But before Linus grumps at me again, let's try to do the waitqueue avoidance bit first before we worry about that :) > It > shrinks the available hash table space but as before, maybe collisions > are not common enough to actually matter. That would be worth checking > out. Alternatively, careful auditing to pick a node when it's known it's > safe to call virt_to_page may work but it would be fragile. > > Unfortunately I won't be able to review or test any patches until January > 3rd after I'm back online properly. Right now, I have intermittent internet > access at best. During the next 4 days, I know I definitely will not have > any internet access. > > The last time around, there were three patch sets to avoid the overhead for > pages in particular. One was dropped (mine, based on Nick's old work) as > it was too complicated. Peter had some patches but after enough hammering > it failed due to a missed wakup that I didn't pin down before having to > travel to a conference. > > I hadn't tested Nick's prototype although it looked fine because others > reviewed it before I looked and I was waiting for another version to > appear. If one appears, I'll take a closer look and bash it across a few > machines to see if it has any lost wakeup problems. > Sure I'll respin it this week. Thanks, Nick