Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934346AbcJZWDt (ORCPT ); Wed, 26 Oct 2016 18:03:49 -0400 Received: from outbound-smtp06.blacknight.com ([81.17.249.39]:58115 "EHLO outbound-smtp06.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932739AbcJZWDs (ORCPT ); Wed, 26 Oct 2016 18:03:48 -0400 Date: Wed, 26 Oct 2016 23:03:39 +0100 From: Mel Gorman To: Linus Torvalds Cc: Andy Lutomirski , Andreas Gruenbacher , Peter Zijlstra , Andy Lutomirski , LKML , Bob Peterson , Steven Whitehouse , linux-mm Subject: Re: CONFIG_VMAP_STACK, on-stack struct, and wake_up_bit Message-ID: <20161026220339.GE2699@techsingularity.net> References: <20161026203158.GD2699@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4054 Lines: 107 On Wed, Oct 26, 2016 at 02:26:57PM -0700, Linus Torvalds wrote: > On Wed, Oct 26, 2016 at 1:31 PM, Mel Gorman wrote: > > > > IO wait activity is not all that matters. We hit the lock/unlock paths > > during a lot of operations like reclaim. > > I doubt we do. > > Yes, we hit the lock/unlock itself, but do we hit the *contention*? > > The current code is nasty, and always ends up touching the wait-queue > regardless of whether it needs to or not, but we have a fix for that. > To be clear, are you referring to PeterZ's patch that avoids the lookup? If so, I see your point. > With that fixed, do we actually get contention on a per-page basis? Reclaim would have to running parallel to migrations, faults, clearing write-protect etc. I can't think of a situation where a normal workload would hit it regularly and/or for long durations. > Because without contention, we'd never actually look up the wait-queue > at all. > > I suspect that without IO, it's really really hard to actually get > that contention, because things like reclaim end up looking at the LRU > queue etc wioth their own locking, so it should look at various > individual pages one at a time, not have multiple queues look at the > same page. > Except many direct reclaimers on small LRUs while a system is thrashing -- not a case that really matters, you've already lost. > >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > >> index 7f2ae99e5daf..0f088f3a2fed 100644 > >> --- a/include/linux/mmzone.h > >> +++ b/include/linux/mmzone.h > >> @@ -440,33 +440,7 @@ struct zone { > >> + int initialized; > >> > >> /* Write-intensive fields used from the page allocator */ > >> ZONE_PADDING(_pad1_) > > > > zone_is_initialized is mostly the domain of hotplug. A potential cleanup > > is to use a page flag and shrink the size of zone slightly. Nothing to > > panic over. > > I really did that to make it very obvious that there was no semantic > change. I just set the "initialized" flag in the same place where it > used to initialize the wait_table, so that this: > > >> static inline bool zone_is_initialized(struct zone *zone) > >> { > >> - return !!zone->wait_table; > >> + return zone->initialized; > >> } > > ends up being obviously equivalent. > No problem with that. > >> +#define WAIT_TABLE_BITS 8 > >> +#define WAIT_TABLE_SIZE (1 << WAIT_TABLE_BITS) > >> +static wait_queue_head_t bit_wait_table[WAIT_TABLE_SIZE] __cacheline_aligned; > >> + > >> +wait_queue_head_t *bit_waitqueue(void *word, int bit) > >> +{ > >> + const int shift = BITS_PER_LONG == 32 ? 5 : 6; > >> + unsigned long val = (unsigned long)word << shift | bit; > >> + > >> + return bit_wait_table + hash_long(val, WAIT_TABLE_BITS); > >> +} > >> +EXPORT_SYMBOL(bit_waitqueue); > >> + > > > > Minor nit that it's unfortunate this moved to the scheduler core. It > > wouldn't have been a complete disaster to add a page_waitqueue_init() or > > something similar after sched_init. > > I considered that, but decided that "minimal patch" was better. Plus, > with that bit_waitqueue() actually also being used for the page > locking queues (which act _kind of_ but not quite, like a bitlock), > the bit_wait_table is actually more core than just the bit-wait code. > > In fact, I considered just renaming it to "hashed_wait_queue", because > that's effectively how we use it now, rather than being particularly > specific to the bit-waiting. But again, that would have made the patch > bigger, which I wanted to avoid since this is a post-rc2 thing due to > the gfs2 breakage. > No objection. Shuffling it around does not make it obviously better in any way. In the meantime, a machine freed up. FWIW, it survived booting on a 2-socket and about 20 minutes of bashing on reclaim paths from multiple processes to beat on lock/unlock. I didn't do a performance comparison or gather profile data but I wouldn't expect anything interesting from profiles other than some cycles saved. -- Mel Gorman SUSE Labs