Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756053AbcL0URm (ORCPT ); Tue, 27 Dec 2016 15:17:42 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:34892 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755038AbcL0URf (ORCPT ); Tue, 27 Dec 2016 15:17:35 -0500 MIME-Version: 1.0 In-Reply-To: References: <20161225030030.23219-1-npiggin@gmail.com> <20161225030030.23219-3-npiggin@gmail.com> <20161226111654.76ab0957@roar.ozlabs.ibm.com> <20161227211946.3770b6ce@roar.ozlabs.ibm.com> From: Linus Torvalds Date: Tue, 27 Dec 2016 12:17:34 -0800 X-Google-Sender-Auth: UfHS5kO-vKT4OHaQ2sRz-U5hpEk Message-ID: Subject: Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit To: Nicholas Piggin Cc: Dave Hansen , Bob Peterson , Linux Kernel Mailing List , Steven Whitehouse , Andrew Lutomirski , Andreas Gruenbacher , Peter Zijlstra , linux-mm , Mel Gorman Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1813 Lines: 41 On Tue, Dec 27, 2016 at 11:40 AM, Linus Torvalds wrote: > > This patch at least might have a chance in hell of working. Let's see.. Ok, with that fixed, things do indeed seem to work. And things also look fairly good on my "lots of nasty little shortlived scripts" benchmark ("make -j32 test" for git, in case people care). That benchmark used to have "unlock_page()" and "__wake_up_bit()" together using about 3% of all CPU time. Now __wake_up_bit() doesn't show up at all (ok, it's something like 0.02%, so it's technically still there, but..) and "unlock_page()" is at 0.66% of CPU time. So it's about a quarter of where it used to be. And now it's about the same cost as the "try_lock_page() that is inlined into filemap_map_pages() - it used to be that unlocking the page was much more expensive than locking it because of all the unnecessary waitqueue games. So the benchmark still does a ton of page lock/unlock action, but it doesn't stand out in the profiles as some kind of WTF thing any more. And the profiles really show that the cost is the atomic op itself rather than bad effects from bad code generation, which is what you want to see. Would I love to fix this all by not taking the page lock at all? Yes I would. I suspect we should be able to do something clever and lockless at least in theory. But in the meantime, I'm happy with where our page locking overhead is. And while I haven't seen the NUMA numbers from Dave Hansen with this all, the early testing from Dave was that the original patch from Nick already fixed the regression and was the fastest one anyway. And this optimization will only have improved on things further, although it might not be as noticeable on NUMA as it is on just a regular single socket system. Linus