Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751546AbbHMFLT (ORCPT ); Thu, 13 Aug 2015 01:11:19 -0400 Received: from mail-pd0-f172.google.com ([209.85.192.172]:35127 "EHLO mail-pd0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750717AbbHMFLS (ORCPT ); Thu, 13 Aug 2015 01:11:18 -0400 Date: Wed, 12 Aug 2015 22:10:09 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Hugh Dickins , "Kirill A. Shutemov" , Andrew Morton , Andrea Arcangeli , David Rientjes , Dave Hansen , Mel Gorman , Rik van Riel , Vlastimil Babka , Christoph Lameter , Naoya Horiguchi , Steve Capper , "Aneesh Kumar K.V" , Johannes Weiner , Michal Hocko , Jerome Marchand , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: page-flags behavior on compound pages: a worry In-Reply-To: <20150807144949.GA12177@node.dhcp.inet.fi> Message-ID: References: <1426784902-125149-1-git-send-email-kirill.shutemov@linux.intel.com> <1426784902-125149-5-git-send-email-kirill.shutemov@linux.intel.com> <20150806153259.GA2834@node.dhcp.inet.fi> <20150807144949.GA12177@node.dhcp.inet.fi> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4030 Lines: 88 On Fri, 7 Aug 2015, Kirill A. Shutemov wrote: > On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote: > > > > Oh, and I know a patchset which avoids these problems completely, > > by not using compound pages at all ;) > > BTW, I haven't heard anything about the patchset for a while. > What's the status? It's gone well, and being put into wider use here. But I'm not one for monthly updates of large patchsets myself, always too much to do; and nobody else seemed anxious to have it yet, back in March. As I said at the time of posting huge tmpfs against v3.19, it was fully working (and little changed since), but once memory pressure had disbanded a team to swap it out, there was nothing to put it together again later, to restore the original hugepage performance. I couldn't imagine people putting it into real use while that remained the case, so spent the next months adding "huge tmpfs recovery" - considered hooking into khugepaged, but settled on work item queued from fault. Which has worked out well, except that I had to rush it in before I went on vacation in June, then spent last month fixing all the concurrent hole-punching bugs Andres found with his fuzzing while I was away. Busy time, stable now; but I do want to reconsider a few rushed decisions before offering the rebased and extended set. And there's three pieces of the work not begun: The page-table allocation delay in mm/memory.c had been great for the first posting, but not good enough for recovery (replacing ptes by pmd): for the moment I skate around that by guarding with mmap_sem, but mmap_sem usually ends up regrettable, and shouldn't be necessary - there's just a lot of scattered walks to work through, adjusting them to racy replacement of ptes by pmd. Maybe I can get away without doing this for now, we seem to be working well enough without it. And I suspect that my queueing a recovery work item from fault is over eager, needs some stats and knobs to tune it down. Though not surfaced as a problem yet; and I don't think we could live with the opposite extreme, of khugepaged lumbering its way around the vmas. But the one I think I shall have to do something better about before posting, is NUMA. For a confluence of reasons, that rule out swapin readahead for now, it's not a serious issue for us as yet. But swapin readahead and NUMA have always been a joke in tmpfs, and I'll be amplifying that joke with my current NUMA placement in recovery. Unfortunately, there's a lot of opportunity to make silly mistakes when trying to get NUMA right: I doubt I can get it right, but do need to get it a little less wrong before letting others take over. > > Optimizing rmap operations in my patchset (see PG_double_map), I found > that it would be very tricky to expand team pages to anon-THP without > performance regression on rmap side due to amount of atomic ops it > requires. Thanks for thinking of it: I've been too busy with the recovery to put more thought into extending teams to anon THP, though I'd certainly like to try that once the huge tmpfs end is "complete". Yes, there's not a doubt that starting from compound pages is more rigid but should involve much less repetition; whereas starting from the other end with a team of ordinary 4k pages, more flexible but a lot of wasted effort. I can't predict where we shall meet. > > Is there any clever approach to the issue? I'd been hoping that I could implement first, and then optimize away the unnecessary; but you're right that it's easier to live with that in the pagecache case, whereas with anon THP it would be a regression. Hugh > > Team pages are probably fine for file mappings due different performance > baseline. I'm less optimistic about anon-THP. > > -- > Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/