Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751431AbbHMENX (ORCPT ); Thu, 13 Aug 2015 00:13:23 -0400 Received: from mail-pd0-f178.google.com ([209.85.192.178]:35526 "EHLO mail-pd0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750713AbbHMENV (ORCPT ); Thu, 13 Aug 2015 00:13:21 -0400 Date: Wed, 12 Aug 2015 21:12:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Andrew Morton , Greg Thelen , Hugh Dickins , David Rientjes , Vlastimil Babka , "Kirill A. Shutemov" , Andrea Arcangeli , Dave Hansen , Mel Gorman , Rik van Riel , Christoph Lameter , Naoya Horiguchi , Steve Capper , "Aneesh Kumar K.V" , Johannes Weiner , Michal Hocko , Jerome Marchand , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: page-flags behavior on compound pages: a worry In-Reply-To: <20150812222136.GA15010@node.dhcp.inet.fi> Message-ID: References: <1426784902-125149-1-git-send-email-kirill.shutemov@linux.intel.com> <1426784902-125149-5-git-send-email-kirill.shutemov@linux.intel.com> <20150806153259.GA2834@node.dhcp.inet.fi> <20150812143509.GA12320@node.dhcp.inet.fi> <20150812141644.ceb541e5b52d76049339a243@linux-foundation.org> <20150812222136.GA15010@node.dhcp.inet.fi> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3476 Lines: 81 On Thu, 13 Aug 2015, Kirill A. Shutemov wrote: > > All this situation is ugly. I'm thinking on more general solution for > PageTail() vs. ->first_page race. > > We would be able to avoid the race in first place if we encode PageTail() > and position of head page within the same word in struct page. This way we > update both thing in one shot without possibility of race. > > Details get tricky. > > I'm going to try tomorrow something like this: encode the position of head > as offset from the tail page and store it as negative number in the union > with ->mapping and ->s_mem. PageTail() can be implemented as check value > of the field to be in range -1..-MAX_ORDER_NR_PAGES. > > I'm not sure at all if it's going to work, especially looking on > ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow. > > We could also try to encode page order instead (again as negative number) > and calculate head page position based on alignment... > > Any other ideas are welcome. Good luck, I've not given it any thought, but hope it works out: my reasoning was the same when I put the PageAnon bit into page->mapping instead of page->flags. Something to beware of though: although exceedingly unlikely to be a problem, page->mapping always contained a pointer to or into a relevant structure, or else something that could not possibly be a kernel pointer, when I was working on KSM swapping: see comment above get_ksm_page() in mm/ksm.c. It is best to keep page->mapping for pointers if possible (and probably avoid having the PageAnon bit set unless really Anon). I've only just read your mail, and I'm too slow a thinker to have worked through your isolate_migratepages_block() race yet. But, given the timing, cannot resist sending you a code fragment I wrote earlier today for our v3.11-based kernel: which still has compound_trans_order(), which we had been using in a similar racy physical scan. I'm not for a moment suggesting that this fragment is relevant to your race; but it is something amusing to consider when you're thinking of such races. Credit to Greg Thelen for thinking of the prep_compound_page() end of it, when I'd been focussed on the __split_huge_page_refcount() end. /* * It is not safe to use compound_lock (inside compound_trans_order) * until we have a reference on the page (okay, done above) and have * then seen PageLRU on it (just below): because mm/huge_memory.c uses * the non-atomic __SetPageUptodate on a freshly allocated THPage in * several places, believing it to be invisible to the outside world, * but liable to race and leave PG_compound_lock set when cleared here. */ nr_pages = 1; if (PageHead(page)) { /* * smp_rmb() against the smp_wmb() in the first iteration of * prep_compound_page(), so that the PageTail test ensures * that compound_order(page) is now correctly readable. */ smp_rmb(); if (PageTail(page + 1)) { nr_pages = 1 << compound_order(page); /* * Then smp_rmb() against smp_wmb() in last iteration of * __split_huge_page_refcount(), to ensure that has not * yet written something else into page[1].lru.prev. */ smp_rmb(); if (!PageTail(page + 1)) nr_pages = 1; } } Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/