Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751722AbbHLWVl (ORCPT ); Wed, 12 Aug 2015 18:21:41 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:37165 "EHLO mail-wi0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751617AbbHLWVj (ORCPT ); Wed, 12 Aug 2015 18:21:39 -0400 Date: Thu, 13 Aug 2015 01:21:36 +0300 From: "Kirill A. Shutemov" To: Andrew Morton Cc: Hugh Dickins , David Rientjes , Vlastimil Babka , "Kirill A. Shutemov" , Andrea Arcangeli , Dave Hansen , Mel Gorman , Rik van Riel , Christoph Lameter , Naoya Horiguchi , Steve Capper , "Aneesh Kumar K.V" , Johannes Weiner , Michal Hocko , Jerome Marchand , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: page-flags behavior on compound pages: a worry Message-ID: <20150812222136.GA15010@node.dhcp.inet.fi> References: <1426784902-125149-1-git-send-email-kirill.shutemov@linux.intel.com> <1426784902-125149-5-git-send-email-kirill.shutemov@linux.intel.com> <20150806153259.GA2834@node.dhcp.inet.fi> <20150812143509.GA12320@node.dhcp.inet.fi> <20150812141644.ceb541e5b52d76049339a243@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150812141644.ceb541e5b52d76049339a243@linux-foundation.org> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3610 Lines: 94 On Wed, Aug 12, 2015 at 02:16:44PM -0700, Andrew Morton wrote: > On Wed, 12 Aug 2015 17:35:09 +0300 "Kirill A. Shutemov" wrote: > > > On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote: > > > > IIUC, the only potentially problematic callsites left are physical memory > > > > scanners. This code requires audit. I'll do that. > > > > > > Please. > > > > I haven't finished the exercise yet. But here's an issue I believe present > > in current *Linus* tree: > > > > >From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001 > > From: "Kirill A. Shutemov" > > Date: Wed, 12 Aug 2015 17:09:16 +0300 > > Subject: [PATCH] mm: fix potential race in isolate_migratepages_block() > > > > Hugh has pointed that compound_head() call can be unsafe in some context. > > There's one example: > > > > CPU0 CPU1 > > > > isolate_migratepages_block() > > page_count() > > compound_head() > > !!PageTail() == true > > put_page() > > tail->first_page = NULL > > head = tail->first_page > > alloc_pages(__GFP_COMP) > > prep_compound_page() > > tail->first_page = head > > __SetPageTail(p); > > !!PageTail() == true > > > > > > The race is pure theoretical. I don't it's possible to trigger it in > > practice. But who knows. > > > > This can be fixed by avoiding compound_head() in unsafe context. > > This is nuts :( page_count() should Just Work without us having to > worry about bizarre races against splitting. Sigh. Split is not involved. And this race is present even for THP=n. :( > > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > * admittedly racy check. > > */ > > if (!page_mapping(page) && > > - page_count(page) > page_mapcount(page)) > > + atomic_read(&page->_count) > page_mapcount(page)) > > continue; > > If we're going to do this sort of thing, can we please do it in a more > transparent manner? Let's not sprinkle unexplained and > incomprehensible direct accesses to ->_count all over the place. > > Create a formal function to do this, with an appropriate name and with > documentation which fully explains what's going on. Then use that > here, and in has_unmovable_pages() (at least). All this situation is ugly. I'm thinking on more general solution for PageTail() vs. ->first_page race. We would be able to avoid the race in first place if we encode PageTail() and position of head page within the same word in struct page. This way we update both thing in one shot without possibility of race. Details get tricky. I'm going to try tomorrow something like this: encode the position of head as offset from the tail page and store it as negative number in the union with ->mapping and ->s_mem. PageTail() can be implemented as check value of the field to be in range -1..-MAX_ORDER_NR_PAGES. I'm not sure at all if it's going to work, especially looking on ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow. We could also try to encode page order instead (again as negative number) and calculate head page position based on alignment... Any other ideas are welcome. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/