Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933045AbbHXKSB (ORCPT ); Mon, 24 Aug 2015 06:18:01 -0400 Received: from mail-wi0-f170.google.com ([209.85.212.170]:35549 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932457AbbHXKR7 (ORCPT ); Mon, 24 Aug 2015 06:17:59 -0400 Date: Mon, 24 Aug 2015 13:17:56 +0300 From: "Kirill A. Shutemov" To: "Kirill A. Shutemov" Cc: Andrew Morton , Hugh Dickins , Andrea Arcangeli , Dave Hansen , Vlastimil Babka , Johannes Weiner , Michal Hocko , David Rientjes , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv3 4/5] mm: make compound_head() robust Message-ID: <20150824101755.GA2370@node.dhcp.inet.fi> References: <1439976106-137226-1-git-send-email-kirill.shutemov@linux.intel.com> <1439976106-137226-5-git-send-email-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1439976106-137226-5-git-send-email-kirill.shutemov@linux.intel.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2719 Lines: 79 On Wed, Aug 19, 2015 at 12:21:45PM +0300, Kirill A. Shutemov wrote: > Hugh has pointed that compound_head() call can be unsafe in some > context. There's one example: > > CPU0 CPU1 > > isolate_migratepages_block() > page_count() > compound_head() > !!PageTail() == true > put_page() > tail->first_page = NULL > head = tail->first_page > alloc_pages(__GFP_COMP) > prep_compound_page() > tail->first_page = head > __SetPageTail(p); > !!PageTail() == true > > > The race is pure theoretical. I don't it's possible to trigger it in > practice. But who knows. > > We can fix the race by changing how encode PageTail() and compound_head() > within struct page to be able to update them in one shot. > > The patch introduces page->compound_head into third double word block in > front of compound_dtor and compound_order. That means it shares storage > space with: > > - page->lru.next; > - page->next; > - page->rcu_head.next; > - page->pmd_huge_pte; > > That's too long list to be absolutely sure, but looks like nobody uses > bit 0 of the word. It can be used to encode PageTail(). And if the bit > set, rest of the word is pointer to head page. > > Signed-off-by: Kirill A. Shutemov > Acked-by: Michal Hocko > Cc: Hugh Dickins > Cc: David Rientjes > Cc: Vlastimil Babka If DEFERRED_STRUCT_PAGE_INIT=n, combining this patchset with my page-flags patches causes oops in SetPageReserved() called from reserve_bootmem_region(). It happens because we haven't yet initilized the word in struct page and PageTail() inside SetPageReserved() can give false-positive, which leads to bogus compound_head() result. IIUC, we initialize the word only on first allocation of the page. It can be too late: pfn scanner can see false-positive PageTail() from not yet allocated pages too. Here's fixlet for patch to address the issue. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 347724850665..d0e3fca830f8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -892,6 +892,8 @@ static void init_reserved_page(unsigned long pfn) #else static inline void init_reserved_page(unsigned long pfn) { + /* Avoid false-positive PageTail() */ + INIT_LIST_HEAD(&pfn_to_page(pfn)->lru); } #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/