Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754583AbaDNOmi (ORCPT ); Mon, 14 Apr 2014 10:42:38 -0400 Received: from mta-out.inet.fi ([195.156.147.13]:48627 "EHLO jenni2.inet.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751151AbaDNOmg (ORCPT ); Mon, 14 Apr 2014 10:42:36 -0400 Date: Mon, 14 Apr 2014 17:42:18 +0300 From: "Kirill A. Shutemov" To: Andrea Arcangeli , Rik van Riel Cc: Michel Lespinasse , Sasha Levin , Andrew Morton , Dave Jones , "linux-mm@kvack.org" , LKML , "Kirill A. Shutemov" , Vlastimil Babka , Bob Liu Subject: Re: mm: kernel BUG at mm/huge_memory.c:1829! Message-ID: <20140414144218.GA26515@node.dhcp.inet.fi> References: <53440991.9090001@oracle.com> <20140410102527.GA24111@node.dhcp.inet.fi> <20140410134436.GA25933@node.dhcp.inet.fi> <20140410162750.GD2749@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140410162750.GD2749@redhat.com> User-Agent: Mutt/1.5.22.1-rc1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 10, 2014 at 06:27:50PM +0200, Andrea Arcangeli wrote: > Hi, > > On Thu, Apr 10, 2014 at 04:44:36PM +0300, Kirill A. Shutemov wrote: > > Okay, below is my attempt to fix the bug. I'm not entirely sure it's > > correct. Andrea, could you take a look? > > The possibility the interval tree implicitly broke the walk order of > the anon_vma list didn't cross my mind, that's very good catch! > Breakage of the rmap walk order definitely can explain that BUG_ON in > split_huge_page that signals a pte was missed by the rmap walk. I've spent few day trying to understand rmap code. And now I think my patch is wrong. I actually don't see where walk order requirement comes from. It seems all operations (insert, remove, foreach) on anon_vma is serialized with anon_vma->root->rwsem. Andrea, could you explain this for me? The actual bug was introduced by me with split PMD page table lock patchset. The patch below should fix it. Please review. It also means it can't be the root cause of other bug report[1] since split PMD lock was introduced in v3.13 and bug report is about v3.8. [1] https://bugzilla.redhat.com/show_bug.cgi?id=923817 >From 8e0095daf2f0fac113c2e98369c7fc9ff4cde484 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Mon, 14 Apr 2014 16:05:33 +0300 Subject: [PATCH] thp: close race between split and zap huge pages Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have the same root cause. Let's look to them one by one. The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!". It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page(). >From my testing I see that page_mapcount() is higher than mapcount here. I think it happens due to race between zap_huge_pmd() and page_check_address_pmd(). page_check_address_pmd() misses PMD which is under zap: CPU0 CPU1 zap_huge_pmd() pmdp_get_and_clear() __split_huge_page() anon_vma_interval_tree_foreach() __split_huge_page_splitting() page_check_address_pmd() mm_find_pmd() /* * We check if PMD present without taking ptl: no * serialization against zap_huge_pmd(). We miss this PMD, * it's not accounted to 'mapcount' in __split_huge_page(). */ pmd_present(pmd) == 0 BUG_ON(mapcount != page_mapcount(page)) // CRASH!!! page_remove_rmap(page) atomic_add_negative(-1, &page->_mapcount) The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!". It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd(). This happens in similar way: CPU0 CPU1 zap_huge_pmd() pmdp_get_and_clear() page_remove_rmap(page) atomic_add_negative(-1, &page->_mapcount) __split_huge_page() anon_vma_interval_tree_foreach() __split_huge_page_splitting() page_check_address_pmd() mm_find_pmd() pmd_present(pmd) == 0 /* The same comment as above */ /* * No crash this time since we already decremented page->_mapcount in * zap_huge_pmd(). */ BUG_ON(mapcount != page_mapcount(page)) /* * We split the compound page here into small pages without * serialization against zap_huge_pmd() */ __split_huge_page_refcount() VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!! So my understanding the problem is pmd_present() check in mm_find_pmd() without taking page table lock. The bug was introduced by me commit with commit 117b0791ac42. Sorry for that. :( Let's open code mm_find_pmd() in page_check_address_pmd() and do the check under page table lock. Note that __page_check_address() does the same for PTE entires if sync != 0. [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com> [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com> Signed-off-by: Kirill A. Shutemov Reported-by: Sasha Levin Cc: #3.13+ --- mm/huge_memory.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5025709bb3b5..d02a83852ee9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1536,16 +1536,23 @@ pmd_t *page_check_address_pmd(struct page *page, enum page_check_address_pmd_flag flag, spinlock_t **ptl) { + pgd_t *pgd; + pud_t *pud; pmd_t *pmd; if (address & ~HPAGE_PMD_MASK) return NULL; - pmd = mm_find_pmd(mm, address); - if (!pmd) + pgd = pgd_offset(mm, address); + if (!pgd_present(*pgd)) return NULL; + pud = pud_offset(pgd, address); + if (!pud_present(*pud)) + return NULL; + pmd = pmd_offset(pud, address); + *ptl = pmd_lock(mm, pmd); - if (pmd_none(*pmd)) + if (!pmd_present(*pmd)) goto unlock; if (pmd_page(*pmd) != page) goto unlock; -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/