Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753026AbaJBOZ1 (ORCPT ); Thu, 2 Oct 2014 10:25:27 -0400 Received: from mta-out1.inet.fi ([62.71.2.226]:56524 "EHLO jenni2.inet.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751507AbaJBOZ0 (ORCPT ); Thu, 2 Oct 2014 10:25:26 -0400 Date: Thu, 2 Oct 2014 17:25:03 +0300 From: "Kirill A. Shutemov" To: Linus Torvalds Cc: Sasha Levin , Hugh Dickins , Dave Jones , Al Viro , Linux Kernel , Rik van Riel , Ingo Molnar , Michel Lespinasse , "Kirill A. Shutemov" , Mel Gorman Subject: Re: pipe/page fault oddness. Message-ID: <20141002142503.GA13203@node.dhcp.inet.fi> References: <20140930164047.GA18354@redhat.com> <20140930182059.GA24431@redhat.com> <542C7B5E.2020000@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 01, 2014 at 03:42:53PM -0700, Linus Torvalds wrote: > On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin wrote: > > > > I've tried this patch on the same configuration that was triggering > > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it > > ran fine for ~20 minutes before exploding with: > > Well, that's somewhat encouraging. I didn't expect it to be perfect. > > That said, "ran fine" isn't necessarily the same thing as "worked". > Who knows how buggy it was without showing overt symptoms until the > BUG_ON() triggered. But hey, I'll be optimistic. > > > [ 2781.566206] kernel BUG at mm/huge_memory.c:1293! > > So that's > > BUG_ON(is_huge_zero_page(page)); > > and the reason is trivial: the old code used to have a magical special > case for the zero-page hugepage (see change_huge_pmd()) and I got rid > of that (because now it's just about setting protections, and the > zero-page hugepage is in no way special. > > So I think the solution is equally trivial: just accept that the > zero-page can happen, and ignore it (just un-numa it). > > Appended is a incremental diff on top of the previous one. Even less > tested than the last case, but I think you get the idea if it doesn't > work as-is. > > Linus > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 14de54af6c38..fc33952d59c4 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1290,7 +1290,9 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, > } > > page = pmd_page(pmd); > - BUG_ON(is_huge_zero_page(page)); > + if (is_huge_zero_page(page)) > + goto huge_zero_page; > + > page_nid = page_to_nid(page); > last_cpupid = page_cpupid_last(page); > count_vm_numa_event(NUMA_HINT_FAULTS); > @@ -1381,6 +1383,11 @@ out: > task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, flags); > > return 0; > +huge_zero_page: > + pmd = pmd_modify(pmd, vma->vm_page_prot); > + set_pmd_at(mm, haddr, pmdp, pmd); > + update_mmu_cache_pmd(vma, addr, pmdp); > + goto out_unlock; I don't see what prevents the code to make zero page writable here. We need at least pmd = pmd_wrprotect(pmd) before set_pmd_at(); -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/