Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754611AbaJBP5g (ORCPT ); Thu, 2 Oct 2014 11:57:36 -0400 Received: from mail-vc0-f169.google.com ([209.85.220.169]:55730 "EHLO mail-vc0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754455AbaJBP5d (ORCPT ); Thu, 2 Oct 2014 11:57:33 -0400 MIME-Version: 1.0 In-Reply-To: References: <20140930033327.GA14558@redhat.com> <20140930043309.GA16196@redhat.com> <20140930160510.GA15903@redhat.com> <20140930162201.GC15903@redhat.com> <20140930164047.GA18354@redhat.com> <20140930182059.GA24431@redhat.com> Date: Thu, 2 Oct 2014 08:57:32 -0700 X-Google-Sender-Auth: xCmCAGuv4tJNgMQfu2g05Gh1WII Message-ID: Subject: Re: pipe/page fault oddness. From: Linus Torvalds To: Hugh Dickins Cc: Dave Jones , Al Viro , Linux Kernel , Rik van Riel , Ingo Molnar , Michel Lespinasse , "Kirill A. Shutemov" , Mel Gorman , Sasha Levin Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 2, 2014 at 1:47 AM, Hugh Dickins wrote: > > I hesitate to admit, I still don't see it: please illuminate further. No, your'e looking at what I was looking. > We're talking about the loop in __split_huge_page_map(), where it does Yes. > entry = mk_pte(page + i, vma->vm_page_prot); > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > if (!pmd_write(*pmd)) > entry = pte_wrprotect(entry); > if (!pmd_young(*pmd)) > entry = pte_mkold(entry); > if (pmd_numa(*pmd)) > entry = pte_mknuma(entry); > > , right? I only see that adding _PAGE_NUMA to _PAGE_PROTNONE if > pmd_numa(*pmd): but that would mean we had already gone wrong, setting > pmd_numa in a PROT_NONE vma, which task_numa_work takes care not to do; > or have mprotected an area to PROT_NONE without doing the pmd_mknonnuma. Fair enough. Except this code has no locking that I see, so if we *ever* see that numa entry in the pmd while walking the page tables in vmscan, we're basically screwed. > Or are you noticing a deficiency in the pmd locking? I have not > worked my way through that, so cannot guarantee it, but please > point me to the weakness where you see it. So I don't see any locking at all wrt mprotect (or new mmap). That's kind of the whole point for page-out - it bypasses all the normal VM locks, and only uses the last pte locking. So the whole use of vma->vm_page_prot here is a bit scary. That gets modified outside of the page table locks. So how do you know it's not already PROT_NONE, but mprotect just hasn't gotten to actually take the page table locks yet? I dunno. It all makes me just very nervous. The whole "numa bit is separate from the protections, has different locking, and is just oddly and subtly different" is really what I fundamentally object to. And it seems so _unnecessary_. All this odd complexity for no actual gain - just extra code, and extra room for subtle bugs. Which is exactly why I hate that magic NUMA bit so much. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/