Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752970AbbBSRBR (ORCPT ); Thu, 19 Feb 2015 12:01:17 -0500 Received: from cantor2.suse.de ([195.135.220.15]:35101 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752552AbbBSRBP (ORCPT ); Thu, 19 Feb 2015 12:01:15 -0500 Date: Thu, 19 Feb 2015 17:01:04 +0000 From: Mel Gorman To: David Vrabel Cc: Linus Torvalds , Wei Liu , "linux-kernel@vger.kernel.org" , "Xen-devel@lists.xen.org" Subject: Re: NUMA_BALANCING and Xen PV guest regression in 3.20-rc0 Message-ID: <20150219170104.GS3087@suse.de> References: <54E5DFED.9050700@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <54E5DFED.9050700@citrix.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3132 Lines: 77 On Thu, Feb 19, 2015 at 01:06:53PM +0000, David Vrabel wrote: > Mel, > > The NUMA_BALANCING series beginning with 5d833062139d (mm: numa: do not > dereference pmd outside of the lock during NUMA hinting fault) and > specifically 8a0516ed8b90 (mm: convert p[te|md]_numa users to > p[te|md]_protnone_numa) breaks Xen 64-bit PV guests. > > Any fault on a present userspace mapping (e.g., a write to a read-only > mapping) is being misinterpreted as a NUMA hinting fault and not handled > correctly. All userspace programs end up continuously faulting. > > This is because the hypervisor sets _PAGE_GLOBAL (== _PAGE_PROTNONE) on > all present userspace page table entries. > I see, this is a variation of the problem where the NUMA hinted PTE was treated as special due to the paravirt interfaces not being used. > Note that the comment in asm/pgtable_types.h that says that > _PAGE_BIT_PROTNONE is only valid on non-present entries. > > /* If _PAGE_BIT_PRESENT is clear, we use these: */ > /* - if the user mapped it with PROT_NONE; pte_present gives true */ > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL > > Adjusting pte_protnone() and pmd_protnone() to check for the absence of > _PAGE_PRESENT allows 64-bit Xen PV guests to work correctly again (see > following patch), but I'm not sure if NUMA_BALANCING would correctly > work with this change. > Thanks for the analysis and the reminder of some of the details from the previous discussion. > > 8<--------------------------- > x86: pte_protnone() and pmd_protnone() must check entry is > not present > > Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if > _PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check > for this. > > This fixes a 64-bit Xen PV guest regression introduced by > 8a0516ed8b90c95ffa1363b420caa37418149f21 (mm: convert p[te|md]_numa > users to p[te|md]_protnone_numa). Any userspace process would > endlessly fault. > > In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL > set by the hypervisor. This meant that any fault on a present > userspace entry (e.g., a write to a read-only mapping) would be > misinterpreted as a NUMA hinting fault and the fault would not be > correctly handled, resulting in the access endlessly faulting. > > Signed-off-by: David Vrabel > Cc: Mel Gorman I cannot think of a reason why this would fail for NUMA balancing on bare metal. The PAGE_NONE protection clears the present bit on p[te|md]_modify so the expectations are matched before or after the patch is applied. So, for bare metal at least Acked-by: Mel Gorman I *think* this will work ok with Xen but I cannot 100% convince myself. I'm adding Wei Liu to the cc who may have a Xen PV setup handy that supports NUMA and may be able to test the patch to confirm. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/