Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753157AbcKRWMk convert rfc822-to-8bit (ORCPT ); Fri, 18 Nov 2016 17:12:40 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:26545 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752136AbcKRWMi (ORCPT ); Fri, 18 Nov 2016 17:12:38 -0500 Subject: Re: [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing To: Hugh Dickins References: <1479413404-27332-1-git-send-email-boris.ostrovsky@oracle.com> Cc: Mel Gorman , david.vrabel@citrix.com, jgross@suse.com, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, olaf@aepfle.de From: Boris Ostrovsky Message-ID: <2bf041f3-8918-3c6f-8afb-c9edcc03dcd9@oracle.com> Date: Fri, 18 Nov 2016 17:14:55 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8BIT X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3805 Lines: 126 On 11/18/2016 04:51 PM, Hugh Dickins wrote: > On Thu, 17 Nov 2016, Boris Ostrovsky wrote: > >> Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to >> NUMA balancing") set VM_IO flag to prevent grant maps from being >> subjected to NUMA balancing. >> >> It was discovered recently that this flag causes get_user_pages() to >> always fail with -EFAULT. >> >> check_vma_flags >> __get_user_pages >> __get_user_pages_locked >> __get_user_pages_unlocked >> get_user_pages_fast >> iov_iter_get_pages >> dio_refill_pages >> do_direct_IO >> do_blockdev_direct_IO >> do_blockdev_direct_IO >> ext4_direct_IO_read >> generic_file_read_iter >> aio_run_iocb >> >> (which can happen if guest's vdisk has direct-io-safe option). >> >> To avoid this don't use vm_flags. Instead, use mempolicy that prohibits >> page migration (i.e. clear MPOL_F_MOF|MPOL_F_MORON) and make sure we >> don't consult task's policy (which may include those flags) if vma >> doesn't have one. >> >> Reported-by: Olaf Hering >> Signed-off-by: Boris Ostrovsky >> Cc: stable@vger.kernel.org > Hmm, sorry, but this seems overcomplicated to me: ingenious, but an > unusual use of the ->get_policy method, which is a little worrying, > since it has only been used for shmem (+ shm and kernfs) until now. > > Maybe I'm wrong, but wouldn't substituting VM_MIXEDMAP for VM_IO > solve the problem more simply? It would indeed. I didn't want to use it because it has specific meaning ("Can contain "struct page" and pure PFN pages") and that didn't seem like the right flag to describe this vma. -boris > > Hugh > >> --- >> >> Mis-spelled David's address. >> >> Changes in v3: >> * Don't use __mpol_dup() and get_task_policy() which are not exported >> for use by drivers. Add vm_operations_struct.get_policy(). >> * Copy to stable >> >> >> drivers/xen/gntdev.c | 27 ++++++++++++++++++++++++++- >> 1 files changed, 26 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >> index bb95212..632edd4 100644 >> --- a/drivers/xen/gntdev.c >> +++ b/drivers/xen/gntdev.c >> @@ -35,6 +35,7 @@ >> #include >> #include >> #include >> +#include >> >> #include >> #include >> @@ -433,10 +434,28 @@ static void gntdev_vma_close(struct vm_area_struct *vma) >> return map->pages[(addr - map->pages_vm_start) >> PAGE_SHIFT]; >> } >> >> +#ifdef CONFIG_NUMA >> +/* >> + * We have this op to make sure callers (such as vma_policy_mof()) don't >> + * check current task's policy which may include migrate flags (MPOL_F_MOF >> + * or MPOL_F_MORON) >> + */ >> +static struct mempolicy *gntdev_vma_get_policy(struct vm_area_struct *vma, >> + unsigned long addr) >> +{ >> + if (mpol_needs_cond_ref(vma->vm_policy)) >> + mpol_get(vma->vm_policy); >> + return vma->vm_policy; >> +} >> +#endif >> + >> static const struct vm_operations_struct gntdev_vmops = { >> .open = gntdev_vma_open, >> .close = gntdev_vma_close, >> .find_special_page = gntdev_vma_find_special_page, >> +#ifdef CONFIG_NUMA >> + .get_policy = gntdev_vma_get_policy, >> +#endif >> }; >> >> /* ------------------------------------------------------------------ */ >> @@ -1007,7 +1026,13 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) >> >> vma->vm_ops = &gntdev_vmops; >> >> - vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_IO; >> + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; >> + >> +#ifdef CONFIG_NUMA >> + /* Prevent NUMA balancing */ >> + if (vma->vm_policy) >> + vma->vm_policy->flags &= ~(MPOL_F_MOF | MPOL_F_MORON); >> +#endif >> >> if (use_ptemod) >> vma->vm_flags |= VM_DONTCOPY; >> -- >> 1.7.1 >> >>