Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752697AbZIGVgS (ORCPT ); Mon, 7 Sep 2009 17:36:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752211AbZIGVgS (ORCPT ); Mon, 7 Sep 2009 17:36:18 -0400 Received: from mk-filter-2-a-1.mail.uk.tiscali.com ([212.74.100.53]:62781 "EHLO mk-filter-2-a-1.mail.uk.tiscali.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751607AbZIGVgR (ORCPT ); Mon, 7 Sep 2009 17:36:17 -0400 X-Trace: 258732962/mk-filter-2.mail.uk.tiscali.com/B2C/$b2c-THROTTLED-DYNAMIC/b2c-CUSTOMER-DYNAMIC-IP/79.69.94.54/None/hugh.dickins@tiscali.co.uk X-SBRS: None X-RemoteIP: 79.69.94.54 X-IP-MAIL-FROM: hugh.dickins@tiscali.co.uk X-SMTP-AUTH: X-MUA: X-IP-BHB: Once X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AugEAGMZpUpPRV42/2dsb2JhbACBU9c/hBgF X-IronPort-AV: E=Sophos;i="4.44,348,1249254000"; d="scan'208";a="258732962" Date: Mon, 7 Sep 2009 22:35:32 +0100 (BST) From: Hugh Dickins X-X-Sender: hugh@sister.anvils To: Andrew Morton cc: Mel Gorman , Jeff Chua , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Linus Torvalds , Nick Piggin , Rik van Riel , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 4/8] mm: FOLL_DUMP replace FOLL_ANON In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4453 Lines: 119 The "FOLL_ANON optimization" and its use_zero_page() test have caused confusion and bugs: why does it test VM_SHARED? for the very good but unsatisfying reason that VMware crashed without. As we look to maybe reinstating anonymous use of the ZERO_PAGE, we need to sort this out. Easily done: it's silly for __get_user_pages() and follow_page() to be guessing whether it's safe to assume that they're being used for a coredump (which can take a shortcut snapshot where other uses must handle a fault) - just tell them with GUP_FLAGS_DUMP and FOLL_DUMP. get_dump_page() doesn't even want a ZERO_PAGE: an error suits fine. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 2 +- mm/internal.h | 1 + mm/memory.c | 43 ++++++++++++------------------------------- 3 files changed, 14 insertions(+), 32 deletions(-) --- mm3/include/linux/mm.h 2009-09-07 13:16:32.000000000 +0100 +++ mm4/include/linux/mm.h 2009-09-07 13:16:39.000000000 +0100 @@ -1247,7 +1247,7 @@ struct page *follow_page(struct vm_area_ #define FOLL_WRITE 0x01 /* check pte is writable */ #define FOLL_TOUCH 0x02 /* mark page accessed */ #define FOLL_GET 0x04 /* do get_page on page */ -#define FOLL_ANON 0x08 /* give ZERO_PAGE if no pgtable */ +#define FOLL_DUMP 0x08 /* give error on hole if it would be zero */ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr, void *data); --- mm3/mm/internal.h 2009-09-07 13:16:22.000000000 +0100 +++ mm4/mm/internal.h 2009-09-07 13:16:39.000000000 +0100 @@ -252,6 +252,7 @@ static inline void mminit_validate_memmo #define GUP_FLAGS_WRITE 0x01 #define GUP_FLAGS_FORCE 0x02 +#define GUP_FLAGS_DUMP 0x04 int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int flags, --- mm3/mm/memory.c 2009-09-07 13:16:32.000000000 +0100 +++ mm4/mm/memory.c 2009-09-07 13:16:39.000000000 +0100 @@ -1174,41 +1174,22 @@ no_page: pte_unmap_unlock(ptep, ptl); if (!pte_none(pte)) return page; - /* Fall through to ZERO_PAGE handling */ + no_page_table: /* * When core dumping an enormous anonymous area that nobody - * has touched so far, we don't want to allocate page tables. + * has touched so far, we don't want to allocate unnecessary pages or + * page tables. Return error instead of NULL to skip handle_mm_fault, + * then get_dump_page() will return NULL to leave a hole in the dump. + * But we can only make this optimization where a hole would surely + * be zero-filled if handle_mm_fault() actually did handle it. */ - if (flags & FOLL_ANON) { - page = ZERO_PAGE(0); - if (flags & FOLL_GET) - get_page(page); - BUG_ON(flags & FOLL_WRITE); - } + if ((flags & FOLL_DUMP) && + (!vma->vm_ops || !vma->vm_ops->fault)) + return ERR_PTR(-EFAULT); return page; } -/* Can we do the FOLL_ANON optimization? */ -static inline int use_zero_page(struct vm_area_struct *vma) -{ - /* - * We don't want to optimize FOLL_ANON for make_pages_present() - * when it tries to page in a VM_LOCKED region. As to VM_SHARED, - * we want to get the page from the page tables to make sure - * that we serialize and update with any other user of that - * mapping. - */ - if (vma->vm_flags & (VM_LOCKED | VM_SHARED)) - return 0; - /* - * And if we have a fault routine, it's not an anonymous region. - */ - return !vma->vm_ops || !vma->vm_ops->fault; -} - - - int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int nr_pages, int flags, struct page **pages, struct vm_area_struct **vmas) @@ -1288,8 +1269,8 @@ int __get_user_pages(struct task_struct foll_flags = FOLL_TOUCH; if (pages) foll_flags |= FOLL_GET; - if (!write && use_zero_page(vma)) - foll_flags |= FOLL_ANON; + if (flags & GUP_FLAGS_DUMP) + foll_flags |= FOLL_DUMP; do { struct page *page; @@ -1447,7 +1428,7 @@ struct page *get_dump_page(unsigned long struct page *page; if (__get_user_pages(current, current->mm, addr, 1, - GUP_FLAGS_FORCE, &page, &vma) < 1) + GUP_FLAGS_FORCE | GUP_FLAGS_DUMP, &page, &vma) < 1) return NULL; if (page == ZERO_PAGE(0)) { page_cache_release(page); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/