Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752355Ab3IZB7e (ORCPT ); Wed, 25 Sep 2013 21:59:34 -0400 Received: from mga11.intel.com ([192.55.52.93]:21223 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751070Ab3IZB7c (ORCPT ); Wed, 25 Sep 2013 21:59:32 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.90,982,1371106800"; d="scan'208";a="401173026" Date: Thu, 26 Sep 2013 09:59:24 +0800 From: Fengguang Wu To: Bob Liu Cc: Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [munlock] BUG: Bad page map in process killall5 pte:cf17e720 pmd:05a22067 Message-ID: <20130926015924.GA10453@localhost> References: <20130926004028.GB9394@localhost> <52439258.3010904@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <52439258.3010904@oracle.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9243 Lines: 176 Hi Bob, On Thu, Sep 26, 2013 at 09:48:08AM +0800, Bob Liu wrote: > Hi Fengguang, > > Would you please have a try with the attached patch? > It added a small fix based on Vlastimil's patch. Thanks for the quick response! I just noticed Andrew added this patch to -mm tree: ------------------------------------------------------ From: Vlastimil Babka Subject: mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration What's the git tree your v2 patch based on? If you already had a git tree, it'd be more simple to push it to a branch and tell me the tree/branch/commit to test. Thanks, Fengguang > On 09/26/2013 08:40 AM, Fengguang Wu wrote: > > Hi Vlastimil, > > > > FYI, this bug seems still not fixed in linux-next 20130925. > > > > commit 7a8010cd36273ff5f6fea5201ef9232f30cebbd9 > > Author: Vlastimil Babka > > Date: Wed Sep 11 14:22:35 2013 -0700 > > > > mm: munlock: manual pte walk in fast path instead of follow_page_mask() > > > > Currently munlock_vma_pages_range() calls follow_page_mask() to obtain > > each individual struct page. This entails repeated full page table > > translations and page table lock taken for each page separately. > > > > This patch avoids the costly follow_page_mask() where possible, by > > iterating over ptes within single pmd under single page table lock. The > > first pte is obtained by get_locked_pte() for non-THP page acquired by the > > initial follow_page_mask(). The rest of the on-stack pagevec for munlock > > is filled up using pte_walk as long as pte_present() and vm_normal_page() > > are sufficient to obtain the struct page. > > > > After this patch, a 14% speedup was measured for munlocking a 56GB large > > memory area with THP disabled. > > > > Signed-off-by: Vlastimil Babka > > Cc: Jörn Engel > > Cc: Mel Gorman > > Cc: Michel Lespinasse > > Cc: Hugh Dickins > > Cc: Rik van Riel > > Cc: Johannes Weiner > > Cc: Michal Hocko > > Cc: Vlastimil Babka > > Signed-off-by: Andrew Morton > > Signed-off-by: Linus Torvalds > > > > > > [ 89.835504] init: plymouth-upstart-bridge main process (3556) terminated with status 1 > > [ 89.986606] init: tty6 main process (3529) killed by TERM signal > > [ 91.414086] BUG: Bad page map in process killall5 pte:cf17e720 pmd:05a22067 > > [ 91.416626] addr:bfc00000 vm_flags:00100173 anon_vma:cf128c80 mapping: (null) index:bfff0 > > [ 91.419402] CPU: 0 PID: 3574 Comm: killall5 Not tainted 3.12.0-rc1-00010-g5fbc0a6 #24 > > [ 91.422171] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > [ 91.423998] 00000000 00000000 c0199e34 c1db5db4 00000000 c0199e54 c10e72d4 000bfff0 > > [ 91.427933] 00000000 bfc00000 00000000 000cf17e cf17e720 c0199e74 c10e7995 00000000 > > [ 91.431940] bfc00000 cf1ca190 bfc00000 cf180000 cf1ca190 c0199ee0 c10eb8cf ce6d1900 > > [ 91.435894] Call Trace: > > [ 91.436969] [] dump_stack+0x4b/0x66 > > [ 91.438503] [] print_bad_pte+0x14b/0x162 > > [ 91.440204] [] vm_normal_page+0x67/0x9b > > [ 91.441811] [] munlock_vma_pages_range+0xf9/0x176 > > [ 91.443633] [] exit_mmap+0x86/0xf7 > > [ 91.445156] [] ? lock_release+0x169/0x1ef > > [ 91.446795] [] ? rcu_read_unlock+0x17/0x23 > > [ 91.448465] [] ? exit_aio+0x2b/0x6c > > [ 91.449990] [] mmput+0x6a/0xcb > > [ 91.451508] [] do_exit+0x362/0x8be > > [ 91.453013] [] ? hrtimer_debug_hint+0xd/0xd > > [ 91.454700] [] do_group_exit+0x51/0x9e > > [ 91.456296] [] SyS_exit_group+0x16/0x16 > > [ 91.457901] [] sysenter_do_call+0x12/0x33 > > [ 91.459553] Disabling lock debugging due to kernel taint > > > > git bisect start 272b98c6455f00884f0350f775c5342358ebb73f v3.11 -- > > git bisect good 57d730924d5cc2c3e280af16a9306587c3a511db # 02:21 495+ Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good 3bb22ec53e2bd12a241ed84359bffd591a40ab87 # 12:03 495+ staging/lustre/ptlrpc: convert to new shrinker API > > git bisect bad a5b7c87f92076352dbff2fe0423ec255e1c9a71b # 12:18 31- vmscan, memcg: do softlimit reclaim also for targeted reclaim > > git bisect good 3d94ea51c1d8db6f41268a9d2aea5f5771e9a8d3 # 15:40 495+ ocfs2: clean up dead code in ocfs2_acl_from_xattr() > > git bisect bad d62a201f24cba74e2fbf9f6f7af86ff5f5e276fc # 16:46 79- checkpatch: enforce sane perl version > > git bisect good 83467efbdb7948146581a56cbd683a22a0684bbb # 01:29 585+ mm: migrate: check movability of hugepage in unmap_and_move_huge_page() > > git bisect bad 2bff24a3707093c435ab3241c47dcdb5f16e432b # 02:07 148- memcg: fix multiple large threshold notifications > > git bisect bad 1ecfd533f4c528b0b4cc5bc115c4c47f0b5e4828 # 02:34 64- mm/mremap.c: call pud_free() after fail calling pmd_alloc() > > git bisect good 0ec3b74c7f5599c8a4d2b33d430a5470af26ebf6 # 13:10 1170+ mm: putback_lru_page: remove unnecessary call to page_lru_base_type() > > git bisect good 5b40998ae35cf64561868370e6c9f3d3e94b6bf7 # 16:52 1170+ mm: munlock: remove redundant get_page/put_page pair on the fast path > > git bisect bad 187320932dcece9c4b93f38f56d1f888bd5c325f # 17:11 0- mm/sparse: introduce alloc_usemap_and_memmap > > git bisect bad 6e543d5780e36ff5ee56c44d7e2e30db3457a7ed # 17:29 2- mm: vmscan: fix do_try_to_free_pages() livelock > > git bisect bad 7a8010cd36273ff5f6fea5201ef9232f30cebbd9 # 17:59 14- mm: munlock: manual pte walk in fast path instead of follow_page_mask() > > git bisect good 5b40998ae35cf64561868370e6c9f3d3e94b6bf7 # 22:10 3510+ mm: munlock: remove redundant get_page/put_page pair on the fast path > > git bisect bad 5fbc0a6263a147cde905affbfb6622c26684344f # 22:10 0- Merge remote-tracking branch 'pinctrl/for-next' into kbuild_tmp > > git bisect good 87e37036dcf96eb73a8627524be8b722bd1ac526 # 04:31 3510+ Revert "mm: munlock: manual pte walk in fast path instead of follow_page_mask()" > > git bisect bad 22356f447ceb8d97a4885792e7d9e4607f712e1b # 04:40 48- mm: Place preemption point in do_mlockall() loop > > git bisect bad 050f4da86e9bdbcc9e11789e0f291aafa57b8a20 # 04:55 133- Add linux-next specific files for 20130925 > > > > Thanks, > > Fengguang > > > >From aef673d802a92aef8dc082c244fef51ae9c4a13c Mon Sep 17 00:00:00 2001 > From: Bob Liu > Date: Thu, 26 Sep 2013 09:41:27 +0800 > Subject: [PATCH v2] mm: munlock: Prevent walking off the end of a pagetable in > no-pmd configuration > > The function __munlock_pagevec_fill() introduced in commit 7a8010cd3 > ("mm: munlock: manual pte walk in fast path instead of follow_page_mask()") > uses pmd_addr_end() for restricting its operation within current page table. > This is insufficient on architectures/configurations where pmd is folded > and pmd_addr_end() just returns the end of the full range to be walked. In > this case, it allows pte++ to walk off the end of a page table resulting in > unpredictable behaviour. > > This patch fixes the function by using pgd_addr_end() and pud_addr_end() > before pmd_addr_end(), which will yield correct page table boundary on all > configurations. This is similar to what existing page walkers do when walking > each level of the page table. > > Additionaly, the patch clarifies a comment for get_locked_pte() call in the > function. > > v2: walk page table after start += PAGESIZE > > Reported-by: Fengguang Wu > Signed-off-by: Vlastimil Babka > Signed-off-by: Bob Liu > --- > mm/mlock.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/mm/mlock.c b/mm/mlock.c > index d638026..a91114a 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -379,13 +379,19 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, > > /* > * Initialize pte walk starting at the already pinned page where we > - * are sure that there is a pte. > + * are sure that there is a pte, as it was pinned under the same > + * mmap_sem write op. > */ > pte = get_locked_pte(vma->vm_mm, start, &ptl); > - end = min(end, pmd_addr_end(start, end)); > > /* The page next to the pinned page is the first we will try to get */ > start += PAGE_SIZE; > + > + /* Make sure we do not cross the page table boundary */ > + end = pgd_addr_end(start, end); > + end = pud_addr_end(start, end); > + end = pmd_addr_end(start, end); > + > while (start < end) { > struct page *page = NULL; > pte++; > -- > 1.7.10.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/