Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756043AbaFPVEZ (ORCPT ); Mon, 16 Jun 2014 17:04:25 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:28949 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755783AbaFPVET (ORCPT ); Mon, 16 Jun 2014 17:04:19 -0400 Message-ID: <539F5BC5.3010501@oracle.com> Date: Mon, 16 Jun 2014 17:04:05 -0400 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Hugh Dickins CC: "Kirill A. Shutemov" , Konstantin Khlebnikov , Mel Gorman , Bob Liu , "linux-mm@kvack.org" , Christoph Lameter , Andrew Morton , LKML , Dave Jones Subject: Re: mm: NULL ptr deref in remove_migration_pte References: <534E9ACA.2090008@oracle.com> <5367B365.1070709@oracle.com> <537FE9F3.40508@oracle.com> <538498A1.7010305@oracle.com> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/10/2014 12:20 AM, Hugh Dickins wrote: > On Tue, 27 May 2014, Sasha Levin wrote: >> > On 05/26/2014 04:05 PM, Hugh Dickins wrote: >>> > > On Fri, 23 May 2014, Sasha Levin wrote: >>> > > >>>> > >> Ping? >>>> > >> >>>> > >> On 05/05/2014 11:51 AM, Sasha Levin wrote: >>>>> > >>> Did anyone have a chance to look at it? I still see it in -next. >>>>> > >>> >>>>> > >>> >>>>> > >>> Thanks, >>>>> > >>> Sasha >>>>> > >>> >>>>> > >>> On 04/16/2014 10:59 AM, Sasha Levin wrote: >>>>>> > >>>> Hi all, >>>>>> > >>>> >>>>>> > >>>> While fuzzing with trinity inside a KVM tools guest running latest -next >>>>>> > >>>> kernel I've stumbled on the following: >>>>>> > >>>> >>>>>> > >>>> [ 2552.313602] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 >>>>>> > >>>> [ 2552.315878] IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1)) >>>>>> > >>>> [ 2552.315878] PGD 465836067 PUD 465837067 PMD 0 >>>>>> > >>>> [ 2552.315878] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC >>>>>> > >>>> [ 2552.315878] Dumping ftrace buffer: >>>>>> > >>>> [ 2552.315878] (ftrace buffer empty) >>>>>> > >>>> [ 2552.315878] Modules linked in: >>>>>> > >>>> [ 2552.315878] CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G W 3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398 >>>>>> > >>>> [ 2552.315878] task: ffff88046548b000 ti: ffff88044e532000 task.ti: ffff88044e532000 >>>>>> > >>>> [ 2552.320286] RIP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1)) >>>>>> > >>>> [ 2552.320286] RSP: 0018:ffff88044e5339c8 EFLAGS: 00010002 >>>>>> > >>>> [ 2552.320286] RAX: 0000000000000082 RBX: ffff88046548b000 RCX: 0000000000000000 >>>>>> > >>>> [ 2552.320286] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018 >>>>>> > >>>> [ 2552.320286] RBP: ffff88044e533ab8 R08: 0000000000000001 R09: 0000000000000000 >>>>>> > >>>> [ 2552.320286] R10: ffff88046548b000 R11: 0000000000000001 R12: 0000000000000000 >>>>>> > >>>> [ 2552.320286] R13: 0000000000000018 R14: 0000000000000000 R15: 0000000000000000 >>>>>> > >>>> [ 2552.320286] FS: 00007fd286a9a700(0000) GS:ffff88018b000000(0000) knlGS:0000000000000000 >>>>>> > >>>> [ 2552.320286] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>> > >>>> [ 2552.320286] CR2: 0000000000000018 CR3: 0000000442c17000 CR4: 00000000000006a0 >>>>>> > >>>> [ 2552.320286] DR0: 0000000000695000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> > >>>> [ 2552.320286] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 >>>>>> > >>>> [ 2552.320286] Stack: >>>>>> > >>>> [ 2552.320286] ffff88044e5339e8 ffffffff9f56e761 0000000000000000 ffff880315c13000 >>>>>> > >>>> [ 2552.320286] ffff88044e533a38 ffffffff9c193f0d ffffffff9c193e34 ffff8804654e8000 >>>>>> > >>>> [ 2552.320286] ffff8804654e8000 0000000000000001 ffff88046548b000 0000000000000007 >>>>>> > >>>> [ 2552.320286] Call Trace: >>>>>> > >>>> [ 2552.320286] ? _raw_spin_unlock_irq (arch/x86/include/asm/preempt.h:98 include/linux/spinlock_api_smp.h:169 kernel/locking/spinlock.c:199) >>>>>> > >>>> [ 2552.320286] ? finish_task_switch (include/linux/tick.h:206 kernel/sched/core.c:2163) >>>>>> > >>>> [ 2552.320286] ? finish_task_switch (arch/x86/include/asm/current.h:14 kernel/sched/sched.h:993 kernel/sched/core.c:2145) >>>>>> > >>>> [ 2552.320286] ? retint_restore_args (arch/x86/kernel/entry_64.S:1040) >>>>>> > >>>> [ 2552.320286] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) >>>>>> > >>>> [ 2552.320286] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599) >>>>>> > >>>> [ 2552.320286] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) >>>>>> > >>>> [ 2552.320286] ? remove_migration_pte (mm/migrate.c:137) >>>>>> > >>>> [ 2552.320286] ? retint_restore_args (arch/x86/kernel/entry_64.S:1040) >>>>>> > >>>> [ 2552.320286] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) >>>>>> > >>>> [ 2552.320286] ? remove_migration_pte (mm/migrate.c:137) >>>>>> > >>>> [ 2552.320286] remove_migration_pte (mm/migrate.c:137) >>>>>> > >>>> [ 2552.320286] rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699) >>>>>> > >>>> [ 2552.320286] remove_migration_ptes (mm/migrate.c:224) >>>>>> > >>>> [ 2552.320286] ? new_page_node (mm/migrate.c:107) >>>>>> > >>>> [ 2552.320286] ? remove_migration_pte (mm/migrate.c:195) >>>>>> > >>>> [ 2552.320286] migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126) >>>>>> > >>>> [ 2552.320286] ? perf_trace_mm_numa_migrate_ratelimit (mm/migrate.c:1574) >>>>>> > >>>> [ 2552.320286] migrate_misplaced_page (mm/migrate.c:1733) >>>>>> > >>>> [ 2552.320286] __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925) >>>>>> > >>>> [ 2552.320286] ? __const_udelay (arch/x86/lib/delay.c:126) >>>>>> > >>>> [ 2552.320286] ? __rcu_read_unlock (kernel/rcu/update.c:97) >>>>>> > >>>> [ 2552.320286] handle_mm_fault (mm/memory.c:3948) >>>>>> > >>>> [ 2552.320286] __get_user_pages (mm/memory.c:1851) >>>>>> > >>>> [ 2552.320286] ? preempt_count_sub (kernel/sched/core.c:2527) >>>>>> > >>>> [ 2552.320286] __mlock_vma_pages_range (mm/mlock.c:255) >>>>>> > >>>> [ 2552.320286] __mm_populate (mm/mlock.c:711) >>>>>> > >>>> [ 2552.320286] SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791) >>>>>> > >>>> [ 2552.320286] tracesys (arch/x86/kernel/entry_64.S:749) >>>>>> > >>>> [ 2552.320286] Code: 85 2d 1e 00 00 48 c7 c1 d7 68 6c a0 48 c7 c2 47 11 6c a0 31 c0 be fa 0b 00 00 48 c7 c7 91 68 6c a0 e8 1c 6d f9 ff e9 07 1e 00 00 <49> 81 7d 00 80 31 76 a2 b8 00 00 00 00 44 0f 44 c0 eb 07 0f 1f >>>>>> > >>>> [ 2552.320286] RIP __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1)) >>>>>> > >>>> [ 2552.320286] RSP >>>>>> > >>>> [ 2552.320286] CR2: 0000000000000018 >>> > > >>> > > Sasha, please clarify your Ping: I've seen you say in other mail >>> > > "I had to disable transhuge/hugetlb in my testing .config". >>> > > >>> > > Do you see this remove_migration_pte oops even with THP disabled? >>> > > >>> > > Do you see the filemap.c:202 BUG_ON(page_mapped(page)) >>> > > even with THP disabled? >> > >> > The mail that you mentioned prompted me to go back and re-enable THP and >> > see what still breaks, which would explain why I pinged this thread again (I >> > only do that once I see that problem still occurs). >> > >> > However, I can't confirm if these problems happen without THP as I didn't >> > think they were related. I'll disable THP again and give it a go. > Although there's nothing in the backtrace to implicate it, > I think this crash is caused by THP: please try this patch - thanks. > > [PATCH] mm: let mm_find_pmd fix buggy race with THP fault > > Trinity has reported: > BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1)) > CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G W > 3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398 > lock_acquire (arch/x86/include/asm/current.h:14 > kernel/locking/lockdep.c:3602) > _raw_spin_lock (include/linux/spinlock_api_smp.h:143 > kernel/locking/spinlock.c:151) > remove_migration_pte (mm/migrate.c:137) > rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699) > remove_migration_ptes (mm/migrate.c:224) > migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126) > migrate_misplaced_page (mm/migrate.c:1733) > __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925) > handle_mm_fault (mm/memory.c:3948) > __get_user_pages (mm/memory.c:1851) > __mlock_vma_pages_range (mm/mlock.c:255) > __mm_populate (mm/mlock.c:711) > SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791) > > I believe this comes about because, whereas collapsing and splitting > THP functions take anon_vma lock in write mode (which excludes > concurrent rmap walks), faulting THP functions (write protection and > misplaced NUMA) do not - and mostly they do not need to. > > But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, > for an instant (indeed, for a long instant, given the inter-CPU > TLB flush in there), leaves *pmd neither present not trans_huge. > > Which can confuse a concurrent rmap walk, as when removing migration > ptes, seen in the dumped trace. Although that rmap walk has a 4k > page to insert, anon_vmas containing THPs are in no way segregated > from 4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to > cope with that instant when a trans_huge pmd is temporarily absent. > > I don't think we need strengthen the locking at the THP end: it's > easily handled with an ACCESS_ONCE() before testing both conditions. > > And since mm_find_pmd() had only one caller who wanted a THP rather > than a pmd, let's slightly repurpose it to fail when it hits a THP > or non-present pmd, and open code split_huge_page_address() again. > > Reported-by: Sasha Levin > Signed-off-by: Hugh Dickins Hi Hugh, It took some time to hit something here, but I think that the following is related: [ 489.152166] INFO: trying to register non-static key. [ 489.152166] the code is fine but needs lockdep annotation. [ 489.152166] turning off the locking correctness validator. [ 489.152166] CPU: 23 PID: 12148 Comm: trinity-c79 Not tainted 3.15.0-next-20140616-sasha-00025-g0fd1f7d-dirty #657 [ 489.152166] ffff8804dd013000 ffff8804e15a38e8 ffffffff965140d1 0000000000000002 [ 489.152166] ffffffff9a5ce7c0 ffff8804e15a39e8 ffffffff931ca363 ffff8804e15a3928 [ 489.152166] 0000000000000000 0000000000000000 ffff8804e4730978 0000000000000001 [ 489.152166] Call Trace: [ 489.152166] dump_stack (lib/dump_stack.c:52) [ 489.152166] __lock_acquire (kernel/locking/lockdep.c:743 kernel/locking/lockdep.c:3078) [ 489.152166] ? __lock_acquire (kernel/locking/lockdep.c:3189) [ 489.152166] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86) [ 489.152166] lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 489.152166] ? __page_check_address (include/linux/spinlock.h:303 mm/rmap.c:630) [ 489.152166] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 489.152166] ? __page_check_address (include/linux/spinlock.h:303 mm/rmap.c:630) [ 489.152166] ? get_parent_ip (kernel/sched/core.c:2546) [ 489.152166] __page_check_address (include/linux/spinlock.h:303 mm/rmap.c:630) [ 489.152166] try_to_unmap_one (mm/rmap.c:1153) [ 489.152166] ? __const_udelay (arch/x86/lib/delay.c:126) [ 489.152166] ? __rcu_read_unlock (kernel/rcu/update.c:97) [ 489.152166] ? page_lock_anon_vma_read (mm/rmap.c:448) [ 489.152166] rmap_walk (mm/rmap.c:1654 mm/rmap.c:1725) [ 489.152166] ? preempt_count_sub (kernel/sched/core.c:2602) [ 489.152166] try_to_unmap (mm/rmap.c:1547) [ 489.152166] ? page_remove_rmap (mm/rmap.c:1144) [ 489.152166] ? invalid_migration_vma (mm/rmap.c:1503) [ 489.152166] ? try_to_unmap_one (mm/rmap.c:1411) [ 489.152166] ? anon_vma_prepare (mm/rmap.c:448) [ 489.152166] ? invalid_mkclean_vma (mm/rmap.c:1498) [ 489.152166] ? page_get_anon_vma (mm/rmap.c:405) [ 489.152166] migrate_pages (mm/migrate.c:913 mm/migrate.c:959 mm/migrate.c:1146) [ 489.152166] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:98 include/linux/spinlock_api_smp.h:169 kernel/locking/spinlock.c:199) [ 489.152166] ? perf_trace_mm_numa_migrate_ratelimit (mm/migrate.c:1594) [ 489.152166] migrate_misplaced_page (mm/migrate.c:1754) [ 489.152166] __handle_mm_fault (mm/memory.c:3157 mm/memory.c:3207 mm/memory.c:3317) [ 489.152166] handle_mm_fault (include/linux/memcontrol.h:151 mm/memory.c:3343) [ 489.152166] ? __do_page_fault (arch/x86/mm/fault.c:1163) [ 489.152166] __do_page_fault (arch/x86/mm/fault.c:1230) [ 489.152166] ? vtime_account_user (kernel/sched/cputime.c:687) [ 489.152166] ? get_parent_ip (kernel/sched/core.c:2546) [ 489.152166] ? preempt_count_sub (kernel/sched/core.c:2602) [ 489.152166] ? context_tracking_user_exit (kernel/context_tracking.c:184) [ 489.152166] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 489.152166] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2)) [ 489.152166] trace_do_page_fault (arch/x86/mm/fault.c:1313 include/linux/jump_label.h:115 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1314) [ 489.152166] do_async_page_fault (arch/x86/kernel/kvm.c:264) [ 489.152166] async_page_fault (arch/x86/kernel/entry_64.S:1322) [ 494.710068] ============================================================================= [ 494.710068] BUG page->ptl (Not tainted): Redzone overwritten [ 494.710068] ----------------------------------------------------------------------------- [ 494.710068] [ 494.710068] INFO: 0xffff8804e4730e58-0xffff8804e4730e5f. First byte 0x0 instead of 0xbb [ 494.710068] INFO: Slab 0xffffea001391cc00 objects=40 used=40 fp=0x (null) flags=0x56fffff80004080 [ 494.710068] INFO: Object 0xffff8804e4730e10 @offset=3600 fp=0x (null) [ 494.710068] [ 494.710068] Bytes b4 ffff8804e4730e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 494.710068] Object ffff8804e4730e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 494.710068] Object ffff8804e4730e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 494.710068] Object ffff8804e4730e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 494.710068] Object ffff8804e4730e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 494.710068] Object ffff8804e4730e50: 00 00 00 00 00 00 00 00 ........ [ 494.710068] Redzone ffff8804e4730e58: 00 00 00 00 00 00 00 00 ........ [ 494.710068] Padding ffff8804e4730f98: 00 00 00 00 00 00 00 00 ........ [ 494.710068] CPU: 21 PID: 12452 Comm: trinity-c128 Tainted: G B 3.15.0-next-20140616-sasha-00025-g0fd1f7d-dirty #657 [ 494.710068] ffff8804e4730e10 ffff88040b7d3980 ffffffff965140d1 0000000000000001 [ 494.710068] ffff88003680bb80 ffff88040b7d39b0 ffffffff932eac11 ffff8804e4730e60 [ 494.710068] ffff88003680bb80 00000000000000bb ffff8804e4730e10 ffff88040b7d3a00 [ 494.710068] Call Trace: [ 494.710068] dump_stack (lib/dump_stack.c:52) [ 494.710068] print_trailer (mm/slub.c:641) [ 494.710068] check_bytes_and_report (mm/slub.c:680 mm/slub.c:704) [ 494.710068] check_object (mm/slub.c:804) [ 494.710068] ? ptlock_alloc (mm/memory.c:3826) [ 494.742119] alloc_debug_processing (mm/slub.c:1082) [ 494.742119] __slab_alloc (mm/slub.c:2382 (discriminator 1)) [ 494.742119] ? ptlock_alloc (mm/memory.c:3826) [ 494.742119] ? get_parent_ip (kernel/sched/core.c:2546) [ 494.742119] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489) [ 494.742119] ? ptlock_alloc (mm/memory.c:3826) [ 494.742119] ? pte_alloc_one (arch/x86/mm/pgtable.c:28) [ 494.742119] ? copy_huge_pmd (./arch/x86/include/asm/paravirt.h:571 ./arch/x86/include/asm/pgtable.h:168 mm/huge_memory.c:867) [ 494.742119] ptlock_alloc (mm/memory.c:3826) [ 494.742119] pte_alloc_one (include/linux/mm.h:1464 include/linux/mm.h:1499 arch/x86/mm/pgtable.c:30) [ 494.742119] copy_huge_pmd (mm/huge_memory.c:858) [ 494.742119] copy_page_range (mm/memory.c:968 mm/memory.c:998 mm/memory.c:1062) [ 494.742119] copy_process (kernel/fork.c:460 kernel/fork.c:835 kernel/fork.c:898 kernel/fork.c:1346) [ 494.742119] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2619) [ 494.742119] do_fork (kernel/fork.c:1607) [ 494.742119] ? get_parent_ip (kernel/sched/core.c:2546) [ 494.742119] ? context_tracking_user_exit (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/context_tracking.c:184 (discriminator 2)) [ 494.742119] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2564) [ 494.742119] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) [ 494.742119] SyS_clone (kernel/fork.c:1693) [ 494.742119] stub_clone (arch/x86/kernel/entry_64.S:637) [ 494.742119] ? tracesys (arch/x86/kernel/entry_64.S:542) [ 494.742119] FIX page->ptl: Restoring 0xffff8804e4730e58-0xffff8804e4730e5f=0xbb [ 494.742119] [ 494.742119] FIX page->ptl: Marking all objects used Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/